Skip to content
Snippets Groups Projects

Updated cell splitting strategy

Merged Pedro Gonnet requested to merge faster_rebuilds into master

It's a small improvement, but it's also just a start.

Edited by Matthieu Schaller

Merge request reports

Merged by avatar (May 25, 2025 8:32am UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Should space_split_recursive() construct the content of the buffer in the cell's frame (i.e. removing cell->loc from the positions)? I am just thinking about cases where the top-level mesh is large and the tree rather deep.

  • Author Developer

    Not doing that quite yet, still want to see what we can gain from building this in different places... I.e. this is still WIP, but I think for now we can go with this first change.

    Edited by Pedro Gonnet
  • ok. Let's just make sure that an EAGLE_100 or EAGLE_50x2 or runs.

  • Or EAGLE_50, a run of that aborts, with debugging checks on:

    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    application called MPI_Abort(MPI_COMM_WORLD, -1) - process 5
    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
    

    Just running SPH on 8 nodes of COSMA5.

  • In fact with debugging checks on the much simpler run.sh fails on EAGLE_6.

  • Author Developer

    The check that fails is the following:

      for (int k = 0; k < count; k++) {
        if (buff[k].x[0] != parts[k].x[0] || buff[k].x[1] != parts[k].x[1] ||
            buff[k].x[2] != parts[k].x[2])
          error("Inconsistent buff contents.");
      }

    Which is no longer true if buff holds floats and part.x holds doubles. I will change this test to

      for (int k = 0; k < count; k++) {
        if (buff[k].x[0] != (float)parts[k].x[0] || buff[k].x[1] != (float)parts[k].x[1] ||
            buff[k].x[2] != (float)parts[k].x[2])
          error("Inconsistent buff contents.");
      }

    which should not fail.

  • Author Developer

    Argh... Fixed that, now it fails in checkCellhdxmax...

    Wondering what will happen if I round the cell locations to floats...

  • Author Developer

    Actually, this just breaks a whole bunch of other stuff. What a mess.

  • Author Developer

    OK, screw it, I just realized that we weren't really using the buffers the way I though we were.

    Currently, we pack the particle position and an int. In each recursion level of space_split_recursive, cell_split is called and assumes that the buffer and the parts array match. The int in the buffer is then set to the sub-cell ID of each part, and both the buffer and the parts are sorted into the right sub-cells.

    So what could we do differently? Well, instead of keeping the buffer and parts in sync at ever recursion level, store two ints in the buffer: one corresponding to the particle offset in the super-cell, and the other to be used for sorting into the sub-cells. Then do the whole recursion for a top-level cell, but only sorting the buffer entries into the correct cells. Once the buffer entries have been sorted into their cells, use the particle offsets stored in the buffer to move each particle to its rightful spot.

    This is a bit of a major change, so I won't finish it tonight (or if I do, I'll break a whole lot of stuff in the process), so watch this space!

  • Looks like we think alike: !298 (closed) and re-try old ideas. Might be better now...

  • As discussed this morning you can ecrase the "weight" of your particles to a relevant point for EAGLE-like things by configuring with --with-chemistry=EAGLE and adding

    EAGLEChemistry:
      InitMetallicity:         0.
      InitAbundance_Hydrogen:  0.752
      InitAbundance_Helium:    0.248
      InitAbundance_Carbon:    0.000
      InitAbundance_Nitrogen:  0.000
      InitAbundance_Oxygen:    0.000
      InitAbundance_Neon:      0.000
      InitAbundance_Magnesium: 0.000
      InitAbundance_Silicon:   0.000
      InitAbundance_Iron:      0.000
      CalciumOverSilicon:      0.0941736
      SulphurOverSilicon:      0.6054160

    to your YAML file.

  • Pedro Gonnet added 4 commits

    added 4 commits

    • d6d517d2 - Revert "use floats instead of double for cell splitting, makes space_rebuild ~10% faster."
    • 63024b65 - first step in breaking stuff, modify cell_split so that it only splits the…
    • 88cf897b - add a wrapper space_split_cell that allocates the buffers and re-sorts the…
    • 35acbea9 - add particle re-ordering at the bottom of the recusrion in…

    Compare with previous version

  • Pedro Gonnet added 1 commit

    added 1 commit

    • c6d9d0fb - fix the shuffling so that it now actually works, still need to re-ling parts and gparts.

    Compare with previous version

  • Pedro Gonnet added 1 commit

    added 1 commit

    • ef9f6714 - add a function to link both parts and sparts to gparts in one go.

    Compare with previous version

  • Pedro Gonnet added 1 commit

    added 1 commit

    • 8596ab82 - put it all together. works for hydro, but i get a wrong time bin with gravity.

    Compare with previous version

  • Pedro Gonnet added 134 commits

    added 134 commits

    Compare with previous version

  • Author Developer

    So this now works for hydro, but if I switch on gravity I get

    [00011.9] space_rebuild: (re)building space
    [00014.0] space_parts_sort: Sorting succeeded.
    [00016.0] space_gparts_sort: Sorting succeeded.
    [00037.7] engine_init_particles: Converting internal energy variable.
    [00038.0] engine_init_particles: Running initial fake time-step.
    [00058.0] runner.c:runner_do_kick2():1177: Particle in wrong time-bin
    
    Program received signal SIGABRT, Aborted.
    [Switching to Thread 0x7fff37a23700 (LWP 27474)]
    0x00007ffff6a24c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
    56	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.

    @matthieu, any ideas what might be causing this? I'll have a closer look myself as soon as I can!

  • What test-case, configuration options and runtime flags are you using here?

    Edited by Matthieu Schaller
  • Author Developer

    Sorry, I should have been more specific -- I though maybe you could see something bad in the diffs directly.

    I was running EAGLE_12 out of the box with -g. The code was configured with checks enabled. If I run without -g, all goes well, and it looks like the failing check is on a gpart, so it's probably something I messed up there, but I can't see what :frowning2:

  • I can't see anything obvious from the diff. I'll have a more detailed look later. I can imagine that some particle counts or cell-wide time-steps are not computed correctly which can lead to this issue.

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading