Updated cell splitting strategy
It's a small improvement, but it's also just a start.
Merge request reports
Activity
Not doing that quite yet, still want to see what we can gain from building this in different places... I.e. this is still WIP, but I think for now we can go with this first change.
Edited by Pedro GonnetOr EAGLE_50, a run of that aborts, with debugging checks on:
[0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents. [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents. [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents. [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents. [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents. [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents. [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents. [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents. application called MPI_Abort(MPI_COMM_WORLD, -1) - process 5 [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents. [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents. [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents. [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents. [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents. [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents. [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents. [0005] [00170.6] cell.c:cell_split():790: Inconsistent buff contents.
Just running SPH on 8 nodes of COSMA5.
The check that fails is the following:
for (int k = 0; k < count; k++) { if (buff[k].x[0] != parts[k].x[0] || buff[k].x[1] != parts[k].x[1] || buff[k].x[2] != parts[k].x[2]) error("Inconsistent buff contents."); }
Which is no longer true if
buff
holds floats andpart.x
holds doubles. I will change this test tofor (int k = 0; k < count; k++) { if (buff[k].x[0] != (float)parts[k].x[0] || buff[k].x[1] != (float)parts[k].x[1] || buff[k].x[2] != (float)parts[k].x[2]) error("Inconsistent buff contents."); }
which should not fail.
OK, screw it, I just realized that we weren't really using the buffers the way I though we were.
Currently, we pack the particle position and an
int
. In each recursion level ofspace_split_recursive
,cell_split
is called and assumes that the buffer and the parts array match. Theint
in the buffer is then set to the sub-cell ID of each part, and both the buffer and the parts are sorted into the right sub-cells.So what could we do differently? Well, instead of keeping the buffer and parts in sync at ever recursion level, store two
int
s in the buffer: one corresponding to the particle offset in the super-cell, and the other to be used for sorting into the sub-cells. Then do the whole recursion for a top-level cell, but only sorting the buffer entries into the correct cells. Once the buffer entries have been sorted into their cells, use the particle offsets stored in the buffer to move each particle to its rightful spot.This is a bit of a major change, so I won't finish it tonight (or if I do, I'll break a whole lot of stuff in the process), so watch this space!
Looks like we think alike: !298 (closed) and re-try old ideas. Might be better now...
As discussed this morning you can ecrase the "weight" of your particles to a relevant point for EAGLE-like things by configuring with
--with-chemistry=EAGLE
and addingEAGLEChemistry: InitMetallicity: 0. InitAbundance_Hydrogen: 0.752 InitAbundance_Helium: 0.248 InitAbundance_Carbon: 0.000 InitAbundance_Nitrogen: 0.000 InitAbundance_Oxygen: 0.000 InitAbundance_Neon: 0.000 InitAbundance_Magnesium: 0.000 InitAbundance_Silicon: 0.000 InitAbundance_Iron: 0.000 CalciumOverSilicon: 0.0941736 SulphurOverSilicon: 0.6054160
to your YAML file.
added 4 commits
- d6d517d2 - Revert "use floats instead of double for cell splitting, makes space_rebuild ~10% faster."
- 63024b65 - first step in breaking stuff, modify cell_split so that it only splits the…
- 88cf897b - add a wrapper space_split_cell that allocates the buffers and re-sorts the…
- 35acbea9 - add particle re-ordering at the bottom of the recusrion in…
Toggle commit listadded 1 commit
- c6d9d0fb - fix the shuffling so that it now actually works, still need to re-ling parts and gparts.
added 1 commit
- ef9f6714 - add a function to link both parts and sparts to gparts in one go.
added 1 commit
- 8596ab82 - put it all together. works for hydro, but i get a wrong time bin with gravity.
added 134 commits
-
8596ab82...079f0ea3 - 133 commits from branch
master
- 3cb72fca - Merge remote-tracking branch 'origin/master' into faster_rebuilds
-
8596ab82...079f0ea3 - 133 commits from branch
So this now works for hydro, but if I switch on gravity I get
[00011.9] space_rebuild: (re)building space [00014.0] space_parts_sort: Sorting succeeded. [00016.0] space_gparts_sort: Sorting succeeded. [00037.7] engine_init_particles: Converting internal energy variable. [00038.0] engine_init_particles: Running initial fake time-step. [00058.0] runner.c:runner_do_kick2():1177: Particle in wrong time-bin Program received signal SIGABRT, Aborted. [Switching to Thread 0x7fff37a23700 (LWP 27474)] 0x00007ffff6a24c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
@matthieu, any ideas what might be causing this? I'll have a closer look myself as soon as I can!
What test-case, configuration options and runtime flags are you using here?
Edited by Matthieu SchallerSorry, I should have been more specific -- I though maybe you could see something bad in the diffs directly.
I was running
EAGLE_12
out of the box with-g
. The code was configured with checks enabled. If I run without-g
, all goes well, and it looks like the failing check is on agpart
, so it's probably something I messed up there, but I can't see what