Rewrite of MPI loops - Add stellar feedback loops
This includes and supersedes @lhausammann's !700 (closed) merge request.
Also fixes #522 (closed), #476 (closed), #449 (closed) the original issue of #515 (closed) (the negative wait, not the hanging), possibly #537 (closed) and #520 (closed).
The main changes are:
- Each bloc (hydro, gravity, stars) of tasks is now at a fixed level. Only local tasks can move up and down between levels,
- MPI communications happen within one bloc only,
- Add MPI communications for stars density and feedback (original !700 (closed)),
- Add a separate drift task for the stars.
- Move all the star feedback tasks to after the star formation task (itself after the cooling).
This was heavily tested but since it's a big change, a third pair of eyes is very much welcomed!
Merge request reports
Activity
@lhausammann could you check that it does not break your galaxy examples in the GEAR directory?
@folkert could you check that this does not break your isolated galaxy example?
@jkeger could you check that this does not break your planet impact example?
mentioned in merge request !700 (closed)
added 1 commit
- feb8c0b8 - The kick2 should always depend upon the stars drift even when using feedback.
Running with your updated EAGLE_12 ICs, with debug checking on I see:
[0002] [00229.9] runner_doiact_stars.h:runner_do_nonsym_pair_stars_density():209: Particle pj not drifted to current time application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2
after 79 steps.
The command was:
mpirun -np 14 ../../swift_mpi -a -t 4 --cosmology --hydro --self-gravity --stars --feedback --threads=4 eagle_12.yml
and configure:
./configure --with-parmetis --with-feedback=thermal --with-stars=EAGLE --with-star-formation=EAGLE --with-entropy-floor=EAGLE --with-chemistry=EAGLE --disable-hand-vec --enable-debug --enable-debugging-checks --enable-sanitizer --enable-undefined-sanitizer
Adding some debugging we see:
[0001] [00376.9] runner_doiact_stars.h:runner_do_nonsym_pair_stars_density():209: Particle pj not drifted to current time (211106232532992 != 281474976710656 application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1
and it stopped on the same line, so repeatable.
added 1 commit
- e8f0359e - Isolated galaxy example: change Birth_time to BirthTime similar as has been done in the code
Thanks Peter. That looks like I am missing a dependency. Running on 4 nodes it took 2000 steps before reaching a similar situation. Back to the drawing board. BTW, I have added the EAGLE-6 and -25 examples with stars as well to the binary repository.
Thanks for the update @folkert; The new name is more in line with the other variables. Should have updated the analysis script as well.
added 3 commits
added 1 commit
- 45bf163e - Removed debugging code. Code formatting and documentation.
added 1 commit
- e898607a - Removed last traces of the debugging communication task.
@pdraper I have now run the offending example above for more than 2000 steps without any issues. There was indeed a missing dependence.
Indeed. Running with that I get:
[0001] [00315.0] runner.c:runner_do_stars_ghost():267: Smoothing length correction not going in the right direction application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1
during
space_rebuild
.Edited by Peter W. DraperYou will need to increase the smallest
h
in the IC. The method used to compute the smoothing length seems to underestimate someh
and raises this bug.Edited by Loic Hausammann