Draft: MPI for sink particles
Done so far:
- Sink particle exchange with the proxies (proxy.h/.c, engine_strays.c, etc.)
- Tasks implementation
-
cell_unskip
is up to date, but notengine_marktasks
To do:
-
Clean up debugging messages (some may be worth to be kept) -
Clean up/update comments -
Check some comments I left alongside the black holes when I found weird things/potential bugs. -
Add !1938 (merged) for sink particles -
I disabled calls to gravity functions in GEAR sink sink_iact.h
. Move these to some sink_data/sink_merger attribute (similar topotential
) to avoid MPI issues when the cells are foreign. -
Check compilation with Default sink -
Take into account the fact that we can use SF and SF_sink at the same time --> use a different sf_counts for SF_sink
Documentation:
-
Add a list of MPI-related files -
Update the sink doc to say where we can call gpart and when we cannot to avoid MPI issues.
Merge request reports
Activity
added 26 commits
-
68a99629...a0a82398 - 25 commits from branch
master
- c0d88cea - Merge with master + fix conflict
-
68a99629...a0a82398 - 25 commits from branch
added 1 commit
- cfa4be56 - Select the right color for send/recv_sink_density
Bug: I have
[0002] [00199.0] runner_do_nonsym_pair_sinks_naive_swallow: WARNING: Particle sj -666 not drifted to current time. Mass = 0.000000e+00, gpart = (nil). si->id = 12796, ci->cellID = 327686, cj->cellID = 458757, ci->nodeID = 2, cj->nodeID = 0, ci->sinks.count = 1, cj->sinks.count = 2, ci->super = 6, cj->super = 5
This is during the pair swallow interactions:
runner_sinks.c:runner_do_nonsym_pair_sinks_naive_swallow()
.It seems that sj was not initialized. I set the value -666 in
engine.c
for memory allocation.Strangely, when I comment L254
runner_iact_nonsym_sinks_gas_swallow()
, the error disappears... Swallowing problem? Memory problem when swallowing? Is everything correctly updated?The problem is caused by gas swallowing in pair interactions when the two cells are on the local node. If one is foreign, there is no problem. Then, the two cells interacting (one foreign and one local) for the sink swallow lead to the crash. It seems that the pointers of the foreign cell part or sink are, at some point, not updated. So, when we access the last sink in the foreign cell, we are accessing a non-initialized sink, which is not drifted.
I tried deactivating
runner_do_sinks_sink_swallow()
, but the crash happened. I tried deactivatingrunner_do_sinks_gas_swallow()
, which allows the simulation to run until the end.Given that the sink has an id = -666, a
timebin = time_bin_not_created
and all its properties set to 0 or NULL, we are accessing a sink that is too far in memory. Since deactivating gas removal "solves" the problem, this is linked to gas removal and memory that gives a wrong offset.
Update: The bug is probably caused by "broken" foreign hydro super-cell dependencies. This is cell 5, which is local on node 0; the picture shows the tasks' dependencies on node 1. Green arrows show the missing dependencies.
So, I "simply" need to add the missing dependencies...
Edited by Darwinadded 5 commits
- 1cd4ceb9 - Remove wrong recv_density dependency that activate tasks not linked through dependencies
- ad56d7b8 - Cleaning in engine_maketasks.h
- 7bd1f684 - give calloc the parameters in the right order
- c9077afa - Set the correct %x for message printing
- 3c4cd34e - Set the correct %x for message printing
Toggle commit list@matthieu, can you please look at what I have done so far?
Not the GEAR sink specifics (which I disfigured to make the rest work). The task dependencies should be more or less good, but not their activation in
cell_unskip.c
(engine_marktasks.c
is not updated yet).added 2 commits
@matthieu, this is a friendly remainder for you to check what I did so far in the sink MPI. Do it when you have time! :)
I need to finish !1938 (merged) first as it is required for GEAR SF to work over MPI. Then I can start looking at sinks. I am hoping to have !1938 (merged) done this week.
mentioned in merge request !1943 (merged)
mentioned in merge request !2007 (merged)
mentioned in merge request !2018 (merged)
added 57 commits
-
8b9b7436...df3f05c9 - 56 commits from branch
master
- 7128f0ed - Merge with master + fix conflict
-
8b9b7436...df3f05c9 - 56 commits from branch