Mpi periodic gravity

added 1 commit

073d67f2 - Look for grav_mm tasks that will also need send_ti updates

So I found another test that fails:

mpirun -np 2 ../swift_mpi -s -G -S -t 2 eagle_6.yml

that failed as a send_ti task was not available for the grav_mm task (in engine_init_particles phase). Think that looked simple to fix so have pushed that.

We now stop at:

       2   6.103516e-07   1.000000e+00    0.00000   3.051758e-07   41   42          615         4620         3633              2512.215      7
[0001] [00116.6] runner.c:runner_do_end_force():1761: g-particle (id=4719203974277, type=Gas) did not interact gravitationally with all other gparts gp->num_interacted=1038453, total_gparts=1661079 (local num_gparts=879491)

which is more difficult to understand.

Thanks for reporting this one. I am not sure I agree with the first fix though. The thing I was trying to achieve was to not create send_ti tasks for M-M calculations since these do not require the particles. I'll investigate this EAGLE_6 problem.

But don't you need the timestep updates to check if the task should be made active regardless?

Anyway I agree that fix looks wrong as the EAGLE_12 test is now failing in the hydro part of marktasks (EAGLE_6 runs forever without the debugging checks).

BTW, the failure with the EAGLE_6 check was at:

        /* If the local cell is active, send its ti_end values. */
        if (ci_active_gravity)
          scheduler_activate_send(s, ci->send_ti, cj->nodeID);

~line 3767. That ci->send_ti was NULL. Given your reasoning I guess this line should be removed instead?

added 1 commit

1cef67c9 - Revert "Look for grav_mm tasks that will also need send_ti updates"

Compare with previous version

added 3 commits

f5a50abe - Do not activate communication tasks when unlocking an M-M one.
82e9ddbc - Do not link the M-M tasks in with the other gravity pair tasks.
5bc51ffb - Merge branch 'mpi_periodic_gravity' of gitlab.cosma.dur.ac.uk:swift/swiftsim…

Compare with previous version

So... looks like I had forgotten to hit git push on the last two commits...

But that still has the original issue.

The idea, in brief, is to not communicate anything when we have an M-M task. Since we have full knowledge of a neighbouring node's tree, we can use their multipole without having to ask them. It just means we have to drift their multipole when any of our multipole needs it. This is done when unskipping the tasks. That should, in principle, not have any impact on the time-step decision since these are related to particles lower in the tree, which will be communicated if necessary and will have an associated send/recv of the ti_end.

mpirun -np 2 ../swift_mpi -s -G -S -t 2 eagle_6.yml

runs smoothly with the missing push...

Yes, and EAGLE_12 sticks waiting for unpaired tasks after a number of steps. I can see that now...

Exactly. There is one recv_gpart activated that does have a matching send_gpart activated.

Although that EAGLE_6 examples gets stuck on step 4317.

added 42 commits

5bc51ffb...8c9ff9f2 - 41 commits from branch master
1de6620c - Merge branch 'master' into mpi_periodic_gravity

Compare with previous version

One interesting fact (or not?) is that for the EAGLE_12 box, the rebuild that exists around 10 steps before the code hanging is entirely due to the condition we now impose to rebuild when more than X% of the g-part have moved. If I remove that condition, this rebuild is not triggered and we then don't hang on step 43. This is why I mentionned the other day I thought there was something incorrectly done around rebuild time. Which would tie in with the other discussion we are having about proxy exchanges.