Redistribution problems - MPI performance
Here is the stdout (vith verbosity on) of a recent cosmological run. This is a 1536^3 = 3.6 billion particles run with only gravity. It was run on 8 nodes with 2 MPI ranks per node and 14 threads per rank.
There are a few things that are worth noting at this stage:
- We hit the default trigger (5% inbalance) for repartitioning quite often. Almost every 4 steps at the start.
- Repartitioning is horribly expensive (e.g.
engine_repartition: took 1454052.947 ms
) but the redistribute seems more-or-less acceptable (e.g.engine_redistribute: took 10907.899 ms
). (is the timer correctly placed here?). - We typically move about 60% of the particles around when redistributing.
- We create a lot of gravity tasks and that costs us a bit of time in the rebuild when ranking them for instance.
I have started working on the last point by merging some of the multipole-multipole tasks into larger ones doing more than one interaction. This will save on the overheads.
I have not worked on getting any sensible cost estimates for the gravity-related tasks yet. That may be why we trigger a repartition that often.