Only activate the tend comms that are needed
In large simulations, especially with small time-steps, we swamp the system with tend
communications at the end of a step.
That is because in the current logic we don't have a good way of deciding which ones to launch (because of the complexities of the sync + limiter) and so decided to activate all of them. That can be N^3 * 125 communications, where N is the number of top-level cells on one side. That's 4e6 comms in a 32^3 setup like the one-before-largest colibre runs!!
Here, we improve upon this by doing the following:
- Construct an array of boolean (char) of the size of the top-level grid.
- The timestep_collect, sync, and limiter tasks when running at the top-level set the boolean to 'true' if they ended up changing anything related to the time-step in this cell
- We then all-reduce the array for all nodes.
- Each node then activates the tend comms involved in local cells for which the boolean is true.
This means we trade a lot of communications for a global reduction bottleneck.
On a COLIBRE L100N1504 running on 20 nodes (80 ranks, 32^3 TLCs), we see a 5-10% speed-up. In particular, all the steps involving very few particles are significantly faster (500+ms to 200ms). The impact will be larger at even higher resolution.
Merge request reports
Activity
added MPI performance labels
@mivkov this may have consequences on how the RT sub-stepping works over MPI, but I have not checked in details yet.
assigned to @pdraper
@pdraper your thoughts on this would be very welcome too.
added 5 commits
-
a3144dc4...df3f05c9 - 2 commits from branch
master
- 632fd5b2 - Merge branch 'master' into reduced_dt_comms
- ceeed2db - Time the new task launch
- a84ce43f - Time the new function
Toggle commit list-
a3144dc4...df3f05c9 - 2 commits from branch
- Resolved by Matthieu Schaller
- Resolved by Matthieu Schaller
added 1 commit
- 20f51ab2 - Reorder the operations in the time integration to use the same updates as in master
added 1 commit
- 2c8b8468 - Use an atomic operation to update the space-carried array of top-level cell updates
added 9 commits
-
ecb70aa5...1c2c5a6f - 8 commits from branch
master
- d01ffba8 - xMerge branch 'master' into reduced_dt_comms
-
ecb70aa5...1c2c5a6f - 8 commits from branch