Skip to content

Only activate the tend comms that are needed

Matthieu Schaller requested to merge reduced_dt_comms into master

In large simulations, especially with small time-steps, we swamp the system with tend communications at the end of a step. That is because in the current logic we don't have a good way of deciding which ones to launch (because of the complexities of the sync + limiter) and so decided to activate all of them. That can be N^3 * 125 communications, where N is the number of top-level cells on one side. That's 4e6 comms in a 32^3 setup like the one-before-largest colibre runs!!

Here, we improve upon this by doing the following:

  • Construct an array of boolean (char) of the size of the top-level grid.
  • The timestep_collect, sync, and limiter tasks when running at the top-level set the boolean to 'true' if they ended up changing anything related to the time-step in this cell
  • We then all-reduce the array for all nodes.
  • Each node then activates the tend comms involved in local cells for which the boolean is true.

This means we trade a lot of communications for a global reduction bottleneck.

On a COLIBRE L100N1504 running on 20 nodes (80 ranks, 32^3 TLCs), we see a 5-10% speed-up. In particular, all the steps involving very few particles are significantly faster (500+ms to 200ms). The impact will be larger at even higher resolution.

Edited by Matthieu Schaller

Merge request reports