Draft: Reduced MPI message size for hydro
This merge request reduces the data volume that needs to be sent to update the proxies for the hydro interactions. In the original scheme, the entire particle was sent prior to the interaction, yielding a total communication volume of 2P or 3P, depending on the number of loops. However, because of the way the hydro interactions work, almost all variables stored in the particle change at most once during the loops, e.g. the position is drifted but remains constant after that, the density is updated during the first loop but should not be used prior to that anyway... In other words, the total volume that actually needs to be sent (distributed over the 2 or 3 communications) should be P (more or less). For the default SPHENIX hydro scheme, this means a potential volume reduction of 2/3!
To reduce the size of the messages, an additional "reduced particle" needs to be created for the various communications. Prior to the send, a new buffer of these "particles" is allocated and the required variables are copied over from the original particle into the reduced particle. This reduced buffer is then used for the send. On the receiving end, a new "reduced" receive buffer of the same size is allocated and upon successful completion of the communication the variables from the reduced buffer are copied over into the particles in the proxy. The latter is done during the recv task and will hence already show up in task plots and timers. The same is not true for the send, since that is emitted in the scheduler. The buffer allocation and copying for sends is therefore done in a new task that unlocks the send task, similar to what is already done for the limiter.
A major disadvantage of this new approach is that hydro schemes (and all subgrid schemes that use data stored in the part) can no longer be ignorant about what happens during the communications. To deal with this in a somewhat clean way, I have created a new file,
hydro_pack.h, that deals with the definition of the "reduced particle" structs and all packing and unpacking related functions. This file needs to be implemented for each hydro scheme in a similar way as
hydro_part.h, and should in turn use a similar file (with structs and functions) to deal with subgrid schemes. The developer/maintainer of a specific hydro scheme or subgrid scheme is then responsible for deciding which variables need to be sent prior to each loop, and for copying these variables between the particles on both ranks and the intermediate "reduced particle".
A lot of things remain to be done:
right now, inactive cells (cells with only inactive particles) are only communicated once, i.e. by
send_xv. Since inactive particles can be neighbours of active particles on the remote rank, we need to make sure all the required variables are sent through for these cells. This means either activating the other sends again for these cells, or creating a different send for inactive cells that sends more information than the reduced
because of the above problem, the reduced
send_xvstill sends the whole particle, but in a less efficient way than what was done before (because we still use an intermediate buffer)
the current implementation only filters out hydro variables; subgrid related variables are blindly
memcpyd for now
- the current implementation only works for SPHENIX
- lots of documentation is missing
I have tested the new implementation on steps 100-120 of the EAGLE_12 low z example (on 4 ranks and 3 threads per rank) and that seems to work. A preliminary test on a FLAMINGO benchmark run shows a total message volume reduction of 15% and a time gain of 5s (on a total of 140s) for a representative large step. So some progress, but not spectacular.