Implements the mesh gravity calculation in an MPI-distributed fashion. This allows to run with much larger meshes and reduce the memory footprint.
Base implementation is similar to !1045 (closed).
- Each rank accumulates it's own contributions to the density field. These would need to be stored in some kind of sparse array to avoid making any assumptions about how the domain decomposition is done. Unlike !1045 (closed), we do not use a hashmap but rather a simple array since we can actually predict what we need and construct nice keys.
- Each rank allocates a slab of the full mesh as required by FFTW
- The mesh contributions are sent to whichever rank stores the corresponding slab. This is reasonably straightforward because the coordinates of a cell indicate which rank it's stored on.
- The FFT/Green function/CIC deconvolution/inverse FFT are carried out to make a slab-distributed mesh containing the potential
- Each rank calculates which cells it needs to calculate the potential gradient for its particles and requests them from whichever node they're on. This is slightly trickier than constructing the mesh because we need several cells around each particle to evaluate the gradient.
- We can then evaluate the potential and acceleration on the particles as usual.
Another difference with !1045 (closed) is that I am using a hand-written bucket sort instead of the
qsort() that was used originally.
Since we only need a crude sort with a fixed small number of bins this is much much faster. (But still the current bottleneck)
The code needs to be configures with
--enable-mpi-mesh-gravity and the runtime parameter
Gravity:distributed_mesh must be set to 1.
Use the threadpool to speed-up the three bucket sorts. At least to construct the counts.
Construct the array of send/recv counts at the same time as the bucket sort. Or just from the bucket counts?
Check whether the potential patches can be smaller than currently and only cover the particle extent.