## MPI parallel mesh gravity

Currently the mesh gravity calculation stores a full copy of the gravity mesh and repeats the FFT calculations on every MPI rank. In runs with many MPI ranks this increases memory use, generates a lot of communication, and limits the mesh size that can be used.

I've been looking into using the MPI support in FFTW3. The sequence of operations would be something like

- Each rank accumulates it's own contributions to the density field. These would need to be stored in some kind of sparse array to avoid making any assumptions about how the domain decomposition is done. The hash map code used in the FoF implementation might be suitable.
- Each rank allocates a slab of the full mesh as required by FFTW
- The mesh contributions are sent to whichever rank stores the corresponding slab. This is reasonably straightforward because the coordinates of a cell indicate which rank it's stored on.
- The FFT/Green function/CIC deconvolution/inverse FFT are carried out to make a slab-distributed mesh containing the potential
- Each rank calculates which cells it needs to calculate the potential gradient for its particles and requests them from whichever node they're on. This is slightly trickier than constructing the mesh because we need several cells around each particle to evaluate the gradient.
- We can then evaluate the potential and acceleration on the particles as usual

This branch executes the first three steps alongside the usual mesh calculations and reports how well the distributed density mesh agrees with the full mesh. Then it just discards the distributed mesh. Currently it's all single threaded and the distributed mesh is not quite in the right format because MPI FFTW needs some specific padding in the last dimension.