Implements the mesh gravity calculation in an MPIdistributed fashion. This allows to run with much larger meshes and reduce the memory footprint.
Base implementation is similar to !1045 (closed).
 Each rank accumulates it's own contributions to the density field. These would need to be stored in some kind of sparse array to avoid making any assumptions about how the domain decomposition is done. Unlike !1045 (closed), we do not use a hashmap but rather a simple array since we can actually predict what we need and construct nice keys.
 Each rank allocates a slab of the full mesh as required by FFTW
 The mesh contributions are sent to whichever rank stores the corresponding slab. This is reasonably straightforward because the coordinates of a cell indicate which rank it's stored on.
 The FFT/Green function/CIC deconvolution/inverse FFT are carried out to make a slabdistributed mesh containing the potential
 Each rank calculates which cells it needs to calculate the potential gradient for its particles and requests them from whichever node they're on. This is slightly trickier than constructing the mesh because we need several cells around each particle to evaluate the gradient.
 We can then evaluate the potential and acceleration on the particles as usual.
Another difference with !1045 (closed) is that I am using a handwritten bucket sort instead of the qsort()
that was used originally.
Since we only need a crude sort with a fixed small number of bins this is much much faster. (But still the current bottleneck)
The code needs to be configures with enablempimeshgravity
and the runtime parameter Gravity:distributed_mesh
must be set to 1.
Implements #524 (closed). Bypasses #716 (closed). Likely supersedes !1045 (closed).
Possible improvements:

Use the threadpool to speedup the three bucket sorts. At least to construct the counts. 
Construct the array of send/recv counts at the same time as the bucket sort. Or just from the bucket counts? 
Check whether the potential patches can be smaller than currently and only cover the particle extent.