Implements the mesh gravity calculation in an MPI-distributed fashion. This allows to run with much larger meshes and reduce the memory footprint.
Base implementation is similar to !1045 (closed).
Another difference with !1045 (closed) is that I am using a hand-written bucket sort instead of the qsort()
that was used originally.
Since we only need a crude sort with a fixed small number of bins this is much much faster. (But still the current bottleneck)
The code needs to be configures with --enable-mpi-mesh-gravity
and the runtime parameter Gravity:distributed_mesh
must be set to 1.
Implements #524 (closed). Bypasses #716 (closed). Likely supersedes !1045 (closed).
Possible improvements: