Draft: Fast one-sided MPI version
Reworked version of branch asyncreallyonesided with only speed in mind, so accumulation and atomics are gone and probably the order guarantees, but it is faster than the usual p2p asynchronous MPI. Possible method to use these techniques in SWIFT and avoid copies on the receive side. Doesn't seem likely that we can use registered memory on the second side as it is not possible to associate enough regions with a window and the send side is not a contiguous memory segment, for just those particles with foreign copies, unlike that used for foreign memory.
Although we could register the complete buffers and only send parts... (since the offsets used by MPI_Win_attach are based on some concept of the address space as a whole I think).
Edited by Peter W. Draper