Use non-buffered MPI sends for small messages
An attempt to tune these task MPI particle exchanges without affecting the other parts, like repartitioning, which work better with buffered MPI.
Seems to give the same results for MPI tic and toc improvements as tuning in #366. Sadly for longer runs the improvement is harder to find (other factors are far more dominant, like longer running task chains), but it can be seen in EAGLE_50 runs, giving a millisecond or two of improvement for small steps, so worth keeping.
Note that using eager sends like these works best when the receiving recvs are ready, otherwise the remote node will need to buffer and copy the request anyway, so these should be kept under control and not grown without suitable consideration.
Edited by Peter W. Draper