Best one-sided implementation. Not that fast, so moved on to RDMA afterwards.

Note copies memory around, so that is probably part of the issue with speed on reflection two years later!

