Draft: Ragged one side registered buffers.
Compare changes
Some changes are not shown
For a faster browsing experience, some files are collapsed by default.
No preview for this file type
Attempt at using a technique that uses the local memory buffers instead of a large receive buffer per subtype. If this works (worried it will run out of registered memory handles) then we can share the particle buffers with MPI and use DMA to avoid copying....
Works with Intel MPI 2022, but not as fast as point to point. Fails with OpenMPI which has a hard limit on the number of regions that can be attached to a window (OMPI_OSC_UCX_ATTACH_MAX I think).
Back to the drawing board.
For a faster browsing experience, some files are collapsed by default.