Free the offsets array at the end of the particle sorting routines.
Unless I am mistaken about the logic here. we should free the offset
array at the end of the function.
Also, should we use an aligned allocation here? The memswap
function documentation begs the user to use aligned addresses but that may not apply to things smaller than an AVX
vector.