Expanding GPU task coverage
We discussed doing the drift, sort, kick etc. on the GPU (also aiming to reduce the data transfer potentially long term). I think almost anyone can implement these GPU routines and (hoping the design of the host code is correct..) it should be easy to extend the GPU to use additional task types too. This probably goes hand-in-hand with #368