Creation of CUDA tasks and cells and transfer of this data to the GPU and queue initialisation.

Currently working on this at the moment. This is the complicated part I think. Currently here is how I am expecting it to work. The sections in bold are completed, the rest is in progress.

N.B. I had to extend some of the structures (the cell and task structures) to link from the CPU version to the GPU versions to make it easier (and likely faster) to do this process.

Each cell is given an integer ID - this is on using DFS though this shouldn't matter much, they just need to be in a specific order for the CUDA tasks (since pointer mapping is hard/not good).
Create copies of the density, force and ghost tasks in GPU form. - Do we want any other types of task on the GPU @matthieu ?
Create the data transfer tasks recursively from the cells. Particle data is copied at leaf level with dependencies dealing with higher level cells.
Create the dependencies for the work tasks and between the transfer and work tasks.
Create the device dependency array from the data created in the previous section (I can't think of a good way to do the previous section in-place efficiently).
Asynchronously copy the tasks and unlock arrays to the device. - Note this is not asynchronous at the moment.
Create the CUDA cell structs.
Fill the task queues and copy to the device.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information