Implement lock free subcell splitting and other speed ups.
Based on work in the zoom-master branch by Matthieu and Will
Avoids locking the memory used for subcells during spliting by having a pool of memory for each threadpool thread. In simple tests this speeds things up nicely, especially during step 0.
Also speeds the engine by using a more uniform and randomly assigned runner to a cell (the owner) and using more information about the weights when scheduling tasks.
An EAGLE_50 volume ran on a single COSMA 8 node shows speed ups of the order 20% over the initial 128 steps. This is also faster than 8xMPI on a node, but the reasons for that are more nuanced and may be less for a proper MPI run (since for instance the MPI limits for fastest possible step and communications within a step will return).
Edited by Peter W. Draper