This follows from the discussion of the performance of the tree construction in the zoom case.
After more detailed analysis @dc-rope1 identified that the bottleneck was the call to
space_getcells() which features a global lock across all the (pool) threads.
The solution proposed here is to remove the lock entirely. That means we maintain N pool of cells , 1 per thread. Each thread can then allocate what it needs without the need for a lock. When recycling, we put the cells back in the pool of the thread that allocated it.
To get this to work, we need to add a new type of mapper function where the thread id is passed as an argument to the function we call.
This MR also changes the definition of the
cell->owner to be the thread that allocated the cell rather than a somewhat arbitrary fraction of the total particle
array. I also expand the use of this owner in the enqueing.
This should help with #760 and #742.
Deal with the case where the number of pool threads is not the same as the number of runner threads.
Deal with the list recycling.