Skip to content

Draft: Thread-parallel cell allocation

Matthieu Schaller requested to merge parallel_getcells into master

This follows from the discussion of the performance of the tree construction in the zoom case. After more detailed analysis @dc-rope1 identified that the bottleneck was the call to space_getcells() which features a global lock across all the (pool) threads.

The solution proposed here is to remove the lock entirely. That means we maintain N pool of cells , 1 per thread. Each thread can then allocate what it needs without the need for a lock. When recycling, we put the cells back in the pool of the thread that allocated it.

To get this to work, we need to add a new type of mapper function where the thread id is passed as an argument to the function we call.

This MR also changes the definition of the cell->owner to be the thread that allocated the cell rather than a somewhat arbitrary fraction of the total particle array. I also expand the use of this owner in the enqueing.

This should help with #760 and #742.

Todo:

  • Deal with the case where the number of pool threads is not the same as the number of runner threads.
  • Deal with the list recycling.
Edited by Matthieu Schaller

Merge request reports