Parallel space_rebuild()

Matthieu Schaller requested to merge parallel_get_index into master

Looking at some of the scaling results, it turns out that the last remaining significant chunk of non-parallel code is space_rebuild() and the majority of the time in there is spent computing the cell index of the particles. This can easily done in parallel and on the EAGLE_25 shows significant improvements in the code speed and scalability. Although I should say that this comes from running this on 16 cores only and based on the vTune outputs (which usually match the actual tests).

What do you think ?

Merge request reports