Parallel rebuild2
Fixes #458 (closed). Also renders the discussion in #483 (closed) void.
Merge request reports
Activity
added 13 commits
-
8c7d70a9...6e883e98 - 12 commits from branch
master
- af1b599d - Post-merge fixes.
-
8c7d70a9...6e883e98 - 12 commits from branch
assigned to @pdraper
mentioned in merge request !655 (closed)
added MPI enhancement performance labels
added 1 commit
- 750ccb14 - The timing of the star linking does not include any reweight time. Corrected the message.
Thanks, that looks better now. I assume we're waiting for #458 (closed) to see which 'improvements' we are keeping?
- Resolved by Matthieu Schaller
So it's super-weird to me that this is so slow...
My only guess is that the spin-locks used around the
MPI_Waitany
(currentlyswift_lock_type
) generate too much background noise. Can you try replacing theswift_lock_type
with apthread_mutex
and initialize it withPTHREAD_MUTEX_INITIALIZER
? The mutex has a higher unlock latency than the spin-lock, but doesn't burn any CPU while waiting.As for having different threads look at different chunks of the array, that can cause load imbalances, say if you have two threads and ten entries, there may be zero available in the first chunk, but all available in the second, and the first thread will wait idly while the second has too much work.
mentioned in issue #458 (closed)
FYI, I have been working on an orthogonal solution to this. We currently construct much too many proxies compared to what we will need later on when constructing the tasks. That speeds up the whole exchange of cells a lot but I want to make sure the new decision making does not lead to forgotten corner cases where we actually need the much more conservative approach currently in use.
Here is a suggestion before some of this drifts too much out of date: Let's merge in the good parts and keep the parallelization of the cell unpacking for later.
Also, the reduction of the number of proxies has a dramatic effect on this. I'll make a separate merge request for these changes.
added 73 commits
-
750ccb14...e1bdccb3 - 71 commits from branch
master
- 6be06310 - Do not use the threadpool parallelization for the waiting and unpacking of cell proxies.
- 35314bbc - Merge branch 'master' into parallel_rebuild2
-
750ccb14...e1bdccb3 - 71 commits from branch
added 1 commit
- 97320153 - Document the mapper function to remember that it is currently unused.
Ok. I have reverted the threadpool call to the waiting and un-packing of cells.
Once merged, we can resume work by reverting commit 6be06310.
mentioned in commit bae9515b