Delayed foreign allocation
Fixes #456 (closed).
We don't blindly allocate memory for foreign particles based on the top-level cell content any more. The new strategy is to recurse once down the cells once all the tasks (including the local-foreign pairs and MPI comms) have been constructed. In this first pass we only go down to the level where we reach tasks (so it's rather quick). We use this to count the number of particles that we will need to allocate. We then do that and finally link the particle arrays to their cells in the same way as before, just starting from the super level instead of the top level.
This saves quite a bit of memory and is also marginally faster as we only recurse into the parts of the (foreign) tree that we need to.
Future improvements possibly include:
- Do the same for the stars once we a have a better star-over-MPI strategy.
- Use the threadpool to parallelize the recursion since it is embarassingly parallel. (But it's also fast so do we care?)
Merge request reports
Activity
mentioned in issue #456 (closed)
Sorry to overfill your time @pdraper... I'll slow down a bit in the next few weeks.
added 1 commit
- 71c42df2 - Correct calculation of the memory allocated for foreign particles.
added 34 commits
-
f80c65cf...aa863097 - 33 commits from branch
master
- 2eb69dff - Merge branch 'master' into delayed_foreign_allocation
-
f80c65cf...aa863097 - 33 commits from branch
mentioned in commit 0d8f1852