Delayed allocation of foreign particle arrays
When exchanging the proxies, we directly allocate memory for all the particles in the hierarchy of the corresponding top-level cell. We then construct the tasks and push them down the tree. In practical applications, after this step there are no self/pair tasks left at the top-level, which means that we are allocating way too much memory. This especially true in the case of calculations involving gravity.
One possible solution would be to delay the allocation of the foreign particle arrays until after all the communication tasks have been created. This would be a bit more messy and would require more individual allocations but would reduce our memory footprint dramatically.
Edited by Matthieu Schaller