Free space used by foreign particles before attempting a repartition,
Free space used by foreign particles before attempting a repartition, this frees space for the particle type copy we need.
Merge request reports
Activity
added enhancement label
assigned to @matthieu
Seems to do the expected thing, i.e. make more memory available for repartitioning. Here is a plot showing the output from the memory report log with and without this fix, running on 4 nodes with EAGLE_50, 16x16x16 top-level cells and forcing repartitioning every other step:
The upper line is before (the time axis offset is deliberate). Colours change each for the step. Note that I've excluded the sorts as they are a pain to process, so not the complete story, but interesting to note that the peak seen in step 0, that hump at around 500 secs, is caused by the buffers used when writing the snapshot.
Here's the breakdown of the memory in use (excluding sorts) at the end of step 0:
# Memory use by label: ## label MB numactive ## ## cells_sub 5515.900 4885 ## cells_top 1.951 1 ## cells_with_particles_top 0.007 1 ## fftw_mesh.potential 16.000 1 ## gparts 15188.057 1 ## gparts_foreign 7375.277 1 ## gparts_in 0.018 3 ## gparts_out 0.018 3 ## links 418.320 1 ## local_cells_top 0.007 1 ## local_cells_with_particles_top 0.007 1 ## local_cells_with_tasks_top 0.007 1 ## multipoles_sub 1788.940 4885 ## multipoles_top 0.633 1 ## parts 14558.686 1 ## parts_foreign 4351.396 1 ## parts_in 0.037 3 ## parts_out 0.037 3 ## pcells_in 1048.758 3 ## pcells_out 950.666 3 ## queues 0.002 1 ## runners 0.087 1 ## sparts 657.152 1 ## sparts_foreign 779.746 1 ## sparts_in 0.037 3 ## sparts_out 0.037 3 ## tasks 8585.178 1 ## tasks_ind 357.716 1 ## tid_active 357.716 1 ## unlock_ind 78.125 1 ## unlocks 156.250 1 ## xparts 7279.343 1 ## xparts_in 0.018 3 ## xparts_out 0.018 3 ## # Total memory still in use : 69466.145 (MB) # Peak memory usage : 77830.465 (MB) # # Memory use by process (all/system): VIRT = 94876.398 SHR = 11.082 CODE = 1.883 DATA = 94732.898 RES = 77546.684 (MB)
@nnrw56, I know you like to see this sort of thing!
Here we go. These are the four ranks (with this fix) plotted all together.
Interesting that after the first repartition the variations are less that during the initial. I didn't expect to see that and am wondering if there is a problem in the initial partition code, that is supposed to memory balance.
@nnrw56, I know you like to see this sort of thing!
This is indeed awesome :)
We're getting to a point where the tooling is just as amazing as the code itself, which is a really nice place to be.
Edited by Pedro Gonnetmentioned in commit 6cd6230f