Skip to content
Snippets Groups Projects

Free space used by foreign particles before attempting a repartition,

Merged Peter W. Draper requested to merge repart-lessmem into master

Free space used by foreign particles before attempting a repartition, this frees space for the particle type copy we need.

Edited by Peter W. Draper

Merge request reports

Merged by avatar (May 28, 2025 1:59am UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Peter W. Draper unmarked as a Work In Progress

    unmarked as a Work In Progress

  • Seems to do the expected thing, i.e. make more memory available for repartitioning. Here is a plot showing the output from the memory report log with and without this fix, running on 4 nodes with EAGLE_50, 16x16x16 top-level cells and forcing repartitioning every other step:

    free-foreign

    The upper line is before (the time axis offset is deliberate). Colours change each for the step. Note that I've excluded the sorts as they are a pain to process, so not the complete story, but interesting to note that the peak seen in step 0, that hump at around 500 secs, is caused by the buffers used when writing the snapshot.

    Here's the breakdown of the memory in use (excluding sorts) at the end of step 0:

    # Memory use by label:
    ##  label                                        MB        numactive
    ## 
    ##  cells_sub                              5515.900             4885
    ##  cells_top                                 1.951                1
    ##  cells_with_particles_top                  0.007                1
    ##  fftw_mesh.potential                      16.000                1
    ##  gparts                                15188.057                1
    ##  gparts_foreign                         7375.277                1
    ##  gparts_in                                 0.018                3
    ##  gparts_out                                0.018                3
    ##  links                                   418.320                1
    ##  local_cells_top                           0.007                1
    ##  local_cells_with_particles_top            0.007                1
    ##  local_cells_with_tasks_top                0.007                1
    ##  multipoles_sub                         1788.940             4885
    ##  multipoles_top                            0.633                1
    ##  parts                                 14558.686                1
    ##  parts_foreign                          4351.396                1
    ##  parts_in                                  0.037                3
    ##  parts_out                                 0.037                3
    ##  pcells_in                              1048.758                3
    ##  pcells_out                              950.666                3
    ##  queues                                    0.002                1
    ##  runners                                   0.087                1
    ##  sparts                                  657.152                1
    ##  sparts_foreign                          779.746                1
    ##  sparts_in                                 0.037                3
    ##  sparts_out                                0.037                3
    ##  tasks                                  8585.178                1
    ##  tasks_ind                               357.716                1
    ##  tid_active                              357.716                1
    ##  unlock_ind                               78.125                1
    ##  unlocks                                 156.250                1
    ##  xparts                                 7279.343                1
    ##  xparts_in                                 0.018                3
    ##  xparts_out                                0.018                3
    ## 
    # Total memory still in use :  69466.145  (MB)
    # Peak memory usage         :  77830.465  (MB)
    #
    # Memory use by process (all/system):  VIRT = 94876.398 SHR = 11.082 CODE = 1.883 DATA = 94732.898 RES = 77546.684 (MB)
  • @nnrw56, I know you like to see this sort of thing!

  • Oh, neat plot.

    We can reduce the size of the i/o buffers if necessary.

    Do you have an intuition as to why we do not reach the same level in the flat bits? Is it because we only (re-)allocate what we need?

  • This is just one rank of several, I've seen variations at this level between them, so I expect it is just the detail of the repartitioning. Let me plot that...

  • Also, why does it ramp down? Isn't the freeing of memory instantaneous?

  • Here we go. These are the four ranks (with this fix) plotted all together.

    free-foreign-variation

    Interesting that after the first repartition the variations are less that during the initial. I didn't expect to see that and am wondering if there is a problem in the initial partition code, that is supposed to memory balance.

  • The ramping is just sampling. Each point happens at a malloc/free and I join these with a straight line. Looks better than just the points.

  • So if we exclude step 0, it looks like we now need less memory in the redistribution process than in the regular steps. That's great progress. If we survive the steps, we are likely to survive the run overall.

    Maybe just the worry about the i/o, which we can adjust as well.

  • For this volume, yes. Previously repartitioning spiked to around the same level as peak in step 0, and presumably above when some imbalance was needed, which is why it failed later for the big volume.

  • BTW, I am ready to accept. Just waiting for Pedro to see the plots and rejoice.

  • @nnrw56, I know you like to see this sort of thing!

    This is indeed awesome :)

    We're getting to a point where the tooling is just as amazing as the code itself, which is a really nice place to be.

    Edited by Pedro Gonnet
  • Agreed, nice to work from information rather than guessing. The memory logger is about ready for acceptance as well.

  • mentioned in commit 6cd6230f

Please register or sign in to reply
Loading