Skip to content
Snippets Groups Projects

Only repartition when required

Merged Peter W. Draper requested to merge repartition-less into master

Only repartition when the previous step processed some large fraction of all the particles, and then only when the loads between the ranks are out of balance. This is for several reasons:

  • Repartitioning is expensive, so should only be done when necessary.
  • Frequent repartitioning with multi-dt is not necessary (for the EAGLE volumes anyway).
  • It is more representative to check the load balance when all tasks have been ran.

The load balance is determined from the user CPU time per step (including the CPU time from all threads). We exclude the system time as that is not down to processing and tends to even out the ranks artificially, much as elapsed time does (since we wait for all the MPI tasks to come together).

The load imbalance allowed is determined by the parameter DomainDecomposition:trigger, this can also be set to a number greater than one, in which case the old repartitioning scheme of every 'trigger' steps will be used (previously trigger was always 100).

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Peter W. Draper Added 4 commits:

    Added 4 commits:

    • bd1c4c1c - Add function that returns the CPU time used by a process and it's children
    • fdc1b7b8 - Correct documentation issue
    • 43df5f7b - Switch to using CPU time to estimate the fraction difference in runtimes between ranks
    • 30bb8907 - The step tic/toc is now only used in the thread task dumps
  • Peter W. Draper Added 2 commits:

    Added 2 commits:

  • Peter W. Draper Title changed from WIP: Only repartition when required to Only repartition when required

    Title changed from WIP: Only repartition when required to Only repartition when required

  • Reassigned to @nnrw56

  • @nnrw56 probably time you had a look at this. I probably need to do some more evaluation, but how does all seem to you?

  • Peter W. Draper Title changed from Only repartition when required to WIP: Only repartition when required

    Title changed from Only repartition when required to WIP: Only repartition when required

  • Peter W. Draper Added 32 commits:

    Added 32 commits:

  • Peter W. Draper Added 401 commits:

    Added 401 commits:

  • Peter W. Draper Added 1 commit:

    Added 1 commit:

  • Now we can do longer runs here are some plots of time per step against step. First using the existing repartitioning scheme:

    master

    The repartition steps are the green squares.

    Now for the current branch:

    less

    and finally my most recent tweak (were we do a repartition after the second step regardless):

    latestless

    Don't get too excited about the seeming longer run, that turned out to be more about the filesystem behaviour, the actually speed up is more like 7%.

  • I've repeated the above now the filesystem is more stable and reduced the number of nodes down to 12 from 20. The message remains the same, the balance remains stable for fewer repartitions, which gives a speed up in wall clock of around 7%.

    Since @rgb asked for a binned version of the plot to make the variations more obvious, here is an attempt for the new runs.

    repart-median

    This shows the median value per 100 steps, for a classic run with repartitioning every 100 steps in blue, and the new code in red. The points at which a new repartition was performed are the green crosses. That seems to show that the balance was worse until the second repartition (the first is a little hidden at step 3), but we are largely as good afterwards. The extra steps are the speed up.

    Here are the raw data. Just to for completeness.

    repart-noless-full

    repart-less-full

  • Peter W. Draper Added 51 commits:

    Added 51 commits:

    • ceaaa6bf...a679332b - 49 commits from branch master
    • d67758ac - Merge remote-tracking branch 'origin/master' into repartition-less
    • bd803b48 - Extend possible schemes to include a number of steps as well as a
  • Peter W. Draper Added 1 commit:

    Added 1 commit:

  • Peter W. Draper Added 84 commits:

    Added 84 commits:

  • Peter W. Draper Added 1 commit:

    Added 1 commit:

    • d76b4aa1 - Stop repartitioning in the step after a repartition, that makes no sense and is …
  • Peter W. Draper Title changed from WIP: Only repartition when required to Only repartition when required

    Title changed from WIP: Only repartition when required to Only repartition when required

  • This looks ready to go now, so please have a look and re-assign to me or @matthieu for merging.

3151 e->forcerepart = 1;
3152 }
3153 }
3154
3155 #ifdef SWIFT_DEBUG_TASKS
3156 /* Save the cputimes for analysis. */
3157 fprintf(e->file_cputimes, "%6d ", e->step);
3158 for (int k = 0; k < e->nr_nodes; k++) {
3159 fprintf(e->file_cputimes, " %14.7g", elapsed_cputimes[k]);
3160 }
3161 fprintf(e->file_cputimes, "\n");
3162 fflush(e->file_cputimes);
3163 #endif
3164 }
3165 }
3166
  • Peter W. Draper Added 1 commit:

    Added 1 commit:

    • fc3e4d3e - Move logic about whether to trigger a repartition into function and remove the
  • Peter W. Draper Added 1 commit:

    Added 1 commit:

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading