Log rank CPU times and memory use to assess the balance.
Writes the user and system CPU times per rank into a file rank_cpu_balance.log
together with the
current load balance estimate. Also records the resident set size of the process on each
rank to a separate file rank_memory_balance.log
. The resident size is essentially the core
memory use (but note only done when considering whether to repartition, as are the CPU times,
so is indicative rather than some peak use).
These should be useful when trying to understand how the overall balance per node is working.
As part of this the CPU times used to estimate the balance are changed to only include
time used by the tasks in engine_launch()
, previously all the time used in the step
was considered. Given the steps considered are those that interact all particles this
is not likely to make much difference, but clarifies what we are using.
Merge request reports
Activity
added 259 commits
-
89857dca...9bc31c9a - 258 commits from branch
master
- 03c37d8b - Merge remote-tracking branch 'origin/master' into repartition-cputime-update
-
89857dca...9bc31c9a - 258 commits from branch
added 1 commit
- 49508b68 - Only use one call to get CPU times and rationalise around that
added MPI enhancement feature request labels
added 1 commit
- 0d2aa4ac - Fix problem in mean calcs (index starts at 2 not 1)
added 1 commit
- a24ae526 - Extend to include the resident set size of the ranks in a separate file
assigned to @matthieu
Here are the two logs plotted for a run of EAGLE_25 on 14 ranks (COSMA5 2 ranks per node).
CPU:
Sorry that Y axis should say
CPU time (user)
.MEM:
Interesting to see that the initial partition balances memory better than work. The coloured by rank shows that during repartition we are stable, that is the memory and work of a rank stay in the order of the first repartition.
Edited by Peter W. Draperadded 7 commits
-
a24ae526...d8ff8f33 - 5 commits from branch
master
- 859eb5e4 - Merge remote-tracking branch 'origin/master' into repartition-cputime-update
- 7b4f6b03 - Change so that we get the balance logs when not attempting to do the…
-
a24ae526...d8ff8f33 - 5 commits from branch