Add ParMETIS support

Back to WIP: as the balance in longer runs isn't as good as master. More investigation required, but the most likely culprit must be the initial partition since this difference can be seen in the first steps. It isn't possible to get the same partition as serial METIS as the options[] array tuning isn't available in ParMETIS.

added 115 commits

3c383661...a3866435 - 114 commits from branch master
34aa1a37 - Merge remote-tracking branch 'origin/master' into parmetis-perm

Compare with previous version

added 1 commit

9dc16374 - Fix problems with weights and schemes

Compare with previous version

added 59 commits

9dc16374...a727dda0 - 56 commits from branch master
1a9680cb - Change balanced initial partition to memory, should be more obvious
346b3b77 - Use METIS for the initial partition and when not 'refining', this gives a better global solution
af9cb136 - Merge remote-tracking branch 'origin/master' into parmetis-perm

Compare with previous version

added 1 commit

7fc8e370 - Make the itr value a parameter, it will almost certainly benefit from tuning

Compare with previous version

added 40 commits

7fc8e370...07f1e5b1 - 39 commits from branch master
4a8fc023 - Merge remote-tracking branch 'origin/master' into parmetis-perm

Compare with previous version

added 1 commit

8a463288 - Counts is no longer used

Compare with previous version

Still looking into why the balance using this scheme is not as good, here are some plots that may give a clue. These use my old technique of dumping the cells at the end of each step, extracting those that are top-level and active and working out which are on the edges of the domains, i.e. will be doing MPI. The y-axis is just the step and the x-axis the ratio of active cells to cells, points are coloured by how many active cells in the step and their is a histogram showing the density of points (not always visible). For a run of EAGLE_50 on 8 nodes for 8192 steps that is forced to repartition after 1000 steps we get the following, first for master:

now this branch:

added 17 commits

8a463288...7b0aa8f0 - 16 commits from branch master
e7b09db5 - Merge remote-tracking branch 'origin/master' into parmetis-perm

Compare with previous version

Using timebin based weighting for the edges doesn't change things substantially, but you can see the effect of this quite clearly! First master:

Now the parmetis branch:

Trying this out with some changes to the refinement balance between edges and vertices, but that isn't looking useful so far.

added 1 commit

14671b49 - Generalise process_cells for any number of ranks

Compare with previous version

added 2 commits

2b765b45 - Dump top-level as well as super cells, we need top-level for comparison plots
d7c24ae2 - Don't use costs of MPI tasks in the cell decomposition weights

Compare with previous version

added 692 commits

d7c24ae2...e95331dd - 690 commits from branch master
4a18404b - Merge remote-tracking branch 'origin/master' into parmetis-perm
dc8a6050 - Formatting

Compare with previous version

added 1 commit

67e9b219 - Reinstate the vertex-only weights option

Compare with previous version

added 48 commits

67e9b219...e9d05916 - 47 commits from branch master
de28106f - Merge branch 'master' into parmetis-perm

Compare with previous version

added 1 commit

3717d3ad - Bring back METIS as a configure option and allow adaptive and normal refinement with ParMETIS

Compare with previous version

added 4 commits

551894c3 - Merge remote-tracking branch 'origin/master' into parmetis-perm
bbefa289 - Fix the sum of weights sent to PARMETIS/METIS so that we avoid overflows
1a44222f - Merge remote-tracking branch 'origin/master' into parmetis-perm-test
0aa63048 - Merge in changes that limit sum of weights to less than METIS idx_t range

Compare with previous version

added 1 commit

f3a06c4b - Make use of METIS an option when ParMETIS is configured

Compare with previous version

added 1 commit

b0258bb5 - Need pick_metis() when only have ParMETIS and usemetis is true

Compare with previous version

added 1 commit

67bd6d88 - Keep timebin weights in some balance with vertex weights

Compare with previous version

added 1 commit

5a2a6074 - Add function to give a different seed for randoms

Compare with previous version

added 1 commit

2dcd0dff - Iterate a few solutions using ParMETIS when not refining

Compare with previous version

added 188 commits

2dcd0dff...4669036f - 187 commits from branch master
2c149381 - Merge remote-tracking branch 'origin/master' into parmetis-perm

Compare with previous version

added 1 commit

1358ac9a - Fix bug sending regionids to other ranks and don't use initial partition as basis for refinement

Compare with previous version

After a lot of effort here is a plot of the final "solution". First without logging the Y axis, so range is reduced from full:

Now with the usual log Y axis and all the data:

The result is that although we see gains by not moving particles around so much they are very small compared to the variations we see in the overall balance, so the three schemes (METIS, ParMETIS adaptive, ParMETIS refine) are pretty much the same within the measurement error. The plan is now to just offer ParMETIS as an option for testing as we move to larger simulations (these measurements are done using EAGLE_50 on 8 nodes of COSMA6, just SPH as MPI Gravity isn't available at present).

changed title from WIP: Use ParMETIS to reduce the amount of particle movement to WIP: Add ParMETIS support

changed the description

added 1 commit

45c63d97 - Moved variable into used section.

Compare with previous version

changed the description

added 112 commits

45c63d97...50457d68 - 111 commits from branch master
3e6f8b0c - Merge remote-tracking branch 'origin/master' into parmetis-perm

Compare with previous version

For interest I've been running some jobs and looking for indications about where our balance is, and how that is affecting the runtime, comparing the master (simple METIS) and this code (hybrid ParMETIS/METIS). Not managed to draw any useful conclusions, but the results are interesting enough to share. These are from two run for 3 hours on the same nodes 8 of COSMA6, SPH only, EAGLE 50.

First the simulation time achieved comparisons. Done using steps v internal time:

Surprisingly this code performs more steps and proceeds further into the simulation.

We can see this if we plot the cumulative time in steps against the simulation time.

So the speed up comes early and stays with us.

Now a favourite plot, number of updated particles versus the time per step:

Some indication of why this is true. Clearly the shortest time steps are running faster, but when?

It seems at all times, but this plot also shows that we do better, some of the time at other step loci as well.

Burrowing deeper, for these runs I also output the real CPU time (user) per step as a min, max and mean across the ranks. This allows us to ask how well is our scaling for these different timesteps.

CPU estimate	ParMETIS/METIS	METIS
Mean	!	!

Max	!	!

Min	!	!

So here we have simulation time against the ratio of CPU time to time per step. Note that the CPU time is divided by the number of engine threads, so 16 in this case, that is only approximate as we have the threadpool 16 as well, but these two thread groups rarely operate at the same time (the only instance is when initially queuing tasks). The size of the circle is a function of the number of particle updates in the step, so as we expect the large circles are more efficient. I think you can see that the ParMETIS run is more efficient at small and intermediary steps, but stare closely.

The really interesting plots are the min/max ones. These directly show the imbalance between the nodes. Maybe we should be doing better than this, especially for the larger steps.

added 122 commits

3e6f8b0c...79c022ee - 119 commits from branch master
45a713e9 - Merge remote-tracking branch 'origin/master' into parmetis-perm
9b6b6704 - Merge remote-tracking branch 'origin/master' into parmetis-perm
e8c760fe - Still need celllist when using plain METIS

Compare with previous version

Thanks for the very detailed analysis.

I have been running some EAGLE-50 examples with gravity over MPI and I find that the time for the small step is about 500ms, which is about 20x worse than in EAGLE_25. I look forward to trying out the parametis version and use that to recalibrate the gravity task weights.

@nnrw56 you will like these plots.

@matthieu, yes, I do! Although I'm not sure I understand all of them.

So the main message is that ParMETIS now does better than METIS, and this is mainly due to the smaller timesteps? If so, then that is excellent news, and means that it is working as expected!

I'm not quite sure I understand the final set of plots, though I probably would get them framed and hung in my office :) It looks like, on average, we're getting better CPU usage for ParMETIS, which means we're probably waiting less on send/recv tasks, which is consistent with my interpretation of the first two plots. Is that correct?

@pdraper, do you have a plot, or at least the data to make one, showing the actual amount of data sent back and forth for each time step? My guess is that we should see ParMETIS sending fewer particles than METIS, because it builds and maintains a better partition, which is the main source of improvement for the other results.

In any case, awesome stuff!

Yes, ParMETIS seems to do better overall.

We could possibly improve things on the smallest steps:

Our interpretation is that it is related to the fact that on the smallest steps we still try to run with 16 threads only if there is only one cell to update and hence only one chain of dependencies. Looking at task plots we see that on small steps multiple threads run these tasks. So the minimum time is not 0 and the max is not 1. Whilst it could be. We could consider unleashing only 1 thread on the task graph when there is little to do. That would reduce the contention between threads, improve cache re-use and allow the one and only running core to clock up.

Add ParMETIS support

Merged by (Jun 1, 2025 10:49am UTC)

Activity

Add ParMETIS support

Merge request reports

Merged by (Jun 1, 2025 10:49am UTC)

Activity