Performance of full model
Here is an early analysis of a full model EAGLE-25 run running on 8 nodes (16 ranks) of cosma-7. This is with parmetis, tbbmalloc and the maximal level of code optimization and vectorization.
Total measured time: 84118.528 s
Total time: 84430.300000 s
Time spent in the different code sections:
- 'Engine Launch ' (203508 calls, time: 37320.2026s): 44.2024%
- 'Engine Collect End Of Step ' (203506 calls, time: 35810.9253s): 42.4148%
- 'Space Rebuild ' ( 8728 calls, time: 2364.8237s): 2.8009%
- 'Engine Exchange Cells ' ( 8728 calls, time: 1570.1745s): 1.8597%
- 'Writing Particle Properties ' ( 100 calls, time: 981.0003s): 1.1619%
- 'Creating Recv Tasks ' ( 8728 calls, time: 912.8609s): 1.0812%
- 'Communicating Rebuild Flag ' (203506 calls, time: 823.3623s): 0.9752%
- 'Engine Drift All ' ( 8947 calls, time: 716.6176s): 0.8488%
Elements in 'Other' category (<0.8%):
- 'Exchanging Cell Tags ' ( 8728 calls, time: 535.5152s): 0.6343%
- 'Gpart Assignment ' ( 8728 calls, time: 506.9014s): 0.6004%
- 'Engine Unskip ' (194786 calls, time: 455.3908s): 0.5394%
- 'Engine Print Task Counts ' (212236 calls, time: 303.6919s): 0.3597%
- 'Recursively Linking Foreign Arrays ' ( 8728 calls, time: 247.5270s): 0.2932%
....
The time spent in Engine Collect End Of Step
is basically imbalance time.
This is not great. We may need to re-assess the weight of the tasks we have.
The run re-partitioned 41 times over the course of these 200k steps.
Edited by Matthieu Schaller