Performance of full model
Here is an early analysis of a full model EAGLE-25 run running on 8 nodes (16 ranks) of cosma-7. This is with parmetis, tbbmalloc and the maximal level of code optimization and vectorization.
Total measured time: 84118.528 s Total time: 84430.300000 s Time spent in the different code sections: - 'Engine Launch ' (203508 calls, time: 37320.2026s): 44.2024% - 'Engine Collect End Of Step ' (203506 calls, time: 35810.9253s): 42.4148% - 'Space Rebuild ' ( 8728 calls, time: 2364.8237s): 2.8009% - 'Engine Exchange Cells ' ( 8728 calls, time: 1570.1745s): 1.8597% - 'Writing Particle Properties ' ( 100 calls, time: 981.0003s): 1.1619% - 'Creating Recv Tasks ' ( 8728 calls, time: 912.8609s): 1.0812% - 'Communicating Rebuild Flag ' (203506 calls, time: 823.3623s): 0.9752% - 'Engine Drift All ' ( 8947 calls, time: 716.6176s): 0.8488% Elements in 'Other' category (<0.8%): - 'Exchanging Cell Tags ' ( 8728 calls, time: 535.5152s): 0.6343% - 'Gpart Assignment ' ( 8728 calls, time: 506.9014s): 0.6004% - 'Engine Unskip ' (194786 calls, time: 455.3908s): 0.5394% - 'Engine Print Task Counts ' (212236 calls, time: 303.6919s): 0.3597% - 'Recursively Linking Foreign Arrays ' ( 8728 calls, time: 247.5270s): 0.2932% ....
The time spent in
Engine Collect End Of Step is basically imbalance time.
This is not great. We may need to re-assess the weight of the tasks we have.
The run re-partitioned 41 times over the course of these 200k steps.