Draft: Decrease default vectorization threshold when compiling gravity interactions
When optimizing with AVX2 some of the loops (notably at line 1092) are no longer vectorized. This makes that happen and gives good speed ups.
Fixes #865 (closed)
Now draft as the real solution could be a regression in how we use openmp pragmas as compilation hints.
Merge request reports
Activity
added Configuration performance vectorization labels
assigned to @matthieu
mentioned in issue #865 (closed)
mentioned in merge request !1763 (closed)
From the big job:
Last five steps involving all the particles before the change in flags:
# Step Time Scale-factor Redshift Time-step Time-bins Updates g-Updates s-Updates sink-Updates b-Updates Wall-clock time [ms] Props Dead time [ms] 5678 6.608618e-03 0.5371942 0.8615242 1.140760e-06 41 46 0 1199808512000 0 0 0 839950.938 257 39973.425 5710 6.645208e-03 0.5393804 0.8539791 1.145912e-06 41 49 0 1199808512000 0 0 0 859338.562 257 42677.680 5742 6.681961e-03 0.5415755 0.8464646 1.151054e-06 41 46 0 1199808512000 0 0 0 877712.125 257 43392.041 5774 6.718881e-03 0.5437795 0.8389806 1.156219e-06 41 47 0 1199808512000 0 0 0 863896.125 257 40606.240 5806 6.755965e-03 0.5459925 0.8315269 1.161391e-06 41 46 0 1199808512000 0 0 0 830822.625 257 41168.996
First five steps involving all the particles after the change in flags:
# Step Time Scale-factor Redshift Time-step Time-bins Updates g-Updates s-Updates sink-Updates b-Updates Wall-clock time [ms] Props Dead time [ms] 5838 6.793214e-03 0.5482145 0.8241034 1.166552e-06 41 48 0 1199808512000 0 0 0 747684.812 257 29803.102 5870 6.830630e-03 0.5504456 0.8167100 1.171735e-06 41 46 0 1199808512000 0 0 0 774931.125 257 32042.306 5902 6.868211e-03 0.5526857 0.8093465 1.176924e-06 41 47 0 1199808512000 0 0 0 778530.625 257 30668.313 5934 6.905958e-03 0.5549350 0.8020129 1.182110e-06 41 46 0 1199808512000 0 0 0 768847.062 257 31783.021 5966 6.943871e-03 0.5571934 0.7947091 1.187301e-06 41 50 0 1199808512000 0 0 0 793773.250 257 33252.469
The code did redistribute after some of these steps but the reported imbalance remains about the same at 38%.
Edited by Matthieu Schaller- Resolved by Matthieu Schaller
I can run the DMO test we used to benchmark the different modes in the summer if it helps.
I've ran some DMO tests with EAGLE_50x2 and that still seems better, so not just some small volume effect (like more turboboost). Still leaves open the question of how the different resolution may impact the choice of interactions. Are there some smaller example snapshots of this run or maybe FLAMINGO at this resolution?
Closing as superseded by !1804 (merged)