Draft: Add ICC Milan optimization and change AMD architecture to corei7-avx.
Add ICC Milan CPU ids and optimization.
Change the basic architecture to corei7-avx from core-avx2 as tests show that to be 20% faster for gravity only and the same speed for hydro.
Merge request reports
Activity
added Configuration architecture compilation enhancement performance vectorization labels
assigned to @pdraper
@matthieu I would have preferred to just apply
-march=corei7-avx
to the gravity iact code as that is sufficient, but there is no way to do that. Also failed to work out what the crucial optimization difference was, at least one less inline and some vectorized strided movements, was reported by ICC, but no obvious code changes made those work.Must just be better at 128 widths. Feel free to try as that would be better!
requested review from @matthieu
Changed to draft as I checked this at high z and things look slower by 3%, so I guess speeding up the direct interactions isn't always useful and the other gravity code is sightly better with AVX2. Messy since I really am stuck for a way to apply this flag to only the gravity convenience library. This is because the optimizing flags are added to
CFLAGS
which is always appended last so that the user flags can override those of the configure system, which is where we need to make this change from and we already have-march=core-avx2
inCFLAGS
.Edited by Peter W. Draperadded 1 commit
- aaaef57a - Shell out the optimization flags into OPT_CFLAGS so we can control how they...
added 1 commit
- 49cc96df - Remove unused changes. Now in different file.
added 1 commit
- 1df75787 - Use OPT_CFLAGS as optimizations are no longer in CFLAGS
So this keeps the low-z improvement, but at high-z things are only slightly better and simple avx2 is still the winner. Seems we are at an impasse and cannot have our cake and eat it. Guess the proper solution would be to look at the autovectorization and work out why exactly AVX is faster than AVX2 for some of the gravity interactions.
Going to close this for now.
mentioned in issue #865 (closed)
Superseded by !1797 (closed)