Speeds up self-gravity on my EAGLE_50 tests using Intel/2018 plus:
./configure --with-tbbmalloc --with-parmetis --enable-debug
full steps go from 155 down to 105 (x2 in log message included some I/O). These are MPI runs on a single node using all the cores (4x16=64).
Only known to work for EPYC at Durham, but patterns for other AMDs that claim AVX2 support are included. See: