SPH time-stepping may be broken
Run some new tests of my parmetis-perm branch after merging in the latest master
and it seems that the timestep increment used in SPH has changed a lot and is much
smaller than it was. For instance previously we had:
256 2.441406e-06 1.000000e+00 0.00000 9.536743e-09 36 44 9622536 0 0 3613.280 0
257 2.450943e-06 1.000000e+00 0.00000 9.536743e-09 36 36 89 0 0 15.354 0
258 2.460480e-06 1.000000e+00 0.00000 9.536743e-09 36 37 1886 0 0 31.786 0
259 2.470016e-06 1.000000e+00 0.00000 9.536743e-09 36 36 89 0 0 18.534 0
260 2.479553e-06 1.000000e+00 0.00000 9.536743e-09 36 38 24722 0 0 42.264 0
261 2.489090e-06 1.000000e+00 0.00000 9.536743e-09 36 36 89 0 0 16.262 0
262 2.498627e-06 1.000000e+00 0.00000 9.536743e-09 36 37 1898 0 0 26.806 0
263 2.508163e-06 1.000000e+00 0.00000 9.536743e-09 36 36 90 0 0 16.547 0
264 2.517700e-06 1.000000e+00 0.00000 9.536743e-09 36 39 119088 0 0 95.241 0
265 2.527237e-06 1.000000e+00 0.00000 9.536743e-09 36 36 90 0 0 18.810 0
266 2.536774e-06 1.000000e+00 0.00000 9.536743e-09 36 37 1903 0 0 28.251 0
267 2.546310e-06 1.000000e+00 0.00000 9.536743e-09 36 36 90 0 0 17.129 0
268 2.555847e-06 1.000000e+00 0.00000 9.536743e-09 36 38 24846 0 0 43.432 0
269 2.565384e-06 1.000000e+00 0.00000 9.536743e-09 36 36 90 0 0 17.163 0
270 2.574921e-06 1.000000e+00 0.00000 9.536743e-09 36 37 1911 0 0 29.760 0
and now we see:
256 2.441406e-06 1.000000e+00 0.00000 9.536743e-09 36 44 9621833 0 0 3568.128
257 2.441704e-06 1.000000e+00 0.00000 2.980232e-10 31 31 2 0 0 10.8250
258 2.442002e-06 1.000000e+00 0.00000 2.980232e-10 31 32 3 0 0 10.050
259 2.442300e-06 1.000000e+00 0.00000 2.980232e-10 31 31 2 0 0 9.5710
260 2.442598e-06 1.000000e+00 0.00000 2.980232e-10 31 33 5 0 0 10.060
261 2.442896e-06 1.000000e+00 0.00000 2.980232e-10 31 31 2 0 0 12.107
262 2.443194e-06 1.000000e+00 0.00000 2.980232e-10 31 32 3 0 0 11.997
263 2.443492e-06 1.000000e+00 0.00000 2.980232e-10 31 31 2 0 0 10.859
264 2.443790e-06 1.000000e+00 0.00000 2.980232e-10 31 34 5 0 0 12.097
265 2.444088e-06 1.000000e+00 0.00000 2.980232e-10 31 31 2 0 0 10.489
266 2.444386e-06 1.000000e+00 0.00000 2.980232e-10 31 32 3 0 0 12.253
267 2.444685e-06 1.000000e+00 0.00000 2.980232e-10 31 31 2 0 0 12.099
268 2.444983e-06 1.000000e+00 0.00000 2.980232e-10 31 33 5 0 0 12.844
269 2.445281e-06 1.000000e+00 0.00000 2.980232e-10 31 31 2 0 0 12.025
270 2.445579e-06 1.000000e+00 0.00000 2.980232e-10 31 32 3 0 0 11.894
So the increment changes from 9.536743e-09 to 2.980232e-10, and never recovers.
This is using the command:
mpirun -np 8 ../swift_mpi -a -t 16 -s eagle_50.yml
on COSMA6 with the swift/c5/intel/intelmpi/2017-parallel modules.
Tried to repeat this on smaller volumes, but no luck and haven't had time to look
at master itself, just my merge.
The last time it was known to be reasonable was 1358ac9a (on my branch), that is just after the merge 79d55a6b on master.
Flagging now as I don't want to sit on this for two weeks. May try some more tests if I find time.