Skip to content

SPH time-stepping may be broken

Run some new tests of my parmetis-perm branch after merging in the latest master and it seems that the timestep increment used in SPH has changed a lot and is much smaller than it was. For instance previously we had:

     256   2.441406e-06   1.000000e+00    0.00000   9.536743e-09   36   44      9622536            0            0              3613.280      0
     257   2.450943e-06   1.000000e+00    0.00000   9.536743e-09   36   36           89            0            0                15.354      0
     258   2.460480e-06   1.000000e+00    0.00000   9.536743e-09   36   37         1886            0            0                31.786      0
     259   2.470016e-06   1.000000e+00    0.00000   9.536743e-09   36   36           89            0            0                18.534      0
     260   2.479553e-06   1.000000e+00    0.00000   9.536743e-09   36   38        24722            0            0                42.264      0
     261   2.489090e-06   1.000000e+00    0.00000   9.536743e-09   36   36           89            0            0                16.262      0
     262   2.498627e-06   1.000000e+00    0.00000   9.536743e-09   36   37         1898            0            0                26.806      0
     263   2.508163e-06   1.000000e+00    0.00000   9.536743e-09   36   36           90            0            0                16.547      0
     264   2.517700e-06   1.000000e+00    0.00000   9.536743e-09   36   39       119088            0            0                95.241      0
     265   2.527237e-06   1.000000e+00    0.00000   9.536743e-09   36   36           90            0            0                18.810      0
     266   2.536774e-06   1.000000e+00    0.00000   9.536743e-09   36   37         1903            0            0                28.251      0
     267   2.546310e-06   1.000000e+00    0.00000   9.536743e-09   36   36           90            0            0                17.129      0
     268   2.555847e-06   1.000000e+00    0.00000   9.536743e-09   36   38        24846            0            0                43.432      0
     269   2.565384e-06   1.000000e+00    0.00000   9.536743e-09   36   36           90            0            0                17.163      0
     270   2.574921e-06   1.000000e+00    0.00000   9.536743e-09   36   37         1911            0            0                29.760      0

and now we see:

     256   2.441406e-06   1.000000e+00    0.00000   9.536743e-09   36   44      9621833            0            0              3568.128
     257   2.441704e-06   1.000000e+00    0.00000   2.980232e-10   31   31            2            0            0                10.8250
     258   2.442002e-06   1.000000e+00    0.00000   2.980232e-10   31   32            3            0            0                10.050
     259   2.442300e-06   1.000000e+00    0.00000   2.980232e-10   31   31            2            0            0                 9.5710
     260   2.442598e-06   1.000000e+00    0.00000   2.980232e-10   31   33            5            0            0                10.060
     261   2.442896e-06   1.000000e+00    0.00000   2.980232e-10   31   31            2            0            0                12.107
     262   2.443194e-06   1.000000e+00    0.00000   2.980232e-10   31   32            3            0            0                11.997
     263   2.443492e-06   1.000000e+00    0.00000   2.980232e-10   31   31            2            0            0                10.859
     264   2.443790e-06   1.000000e+00    0.00000   2.980232e-10   31   34            5            0            0                12.097
     265   2.444088e-06   1.000000e+00    0.00000   2.980232e-10   31   31            2            0            0                10.489
     266   2.444386e-06   1.000000e+00    0.00000   2.980232e-10   31   32            3            0            0                12.253
     267   2.444685e-06   1.000000e+00    0.00000   2.980232e-10   31   31            2            0            0                12.099
     268   2.444983e-06   1.000000e+00    0.00000   2.980232e-10   31   33            5            0            0                12.844
     269   2.445281e-06   1.000000e+00    0.00000   2.980232e-10   31   31            2            0            0                12.025
     270   2.445579e-06   1.000000e+00    0.00000   2.980232e-10   31   32            3            0            0                11.894

So the increment changes from 9.536743e-09 to 2.980232e-10, and never recovers.

This is using the command:

mpirun -np 8 ../swift_mpi -a -t 16 -s eagle_50.yml

on COSMA6 with the swift/c5/intel/intelmpi/2017-parallel modules.

Tried to repeat this on smaller volumes, but no luck and haven't had time to look at master itself, just my merge.

The last time it was known to be reasonable was 1358ac9a (on my branch), that is just after the merge 79d55a6b on master.

Flagging now as I don't want to sit on this for two weeks. May try some more tests if I find time.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information