Multi-time step MPI problems.
Tests of the following batch script:
#!/bin/bash -l # # Batch script for bash users # #BSUB -L /bin/bash #BSUB -n 4 #BSUB -J SWIFT-mpi-test #BSUB -oo job.dump #BSUB -eo job.err #BSUB -q bench1 #BSUB -P durham #BSUB -R span[ptile=1] #BSUB -x #BSUB -W 00:30 NTHREADS=12 module purge module load swift module load swift/c4/intel/intelmpi/5.1.2 mpirun -np 4 ../swift_mpi -t $NTHREADS -f SodShock/sodShock.hdf5 -m 0.01 -w 5000 -c 1. -d 1e-7 -e 0.01
Give rise to a deadlock unless the affinity code is completely disabled (we seem to have many threads sharing the same cores). Once the affinity code is removed, the job will run but occasions of negative time steps are seen, for instance:
42 0.005371 0.000122 1024127 490.381 43 0.004639 -0.000732 1024128 134.555 44 0.004883 0.000244 1024128 121.384 45 0.005127 0.000244 1024128 112.810 46 0.005371 0.000244 1024128 117.759
and the job eventually fails when
a regrid of the top-level cells. This never happened
for the SodShock with fixed dt.
For reference the SHA at this time was: 33caebf8