Skip to content

OpenMPI deadlocks on MPI exchanges using full physics

I've been running an OpenMPI version of the full physics test in #584, which deadlocks after some (unfixed) period. Using the task dumper !859 (merged) I can now get a dump of the currently unskipped tasks, and doing some basic checks I see that we have, for the one time I've captured so far, 10 fewer sends than recvs active. On closer inspection these are all spart exchanges.

Only odd thing about all this is that the Intel MPI runs never show this problem. Hmm, seeing if I can capture another instance and see what that reports.

COSMA modules in use:

intel_comp/2018 openmpi/3.0.1 fftw/3.3.7 parallel_hdf5/1.10.3 parmetis/4.0.3 gsl/2.4
Edited by Peter W. Draper
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information