Dumper thread only dumps 1 MPI rank
When running with multiple ranks and the dumper thread enabled, the dumper thread regularly only dumps 1 rank. Looking at the implementation, this makes sense: the dumper thread checks for the existence of .dump
. If it exists, it dumps and then deletes that file. If the ranks do not check for the existence of .dump
simultaneously, then chances are .dump
has already been deleted by the time other ranks notice it is there.
This is a bit annoying if you want to check for deadlocks caused by MPI communications, since chances are only 2 ranks will be responsible. If none of those ranks notice .dump
, then the dump is not very useful.
Would it be a good idea to add a rank-specific .dump
(e.g. .dump.0
, .dump.1
...)? That would address this issue and give the user more control over what is dumped, in case you have already identified the problematic ranks in another way.