Error running EAGLE_75 on testing/benchmark cluster
When running an EAGLE_75
box on 6 nodes of the testing cluster I get the following MPI error:
[0005] [01022.0] scheduler.c:scheduler_enqueue():1302: Failed to emit irecv for particle data.
Invalid tag, error stack:
MPI_Irecv(170): MPI_Irecv(buf=0x7f7d4a0cd380, count=5201, dtype=USER<contig>, src=2, tag=4342092, MPI_COMM_WORLD, request=0x7f8adf83e020) failed
MPI_Irecv(109): Invalid tag, value is 4342092
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 5
after 257 steps.
Setup
[0000] [00000.0] main: MPI is up and running with 6 node(s).
Welcome to the cosmological hydrodynamical code
______ _________________
/ ___/ | / / _/ ___/_ __/
\__ \| | /| / // // /_ / /
___/ /| |/ |/ // // __/ / /
/____/ |__/|__/___/_/ /_/
SPH With Inter-dependent Fine-grained Tasking
Version : 0.6.0
Revision: v0.6.0-414-g7a595a27, Branch: master, Date: 2017-09-19 17:08:21 +0100
Webpage : www.swiftsim.com
Config. options: 'CC=icc --with-metis --with-hdf5=/usr/local/hdf5/bin/h5cc CFLAGS=-xCORE-AVX512'
Compiler: ICC, Version: 17.0.20160721
CFLAGS : '-xCORE-AVX512 -O3 -ansi_alias -w2 -Wunused-variable -Wshadow -Werror'
HDF5 library version: 1.8.19
MPI library: Intel(R) MPI Library 2017 for Linux* OS (MPI std v3.1)
METIS library version: 5.1.0
Submission Script
#!/bin/bash
#SBATCH -N 6
#SBATCH -o out_file.o%j
#SBATCH -e err_file.e%j
#SBATCH --exclusive
#SBATCH --tasks-per-node=1
#SBATCH -J SWIFT-Benchmark
#SBATCH -t 240
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/skl1user1/Eagle/software/lib
mpirun -bootstrap srun ../swift_mpi -s -a -t 20 -n 4096 eagle_75.yml
Any ideas? I have never seen this bug before. How could the MPI_Irecv
fail to be emitted?