"repartition has failed" error in Swift mpi
I am running a zoom simulation with 1365^3 particles using Swift_mpi on cosma7, that crashes soon after starting to run. I am using 12 nodes.
I have tried the following GRAVITY parameters:
mesh_side_length : 3072, 2048, 1024 (tried all three values)
max_top_level_cells: 32
cell_split_size: 200
This is the error that appears on the outfile:
[0000] [03528.4] repart_edge_metis: weight mapper took 0.831 ms.
[0000] [03550.6] repart_edge_metis: Node 0 is not present after repartition
[0000] [03550.6] repart_edge_metis: Node 2 is not present after repartition
[0000] [03550.6] repart_edge_metis: WARNING: repartition has failed, continuing with the current partition, load balance will not be optimal
[0000] [03550.6] partition_repartition: took 22417.633 ms.
Job IDs: 6379480 6383290 6385510
Outfiles can be found here:
/cosma7/data/dp004/dc-sant3/swift_Sib25Mpc/out_files/SwiftSib25Mpcx8.6385510.swift.out
/cosma7/data/dp004/dc-sant3/swift_Sib25Mpc/out_files/SwiftSib25Mpcx8.6383290.swift.out
/cosma7/data/dp004/dc-sant3/swift_Sib25Mpc/out_files/SwiftSib25Mpcx8.6379480.swift.out
Submit script is:
#!/bin/bash -l
#SBATCH --ntasks=12
#SBATCH -J SwiftSib25Mpcx8
#SBATCH -o /cosma7/data/dp004/dc-sant3/swift_Sib25Mpc/out_files/%x.%J.swift.out
#SBATCH -e /cosma7/data/dp004/dc-sant3/swift_Sib25Mpc/out_files/%x.%J.swift.err
#SBATCH -p cosma7
#SBATCH -A dp004
#SBATCH --exclusive
#SBATCH --cpus-per-task=28
#SBATCH --time=72:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=isabel.santos@durham.ac.uk
module purge
module load intel_comp/2021.1.0 compiler
module load intel_mpi/2018
module load ucx/1.8.1
module load parallel_hdf5/1.10.6
module load fftw/3.3.9cosma7
module load gsl/2.5
module load parmetis/4.0.3-64bit
mpirun -np $SLURM_NTASKS /cosma7/data/dp004/dc-sant3/swift_Sib25Mpc/swiftc7_mpi -v 1 --pin --cosmology --self-gravity --threads=$SLURM_CPUS_PER_TASK --fof Sib25Mpcx8_params.yml
echo "Job done, info follows..."
sacct -j $SLURM_JOBID --format=JobID,JobName,Partition,AveRSS,MaxRSS,AveVMSize,MaxVMSize,Elapsed,ExitCode
Edited by Isabel Santos