Skip to content

EAGLE-50 fails with 'PARMETIS ERROR: sum weight for constraint 0 is zero'

Yesterday I submitted a grid of EAGLE 50Mpc boxes running with full physics and using MPI with parmetis. They reached their 24 hour run time limit and resubmitted automatically. One of them failed on the first timestep after the restart. The last few lines of stdout look like this:

[0013] [00022.4] engine_compute_next_fof_time: Next FoF time set to a=1.463327e-01.
[0000] [00022.4] main: engine_config took 1582.840 ms.
#   Step           Time Scale-factor     Redshift      Time-step Time-bins      Updates    g-Updates    s-Updates    b-Updates  Wall-clock time [ms]  Props
   63166   9.898754e-04    0.1460576    5.8466137   6.856289e-09   36   47    424678394    850518004       574189         6413                 0.000      0
PARMETIS ERROR: sum weight for constraint 0 is zero.

The wall clock time for the time step is reported as zero. I imagine this might be why parmetis is complaining about zero weights.

I'm running with commit 77dc3b54 from master and starting swift with this:

ccc_mprun ${codedir}/swiftsim/examples/swift_mpi --verbose=0 \
    --param=Restarts:resubmit_on_exit:1 \
    --param=Restarts:resubmit_command:${codedir}/irene/resub.sh \
    --param=Restarts:max_run_time:23.0 \
    --param=Snapshots:output_list:${codedir}/output_times.txt \
    --param=InitialConditions:file_name:/ccc/store/cont005/ra4707/hellyjoh/EAGLE_ICs/SwiftICs/EAGLE_L0050N0752_ICs.hdf5 \
    --param=EAGLECooling:dir_name:${codedir}/Data/coolingtables/ \
    --param=EAGLEFeedback:filename:${codedir}/Data/yieldtables/ \
    --pin --cosmology ${eagle_flags} \
    --threads=24 eagle_50.yml

Here's the stdout file: swift.3857086.1.out. Stderr just has messages saying the parmetis function call failed, e.g.:

[0000] [00049.9] partition.c:pick_parmetis():1036: Call to ParMETIS_V3_AdaptiveRepart failed.
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
In: PMI_Abort(-1, application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3)

The input parameters were eagle_50.yml.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information