Skip to content

Issue with EAGLE_100 with gcc7 and openMPI 2.x

Running EAGLE_100 on our new machine with gcc toolset seems to result in a crash.

Running branch skylake_server_support @ c67f1691

Config:
# Define the system of units to use internally.
InternalUnitSystem:
  UnitMass_in_cgs:     1.989e43      # 10^10 M_sun in grams
  UnitLength_in_cgs:   3.085678e24   # Mpc in centimeters
  UnitVelocity_in_cgs: 1e5           # km/s in centimeters per second
  UnitCurrent_in_cgs:  1             # Amperes
  UnitTemp_in_cgs:     1             # Kelvin

# Parameters governing the time integration
TimeIntegration:
  time_begin: 0.    # The starting time of the simulation (in internal units).
  time_end:   1e-2  # The end time of the simulation (in internal units).
  dt_min:     1e-10 # The minimal time-step size of the simulation (in internal units).
  dt_max:     1e-4  # The maximal time-step size of the simulation (in internal units).

# Parameters governing the snapshots
Snapshots:
  basename:            eagle # Common part of the name of output files
  time_first:          0.    # Time of the first output (in internal units)
  delta_time:          1e-3  # Time difference between consecutive outputs (in internal units)

# Parameters governing the conserved quantities statistics
Statistics:
  delta_time:          1e-2 # Time between statistics output

# Parameters for the self-gravity scheme
Gravity:
  eta:                   0.025    # Constant dimensionless multiplier for time integration.
  epsilon:               0.0001   # Softening length (in internal units).
  theta:                 0.7      # Opening angle (Multipole acceptance criterion)

# Parameters for the hydrodynamics scheme
SPH:
  resolution_eta:        1.2348   # Target smoothing length in units of the mean inter-particle separation (1.2348 == 48Ngbs with the cubic spline kernel).
  CFL_condition:         0.1      # Courant-Friedrich-Levy condition for time integration.

# Parameters related to the initial conditions
InitialConditions:
  file_name:  /lustre/scafellpike/local/HCH028/mjm02/axc67-mjm02/swiftsim_skl/examples/EAGLE_100/EAGLE_ICs_100.hdf5     # The file to read

End of output:

[0000] [02960.2] engine_policy: engine policies are [  steal  keep  mpi  numa_affinity  hydro  ]
[0000] [02960.2] eos_print: Equation of state: Ideal gas.
[0000] [02960.2] eos_print: Adiabatic index gamma: 1.666667.
[0000] [02960.2] hydro_props_print: Hydrodynamic scheme: Gadget-2 version of SPH (Springel 2005) in 3D.
[0000] [02960.2] hydro_props_print: Hydrodynamic kernel: Cubic spline (M4) with eta=1.234800 (48.00 neighbours).
[0000] [02960.2] hydro_props_print: Hydrodynamic relative tolerance in h: 0.00010 (+/- 0.0144 neighbours).
[0000] [02960.2] hydro_props_print: Hydrodynamic integration: CFL parameter: 0.1000.
[0000] [02960.2] hydro_props_print: Hydrodynamic integration: Max change of volume: 1.40 (max|dlog(h)/dt|=0.112157).
[0000] [02960.2] engine_init: Absolute minimal timestep size: 6.938894e-20
[0000] [02960.2] engine_init: Minimal timestep size (on time-line): 7.450580e-11
[0000] [02960.2] engine_init: Maximal timestep size (on time-line): 7.812500e-05
[0000] [02960.2] main: engine_init took 6.364 ms.
[0000] [02960.2] main: Running on 3234281470 gas particles, 0 star particles and 0 DM particles (0 gravity particles)
[0000] [02960.2] main: from t=0.000e+00 until t=1.000e-02 with 64 threads and 64 queues (dt_min=1.000e-10, dt_max=1.000e-04)...
[0000] [03069.0] engine_init_particles: Computing initial gas densities.
[0020] [03231.6] scheduler.c:scheduler_ranktasks():1018: Unsatisfiable task dependencies detected.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 20 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

jobscript (is a mess)

#BSUB -J TEST_script
#BSUB -o /lustre/scafellpike/local/HCH028/mjm02/axc67-mjm02/out.%J.txt
#BSUB -e /lustre/scafellpike/local/HCH028/mjm02/axc67-mjm02/err.%J.txt
#BSUB -n 32
#BSUB -R 'span[ptile=1]'
#BSUB -W 16:00
#BSUB -q scafellpikeSKL
#BSUB -x

module purge
module load use.scafellpike gcc7/7.2.0 openmpi-gcc7/2.1.1 hdf5-gcc7/1.10.1

cd /lustre/scafellpike/local/HCH028/mjm02/axc67-mjm02/swift-gcc/swiftsim/examples/EAGLE_100
mpirun -np 32 ../swift_mpi -s -t 64 eagle_100.yml 2>&1 | tee output.log

Will run again with -v 2 shortly and update when i'm next in (tuesday)

Edited by Aidan Chalk
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information