New timestepping race condition / dependency issue
Hey y'all
I've been setting up new RT examples and noticed some new race condition / dependency issue in the current master.
It shows up sometimes during startup:
[00000.1] main: Reading initial conditions took 31.387 ms.
[00000.1] part_verify_links: All links OK
[00000.1] part_verify_links: took 1.584 ms.
[00000.1] main: Read 8000 gas particles, 0 sink particles, 0 star particles, 0 black hole particles, 0 DM particles, 0 DM background particles, and 0 neutrino DM particles from the ICs.
[00000.1] space_regrid: (re)griding space cdim=(9 9 9)
[00000.1] main: space_init took 4.131 ms.
[00000.1] potential_print_backend: External potential is 'No external potential'.
[00000.1] main: space dimensions are [ 1.000 1.000 1.000 ].
[00000.1] main: space is periodic.
[00000.1] main: highest-level cell dimensions are [ 9 9 9 ].
[00000.1] main: 8000 parts in 729 cells.
[00000.1] main: 8000 gparts in 729 cells.
[00000.1] main: 0 sinks in 729 cells.
[00000.1] main: 0 sparts in 729 cells.
[00000.1] main: 0 bparts in 729 cells.
[00000.1] main: maximum depth is 0.
[00000.1] engine_init: took 0.206 ms.
[00000.1] engine_config: Running simulation 'RT Cooling Test'.
[00000.1] engine_config: no processor affinity used
[00000.1] engine_policy: engine policies are [ 'steal' 'keep' 'numa affinity' 'hydro' 'external gravity' 'stars' 'feedback' 'rt' ]
[00000.1] eos_print: Equation of state: Ideal gas.
[00000.1] eos_print: Adiabatic index gamma: 1.666667.
[00000.1] pressure_floor_print: Pressure floor is 'none'
[00000.1] hydro_props_print: Hydrodynamic scheme: GIZMO MFV (Hopkins 2015) in 3D.
[00000.1] hydro_props_print: Hydrodynamic kernel: Cubic spline (M4) with eta=1.234800 (48.00 neighbours).
[00000.1] hydro_props_print: Hydrodynamic relative tolerance in h: 0.00010 (+/- 0.0144 neighbours).
[00000.1] hydro_props_print: Hydrodynamic integration: CFL parameter: 0.6000.
[00000.1] hydro_props_print: Hydrodynamic integration: Max change of volume: 1.40 (max|dlog(h)/dt|=0.112157).
[00000.1] hydro_props_print: Neighbour number definition: Unweighted.
[00000.1] hydro_props_print: Maximal time-bin difference between neighbours: 2
[00000.1] hydro_props_print: Minimal gas temperature set to 10.000000
[00000.1] hydro_props_print: No particle splitting
[00000.1] entropy_floor_print: Entropy floor is 'no entropy floor'.
[00000.1] stars_props_print: Stars kernel: Cubic spline (M4) with eta=1.234800 (48.00 neighbours).
[00000.1] stars_props_print: Stars relative tolerance in h: 0.00010 (+/- 0.0144 neighbours).
[00000.1] stars_props_print: Stars integration: Max change of volume: 1.40 (max|dlog(h)/dt|=0.112157).
[00000.1] stars_props_print: Maximal iterations in ghost task set to 30
[00000.1] engine_config: Absolute minimal timestep size: 6.938894e-19
[00000.1] engine_config: Minimal timestep size (on time-line): 5.960465e-09
[00000.1] engine_config: Maximal timestep size (on time-line): 6.103516e-06
[00000.1] engine_config: Restarts will be dumped every 5.000000 hours
[00000.1] engine_config: Using 9 threads in the thread-pool
[00000.1] engine_config: took 3.468 ms.
[00000.1] main: Running on 8000 gas particles, 0 sink particles, 0 stars particles 0 black hole particles, 0 neutrino particles, and 0 DM particles (8000 gravity particles)
[00000.1] main: from t=0.000e+00 until t=1.000e-01 with 1 ranks, 9 threads / rank and 9 task queues / rank (dt_min=1.000e-08, dt_max=1.000e-05)...
[00000.1] engine_init_particles: Setting particles to a valid state...
[00000.1] engine_init_particles: Computing initial gas densities and approximate gravity.
[00000.1] space_rebuild: (re)building space
[00000.3] engine_init_particles: Converting internal energy variable.
[00000.3] engine_init_particles: Running initial fake time-step.
[00000.3] space_rebuild: (re)building space
[00000.3] space_regrid: (re)griding space cdim=(8 8 8)
[00000.5] timestep.h:get_part_timestep():202: part (id=4512) wants a time-step (6.384528e-14) below dt_min (1.000000e-08)
./run.sh: line 23: 384333 Aborted (core dumped) ../../swift --hydro --threads=9 --verbose=0 --radiation --external-gravity --stars --feedback -n 1 ./rt_cooling_test.yml
while other times, the code runs just fine.
I haven't seen a related problem while the code is running (for now).
I opened a new branch with my new RT example that causes this issue: You can check out the branch timestep_race_condition_debugging
and run the example examples/RadiativeTransferTests/CoolingTest
(compile swift with --with-rt=GEAR_1 --with-rt-riemann-solver=GLF --with-hydro-dimension=3 --with-hydro=gizmo-mfv --with-riemann-solver=hllc --enable-debug --enable-debugging-checks --enable-optimization=no
)
I'll keep track of the issue in here.