Restarts at runtime update
Use a single runtime across all ranks to avoid exact timing issues.
Put into collectgroup to avoid an extra synchronization point.
Merge request reports
Activity
@matthieu assigned to John for further testing, but you will need to merge it.
Edited by Peter W. DraperI've used this branch to run the extra 50Mpc boxes which Josh identified as having better sizes. I don't think I can use it for the original 8 runs because it changes the format of the restart files and I don't want to restart from z=127.
Only one of those 8 runs had crashed so I've merged in the extra barrier after writing restarts (35eaa463) and restarted it without the fix from this merge request. Any of those 8 might fail again but I don't think they can corrupt their restart files at least.
The 50Mpc box that crashed has crashed again, so I do need to apply this fix to the older runs.
Can I make restarting work by moving runtime out of the engine struct? I was thinking of just having 'float engine_runtime' in engine.c, 'extern float engine_runtime' in engine.h and then changing e.runtime to engine_runtime elsewhere. Would that work as a temporary fix?
mentioned in commit 04c82352