Skip to content
Snippets Groups Projects

First version of the multiple time-stepping

Merged Matthieu Schaller requested to merge newTimeDefinition into master

Eventually...

That should open up debugging from a larger crowd.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Peter W. Draper Title changed from First version of the multiple time-stepping to WIP: First version of the multiple time-stepping

    Title changed from First version of the multiple time-stepping to WIP: First version of the multiple time-stepping

  • WIP as there are a number of issues to fix.

  • Peter W. Draper Added 5 commits:

    Added 5 commits:

    • 03c44d0f - Use autoconf affinity policy (again)
    • 0fda419f - MPI and affinity control are not compatible
    • b56f6179 - Use single swift command instead of the old _fixdt and _mindt
    • 41606240 - Don't attempt to write non-existent field dt
    • 15dd7891 - Recover missing mpi_type definitions, remove unused variables, remove repeated d…
  • I've built this and have gotten it to run and came up with the commits above.

    The MPI version was completely broken as it missed the new type definitions. It also ran very slowly on a single machine (haven't tried any multiple machine tests yet), which came down to a clash with the new affinity code. I've ifdef'd that out for MPI, but it is an area we should think about a bit more.

    My SodShock test runs for 2063 time steps then crashes.

    engine_step: t_end_min=0.003807 t_end_max=0.004215
    engine_step: Step: 2063 e->time=0.003807
    space_regrid: h_max is 1.017e-02 (cell_max=0.000e+00).
    space_regrid: set cell dimensions to [ 44 5 5 ].
    Segmentation fault (core dumped)

    This is down to a particle having -nan valued position, which happens at lines:

    cell_getid(cdim, p->x[0] * ih[0], p->x[1] * ih[1], p->x[2] * ih[2]);
    cells[ind[k]].count++;

    in engine.c. The index ind[k] is nan, since:

    (gdb) print p->x[0]
    $1 = -nan(0x8000000000000)
    (gdb) print p->x[1]
    $2 = -nan(0x8000000000000)
    (gdb) print p->x[2]
    $3 = -nan(0x8000000000000)

    One possible cause happens a little earlier:

    engine_step: t_end_min=0.003783 t_end_max=0.004185
    engine_step: Step: 1947 e->time=0.003783
    runner_doghost: Smoothing length failed to converge on 1 particles.

    but I haven't attempted to check that.

    More generally there is a line:

     message("No active particles, skipping....");

    that is rather verbose, commented that out in my version. Also we have a lot of commented out sections. Ideally I'd like to not see redundant code being committed to master.

    One final point. Can we keep the fixdt version working? Or are we confident that technique will never be useful again?

  • Ok,I'll reduce the verbosity and implement a fixdt mode. I will also update the running scripts in consequence.

  • Matthieu Schaller Added 2 commits:

    Added 2 commits:

  • Added 1 commit:

    • ba4e4e6c - Removed nested function in engine_init_particles()
  • Thanks for testing it BTW. I needed someone to look at the code and break it.

  • Added 1 commit:

    • 4cc942a3 - Create executables with a fixdt policy
  • Added 1 commit:

    • 601c8317 - Print policy when starting the engine.
  • Added 1 commit:

  • Added 1 commit:

    • 2722f6fc - Reintroduce task-marking to prevent enqueing unnecessary tasks.
  • Matthieu Schaller Added 56 commits:

    Added 56 commits:

  • Matthieu Schaller Added 4 commits:

    Added 4 commits:

    • 31390109 - MPI version of engine policy output
    • 91393c9e - Read final time, min and max dt from the command line
    • c58eb6e4 - Propagate time-stepping policy all the way to the kick task
    • 24a05c3d - With fixdt policy, place the timestep on the timeline.
  • Matthieu Schaller Added 4 commits:

    Added 4 commits:

  • Matthieu Schaller Added 2 commits:

    Added 2 commits:

    • 0dc8f2c0 - Clean up main()
    • 71bacd96 - Updated task plotting script for the new tasks
  • Matthieu Schaller Added 3 commits:

    Added 3 commits:

    • 28c96a8f - Corrected time-step evolution and better use of the fixdt policy in the kick task
    • eba84165 - Clean-up drift task and make it more const-correct
    • 9f114949 - Cleaned-up init tasks
  • Added 1 commit:

  • Added 1 commit:

    • af271cdd - Updated default run scripts with the new policy and parameters
  • Alright, so the fixdt policy is now back in the picture. We have two sets of executables. One for the fixdt policy (MPI and non-MPI) and one for the "normal" multiple timestep policy (MPI and non-MPI).

    The running scripts have been updated.

  • Verbosity has also been reduced. I'll work on a better looking output to std though.

  • Matthieu Schaller Added 2 commits:

    Added 2 commits:

    • 8166b729 - Better test for engine dt policy in the kick task
    • 6c047dbf - The engine policies are now an enumerated type
  • Matthieu Schaller Added 2 commits:

    Added 2 commits:

    • cc0c4a4e - Cleaned up engine. MPI only functions are now hidden in non-MPI mode.
    • ef4609fc - More cleaning-up. Only useful stuff remains.
  • Matthieu Schaller Added 2 commits:

    Added 2 commits:

    • 7006e143 - cell_getid() now moved to cell.h and uniquely defined there.
    • 1a012d68 - Removed debugging code from the interaction functions
  • Added 1 commit:

  • Added 1 commit:

    • 05dee395 - Print library information in the HDF5 as well
  • Added 1 commit:

    • 321516d4 - Extracted the physics from the tasks into a new set of files. The correct file i…
  • Matthieu Schaller Added 2 commits:

    Added 2 commits:

    • 8a98575a - More const-correctness in the kick task
    • 7084afb1 - The drift task now computes the maximal displacement and maximal h in each cell.
  • Matthieu Schaller Added 2 commits:

    Added 2 commits:

  • Added 1 commit:

    • 839cbab3 - Moved the interaction functions to the hydro directory
  • Added 1 commit:

    • ee114579 - Same treatment for the gravity core routines
  • Matthieu Schaller Added 4 commits:

    Added 4 commits:

    • 003d8fc2 - Doxygen also has to look in the new directories for documentation.
    • d5d33686 - Fix documentation
    • fc847208 - The particle definition is now split in multiple files. The correct one gets imp…
    • cf0b83ef - Removed multi-dt policy. It is now the default. fixdt has to be explicitly switched on.
  • Added 1 commit:

    • c900b14e - Cleaned-up unnecessary constants
  • Added 1 commit:

    • 17cf7788 - Added timers to the init tasks
  • Added 1 commit:

  • Added 1 commit:

  • Started to look at this again, and the SodShock test ran without any issues (single node, 6 threads for 8190 time steps), good.

    However, the SedovBlast test doesn't get very far

    # Step  Time  time-step  CPU Wall-clock time [ms]
    engine_step: t_end_min=0.000000 t_end_max=0.000000
    engine_step: Step: 0 e->time=0.000000
    engine_step: t_end_min=0.000031 t_end_max=0.007812
    engine_step: Step: 1 e->time=0.000031
    engine_step: t_end_min=0.000061 t_end_max=0.007812
    engine_step: Step: 2 e->time=0.000061
    engine_step: t_end_min=0.000061 t_end_max=0.007812
    engine_step: Step: 3 e->time=0.000061
    engine_step: t_end_min=0.000061 t_end_max=0.007812
    engine_step: Step: 4 e->time=0.000061
    space_regrid: h_max is 5.613e-02 (cell_max=0.000e+00).
    Segmentation fault (core dumped)

    This happens at line 343 of space.c:

    343         cells[ind[k]].count++;

    the value of ind[k] is 524236, but we only have 15625 cells. Checking the particle data we see:

    (gdb) print *p
    $3 = {x = {169.39329781083813, 6.2777476187669521, 3.8292171049782753}, v = {4.32336538e+09, 
        218439120, 30988146}, a = {4.83560232e+16, 2.4432002e+15, 3.46596546e+14}, 
      h = 0.0556494445, t_begin = 6.10947609e-05, t_end = 6.11543655e-05, u = 0.78253454, 
      rho = 0.994605839, rho_dh = 0.969468594, {density = {div_v = 0.996183693, 
          wcount_dh = 0.515203595, curl_v = {2697.84912, -0.797386408, 272116416}, 
          wcount = 0.932364762}, force = {balsara = 0.996183693, POrho2 = 0.515203595, 
          u_dt = 2697.84912, h_dt = -0.797386408, v_sig = 272116416, c = 0.932364762}}, 
      mass = 0.00011920929, id = 549252, gpart = 0x0}

    so the particle is outside of the space and travelling very fast...

    BTW, the parameters are:

    ../swift -t 1 -f sedov.hdf5 -m 0.02 -w 5000 -c 1. -d 1e-7 -e 0.01

    from the new runs.sh.

    Edited by Peter W. Draper
  • Added 1 commit:

    • 9817bc32 - Properly collect the number of particles updated
  • Added 1 commit:

  • Thanks for testing this. It looks like the time-step choice is not great in the (pretty extreme) case of the Sedov blast.

    I will also implement a check to wrap particles back within the box correctly and make sure their values of h are sensible.

  • I've also ran the CosmoVolume for a good while without any problems. Here are some animations of velocity for the SodShock and CosmoVolume. To my eye some of the artefacts of the previous SodShock look to have been fixed, and it's good to see any action without gravity in the CosmoVolume (still not a lot...):

    SodShock Velocity

    CosmoVolume Velocity

  • Thanks for testing this.

    The end is near. :) I have been digging through the code to figure out why some time-steps go rogue. Looks like a derivative somewhere was missing a magic factor.

  • Matthieu Schaller Added 11 commits:

    Added 11 commits:

    • 159c9532 - Now initialise the particles using the hydro functions.
    • 990a8e98 - The scheduler now takes a submask argument as well to only run through part of the task graph
    • 2d4053cc - Less verbosity in the scheduler
    • 626f555a - Correct usage of entropy flag in ICs for Gadget-2 compatibility
    • a3596a2e - Rewrote the Gadget-2 hydro implementation. Everything OK until the end of the density loop.
    • 92c299ad - Rewrote the Gadget-2 hydro implementation. Everything OK until the end of the density loop.
    • 7f50b0b1 - Velocity curl and divergence now correct as well.
    • 6bfdc639 - Drift is now Gadget-2 identical
    • 83787596 - Gadget-2 force loop now correct
    • 7ef2eea8 - Stupid lack of initialisation here....
    • 921a4e6e - Accelerations and time-steps now match Gadget-2
  • Matthieu Schaller Added 6 commits:

    Added 6 commits:

  • @pdraper, could you give it another go ?

    I think the code behaves much better now. Still need some cleaning-up but the physics should be correct.

  • Doesn't build for me:

    serial_io.c: In function 'read_ic_serial':
    serial_io.c:189:38: error: 'struct part' has no member named 'u'
                        (char*)(&(part[0]).field), importance)
                                          ^
    serial_io.c:313:7: note: in expansion of macro 'readArray'
           readArray(h_grp, "InternalEnergy", FLOAT, *N, 1, *parts, N_total, offset,
           ^
    serial_io.c: In function 'write_output_serial':
    serial_io.c:501:39: error: 'struct part' has no member named 'u'
                         (char*)(&(part[0]).field))
                                           ^
    serial_io.c:672:7: note: in expansion of macro 'writeArray'
           writeArray(h_grp, "InternalEnergy", FLOAT, N, 1, N_total, offset, parts,
    

    Can you try from clean sources.

  • Added 1 commit:

    • 97a04907 - Correct i/o also in MPI mode(s)
  • Matthieu Schaller Added 3 commits:

    Added 3 commits:

    • 6683c882 - Updated the tools
    • a82e26d2 - Removed useless entry in Gadget-2 particle definition
    • b40b689c - No particle initialisation in space_init(). It has to happen properly later
  • Alright. I think I am happy with it. If you are as well, I think it is good to go.

    I have opened a bunch of issues for me to remember to do some minor improvements to the current version.

  • Thanks, the SedovBlast is running now, so far 12000 steps without any issues. The SodShock also ran.

    How do I test the dt version? Just pick start and end times and put dt_min and dt_max to the same values?

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading