Skip to content
Snippets Groups Projects

Gadget2 part update

Merged Matthieu Schaller requested to merge gadget2-part-update into master

A compilation of changes from @jwillis and myself re-factoring a bit the particle definition in the Gadget version. This has knock-on consequences on the other schemes.

Also, a initial vectorization of the Gadget-2 interaction routines will be pushed in with the associated tests. These will be refined and actually used in a separate branch.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • James Willis mentioned in merge request !204 (merged)

    mentioned in merge request !204 (merged)

  • Implementation of #187 (closed).

  • There seems to be an issue running the SedovBlast test run.sh:

    #   Step           Time      Time-step    Updates  g-Updates  Wall-clock time [ms]
           0   0.000000e+00   0.000000e+00          0          0              8803.949
           1   3.051758e-05   3.051758e-05    1048576          0               879.906
           2   6.103516e-05   3.051758e-05         14          0                23.688
           3   9.155273e-05   3.051758e-05        189          0                24.373
    [00011.3] runner_do_ghost: Smoothing length failed to converge on 1 particles.
    [00011.3] runner_do_ghost: Smoothing length failed to converge on 3 particles.
           4   1.220703e-04   3.051758e-05          4          0                24.450

    At which point it core dumps. Using GCC 4.8 on my desktop.

  • There is a similar looking issue with the CosmoVolume:

    #   Step           Time      Time-step    Updates  g-Updates  Wall-clock time [ms]
           0   0.000000e+00   0.000000e+00          0          0             21155.365
           1   5.960464e-08   5.960464e-08    1841127          0              6900.954
           2   1.192093e-07   5.960464e-08        564          0               119.457
           3   2.384186e-07   1.192093e-07       1334          0               130.578
           4   3.576279e-07   1.192093e-07       3490          0               139.845
           5   4.768372e-07   1.192093e-07        444          0               118.617
           6   5.960464e-07   1.192093e-07       5578          0               167.864
           7   7.152557e-07   1.192093e-07        443          0               116.737
           8   8.344650e-07   1.192093e-07       1171          0               126.240
           9   9.536743e-07   1.192093e-07        475          0               119.247
          10   1.072884e-06   1.192093e-07       7849          0               212.132
          11   1.132488e-06   5.960464e-08        500          0               149.703
          12   1.192093e-06   5.960464e-08          2          0               112.284
          13   1.251698e-06   5.960464e-08       1199          0               133.224
          14   1.311302e-06   5.960464e-08         16          0               103.389
          15   1.370907e-06   5.960464e-08        512          0               116.993
          16   1.430511e-06   5.960464e-08         21          0               100.890
          17   1.490116e-06   5.960464e-08       2716          0               129.947
          18   1.549721e-06   5.960464e-08         33          0               101.545
          19   1.609325e-06   5.960464e-08        521          0               122.414
          20   1.668930e-06   5.960464e-08         43          0               108.338
          21   1.728535e-06   5.960464e-08       1245          0               152.489
          22   1.788139e-06   5.960464e-08         47          0               109.386
    [00034.4] runner_do_ghost: Smoothing length failed to converge on 44 particles.
    [00034.4] runner_do_ghost: Smoothing length failed to converge on 48 particles.
          23   1.847744e-06   5.960464e-08        522          0               152.064
    ./run.sh: line 10: 24341 Segmentation fault      (core dumped) ../swift -s -t 16 cosmoVolume.yml
    
    
  • Alright....

  • Matthieu Schaller Added 13 commits:

    Added 13 commits:

    • 8e1ea32d...b51f933f - 11 commits from branch master
    • d2c4261f - Merge branch 'master' into gadget2-part-update
    • 09f6effd - Don't move the entropy_dt variable into the force sub-structure
  • Sorry about that. I had not tested the code in a non-fixdt case. I have updated my test script to include these cases as well.

    It is now fixed.

  • Thanks. Looks like there is an issue with MPI. During repartitioning we come to a halt waiting for broadcasts that never complete. Strangely I only see this problem running an optimized build, when I prepare to debug it runs without a problem. This is reproducable.

    Just to be clear my build options are:

    ./configure --with-metis --disable-vec --enable-debug

    using swift/c4/intel/intelmpi/5.1.2 and running a job on the bench1 queue using the CosmoVolume:

    
    #!/bin/bash -l
    #
    # Batch script for bash users
    #
    #BSUB -L /bin/bash
    #BSUB -n 4
    #BSUB -J SWIFT-mpi-test
    #BSUB -oo job%J.dump
    #BSUB -eo job%J.err
    #BSUB -q bench1
    #BSUB -P durham
    #BSUB -R span[ptile=1]
    #BSUB -x
    #BSUB -W 00:30
    
    NTHREADS=12
    
    module purge
    module load swift
    module load swift/c4/intel/intelmpi/5.1.2
    
    mpirun -np 4 ../swift_mpi -t $NTHREADS -s cosmoVolume.yml
    

    output from the job:

    #   Step           Time      Time-step    Updates  g-Updates  Wall-clock time [ms]
           0   0.000000e+00   0.000000e+00          0          0              1801.967
           1   5.960464e-08   5.960464e-08    1841127          0               716.156
           2   1.192093e-07   5.960464e-08        564          0                19.449

    which is the last output we see.

    Checking with ddt we see that rank 0 is waiting to reduce h_max in space_regrid and the other three ranks are in repart_edge_metis attempting to receive a list of cells from rank 0. I suspect the problem is that the space currently claims to have no cells (s->nr_cells == 0), so there isn't any space to receive into...

  • That is odd given that none of this should have changed.

  • Isn't a space with no cells a problem of domain decomposition ?

  • The number of cells isn't changed, this is just an exchange of a list of cell re-assignments, so that would seem to be unlikely. Just reporting what I see...

  • Sure. :) Thanks for that.

    Just puzzled by my own induced bugs...

  • This "feature" is also present in master. I am in the process of tracing the changes back to the point where it was introduced.

  • The slight consolation is that the EAGLE_12 case runs very well with the exact same code and submission script.

  • I may have found the problem. It seems that when we read the flag_entropy:

    readAttribute(h_grp, "Flag_Entropy_ICs", INT, flag_entropy);

    we have room for a single int, whereas the stored value is an vector with a value for each particle type. Worked around that and this test now runs. Does this value really need to be a vector?

  • Peter W. Draper Added 1 commit:

    Added 1 commit:

    • 1023f11a - Remove documented but not present argument vel
  • Haven't found any other issues, so will accept this.

  • Peter W. Draper Status changed to merged

    Status changed to merged

  • Peter W. Draper mentioned in commit dbb272c5

    mentioned in commit dbb272c5

  • mentioned in issue #192 (closed)

  • Peter W. Draper mentioned in commit 794a9482

    mentioned in commit 794a9482

Please register or sign in to reply
Loading