Gadget2 part update
A compilation of changes from @jwillis and myself re-factoring a bit the particle definition in the Gadget version. This has knock-on consequences on the other schemes.
Also, a initial vectorization of the Gadget-2 interaction routines will be pushed in with the associated tests. These will be refined and actually used in a separate branch.
Merge request reports
Activity
mentioned in merge request !204 (merged)
Implementation of #187 (closed).
There seems to be an issue running the
SedovBlast
testrun.sh
:# Step Time Time-step Updates g-Updates Wall-clock time [ms] 0 0.000000e+00 0.000000e+00 0 0 8803.949 1 3.051758e-05 3.051758e-05 1048576 0 879.906 2 6.103516e-05 3.051758e-05 14 0 23.688 3 9.155273e-05 3.051758e-05 189 0 24.373 [00011.3] runner_do_ghost: Smoothing length failed to converge on 1 particles. [00011.3] runner_do_ghost: Smoothing length failed to converge on 3 particles. 4 1.220703e-04 3.051758e-05 4 0 24.450
At which point it core dumps. Using GCC 4.8 on my desktop.
There is a similar looking issue with the
CosmoVolume
:# Step Time Time-step Updates g-Updates Wall-clock time [ms] 0 0.000000e+00 0.000000e+00 0 0 21155.365 1 5.960464e-08 5.960464e-08 1841127 0 6900.954 2 1.192093e-07 5.960464e-08 564 0 119.457 3 2.384186e-07 1.192093e-07 1334 0 130.578 4 3.576279e-07 1.192093e-07 3490 0 139.845 5 4.768372e-07 1.192093e-07 444 0 118.617 6 5.960464e-07 1.192093e-07 5578 0 167.864 7 7.152557e-07 1.192093e-07 443 0 116.737 8 8.344650e-07 1.192093e-07 1171 0 126.240 9 9.536743e-07 1.192093e-07 475 0 119.247 10 1.072884e-06 1.192093e-07 7849 0 212.132 11 1.132488e-06 5.960464e-08 500 0 149.703 12 1.192093e-06 5.960464e-08 2 0 112.284 13 1.251698e-06 5.960464e-08 1199 0 133.224 14 1.311302e-06 5.960464e-08 16 0 103.389 15 1.370907e-06 5.960464e-08 512 0 116.993 16 1.430511e-06 5.960464e-08 21 0 100.890 17 1.490116e-06 5.960464e-08 2716 0 129.947 18 1.549721e-06 5.960464e-08 33 0 101.545 19 1.609325e-06 5.960464e-08 521 0 122.414 20 1.668930e-06 5.960464e-08 43 0 108.338 21 1.728535e-06 5.960464e-08 1245 0 152.489 22 1.788139e-06 5.960464e-08 47 0 109.386 [00034.4] runner_do_ghost: Smoothing length failed to converge on 44 particles. [00034.4] runner_do_ghost: Smoothing length failed to converge on 48 particles. 23 1.847744e-06 5.960464e-08 522 0 152.064 ./run.sh: line 10: 24341 Segmentation fault (core dumped) ../swift -s -t 16 cosmoVolume.yml
Added 13 commits:
-
8e1ea32d...b51f933f - 11 commits from branch
master
- d2c4261f - Merge branch 'master' into gadget2-part-update
- 09f6effd - Don't move the entropy_dt variable into the force sub-structure
-
8e1ea32d...b51f933f - 11 commits from branch
Thanks. Looks like there is an issue with MPI. During repartitioning we come to a halt waiting for broadcasts that never complete. Strangely I only see this problem running an optimized build, when I prepare to debug it runs without a problem. This is reproducable.
Just to be clear my build options are:
./configure --with-metis --disable-vec --enable-debug
using
swift/c4/intel/intelmpi/5.1.2
and running a job on thebench1
queue using theCosmoVolume
:#!/bin/bash -l # # Batch script for bash users # #BSUB -L /bin/bash #BSUB -n 4 #BSUB -J SWIFT-mpi-test #BSUB -oo job%J.dump #BSUB -eo job%J.err #BSUB -q bench1 #BSUB -P durham #BSUB -R span[ptile=1] #BSUB -x #BSUB -W 00:30 NTHREADS=12 module purge module load swift module load swift/c4/intel/intelmpi/5.1.2 mpirun -np 4 ../swift_mpi -t $NTHREADS -s cosmoVolume.yml
output from the job:
# Step Time Time-step Updates g-Updates Wall-clock time [ms] 0 0.000000e+00 0.000000e+00 0 0 1801.967 1 5.960464e-08 5.960464e-08 1841127 0 716.156 2 1.192093e-07 5.960464e-08 564 0 19.449
which is the last output we see.
Checking with ddt we see that rank 0 is waiting to reduce
h_max
inspace_regrid
and the other three ranks are inrepart_edge_metis
attempting to receive a list of cells from rank 0. I suspect the problem is that the space currently claims to have no cells (s->nr_cells == 0
), so there isn't any space to receive into...I may have found the problem. It seems that when we read the
flag_entropy
:readAttribute(h_grp, "Flag_Entropy_ICs", INT, flag_entropy);
we have room for a single int, whereas the stored value is an vector with a value for each particle type. Worked around that and this test now runs. Does this value really need to be a vector?
Added 1 commit:
- 1023f11a - Remove documented but not present argument vel
mentioned in commit dbb272c5
mentioned in issue #192 (closed)
mentioned in commit 794a9482