SWIFTsim issueshttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues2018-05-04T06:50:53Zhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/295Generic cache construction for all flavours of SPH2018-05-04T06:50:53ZMatthieu SchallerGeneric cache construction for all flavours of SPHAt the moment only Gadget-2 SPH can be vectorized as the caches are not generic and targeted only at this flavour. We need to make this more generic.At the moment only Gadget-2 SPH can be vectorized as the caches are not generic and targeted only at this flavour. We need to make this more generic.Vectorization of all the core SPH tasksJames WillisJames Willishttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/294Vectorization of the sort task2017-12-12T21:40:27ZMatthieu SchallerVectorization of the sort taskSome loops in the sort task can be vectorized. Some loops in the sort task can be vectorized. Vectorization of all the core SPH tasksJames WillisJames Willishttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/293Vectorization of the drift tasks2017-12-12T21:40:27ZMatthieu SchallerVectorization of the drift tasksThe loops in the drift task can be vectorized if a little help is given to the compiler.The loops in the drift task can be vectorized if a little help is given to the compiler.Vectorization of all the core SPH tasksJames WillisJames Willishttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/214swift : thread sanitizer output2018-06-01T06:36:24ZMassimiliano Culposwift : thread sanitizer outputFrom :
```
$ git branch -vv
...
* master c541791 [origin/master] Merge branch 'gizmo_volume_io' into 'master'
$ module list
Currently Loaded Modules:
1) gcc-6.2.0-gcc-4.8-fw44bda 2) openmpi-2.0.0-gcc-6.2....From :
```
$ git branch -vv
...
* master c541791 [origin/master] Merge branch 'gizmo_volume_io' into 'master'
$ module list
Currently Loaded Modules:
1) gcc-6.2.0-gcc-4.8-fw44bda 2) openmpi-2.0.0-gcc-6.2.0-rmv3caz 3) hdf5-1.10.0-patch1-gcc-6.2.0-dbjkmep 4) metis-5.1.0-gcc-6.2.0-kbowe7l
```
Configure line :
```
$ ../sources/configure CC=mpicc CPPFLAGS="-I${METIS_ROOT}/include" --prefix=$PWD/../install --disable-optimization --enable-debug=yes --enable-parallel-hdf5 --enable-mpi --with-metis=${METIS_ROOT} --enable-compiler-warnings CFLAGS="-fsanitize=thread" LDFLAGS="-fsanitize=thread"
```
When I run the example `UniformBox_3D` :
```
$ ../../install/bin/swift -s -C -t 1 uniformBox.yml
```
I get a lot of Warning on data races, like :
```
==================
WARNING: ThreadSanitizer: data race (pid=29715)
Atomic write of size 4 at 0x7d800000f320 by thread T1:
#0 __tsan_atomic32_compare_exchange_weak /home/mculpo/PycharmProjects/spack/var/spack/stage/gcc-6.2.0-fw44bdamcou7rerucarajomex4zjg4s6/gcc-6.2.0/libsanitizer/tsan/tsan_interface_atomic.cc:809 (libtsan.so.0+0x00000005fc63)
#1 queue_insert ../../sources/src/queue.c:105 (swift+0x000000491329)
#2 scheduler_enqueue ../../sources/src/scheduler.c:1183 (swift+0x0000004818c1)
#3 scheduler_enqueue_mapper ../../sources/src/scheduler.c:1018 (swift+0x000000480fcd)
#4 threadpool_runner ../../sources/src/threadpool.c:68 (swift+0x0000004905bc)
Previous read of size 4 at 0x7d800000f320 by thread T2:
#0 queue_get_incoming ../../sources/src/queue.c:56 (swift+0x000000490de2)
#1 queue_gettask ../../sources/src/queue.c:180 (swift+0x0000004916d6)
#2 scheduler_gettask ../../sources/src/scheduler.c:1299 (swift+0x000000481db9)
#3 runner_main ../../sources/src/runner.c:1222 (swift+0x00000046bd41)
Location is heap block of size 4096 at 0x7d800000f000 allocated by main thread:
#0 malloc /home/mculpo/PycharmProjects/spack/var/spack/stage/gcc-6.2.0-fw44bdamcou7rerucarajomex4zjg4s6/gcc-6.2.0/libsanitizer/tsan/tsan_interceptors.cc:538 (libtsan.so.0+0x0000000268bc)
#1 queue_init ../../sources/src/queue.c:148 (swift+0x00000049149d)
#2 scheduler_init ../../sources/src/scheduler.c:1375 (swift+0x0000004822ca)
#3 engine_init ../../sources/src/engine.c:3495 (swift+0x00000047984d)
#4 main ../../sources/examples/main.c:458 (swift+0x000000404bd9)
Thread T1 (tid=29717, running) created by main thread at:
#0 pthread_create /home/mculpo/PycharmProjects/spack/var/spack/stage/gcc-6.2.0-fw44bdamcou7rerucarajomex4zjg4s6/gcc-6.2.0/libsanitizer/tsan/tsan_interceptors.cc:876 (libtsan.so.0+0x000000027d6d)
#1 threadpool_init ../../sources/src/threadpool.c:109 (swift+0x0000004908e1)
#2 engine_init ../../sources/src/engine.c:3480 (swift+0x0000004795f3)
#3 main ../../sources/examples/main.c:458 (swift+0x000000404bd9)
Thread T2 (tid=29718, running) created by main thread at:
#0 pthread_create /home/mculpo/PycharmProjects/spack/var/spack/stage/gcc-6.2.0-fw44bdamcou7rerucarajomex4zjg4s6/gcc-6.2.0/libsanitizer/tsan/tsan_interceptors.cc:876 (libtsan.so.0+0x000000027d6d)
#1 engine_init ../../sources/src/engine.c:3506 (swift+0x000000479a46)
#2 main ../../sources/examples/main.c:458 (swift+0x000000404bd9)
SUMMARY: ThreadSanitizer: data race ../../sources/src/queue.c:105 in queue_insert
==================
```
For the most part they seems read on write problems (likely due to the fact that reads are not atomic).Code is C11 (or gnu11) compliantPedro GonnetPedro Gonnethttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/129Replace 'volatile' with thread synchronisation2020-01-07T13:48:59ZAngus LepperReplace 'volatile' with thread synchronisation'volatile' is neither necessary nor sufficient for thread synchronisation e.g. in the scheduler. For one, read/write order is preserved only amongst other volatile reads/writes (cf. http://goo.gl/oBwjgZ - signal is set before the count i...'volatile' is neither necessary nor sufficient for thread synchronisation e.g. in the scheduler. For one, read/write order is preserved only amongst other volatile reads/writes (cf. http://goo.gl/oBwjgZ - signal is set before the count is updated). For another, there's no guarantee of memory ordering - although this may be less visible on x86 vs. other platforms. This is distinct from other 'benign' causes of inter-run variation.
All 'volatile' qualifiers should be removed, and all cross-thread access should be suitably protected (some of it already is).Code is C11 (or gnu11) compliantAidan ChalkAidan Chalkhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/58Variation between single-threaded runs2017-12-12T21:40:27ZAngus LepperVariation between single-threaded runsI'd expect results of successive runs, at least on a single thread, to be identical. However, I see some differences. These vary from run to run, so perhaps something's being used uninitialised. I'm using the comparison script from the p...I'd expect results of successive runs, at least on a single thread, to be identical. However, I see some differences. These vary from run to run, so perhaps something's being used uninitialised. I'm using the comparison script from the position_cache branch, which just does a brute force comparison of - by default - coordinates and velocities. After a build, from the examples/ directory:
git checkout origin/position_cache validate
./test_fixdt -r 2 -t 1 -f SodShock/sodShock.hdf5 -m 0.01 -w 5000 -d 1e-4
mv output_002.hdf5 A.hdf5
./test_fixdt -r 2 -t 1 -f SodShock/sodShock.hdf5 -m 0.01 -w 5000 -d 1e-4
mv output_002.hdf5 B.hdf5
./validate A.hdf5 B.hdf5
# says e.g. 993745 changes, up to 1% (I've seen up to 4%.)
I can't reproduce this at all with -r 1.Code is C11 (or gnu11) compliantMatthieu SchallerMatthieu Schallerhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/436Network Configuration Effects2018-06-15T08:50:45ZJosh BorrowNetwork Configuration EffectsMorning all,
I have been running some tests on COSMA-7 to see if the extra latency introduced when doing a triple-hop (node - switch - top-level switch - switch - node) v.s. a single-hop (node - switch - node) affects SWIFT. We would ho...Morning all,
I have been running some tests on COSMA-7 to see if the extra latency introduced when doing a triple-hop (node - switch - top-level switch - switch - node) v.s. a single-hop (node - switch - node) affects SWIFT. We would hope that it doesn't, otherwise we might be wise to include the network topology into our domain decomposition.
### Set-up
Here I present a worst-case scenario. SWIFT, using `tbbmalloc` and Gadget-2 hydro (so we have as little work as possible), running with the grid domain decomposition. This should give us, pretty much, minimal work with maximum communication.
In terms of particle set-up, I used the EAGLE-50r2 (i.e. EAGLE-50 copied 8 times), with two boxes on each node, for a total of 4 nodes.
SWIFT was invoked with
```
mpirun -np 8 ../swift_mpi -c -a -s -n 8192 -t 14 eagle_50.yml
```
as each node on C7 has 2x14 core chips.
### Results
Summary: We don't care about latency! Asynchronous communication works.
The interesting thing here is the following histogram. It's the classic cumulative wall-clock time used against number of particles updated in each step.
![wallclock_histogram](/uploads/bacdd73f010ea9b965af2de2a294c73f/wallclock_histogram.png)
Normally we say that we get "killed" in the big steps. However, I think an apt metaphor here is that we get "absolutely murdered" in the big steps - the last jump up is caused where every particle in the system is updated and synchronised.
In terms of the tasks that each of the runs actually produces, I think this comparison movie is helpful:
![out](/uploads/74eb0b32b410c65729b09974c2fb44ba/out.mp4)
At the moment I do not understand why using a single hop we typically get more updates for our money. I'm confident it's not something to do with the plot because the 1 particle updated little blob always stays in the same place. Unfortunately I didn't keep the output from the simulation, so I can't check if they got "the same answer". Based on these histograms, I would guess not.
The following histogram shows the breakdown of the runtimes for each of the runs. Notice how similar they look.
![runtime_histogram](/uploads/2433f604c992caf65823969e8835c8ce/runtime_histogram.png)
In summary: It would probably be a waste of time considering the network topology in the redistribute/domain decomposition, based on these results.
We can discuss this at the telecon.Improved MPI scalinghttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/366MPI task timings2019-11-11T16:30:55ZPeter W. DraperMPI task timingsWe spend a lot of time in some steps not processing data as we are
apparently waiting for MPI tasks to complete. What is going on.We spend a lot of time in some steps not processing data as we are
apparently waiting for MPI tasks to complete. What is going on.Improved MPI scalingPeter W. DraperPeter W. Draperhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/371Expanding GPU task coverage2017-10-31T05:03:40ZAidan ChalkExpanding GPU task coverageWe discussed doing the drift, sort, kick etc. on the GPU (also aiming to reduce the data transfer potentially long term).
I think almost anyone can implement these GPU routines and (hoping the design of the host code is correct..) it sh...We discussed doing the drift, sort, kick etc. on the GPU (also aiming to reduce the data transfer potentially long term).
I think almost anyone can implement these GPU routines and (hoping the design of the host code is correct..) it should be easy to extend the GPU to use additional task types too. This probably goes hand-in-hand with #368 GPU Swift part 2https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/370GPU + MPI2020-08-29T16:47:02ZAidan ChalkGPU + MPIWe need to work out a strategy for doing the GPU + MPI within the current infrastructure.
The current idea is probably:
1) Use current MPI tasks on host.
2) Have a cell variable which lets the device know whether a cell has been rec...We need to work out a strategy for doing the GPU + MPI within the current infrastructure.
The current idea is probably:
1) Use current MPI tasks on host.
2) Have a cell variable which lets the device know whether a cell has been received yet if non-local - the GPU checks this when looking at load tasks and places the task in the end of the queue if it has not been yet.
3) Have an additional wait value on the MPI send tasks on the host mid-density. This is decremented on the device once the data for that task has been unloaded from the GPU after density. If the wait is 0 we have to find a way to put these into the host's queue (not worked it out). This also has an issue if the number of CPU threads is 1 it will not progress at current.
This feels a bit messy, a potential alternative (still messy) is:
1) Ignore current MPI tasks on host for density (or gravity if we change what is run on GPU).
2) Have a list of "ready to send cells" populated by the GPU and sent by the thread that launched the GPU kernel, and a list of "received cells" populated by the CPU thread and checked by the GPU as above. This thread does all of the communication required for the GPU.
I think the best (easiest) way to do multi-GPU + CPU stuff will be:
1 rank per GPU + 1 rank for any unused CPUs on the node - using multi GPUs in a single rank is probably overly complex with our model.
Any other ideas for this would be appreciated - using GPU direct to do the data transfer is also possible rather than manually copying back to the CPU and may be better when using 1 rank per GPU.GPU Swift part 2Aidan ChalkAidan Chalkhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/369Reduction in data transfer tasks for the GPU2017-12-21T15:51:40ZAidan ChalkReduction in data transfer tasks for the GPUCurrently we move all the particle data to/from the GPU every step even if its unused. I started discussing this in !355 but I need to finish it at some point, should help significantly for small steps I hope.Currently we move all the particle data to/from the GPU every step even if its unused. I started discussing this in !355 but I need to finish it at some point, should help significantly for small steps I hope.GPU Swift part 2Aidan ChalkAidan Chalkhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/368Profiling/minikernel implementation of self/pair + looking into improvements2018-01-05T12:03:47ZAidan ChalkProfiling/minikernel implementation of self/pair + looking into improvementsFrom where @lhausammann and @jborrow started at the hackathon I think this is one of the main things to look into w.r.t GPU performance currently.
Current things to experiment with (probably in the order we should do them) :
- [ ] Share...From where @lhausammann and @jborrow started at the hackathon I think this is one of the main things to look into w.r.t GPU performance currently.
Current things to experiment with (probably in the order we should do them) :
- [ ] Shared memory usage ( @lhausammann started looking into this and thinks its beneficial )
- [ ] Sorted/unsorted pair interactions ( @lhausammann also started looking into this)
- [ ] Profiling of minikernels and megakernels ( @jborrow started looking at this)
- [ ] Subcelling of large self or pair interactions (similar to the CPU). This is a major bottleneck for the SodShock at current I believe, we interact cells with thousands of particles together with a naive n^2 approach, which results in a significant performance loss vs CPU.
My preferred route to do this would be to have someone else experiment with the first 2 from the above list and write short (few pages?) reports detailing what the findings were in terms of what works well and doesn't work well. Once we have these I can port the improvements back into the megakernel and see if they're beneficial while the last 2 points are started on the minikernels.
@matthieu does this sound reasonable?GPU Swift part 2https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/886Allow users to choose the compression filters for the particle lightcone at r...2024-03-27T10:04:24ZMatthieu SchallerAllow users to choose the compression filters for the particle lightcone at runtime.We currently use `BFloa16` and that is not accurate enough.
Also, we may want to actually apply the same filters as in the snapshots rather than hardcode them.We currently use `BFloa16` and that is not accurate enough.
Also, we may want to actually apply the same filters as in the snapshots rather than hardcode them.Rob McGibbonRob McGibbonhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/884Debug check fails when checking whether to write snapshot after final simulat...2024-03-12T09:51:35ZMladen IvkovicDebug check fails when checking whether to write snapshot after final simulation step when output list is usedPorting issue here from slack, with instructions for reproducible example.
Configure with:
```
./configure --with-hydro-dimension=1 --enable-debug --enable-debugging-checks
```
Reproducible example:
```
swiftsim/examples/RadiativeTrans...Porting issue here from slack, with instructions for reproducible example.
Configure with:
```
./configure --with-hydro-dimension=1 --enable-debug --enable-debugging-checks
```
Reproducible example:
```
swiftsim/examples/RadiativeTransferTests/CosmoAdvection_1D
```
(yes, it's an RT example, but the debug check fail also occurs without RT.)
run with
```
../../../swift --hydro --cosmo rt_advection1D_medium_redshift.yml
```
Run crashes after final step:
```
[00001.8] engine_print_stats: Saving statistics at a=1.405296e-01
[00001.8] engine_dump_snapshot: Dumping snapshot at a=1.407294e-01
61 9.276920e-01 0.1398616 6.1499254 9.757966e-03 50 50 1000 0 0 0 0 8.487 24 0.101
[00001.8] cosmology.c:cosmology_get_delta_time():1294: ti_end must be >= ti_start
```
`gdb` backtrace:
```
[00001.8] cosmology.c:cosmology_get_delta_time():1294: ti_end must be >= ti_start
Thread 1 "swift" received signal SIGABRT, Aborted.
0x00007ffff4c969fc in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007ffff4c969fc in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff4c42476 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff4c287f3 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x0000000000483bd7 in cosmology_get_delta_time (c=<optimised out>, ti_start=<optimised out>, ti_end=<optimised out>) at cosmology.c:1294
#4 0x000000000044c7d4 in engine_io_check_snapshot_triggers (e=e@entry=0x7ffffff9a638) at engine_io.c:1270
#5 0x0000000000435f0a in engine_step (e=e@entry=0x7ffffff9a638) at engine.c:2465
#6 0x0000000000408283 in main (argc=<optimised out>, argv=<optimised out>) at swift.c:1720
(gdb) frame 3
#3 0x0000000000483bd7 in cosmology_get_delta_time (c=<optimised out>, ti_start=<optimised out>, ti_end=<optimised out>) at cosmology.c:1294
1294 if (ti_end < ti_start) error("ti_end must be >= ti_start");
(gdb) p ti_end
$1 = <optimised out>
(gdb) p ti_start
$2 = <optimised out>
(gdb) frame 4
#4 0x000000000044c7d4 in engine_io_check_snapshot_triggers (e=e@entry=0x7ffffff9a638) at engine_io.c:1270
1270 time_to_next_snap = cosmology_get_delta_time(e->cosmology, e->ti_current,
(gdb) l
1265 const int with_cosmology = (e->policy & engine_policy_cosmology);
1266
1267 /* Time until the next snapshot */
1268 double time_to_next_snap;
1269 if (e->policy & engine_policy_cosmology) {
1270 time_to_next_snap = cosmology_get_delta_time(e->cosmology, e->ti_current,
1271 e->ti_next_snapshot);
1272 } else {
1273 time_to_next_snap = (e->ti_next_snapshot - e->ti_current) * e->time_base;
1274 }
(gdb) p e->ti_current
$3 = 139611588448485376
(gdb) p e->ti_next_snapshot
$4 = -1
```
Stan's analysis:
> What I believe is happening is that when using a snapshot file, and you have reached the last snapshot, the ti_next_snapshot is explicitly set to -1. However, when running with debugging checks, if ti_next_snapshot is negative, it throws an error.
I don't remember the exact files/lines where it happens, but the -1 is deliberately set in one of the files that checks for snapshot times in a snapshot list.https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/883testActivePair fails2024-03-08T08:36:02ZMatthieu SchallertestActivePair failsSummary from jenkins:
```
Running ./testActivePair -n 6 -r 1 -d 0 -f active
[00000.0] main: Seed used for RNG: 1709851327
Tolerances read from file
Checking all particles in the file.
No differences found
Tolerances read from file
Chec...Summary from jenkins:
```
Running ./testActivePair -n 6 -r 1 -d 0 -f active
[00000.0] main: Seed used for RNG: 1709851327
Tolerances read from file
Checking all particles in the file.
No differences found
Tolerances read from file
Checking all particles in the file.
Relative difference larger than tolerance (2.400000e-03) for particle 17089, column h_dt:
File 1: a = -1.701384e-05
File 2: b = -1.692253e-05
Difference: |a-b|/|a+b| = 2.690624e-03
Accuracy test failed
==================
./configure --disable-optimization
gcc 7.3.0
```https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/881oneAPI 2023 & 2024 icx raises warnings with mpich/4.1.22024-03-08T09:54:39ZMladen IvkoviconeAPI 2023 & 2024 icx raises warnings with mpich/4.1.2Here are the warnings I'm seeing:
```
space.c:1698:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1698 | MPI_Exscan(&local_...Here are the warnings I'm seeing:
```
space.c:1698:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1698 | MPI_Exscan(&local_nr_parts, &offset_parts, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
space.c:1700:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1700 | MPI_Exscan(&local_nr_sinks, &offset_sinks, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
space.c:1702:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1702 | MPI_Exscan(&local_nr_sparts, &offset_sparts, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
space.c:1704:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1704 | MPI_Exscan(&local_nr_bparts, &offset_bparts, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
space.c:1706:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1706 | MPI_Exscan(&local_nr_dm, &offset_dm, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
space.c:1708:14: warning: argument type 'size_t *' (aka 'unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1708 | MPI_Exscan(&local_nr_dm_background, &offset_dm_background, 1,
| ^~~~~~~~~~~~~~~~~~~~~~~
1709 | MPI_LONG_LONG_INT, MPI_SUM, MPI_COMM_WORLD);
| ~~~~~~~~~~~~~~~~~
space.c:1710:14: warning: argument type 'size_t *' (aka 'unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1710 | MPI_Exscan(&local_nr_nuparts, &offset_nuparts, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
```
```
space_unique_id.c:60:17: warning: argument type 'const int *' doesn't match specified 'MPI' type tag that requires 'int *' [-Wtype-safety]
60 | MPI_Allgather(&require_new_batch, 1, MPI_INT, all_requires, 1, MPI_INT,
| ^~~~~~~~~~~~~~~~~~ ~~~~~~~
```
```
engine.c:325:18: warning: argument type 'double (*)[4]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
325 | MPI_Gather(&timemem, 4, MPI_DOUBLE, timemems, 4, MPI_DOUBLE, 0,
| ^~~~~~~~ ~~~~~~~~~~
engine.c:1284:31: warning: argument type 'float (*)[5]' doesn't match specified 'MPI' type tag that requires 'float *' [-Wtype-safety]
1284 | MPI_Allreduce(MPI_IN_PLACE, &e->s->max_mpole_power,
| ^~~~~~~~~~~~~~~~~~~~~~
1285 | SELF_GRAVITY_MULTIPOLE_ORDER + 1, MPI_FLOAT, MPI_MAX,
| ~~~~~~~~~
```
```
task.c:1578:67: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1578 | int res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : sum), sum, size,
| ^~~
1579 | MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
task.c:1582:64: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1582 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : tsum), tsum, size,
| ^~~~
1583 | MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
task.c:1586:65: warning: argument type 'int (*)[33]' doesn't match specified 'MPI' type tag that requires 'int *' [-Wtype-safety]
1586 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : count), count, size,
| ^~~~~
1587 | MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
| ~~~~~~~
task.c:1590:63: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1590 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : min), min, size,
| ^~~
1591 | MPI_DOUBLE, MPI_MIN, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
task.c:1594:64: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1594 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : tmin), tmin, size,
| ^~~~
1595 | MPI_DOUBLE, MPI_MIN, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
task.c:1598:63: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1598 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : max), max, size,
| ^~~
1599 | MPI_DOUBLE, MPI_MAX, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
task.c:1602:64: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1602 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : tmax), tmax, size,
| ^~~~
1603 | MPI_DOUBLE, MPI_MAX, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
```
```
distributed_io.c:952:17: warning: argument type 'const long long *' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
952 | MPI_Allreduce(N, N_total, swift_type_count, MPI_LONG_LONG_INT, MPI_SUM, comm);
| ^ ~~~~~~~~~~~~~~~~~
distributed_io.c:957:14: warning: argument type 'const long long *' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
957 | MPI_Gather(N, swift_type_count, MPI_LONG_LONG_INT, N_counts, swift_type_count,
| ^ ~~~~~~~~~~~~~~~~~
distributed_io.c:1503:14: warning: argument type 'const long long *' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1503 | MPI_Exscan(N, global_offsets, swift_type_count, MPI_LONG_LONG_INT, MPI_SUM,
| ^ ~~~~~~~~~~~~~~~~~
```
```
parallel_io.c:1606:14: warning: argument type 'size_t *' (aka 'unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1606 | MPI_Exscan(N, offset, swift_type_count, MPI_LONG_LONG_INT, MPI_SUM, comm);
| ^ ~~~~~~~~~~~~~~~~~
```
```
fof_catalogue_io.c:532:17: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
532 | MPI_Allreduce(&num_groups, &num_groups_total, 1, MPI_LONG_LONG, MPI_SUM,
| ^~~~~~~~~~~ ~~~~~~~~~~~~~
```
```
neutrino/Default/neutrino.c:218:14: warning: argument type 'double (*)[7]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
218 | MPI_Reduce(&sums, &total_sums, 7, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
| ^~~~~ ~~~~~~~~~~
neutrino/Default/neutrino.c:218:21: warning: argument type 'double (*)[7]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
218 | MPI_Reduce(&sums, &total_sums, 7, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
| ^~~~~~~~~~~ ~~~~~~~~~~
```
```
swift.c:1327:19: warning: argument type 'long long (*)[8]' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1327 | MPI_Allreduce(&N_long, &N_total, swift_type_count + 1, MPI_LONG_LONG_INT,
| ^~~~~~~ ~~~~~~~~~~~~~~~~~
swift.c:1327:28: warning: argument type 'long long (*)[8]' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1327 | MPI_Allreduce(&N_long, &N_total, swift_type_count + 1, MPI_LONG_LONG_INT,
| ^~~~~~~~ ~~~~~~~~~~~~~~~~~
swift.c:1452:19: warning: argument type 'long long (*)[8]' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1452 | MPI_Allreduce(&N_long, &N_total, swift_type_count + 1, MPI_LONG_LONG_INT,
| ^~~~~~~ ~~~~~~~~~~~~~~~~~
swift.c:1452:28: warning: argument type 'long long (*)[8]' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1452 | MPI_Allreduce(&N_long, &N_total, swift_type_count + 1, MPI_LONG_LONG_INT,
| ^~~~~~~~ ~~~~~~~~~~~~~~~~~
```https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/877Investigate whether MPI_Continuation can be useful2023-11-30T16:58:48ZMatthieu SchallerInvestigate whether MPI_Continuation can be usefulMaybe for the proxy exchange of cells that can't be thread parallelised?Maybe for the proxy exchange of cells that can't be thread parallelised?Matthieu SchallerMatthieu Schallerhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/875Wendland C6 missing neighbour contributions2023-11-30T16:59:15ZThomas SandnesWendland C6 missing neighbour contributionsw < 0 and/or dw_dx > 0 for at least some (maybe all) pairs with ~0.85 < x < 1 in kernel_deval with Wendland C6 and eta = 1.866. This might affect other kernels as well.
A reproducible test case on master branch:
- run hydro test exampl...w < 0 and/or dw_dx > 0 for at least some (maybe all) pairs with ~0.85 < x < 1 in kernel_deval with Wendland C6 and eta = 1.866. This might affect other kernels as well.
A reproducible test case on master branch:
- run hydro test examples/HydroTests/SodShock_3D as normal
- ./configure --with-hydro=minimal --with-kernel=wendland-C6 --disable-hand-vec
- set eta = 1.866 in .yml file
- To catch error: on line 277 of kernel_hydro.h (x < 0.85 is chosen to demonstrate the range of x affected without a fix, rather than to be an indicator of whether the problem is solved):
if (x < 0.85 && w <= 0)
error("Test Error: w = %.20f for x = %.20f", w, x);Matthieu SchallerMatthieu Schallerhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/864Update spin jet AGN feedback RTD documentation2024-02-26T11:12:04ZMatthieu SchallerUpdate spin jet AGN feedback RTD documentationAfter !1727 the documentation is out of date.After !1727 the documentation is out of date.Filip HuskoFilip Huskohttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/863Issue with the way `cell_hydro.dx_max_sort` is computed2023-12-05T16:49:57ZYolan UyttenhoveIssue with the way `cell_hydro.dx_max_sort` is computedI want to preface this with saying that I'm pretty sure, this is a real issue, but that I'm also aware that this really is part of the core of SWIFT and has been looked at and tested extensively, so I might be wrong after all...
### Des...I want to preface this with saying that I'm pretty sure, this is a real issue, but that I'm also aware that this really is part of the core of SWIFT and has been looked at and tested extensively, so I might be wrong after all...
### Description of the issue:
Because of the way `cell_hydro.dx_max_sort` is currently computed (see [lines 323-326](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/cell_drift.c?ref_type=heads#L323-326) and lines [365](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/cell_drift.c?ref_type=heads#L365), [371](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/cell_drift.c?ref_type=heads#L371) in `cell_drift.c` and [211-213](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/drift.h?ref_type=heads#L211-213) in `drift.h`), it is not monotonically increasing.
When some sorting directions are computed later than others, `cell_hydro.dx_max_sort` is no longer guaranteed to be an upper bound of the individual particles' particle shift differences for the sorted `sid`'s (as checked in e.g. [the hydro dopair functions](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/runner_doiact_functions_hydro.h?ref_type=heads#L1371-1380)).
This is because `cell_hydro.dx_max_sort` is computed from `hydro_part.x_diff_sort` which stores the vector offset from the position of the particle when the first sorts where computed (since the last cleanup of the sorts) of the cell containing the particle. Suppose that for some `sid` the sorts are updated at a later timestep then the last cleanup (where `hydro_part.x_diff_sort` has been reset) and subsequently, the particles move back to a position closer to their positions at cleanup. Then `cell_hydro.dx_max_sort` can decrease and actually become smaller than some of the particles' shift differences for the `sid` that was sorted later (when the length of `hydro_part.x_diff_sort` was larger). This violates one of the core assumptions when using the sorting arrays in a neighbor loop.
### Expected behavior:
At any time, for all the particles, the particle's shift difference for any currently sorted 'sid' (indicated by `cell_hydro.sorted`) should be smaller than `cell_hydro.dx_max_sort` of the cell(s) containing the particle.
This is implicitly assumed when checking [whether resorting is needed](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/cell_unskip.c?ref_type=heads#L623) and explicitly checked in e.g. the [the hydro dopair functions](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/runner_doiact_functions_hydro.h?ref_type=heads#L1371-1380).
### Observed behavior:
For some specific task structures, some directions will not be sorted immediately after a cleanup of the sorts. For some particular movement of the particles, this can cause `cell_hydro.dx_max_sort` to decrease after those directions are sorted until it becomes smaller than some particles' shift differences for those directions (mechanism described above). I caught this behavior in the moving mesh branch with exaggerated steering causing some oscillatory motions in some of the particles.
### Why this might have gone unnoticed:
I think this might have gone unnoticed for several reasons:
- It only occurs for specific task structures and specific particle motions
- It is only enforced when debug checks are activated
- The result might not even be wrong if `cell_hydro.dx_max_sort` is too small. This just means that we *might* miss some particle interactions for some SID's
- Even if we do miss a few particle interactions, that will still be hard to see from the results alone
### Proposed fix:
Making [this change](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/commit/6ca0725f843f5ed9b15d9f2c5c0c656770ea6781) to the computation of `cell_hydro.dx_max_sort` fixes the issue for me, but does increase the number of times particles are sorted a bit.
I'm curious if anybody has other suggestions.Matthieu SchallerMatthieu Schaller