SWIFTsim issueshttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues2024-03-27T10:04:24Zhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/886Allow users to choose the compression filters for the particle lightcone at r...2024-03-27T10:04:24ZMatthieu SchallerAllow users to choose the compression filters for the particle lightcone at runtime.We currently use `BFloa16` and that is not accurate enough.
Also, we may want to actually apply the same filters as in the snapshots rather than hardcode them.We currently use `BFloa16` and that is not accurate enough.
Also, we may want to actually apply the same filters as in the snapshots rather than hardcode them.Rob McGibbonRob McGibbonhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/885Compilation error with `--with-subgrid=SPIN_JET_EAGLE`2024-03-15T15:40:34ZDarwinCompilation error with `--with-subgrid=SPIN_JET_EAGLE`Compiling in the master branch with: `./configure --with-subgrid=SPIN_JET_EAGLE` gives the errors:
```
././black_holes/SPIN_JET/black_holes_spin.h: In function 'j_BH':
././black_holes/SPIN_JET/black_holes_spin.h:99:7: error: absolute va...Compiling in the master branch with: `./configure --with-subgrid=SPIN_JET_EAGLE` gives the errors:
```
././black_holes/SPIN_JET/black_holes_spin.h: In function 'j_BH':
././black_holes/SPIN_JET/black_holes_spin.h:99:7: error: absolute value function 'fabsf' given an argument of type 'double' but has parameter of type 'float' which may cause truncation of value [-Werror=absolute-value]
99 | fabsf(bp->subgrid_mass * bp->subgrid_mass * bp->spin *
| ^~~~~
mv -f .deps/mpi-collectgroup.Tpo .deps/mpi-collectgroup.Plo
././black_holes/SPIN_JET/black_holes_spin.h: In function 'merger_spin_evolve':
././black_holes/SPIN_JET/black_holes_spin.h:1281:7: error: absolute value function 'fabsf' given an argument of type 'double' but has parameter of type 'float' which may cause truncation of value [-Werror=absolute-value]
1281 | fabsf(bpi->subgrid_mass * bpi->subgrid_mass * bpi->spin *
| ^~~~~
././black_holes/SPIN_JET/black_holes_spin.h:1284:7: error: absolute value function 'fabsf' given an argument of type 'double' but has parameter of type 'float' which may cause truncation of value [-Werror=absolute-value]
1284 | fabsf(bpj->subgrid_mass * bpj->subgrid_mass * bpj->spin *
| ^~~~~
```
I used gcc 13.2.1.Filip HuskoFilip Huskohttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/884Debug check fails when checking whether to write snapshot after final simulat...2024-03-12T09:51:35ZMladen IvkovicDebug check fails when checking whether to write snapshot after final simulation step when output list is usedPorting issue here from slack, with instructions for reproducible example.
Configure with:
```
./configure --with-hydro-dimension=1 --enable-debug --enable-debugging-checks
```
Reproducible example:
```
swiftsim/examples/RadiativeTrans...Porting issue here from slack, with instructions for reproducible example.
Configure with:
```
./configure --with-hydro-dimension=1 --enable-debug --enable-debugging-checks
```
Reproducible example:
```
swiftsim/examples/RadiativeTransferTests/CosmoAdvection_1D
```
(yes, it's an RT example, but the debug check fail also occurs without RT.)
run with
```
../../../swift --hydro --cosmo rt_advection1D_medium_redshift.yml
```
Run crashes after final step:
```
[00001.8] engine_print_stats: Saving statistics at a=1.405296e-01
[00001.8] engine_dump_snapshot: Dumping snapshot at a=1.407294e-01
61 9.276920e-01 0.1398616 6.1499254 9.757966e-03 50 50 1000 0 0 0 0 8.487 24 0.101
[00001.8] cosmology.c:cosmology_get_delta_time():1294: ti_end must be >= ti_start
```
`gdb` backtrace:
```
[00001.8] cosmology.c:cosmology_get_delta_time():1294: ti_end must be >= ti_start
Thread 1 "swift" received signal SIGABRT, Aborted.
0x00007ffff4c969fc in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007ffff4c969fc in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff4c42476 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff4c287f3 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x0000000000483bd7 in cosmology_get_delta_time (c=<optimised out>, ti_start=<optimised out>, ti_end=<optimised out>) at cosmology.c:1294
#4 0x000000000044c7d4 in engine_io_check_snapshot_triggers (e=e@entry=0x7ffffff9a638) at engine_io.c:1270
#5 0x0000000000435f0a in engine_step (e=e@entry=0x7ffffff9a638) at engine.c:2465
#6 0x0000000000408283 in main (argc=<optimised out>, argv=<optimised out>) at swift.c:1720
(gdb) frame 3
#3 0x0000000000483bd7 in cosmology_get_delta_time (c=<optimised out>, ti_start=<optimised out>, ti_end=<optimised out>) at cosmology.c:1294
1294 if (ti_end < ti_start) error("ti_end must be >= ti_start");
(gdb) p ti_end
$1 = <optimised out>
(gdb) p ti_start
$2 = <optimised out>
(gdb) frame 4
#4 0x000000000044c7d4 in engine_io_check_snapshot_triggers (e=e@entry=0x7ffffff9a638) at engine_io.c:1270
1270 time_to_next_snap = cosmology_get_delta_time(e->cosmology, e->ti_current,
(gdb) l
1265 const int with_cosmology = (e->policy & engine_policy_cosmology);
1266
1267 /* Time until the next snapshot */
1268 double time_to_next_snap;
1269 if (e->policy & engine_policy_cosmology) {
1270 time_to_next_snap = cosmology_get_delta_time(e->cosmology, e->ti_current,
1271 e->ti_next_snapshot);
1272 } else {
1273 time_to_next_snap = (e->ti_next_snapshot - e->ti_current) * e->time_base;
1274 }
(gdb) p e->ti_current
$3 = 139611588448485376
(gdb) p e->ti_next_snapshot
$4 = -1
```
Stan's analysis:
> What I believe is happening is that when using a snapshot file, and you have reached the last snapshot, the ti_next_snapshot is explicitly set to -1. However, when running with debugging checks, if ti_next_snapshot is negative, it throws an error.
I don't remember the exact files/lines where it happens, but the -1 is deliberately set in one of the files that checks for snapshot times in a snapshot list.https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/883testActivePair fails2024-03-08T08:36:02ZMatthieu SchallertestActivePair failsSummary from jenkins:
```
Running ./testActivePair -n 6 -r 1 -d 0 -f active
[00000.0] main: Seed used for RNG: 1709851327
Tolerances read from file
Checking all particles in the file.
No differences found
Tolerances read from file
Chec...Summary from jenkins:
```
Running ./testActivePair -n 6 -r 1 -d 0 -f active
[00000.0] main: Seed used for RNG: 1709851327
Tolerances read from file
Checking all particles in the file.
No differences found
Tolerances read from file
Checking all particles in the file.
Relative difference larger than tolerance (2.400000e-03) for particle 17089, column h_dt:
File 1: a = -1.701384e-05
File 2: b = -1.692253e-05
Difference: |a-b|/|a+b| = 2.690624e-03
Accuracy test failed
==================
./configure --disable-optimization
gcc 7.3.0
```https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/882oneAPI 2023 & 2024 icx don't get -Werror flag passed2024-01-24T21:53:50ZMladen IvkoviconeAPI 2023 & 2024 icx don't get -Werror flag passedWhen configuring with icx, the CFLAGS variable doesn't get the `-Werror` flag passed, and by default I can compile despite warnings.When configuring with icx, the CFLAGS variable doesn't get the `-Werror` flag passed, and by default I can compile despite warnings.https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/881oneAPI 2023 & 2024 icx raises warnings with mpich/4.1.22024-03-08T09:54:39ZMladen IvkoviconeAPI 2023 & 2024 icx raises warnings with mpich/4.1.2Here are the warnings I'm seeing:
```
space.c:1698:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1698 | MPI_Exscan(&local_...Here are the warnings I'm seeing:
```
space.c:1698:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1698 | MPI_Exscan(&local_nr_parts, &offset_parts, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
space.c:1700:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1700 | MPI_Exscan(&local_nr_sinks, &offset_sinks, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
space.c:1702:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1702 | MPI_Exscan(&local_nr_sparts, &offset_sparts, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
space.c:1704:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1704 | MPI_Exscan(&local_nr_bparts, &offset_bparts, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
space.c:1706:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1706 | MPI_Exscan(&local_nr_dm, &offset_dm, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
space.c:1708:14: warning: argument type 'size_t *' (aka 'unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1708 | MPI_Exscan(&local_nr_dm_background, &offset_dm_background, 1,
| ^~~~~~~~~~~~~~~~~~~~~~~
1709 | MPI_LONG_LONG_INT, MPI_SUM, MPI_COMM_WORLD);
| ~~~~~~~~~~~~~~~~~
space.c:1710:14: warning: argument type 'size_t *' (aka 'unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1710 | MPI_Exscan(&local_nr_nuparts, &offset_nuparts, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
```
```
space_unique_id.c:60:17: warning: argument type 'const int *' doesn't match specified 'MPI' type tag that requires 'int *' [-Wtype-safety]
60 | MPI_Allgather(&require_new_batch, 1, MPI_INT, all_requires, 1, MPI_INT,
| ^~~~~~~~~~~~~~~~~~ ~~~~~~~
```
```
engine.c:325:18: warning: argument type 'double (*)[4]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
325 | MPI_Gather(&timemem, 4, MPI_DOUBLE, timemems, 4, MPI_DOUBLE, 0,
| ^~~~~~~~ ~~~~~~~~~~
engine.c:1284:31: warning: argument type 'float (*)[5]' doesn't match specified 'MPI' type tag that requires 'float *' [-Wtype-safety]
1284 | MPI_Allreduce(MPI_IN_PLACE, &e->s->max_mpole_power,
| ^~~~~~~~~~~~~~~~~~~~~~
1285 | SELF_GRAVITY_MULTIPOLE_ORDER + 1, MPI_FLOAT, MPI_MAX,
| ~~~~~~~~~
```
```
task.c:1578:67: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1578 | int res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : sum), sum, size,
| ^~~
1579 | MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
task.c:1582:64: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1582 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : tsum), tsum, size,
| ^~~~
1583 | MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
task.c:1586:65: warning: argument type 'int (*)[33]' doesn't match specified 'MPI' type tag that requires 'int *' [-Wtype-safety]
1586 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : count), count, size,
| ^~~~~
1587 | MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
| ~~~~~~~
task.c:1590:63: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1590 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : min), min, size,
| ^~~
1591 | MPI_DOUBLE, MPI_MIN, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
task.c:1594:64: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1594 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : tmin), tmin, size,
| ^~~~
1595 | MPI_DOUBLE, MPI_MIN, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
task.c:1598:63: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1598 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : max), max, size,
| ^~~
1599 | MPI_DOUBLE, MPI_MAX, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
task.c:1602:64: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1602 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : tmax), tmax, size,
| ^~~~
1603 | MPI_DOUBLE, MPI_MAX, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
```
```
distributed_io.c:952:17: warning: argument type 'const long long *' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
952 | MPI_Allreduce(N, N_total, swift_type_count, MPI_LONG_LONG_INT, MPI_SUM, comm);
| ^ ~~~~~~~~~~~~~~~~~
distributed_io.c:957:14: warning: argument type 'const long long *' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
957 | MPI_Gather(N, swift_type_count, MPI_LONG_LONG_INT, N_counts, swift_type_count,
| ^ ~~~~~~~~~~~~~~~~~
distributed_io.c:1503:14: warning: argument type 'const long long *' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1503 | MPI_Exscan(N, global_offsets, swift_type_count, MPI_LONG_LONG_INT, MPI_SUM,
| ^ ~~~~~~~~~~~~~~~~~
```
```
parallel_io.c:1606:14: warning: argument type 'size_t *' (aka 'unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1606 | MPI_Exscan(N, offset, swift_type_count, MPI_LONG_LONG_INT, MPI_SUM, comm);
| ^ ~~~~~~~~~~~~~~~~~
```
```
fof_catalogue_io.c:532:17: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
532 | MPI_Allreduce(&num_groups, &num_groups_total, 1, MPI_LONG_LONG, MPI_SUM,
| ^~~~~~~~~~~ ~~~~~~~~~~~~~
```
```
neutrino/Default/neutrino.c:218:14: warning: argument type 'double (*)[7]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
218 | MPI_Reduce(&sums, &total_sums, 7, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
| ^~~~~ ~~~~~~~~~~
neutrino/Default/neutrino.c:218:21: warning: argument type 'double (*)[7]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
218 | MPI_Reduce(&sums, &total_sums, 7, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
| ^~~~~~~~~~~ ~~~~~~~~~~
```
```
swift.c:1327:19: warning: argument type 'long long (*)[8]' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1327 | MPI_Allreduce(&N_long, &N_total, swift_type_count + 1, MPI_LONG_LONG_INT,
| ^~~~~~~ ~~~~~~~~~~~~~~~~~
swift.c:1327:28: warning: argument type 'long long (*)[8]' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1327 | MPI_Allreduce(&N_long, &N_total, swift_type_count + 1, MPI_LONG_LONG_INT,
| ^~~~~~~~ ~~~~~~~~~~~~~~~~~
swift.c:1452:19: warning: argument type 'long long (*)[8]' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1452 | MPI_Allreduce(&N_long, &N_total, swift_type_count + 1, MPI_LONG_LONG_INT,
| ^~~~~~~ ~~~~~~~~~~~~~~~~~
swift.c:1452:28: warning: argument type 'long long (*)[8]' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1452 | MPI_Allreduce(&N_long, &N_total, swift_type_count + 1, MPI_LONG_LONG_INT,
| ^~~~~~~~ ~~~~~~~~~~~~~~~~~
```https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/880memory leak in repart_edge_metis src/partition.c:16352024-02-02T15:23:36ZMladen Ivkovicmemory leak in repart_edge_metis src/partition.c:1635Stumbled upon this today when debugging my own bugs. Sanitizer detects a memory leak in
Here's my sanitizer output:
```
Direct leak of 2916 byte(s) in 1 object(s) allocated from:
#0 0x7f28738bf91f in __interceptor_malloc ../../../...Stumbled upon this today when debugging my own bugs. Sanitizer detects a memory leak in
Here's my sanitizer output:
```
Direct leak of 2916 byte(s) in 1 object(s) allocated from:
#0 0x7f28738bf91f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
#1 0x55a6bfad779a in repart_edge_metis /home/mivkov/EPFL/rt/swiftsim/src/partition.c:1635
#2 0x55a6bfadae0b in partition_repartition /home/mivkov/EPFL/rt/swiftsim/src/partition.c:1873
#3 0x55a6bf997913 in engine_repartition /home/mivkov/EPFL/rt/swiftsim/src/engine.c:213
#4 0x55a6bf9a0053 in engine_prepare /home/mivkov/EPFL/rt/swiftsim/src/engine.c:1454
#5 0x55a6bf9a9d5f in engine_step /home/mivkov/EPFL/rt/swiftsim/src/engine.c:2564
#6 0x55a6bf91a81d in main /home/mivkov/EPFL/rt/swiftsim/swift.c:1723
#7 0x7f2871a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
```Peter W. DraperPeter W. Draperhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/879Adaptive gas softening2024-02-29T17:17:04ZMatthieu SchallerAdaptive gas softeningAdd the option to run with adaptive softening for the gas.Add the option to run with adaptive softening for the gas.Matthieu SchallerMatthieu Schallerhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/878Standalone FOF generality2024-01-10T09:59:30ZStuart McAlpineStandalone FOF generalityI was hoping to use the standalone FoF code, but for outputs from another code.
Trying to put together a dummy swift like snapshot, coordinates and velocities of DM particles in a PartType1 group, but I was hoping not to have to reprod...I was hoping to use the standalone FoF code, but for outputs from another code.
Trying to put together a dummy swift like snapshot, coordinates and velocities of DM particles in a PartType1 group, but I was hoping not to have to reproduce the whole swift header information.
Is it basically locked into the assumption of a total swift snapshot layout? I assume the standalone FoF bit won't need much input?https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/877Investigate whether MPI_Continuation can be useful2023-11-30T16:58:48ZMatthieu SchallerInvestigate whether MPI_Continuation can be usefulMaybe for the proxy exchange of cells that can't be thread parallelised?Maybe for the proxy exchange of cells that can't be thread parallelised?Matthieu SchallerMatthieu Schallerhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/876IPO on OSX with clang2023-11-30T17:30:22ZMatthieu SchallerIPO on OSX with clang@nshchutskyi just ran into trouble on his machine trying to compile the code. Now that we enforce the IPO, we demand that clang has `llvm-ranlib` installed which seems to not exist on OSX.
@jborrow maybe you have an idea? Should `llvm-...@nshchutskyi just ran into trouble on his machine trying to compile the code. Now that we enforce the IPO, we demand that clang has `llvm-ranlib` installed which seems to not exist on OSX.
@jborrow maybe you have an idea? Should `llvm-ar` be used instead?Peter W. DraperPeter W. Draperhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/875Wendland C6 missing neighbour contributions2023-11-30T16:59:15ZThomas SandnesWendland C6 missing neighbour contributionsw < 0 and/or dw_dx > 0 for at least some (maybe all) pairs with ~0.85 < x < 1 in kernel_deval with Wendland C6 and eta = 1.866. This might affect other kernels as well.
A reproducible test case on master branch:
- run hydro test exampl...w < 0 and/or dw_dx > 0 for at least some (maybe all) pairs with ~0.85 < x < 1 in kernel_deval with Wendland C6 and eta = 1.866. This might affect other kernels as well.
A reproducible test case on master branch:
- run hydro test examples/HydroTests/SodShock_3D as normal
- ./configure --with-hydro=minimal --with-kernel=wendland-C6 --disable-hand-vec
- set eta = 1.866 in .yml file
- To catch error: on line 277 of kernel_hydro.h (x < 0.85 is chosen to demonstrate the range of x affected without a fix, rather than to be an indicator of whether the problem is solved):
if (x < 0.85 && w <= 0)
error("Test Error: w = %.20f for x = %.20f", w, x);Matthieu SchallerMatthieu Schallerhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/874METIS github support2023-11-07T16:59:13ZPeter W. DraperMETIS github supportThe github release of METIS has unbundled the GKlib component into its own release.
This means we should also test for the presence of that library, not just `-lmetis`.
See !1813.The github release of METIS has unbundled the GKlib component into its own release.
This means we should also test for the presence of that library, not just `-lmetis`.
See !1813.Peter W. DraperPeter W. Draperhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/873Gravity cache size increase leads to crash2023-12-20T09:21:46ZMatthieu SchallerGravity cache size increase leads to crashBug uncovered by @Roduit and @yvesrevaz.
When the gravity cache of a tread is too small and it gets reallocated, the code crashes in `gravity_cache_clean()` while freeing the old arrays. It's unclear why this happens. The caches are pu...Bug uncovered by @Roduit and @yvesrevaz.
When the gravity cache of a tread is too small and it gets reallocated, the code crashes in `gravity_cache_clean()` while freeing the old arrays. It's unclear why this happens. The caches are purely thread-local as they are carrier by the `runner` object. They are originally allocated by the main thread in `engine_config()` but that should not matter.
Yves seems to have also caught it crash in `gravity_cache_zero_output()`.
This is not seen in normal operations as the caches never grow since they are allocated to be the size of a leaf cell. The new sink particle scheme creates a lot of particles and will thus ask for a cache reallocation on occasion.
Bug likely been there forever but just never triggered.https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/872Add a meta-"snapshot" for the FOF catalogs2023-12-01T13:08:24ZMatthieu SchallerAdd a meta-"snapshot" for the FOF catalogsSame as for the regular distributed snapshots.Same as for the regular distributed snapshots.https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/871Safer i/o list construction2024-03-04T21:06:58ZMatthieu SchallerSafer i/o list constructionFollowing !1803, it would be good to:
- Check that the number of entries we list for output does not go past the list size (currently 100 fields per particle)
- Check that the list does not contain gaps which segfault in the actual write.Following !1803, it would be good to:
- Check that the number of entries we list for output does not go past the list size (currently 100 fields per particle)
- Check that the list does not contain gaps which segfault in the actual write.https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/870testRandomCone fails2023-10-19T12:40:17ZMatthieu SchallertestRandomCone fails```
./testRandomCone
[350508.9] testRandomCone.c:test_cone():103: Generated distribution of random unit vectors within a cone exceeds the limit imposed by the tolerance.
``````
./testRandomCone
[350508.9] testRandomCone.c:test_cone():103: Generated distribution of random unit vectors within a cone exceeds the limit imposed by the tolerance.
```Filip HuskoFilip Huskohttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/869missing call to ../getPS2020CoolingTables.sh in EAGLE_502023-09-27T09:58:32ZYves Revazmissing call to ../getPS2020CoolingTables.sh in EAGLE_50a call to
../getPS2020CoolingTables.sh
in the run.sh of
swiftsim/examples/EAGLE_ICs/EAGLE_50
is missing.
Let me know, I you want me to create a branch to fix it.a call to
../getPS2020CoolingTables.sh
in the run.sh of
swiftsim/examples/EAGLE_ICs/EAGLE_50
is missing.
Let me know, I you want me to create a branch to fix it.Matthieu SchallerMatthieu Schallerhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/868MPI Bcast for params2023-09-27T12:59:39ZStuart McAlpineMPI Bcast for paramsI'm not sure how this has arisen for me, but I did recently pull and recompile swift (from main). The setup did work before with no issues.
I'm having issues with MPI hangs, over a certain number of ranks, when it broadcasts the params ...I'm not sure how this has arisen for me, but I did recently pull and recompile swift (from main). The setup did work before with no issues.
I'm having issues with MPI hangs, over a certain number of ranks, when it broadcasts the params file in swift.c
```
#ifdef WITH_MPI
/* Broadcast the parameter file */
MPI_Bcast(params, sizeof(struct swift_params), MPI_BYTE, 0, MPI_COMM_WORLD);
#endif
```
This is on c7-rp. With 10 ranks (14 tasks per rank, so 5 nodes), it works. Beyond that it hangs.
I was originally using mpi intel 2018, i tried a newer mpi, same thing. Although on intel 2021 it does at least give an error.
```
[0000] [00000.0] main: Reading runtime parameters from file 'params.yml'
[m7197:209413:0:209413] ib_mlx5_log.c:139 Transport retry count exceeded on mlx5_0:1/RoCE (synd 0x15 vend 0x81 hw_synd 0/0)
[m7197:209413:0:209413] ib_mlx5_log.c:139 RC QP 0x5c96 wqe[2]: SEND --e [va 0x2afcde9f5400 len 8256 lkey 0x5ff11]
[m7197:209414:0:209414] ib_mlx5_log.c:139 Transport retry count exceeded on mlx5_0:1/RoCE (synd 0x15 vend 0x81 hw_synd 0/0)
[m7197:209414:0:209414] ib_mlx5_log.c:139 RC QP 0x5c91 wqe[0]: SEND --e [va 0x2aed6eddee80 len 8256 lkey 0x61336]
/cosma/local/software/ucx/ucx-1.8.1/src/uct/ib/mlx5/ib_mlx5_log.c: [ uct_ib_mlx5_completion_with_err() ]
...
129 }
130
131 ucs_log(log_level,
==> 132 "%s on "UCT_IB_IFACE_FMT"/%s (synd 0x%x vend 0x%x hw_synd %d/%d)\n"
133 "%s QP 0x%x wqe[%d]: %s",
134 err_info, UCT_IB_IFACE_ARG(iface),
135 uct_ib_iface_is_roce(iface) ? "RoCE" : "IB",
/cosma/local/software/ucx/ucx-1.8.1/src/uct/ib/mlx5/ib_mlx5_log.c: [ uct_ib_mlx5_completion_with_err() ]
...
129 }
130
131 ucs_log(log_level,
==> 132 "%s on "UCT_IB_IFACE_FMT"/%s (synd 0x%x vend 0x%x hw_synd %d/%d)\n"
133 "%s QP 0x%x wqe[%d]: %s",
134 err_info, UCT_IB_IFACE_ARG(iface),
135 uct_ib_iface_is_roce(iface) ? "RoCE" : "IB",
==== backtrace (tid: 209414) ====
0 0x00000000000206f9 uct_ib_mlx5_completion_with_err() /cosma/local/software/ucx/ucx-1.8.1/src/uct/ib/mlx5/ib_mlx5_log.c:132
1 0x0000000000045da1 uct_rc_mlx5_iface_handle_failure() /cosma/local/software/ucx/ucx-1.8.1/src/uct/ib/rc/accel/rc_mlx5_iface.c:217
2 0x0000000000040bdd uct_ib_mlx5_poll_cq() /cosma/local/software/ucx/ucx-1.8.1/src/uct/ib/mlx5/ib_mlx5.inl:81
3 0x000000000002a715 ucs_callbackq_dispatch() /cosma/local/software/ucx/ucx-1.8.1/src/ucs/datastruct/callbackq.h:211
4 0x000000000002a715 uct_worker_progress() /cosma/local/software/ucx/ucx-1.8.1/src/uct/api/uct.h:2221
5 0x000000000002a715 ucp_worker_progress() /cosma/local/software/ucx/ucx-1.8.1/src/ucp/core/ucp_worker.c:1951
6 0x000000000000a191 mlx_ep_progress() mlx_ep.c:0
7 0x0000000000020bcd ofi_cq_progress() osd.c:0
8 0x000000000002199b ofi_cq_readfrom() osd.c:0
9 0x0000000000663166 fi_cq_read() /usr/include/rdma/fi_eq.h:385
10 0x00000000001a8f4b MPIDI_Progress_test() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_progress.c:181
11 0x00000000001a8f4b MPID_Progress_test() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_progress.c:236
12 0x00000000001a8f4b MPID_Progress_wait() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_progress.c:297
13 0x000000000080344b MPIR_Wait_impl() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpi/request/wait.c:40
```Peter W. DraperPeter W. Draperhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/867Memory leaks in space.c2023-10-07T09:12:56ZBert VandenbrouckeMemory leaks in space.cWhen running the 3D Sod shock with default configuration and the address sanitizer, leaks are reported from `space.c:454`: [leaks.txt](/uploads/ad14dd7b79a74d1745444f5839e292ea/leaks.txt).
I have not run SWIFT in a (long) while, but I a...When running the 3D Sod shock with default configuration and the address sanitizer, leaks are reported from `space.c:454`: [leaks.txt](/uploads/ad14dd7b79a74d1745444f5839e292ea/leaks.txt).
I have not run SWIFT in a (long) while, but I am pretty sure this did not happen a year ago.Matthieu SchallerMatthieu Schaller