SWIFTsim issueshttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues2022-07-19T10:34:40Zhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/753Automatic generation of tasks2022-07-19T10:34:40ZLoic HausammannAutomatic generation of tasksThe task creation / activation / dependencies are becoming more and more complicated. A possible cleanup would be to generate automatically the task system directly from a simple input file that describes the system. After our weekly dis...The task creation / activation / dependencies are becoming more and more complicated. A possible cleanup would be to generate automatically the task system directly from a simple input file that describes the system. After our weekly discussion, the best approach seems to do everything directly in C even if it could slightly increase the computational cost.
@mivkovhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/886Allow users to choose the compression filters for the particle lightcone at r...2024-03-27T10:04:24ZMatthieu SchallerAllow users to choose the compression filters for the particle lightcone at runtime.We currently use `BFloa16` and that is not accurate enough.
Also, we may want to actually apply the same filters as in the snapshots rather than hardcode them.We currently use `BFloa16` and that is not accurate enough.
Also, we may want to actually apply the same filters as in the snapshots rather than hardcode them.Rob McGibbonRob McGibbonhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/884Debug check fails when checking whether to write snapshot after final simulat...2024-03-12T09:51:35ZMladen IvkovicDebug check fails when checking whether to write snapshot after final simulation step when output list is usedPorting issue here from slack, with instructions for reproducible example.
Configure with:
```
./configure --with-hydro-dimension=1 --enable-debug --enable-debugging-checks
```
Reproducible example:
```
swiftsim/examples/RadiativeTrans...Porting issue here from slack, with instructions for reproducible example.
Configure with:
```
./configure --with-hydro-dimension=1 --enable-debug --enable-debugging-checks
```
Reproducible example:
```
swiftsim/examples/RadiativeTransferTests/CosmoAdvection_1D
```
(yes, it's an RT example, but the debug check fail also occurs without RT.)
run with
```
../../../swift --hydro --cosmo rt_advection1D_medium_redshift.yml
```
Run crashes after final step:
```
[00001.8] engine_print_stats: Saving statistics at a=1.405296e-01
[00001.8] engine_dump_snapshot: Dumping snapshot at a=1.407294e-01
61 9.276920e-01 0.1398616 6.1499254 9.757966e-03 50 50 1000 0 0 0 0 8.487 24 0.101
[00001.8] cosmology.c:cosmology_get_delta_time():1294: ti_end must be >= ti_start
```
`gdb` backtrace:
```
[00001.8] cosmology.c:cosmology_get_delta_time():1294: ti_end must be >= ti_start
Thread 1 "swift" received signal SIGABRT, Aborted.
0x00007ffff4c969fc in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007ffff4c969fc in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff4c42476 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff4c287f3 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x0000000000483bd7 in cosmology_get_delta_time (c=<optimised out>, ti_start=<optimised out>, ti_end=<optimised out>) at cosmology.c:1294
#4 0x000000000044c7d4 in engine_io_check_snapshot_triggers (e=e@entry=0x7ffffff9a638) at engine_io.c:1270
#5 0x0000000000435f0a in engine_step (e=e@entry=0x7ffffff9a638) at engine.c:2465
#6 0x0000000000408283 in main (argc=<optimised out>, argv=<optimised out>) at swift.c:1720
(gdb) frame 3
#3 0x0000000000483bd7 in cosmology_get_delta_time (c=<optimised out>, ti_start=<optimised out>, ti_end=<optimised out>) at cosmology.c:1294
1294 if (ti_end < ti_start) error("ti_end must be >= ti_start");
(gdb) p ti_end
$1 = <optimised out>
(gdb) p ti_start
$2 = <optimised out>
(gdb) frame 4
#4 0x000000000044c7d4 in engine_io_check_snapshot_triggers (e=e@entry=0x7ffffff9a638) at engine_io.c:1270
1270 time_to_next_snap = cosmology_get_delta_time(e->cosmology, e->ti_current,
(gdb) l
1265 const int with_cosmology = (e->policy & engine_policy_cosmology);
1266
1267 /* Time until the next snapshot */
1268 double time_to_next_snap;
1269 if (e->policy & engine_policy_cosmology) {
1270 time_to_next_snap = cosmology_get_delta_time(e->cosmology, e->ti_current,
1271 e->ti_next_snapshot);
1272 } else {
1273 time_to_next_snap = (e->ti_next_snapshot - e->ti_current) * e->time_base;
1274 }
(gdb) p e->ti_current
$3 = 139611588448485376
(gdb) p e->ti_next_snapshot
$4 = -1
```
Stan's analysis:
> What I believe is happening is that when using a snapshot file, and you have reached the last snapshot, the ti_next_snapshot is explicitly set to -1. However, when running with debugging checks, if ti_next_snapshot is negative, it throws an error.
I don't remember the exact files/lines where it happens, but the -1 is deliberately set in one of the files that checks for snapshot times in a snapshot list.https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/883testActivePair fails2024-03-08T08:36:02ZMatthieu SchallertestActivePair failsSummary from jenkins:
```
Running ./testActivePair -n 6 -r 1 -d 0 -f active
[00000.0] main: Seed used for RNG: 1709851327
Tolerances read from file
Checking all particles in the file.
No differences found
Tolerances read from file
Chec...Summary from jenkins:
```
Running ./testActivePair -n 6 -r 1 -d 0 -f active
[00000.0] main: Seed used for RNG: 1709851327
Tolerances read from file
Checking all particles in the file.
No differences found
Tolerances read from file
Checking all particles in the file.
Relative difference larger than tolerance (2.400000e-03) for particle 17089, column h_dt:
File 1: a = -1.701384e-05
File 2: b = -1.692253e-05
Difference: |a-b|/|a+b| = 2.690624e-03
Accuracy test failed
==================
./configure --disable-optimization
gcc 7.3.0
```https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/881oneAPI 2023 & 2024 icx raises warnings with mpich/4.1.22024-03-08T09:54:39ZMladen IvkoviconeAPI 2023 & 2024 icx raises warnings with mpich/4.1.2Here are the warnings I'm seeing:
```
space.c:1698:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1698 | MPI_Exscan(&local_...Here are the warnings I'm seeing:
```
space.c:1698:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1698 | MPI_Exscan(&local_nr_parts, &offset_parts, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
space.c:1700:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1700 | MPI_Exscan(&local_nr_sinks, &offset_sinks, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
space.c:1702:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1702 | MPI_Exscan(&local_nr_sparts, &offset_sparts, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
space.c:1704:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1704 | MPI_Exscan(&local_nr_bparts, &offset_bparts, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
space.c:1706:14: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1706 | MPI_Exscan(&local_nr_dm, &offset_dm, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
space.c:1708:14: warning: argument type 'size_t *' (aka 'unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1708 | MPI_Exscan(&local_nr_dm_background, &offset_dm_background, 1,
| ^~~~~~~~~~~~~~~~~~~~~~~
1709 | MPI_LONG_LONG_INT, MPI_SUM, MPI_COMM_WORLD);
| ~~~~~~~~~~~~~~~~~
space.c:1710:14: warning: argument type 'size_t *' (aka 'unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1710 | MPI_Exscan(&local_nr_nuparts, &offset_nuparts, 1, MPI_LONG_LONG_INT, MPI_SUM,
| ^~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
```
```
space_unique_id.c:60:17: warning: argument type 'const int *' doesn't match specified 'MPI' type tag that requires 'int *' [-Wtype-safety]
60 | MPI_Allgather(&require_new_batch, 1, MPI_INT, all_requires, 1, MPI_INT,
| ^~~~~~~~~~~~~~~~~~ ~~~~~~~
```
```
engine.c:325:18: warning: argument type 'double (*)[4]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
325 | MPI_Gather(&timemem, 4, MPI_DOUBLE, timemems, 4, MPI_DOUBLE, 0,
| ^~~~~~~~ ~~~~~~~~~~
engine.c:1284:31: warning: argument type 'float (*)[5]' doesn't match specified 'MPI' type tag that requires 'float *' [-Wtype-safety]
1284 | MPI_Allreduce(MPI_IN_PLACE, &e->s->max_mpole_power,
| ^~~~~~~~~~~~~~~~~~~~~~
1285 | SELF_GRAVITY_MULTIPOLE_ORDER + 1, MPI_FLOAT, MPI_MAX,
| ~~~~~~~~~
```
```
task.c:1578:67: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1578 | int res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : sum), sum, size,
| ^~~
1579 | MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
task.c:1582:64: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1582 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : tsum), tsum, size,
| ^~~~
1583 | MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
task.c:1586:65: warning: argument type 'int (*)[33]' doesn't match specified 'MPI' type tag that requires 'int *' [-Wtype-safety]
1586 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : count), count, size,
| ^~~~~
1587 | MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
| ~~~~~~~
task.c:1590:63: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1590 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : min), min, size,
| ^~~
1591 | MPI_DOUBLE, MPI_MIN, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
task.c:1594:64: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1594 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : tmin), tmin, size,
| ^~~~
1595 | MPI_DOUBLE, MPI_MIN, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
task.c:1598:63: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1598 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : max), max, size,
| ^~~
1599 | MPI_DOUBLE, MPI_MAX, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
task.c:1602:64: warning: argument type 'double (*)[33]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
1602 | res = MPI_Reduce((engine_rank == 0 ? MPI_IN_PLACE : tmax), tmax, size,
| ^~~~
1603 | MPI_DOUBLE, MPI_MAX, 0, MPI_COMM_WORLD);
| ~~~~~~~~~~
```
```
distributed_io.c:952:17: warning: argument type 'const long long *' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
952 | MPI_Allreduce(N, N_total, swift_type_count, MPI_LONG_LONG_INT, MPI_SUM, comm);
| ^ ~~~~~~~~~~~~~~~~~
distributed_io.c:957:14: warning: argument type 'const long long *' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
957 | MPI_Gather(N, swift_type_count, MPI_LONG_LONG_INT, N_counts, swift_type_count,
| ^ ~~~~~~~~~~~~~~~~~
distributed_io.c:1503:14: warning: argument type 'const long long *' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1503 | MPI_Exscan(N, global_offsets, swift_type_count, MPI_LONG_LONG_INT, MPI_SUM,
| ^ ~~~~~~~~~~~~~~~~~
```
```
parallel_io.c:1606:14: warning: argument type 'size_t *' (aka 'unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1606 | MPI_Exscan(N, offset, swift_type_count, MPI_LONG_LONG_INT, MPI_SUM, comm);
| ^ ~~~~~~~~~~~~~~~~~
```
```
fof_catalogue_io.c:532:17: warning: argument type 'const size_t *' (aka 'const unsigned long *') doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
532 | MPI_Allreduce(&num_groups, &num_groups_total, 1, MPI_LONG_LONG, MPI_SUM,
| ^~~~~~~~~~~ ~~~~~~~~~~~~~
```
```
neutrino/Default/neutrino.c:218:14: warning: argument type 'double (*)[7]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
218 | MPI_Reduce(&sums, &total_sums, 7, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
| ^~~~~ ~~~~~~~~~~
neutrino/Default/neutrino.c:218:21: warning: argument type 'double (*)[7]' doesn't match specified 'MPI' type tag that requires 'double *' [-Wtype-safety]
218 | MPI_Reduce(&sums, &total_sums, 7, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
| ^~~~~~~~~~~ ~~~~~~~~~~
```
```
swift.c:1327:19: warning: argument type 'long long (*)[8]' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1327 | MPI_Allreduce(&N_long, &N_total, swift_type_count + 1, MPI_LONG_LONG_INT,
| ^~~~~~~ ~~~~~~~~~~~~~~~~~
swift.c:1327:28: warning: argument type 'long long (*)[8]' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1327 | MPI_Allreduce(&N_long, &N_total, swift_type_count + 1, MPI_LONG_LONG_INT,
| ^~~~~~~~ ~~~~~~~~~~~~~~~~~
swift.c:1452:19: warning: argument type 'long long (*)[8]' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1452 | MPI_Allreduce(&N_long, &N_total, swift_type_count + 1, MPI_LONG_LONG_INT,
| ^~~~~~~ ~~~~~~~~~~~~~~~~~
swift.c:1452:28: warning: argument type 'long long (*)[8]' doesn't match specified 'MPI' type tag that requires 'long long *' [-Wtype-safety]
1452 | MPI_Allreduce(&N_long, &N_total, swift_type_count + 1, MPI_LONG_LONG_INT,
| ^~~~~~~~ ~~~~~~~~~~~~~~~~~
```https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/877Investigate whether MPI_Continuation can be useful2023-11-30T16:58:48ZMatthieu SchallerInvestigate whether MPI_Continuation can be usefulMaybe for the proxy exchange of cells that can't be thread parallelised?Maybe for the proxy exchange of cells that can't be thread parallelised?Matthieu SchallerMatthieu Schallerhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/875Wendland C6 missing neighbour contributions2023-11-30T16:59:15ZThomas SandnesWendland C6 missing neighbour contributionsw < 0 and/or dw_dx > 0 for at least some (maybe all) pairs with ~0.85 < x < 1 in kernel_deval with Wendland C6 and eta = 1.866. This might affect other kernels as well.
A reproducible test case on master branch:
- run hydro test exampl...w < 0 and/or dw_dx > 0 for at least some (maybe all) pairs with ~0.85 < x < 1 in kernel_deval with Wendland C6 and eta = 1.866. This might affect other kernels as well.
A reproducible test case on master branch:
- run hydro test examples/HydroTests/SodShock_3D as normal
- ./configure --with-hydro=minimal --with-kernel=wendland-C6 --disable-hand-vec
- set eta = 1.866 in .yml file
- To catch error: on line 277 of kernel_hydro.h (x < 0.85 is chosen to demonstrate the range of x affected without a fix, rather than to be an indicator of whether the problem is solved):
if (x < 0.85 && w <= 0)
error("Test Error: w = %.20f for x = %.20f", w, x);Matthieu SchallerMatthieu Schallerhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/864Update spin jet AGN feedback RTD documentation2024-02-26T11:12:04ZMatthieu SchallerUpdate spin jet AGN feedback RTD documentationAfter !1727 the documentation is out of date.After !1727 the documentation is out of date.Filip HuskoFilip Huskohttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/863Issue with the way `cell_hydro.dx_max_sort` is computed2023-12-05T16:49:57ZYolan UyttenhoveIssue with the way `cell_hydro.dx_max_sort` is computedI want to preface this with saying that I'm pretty sure, this is a real issue, but that I'm also aware that this really is part of the core of SWIFT and has been looked at and tested extensively, so I might be wrong after all...
### Des...I want to preface this with saying that I'm pretty sure, this is a real issue, but that I'm also aware that this really is part of the core of SWIFT and has been looked at and tested extensively, so I might be wrong after all...
### Description of the issue:
Because of the way `cell_hydro.dx_max_sort` is currently computed (see [lines 323-326](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/cell_drift.c?ref_type=heads#L323-326) and lines [365](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/cell_drift.c?ref_type=heads#L365), [371](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/cell_drift.c?ref_type=heads#L371) in `cell_drift.c` and [211-213](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/drift.h?ref_type=heads#L211-213) in `drift.h`), it is not monotonically increasing.
When some sorting directions are computed later than others, `cell_hydro.dx_max_sort` is no longer guaranteed to be an upper bound of the individual particles' particle shift differences for the sorted `sid`'s (as checked in e.g. [the hydro dopair functions](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/runner_doiact_functions_hydro.h?ref_type=heads#L1371-1380)).
This is because `cell_hydro.dx_max_sort` is computed from `hydro_part.x_diff_sort` which stores the vector offset from the position of the particle when the first sorts where computed (since the last cleanup of the sorts) of the cell containing the particle. Suppose that for some `sid` the sorts are updated at a later timestep then the last cleanup (where `hydro_part.x_diff_sort` has been reset) and subsequently, the particles move back to a position closer to their positions at cleanup. Then `cell_hydro.dx_max_sort` can decrease and actually become smaller than some of the particles' shift differences for the `sid` that was sorted later (when the length of `hydro_part.x_diff_sort` was larger). This violates one of the core assumptions when using the sorting arrays in a neighbor loop.
### Expected behavior:
At any time, for all the particles, the particle's shift difference for any currently sorted 'sid' (indicated by `cell_hydro.sorted`) should be smaller than `cell_hydro.dx_max_sort` of the cell(s) containing the particle.
This is implicitly assumed when checking [whether resorting is needed](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/cell_unskip.c?ref_type=heads#L623) and explicitly checked in e.g. the [the hydro dopair functions](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/runner_doiact_functions_hydro.h?ref_type=heads#L1371-1380).
### Observed behavior:
For some specific task structures, some directions will not be sorted immediately after a cleanup of the sorts. For some particular movement of the particles, this can cause `cell_hydro.dx_max_sort` to decrease after those directions are sorted until it becomes smaller than some particles' shift differences for those directions (mechanism described above). I caught this behavior in the moving mesh branch with exaggerated steering causing some oscillatory motions in some of the particles.
### Why this might have gone unnoticed:
I think this might have gone unnoticed for several reasons:
- It only occurs for specific task structures and specific particle motions
- It is only enforced when debug checks are activated
- The result might not even be wrong if `cell_hydro.dx_max_sort` is too small. This just means that we *might* miss some particle interactions for some SID's
- Even if we do miss a few particle interactions, that will still be hard to see from the results alone
### Proposed fix:
Making [this change](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/commit/6ca0725f843f5ed9b15d9f2c5c0c656770ea6781) to the computation of `cell_hydro.dx_max_sort` fixes the issue for me, but does increase the number of times particles are sorted a bit.
I'm curious if anybody has other suggestions.Matthieu SchallerMatthieu Schallerhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/858Upgrade code to use hdf5 1.142024-01-10T09:59:22ZMatthieu SchallerUpgrade code to use hdf5 1.14HDF5 1.14.x is now the only officially supported version. We should transition.
Possible caveat:
- Master code (no changes) seems to be hanging when reading data with hdf5 1.14 in parallel with the latest ICX (2022.3.0) and UCX 1.10. ...HDF5 1.14.x is now the only officially supported version. We should transition.
Possible caveat:
- Master code (no changes) seems to be hanging when reading data with hdf5 1.14 in parallel with the latest ICX (2022.3.0) and UCX 1.10. Works fine when swapping to hdf5 1.12.0.Rob McGibbonRob McGibbonhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/856Heterogenous METIS partitioning.2023-07-17T12:10:29ZPeter W. DraperHeterogenous METIS partitioning.We will need a method to load balance a job that spans heterogeneous clusters, like the new COSMA8 queue, which
will have a mixture of Rome and Milan AMD cpus. Hopefully the imbalance can be assigned to a single number in this
case, so w...We will need a method to load balance a job that spans heterogeneous clusters, like the new COSMA8 queue, which
will have a mixture of Rome and Milan AMD cpus. Hopefully the imbalance can be assigned to a single number in this
case, so we don't need to worry about task-level runtime differences and work at the level of nodes.
In that case METIS and ParMETIS have a parameter array `tpwgts`, which is used to change the fraction of vertex weight that should be used for each node (partition in METIS terms). So we should assign this values based on the runtime
differences. The hard part would seem to be working out which nodes are which, probably simplest to just
use a flat file of names for that and use the MPI processor names.
One effect of this imbalance will be a change in the memory balance, with the Milan nodes probably being
given more cells and particles, out of necessity.Peter W. DraperPeter W. Draperhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/853Erasure encoded restart files.2023-03-28T15:33:17ZPeter W. DraperErasure encoded restart files.Fun idea from Alastair Basden. To protect against the loss of single OSTs, which can happen on the COSMA snap file stores,
why not write out the restart files using an erasure encoded technique. This is the way that RAID devices can
oper...Fun idea from Alastair Basden. To protect against the loss of single OSTs, which can happen on the COSMA snap file stores,
why not write out the restart files using an erasure encoded technique. This is the way that RAID devices can
operate, but working at the level of files.
So instead of writing one restart file, we write let's say 6 files and we arrange to erasure encode these so that the loss
of one file isn't fatal as the missing data can be reconstructed from the remaining 5.
Could be code out there that does this already, but clearly we would need a lot of effort to make sure this worked and
didn't impact performance too much. It would also have a 20% overhead, which is better than a simple duplicate.
Also note we would need to make sure each file was written to a different OST.Peter W. DraperPeter W. Draper2099-12-31https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/847testGravityDerivatives fails2023-01-31T02:33:35ZMatthieu SchallertestGravityDerivatives failsAs reported by the test suite:
```
FAIL: testGravityDerivatives
============================
[00000.0] main: Seed = 1675088708
[00000.0] main: Testing M2L gravity for r=(4.906407e+01 3.594954e+01 1.339894e+01)
[00000.0] main: Mesh scal...As reported by the test suite:
```
FAIL: testGravityDerivatives
============================
[00000.0] main: Seed = 1675088708
[00000.0] main: Testing M2L gravity for r=(4.906407e+01 3.594954e+01 1.339894e+01)
[00000.0] main: Mesh scale r_s=9.727518e+01 periodic=0
[00000.0] testGravityDerivatives.c:test():919: Relative difference (3.873397e-04) for 'M2L D_020' (swift=-2.213255e-09) and (exact=-2.214113e-09) exceeds tolerance (1.000000e-04)
```Matthieu SchallerMatthieu Schallerhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/834Discussion: Change chemistry API to enable advection of metallicity (gimzo-mv...2024-02-29T17:19:52ZYolan UyttenhoveDiscussion: Change chemistry API to enable advection of metallicity (gimzo-mvf, shadowswift)For moving mesh hydro and any hydro scheme with non-zero mass fluxes between particles, we should probably advect the metals used for the chemistry. This would be similar to what @mivkov is already doing in GEARRT (see [here](https://git...For moving mesh hydro and any hydro scheme with non-zero mass fluxes between particles, we should probably advect the metals used for the chemistry. This would be similar to what @mivkov is already doing in GEARRT (see [here](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/rt/GEAR/rt_additions.h) and [here](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/blob/master/src/rt/GEAR/rt.h#L533-615)).
So following this example, we could add a new function to the chemistry API, e.g. `chemistry_update_mass_fluxes(...)`, which could be called by the relevant hydro schemes at the flux exchange calculations, and apply those fluxes in the `kick`. This would also require extending the `chemistry_part_data` struct to store those fluxes.
# Current status:
The approach discussed here has been implemented for the `EAGLE` chemistry scheme, see !1825 and the `GEAR` and `AGORA` shemes (!1853). The necessary additions (no-ops) have been made to make the `none` and `QLA` chemistry schemes work with all hydro schemes. Any hydro scheme that performs mass fluxes (in master currently only `gizmo-mfv`) will currently not compile with the `GEAR-diffusion` chemistry scheme.
# To-do/questions:
- [ ] Do we need to advect other quantities tracked by the chemistry schemes as well? (e.g. `mass_from_SNIa`, `metal_mass_fraction_from_SNIa`... in `EAGLE`)
- [ ] Adapt `GEAR-diffusion`
- [ ] How to treat diffusion (only relevant for the `GEAR-diffusion` chemistry scheme)?Yolan UyttenhoveYolan Uyttenhovehttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/833Gizmo: zeroed density remains zero on a particle despite non-zero mass2023-09-29T08:41:26ZMladen IvkovicGizmo: zeroed density remains zero on a particle despite non-zero massWhile masses get exchanged and updated during the flux exchanges and kick2, the particle densities don't. This leads to the scenario where a particle keeps its density zero while interacting with other particles, whereas I think it shoul...While masses get exchanged and updated during the flux exchanges and kick2, the particle densities don't. This leads to the scenario where a particle keeps its density zero while interacting with other particles, whereas I think it shouldn't be.
An option would be to re-compute the updated density in kick2, or pretty much equivalently kick1.
Alternatively, we could re-write the `hydro_get_physical_density` function (which oddly enough is currently written in `hydro_setters.h` for some reason) to always return `p->conserved.mass / p->geometry.volume` rather than `p->rho`.
Concrete example can be found when running the low z EAGLE_12 example with hydro and gravity only (no feedback, cooling, etc.), i.e.
```
--hydro --threads=16 --stars --self-gravity
```
Particle `6370758347517` gets corrected for unphysical density in step ~120, and doesn't get re-set to proper values until 196.https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/832Incorrect flux.dt after limiting (Gimzo, moving mesh)2023-09-07T12:13:48ZYolan UyttenhoveIncorrect flux.dt after limiting (Gimzo, moving mesh)As I already discussed briefly [here](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/merge_requests/1656#note_48072), there is an issue with the `flux.dt` of particles that have been limited.
Consider 3 neighbouring particles on timebi...As I already discussed briefly [here](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/merge_requests/1656#note_48072), there is an issue with the `flux.dt` of particles that have been limited.
Consider 3 neighbouring particles on timebins corresponding to the following timesteps:
<details><summary>Before limiting</summary>
![orig](/uploads/0d839524ca1cbfc5efc70f2a26ab6d4f/orig.png)
</details>
I.e. Part 0 and Part 1 have last exchanged fluxes at `ti=16` and Part 0 and 2 have last exchanged fluxes at `ti=0`.
Now suppose that at `ti=20` Part 0 is timestep limited to a new timebin corresponding to a timestep of 4:
<details><summary>After limiting</summary>
![limit](/uploads/e0c560de07fd16b9b6cd368261a64a2d/limit.png)
</details>
Because the `flux.dt` is set during the kick, it will now also correspond to a timestep of 4. This is however only correct for calculating flux exchanges with neighbours which are also on timesteps of 4 or smaller...
For the flux exchange with Part 1, we actually need a `flux.dt` of 8:
<details><summary>Flux exchange with Part 1</summary>
![fluxdt1](/uploads/d557286f19b5823eb6818a5f931d9279/fluxdt1.png)
</details>
And for the flux exchange with Part 2 we need a `flux.dt of 24 (which does not even correspond to any timebin):
<details><summary>Flux exchange with Part 2</summary>
![fluxdt2](/uploads/0d2b8b62c4953866d2cc2a999ddca4fa/fluxdt2.png)
</details>
The "fix" I implemented for Moving mesh only solves the second case and is probably incorrect for cosmological time integration. This also did not show any noticeably difference in e.g. the Sedov blastwave test, but probably, more extreme examples of this issue can probably be constructed with star particles or sinks instead of hydro only.
To really fix this, I think we need to time integrate the fluxes from the maximal end of the __previous__ timestep of both particles to the minimal end of the __current__ timestep of both particles. and do that in a way that is correct with cosmological time integration.Yolan UyttenhoveYolan Uyttenhovehttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/830Implement vectorized quartic spline2022-12-22T16:26:57ZMatthieu SchallerImplement vectorized quartic splineMatthieu SchallerMatthieu Schallerhttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/829(GEAR) RT ToDo's2024-01-17T12:11:53ZMladen Ivkovic(GEAR) RT ToDo'sOpened this issue to keep track of stuff that needs to be done for RT.
## RT in general
* [ ] cosmology
* [x] flexible sub-cycling ( !1660)
* [ ] Idea: add a time step limiter for radiation injection to dynamically limit _star_ time st...Opened this issue to keep track of stuff that needs to be done for RT.
## RT in general
* [ ] cosmology
* [x] flexible sub-cycling ( !1660)
* [ ] Idea: add a time step limiter for radiation injection to dynamically limit _star_ time step sizes? Could be a limiter task, could be as simple as gathering neighbor time bins during feedback loop like we do for gas particles.
* [ ] At the end of sub-cycles, check whether hydro particles need to be "woken up" because their time step sizes decreased due to their internal energy being increased through photoheating. If necessary, abort further sub-cycles and perform a regular step.
* [x] Write sub-cycle data to a log file akin to timesteps.txt (!1841)
## GEAR-RT
* [ ] redshifting photons
* [ ] recombination radiation
* [ ] Doppler shifts?
* [ ] Couple to SPH?
* [ ] Deal with sharp discontinuities in particle configurations. See `examples/RadiativeTransferTests/AdvectionDifferentTimeStepSizes_1D`. (This is a problem for MFV hydrodynamics in Eulerian form as well.) Idea to try: During gradient loop, matrix B/E are already present. Use it to compute Aij. Then use Aij to solve linear advection of some field with value unity and with constant velocity over all space. That will reveal flux imbalances.
* [x] Theory (see branch `GEARRT_theory`) ([!1744](https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/merge_requests/1744))
* [ ] Cosmology theory
* [x] Write RT Riemann solver flavour to screen on startup and to snapshots' metadata. ( !1670)
* [ ] Add "option" for "unphysical rescue" of radiation/radiation fluxes when bad cases are detected
* [x] Check whether we can check for correct grackle version on/before compile time. Result: We can't.
* [ ] Check whether storing pressure tensors is more efficient than re-computing them every iact
* [ ] In hydro_iact: Use already existing min_dt for rt_update_mass_fluxes (also maybe rename rt_update_mass_fluxes)Mladen IvkovicMladen Ivkovichttps://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/826High temperature crash in idealized setup with AGN jets and variable masses2023-03-23T17:20:21ZFilip HuskoHigh temperature crash in idealized setup with AGN jets and variable massesI am working on a setup with idealized massive (10^14 - 10^15 Msol) galaxy clusters, with the aim of comparing the cold gas that forms from a cooling flow in the very centre with observations (content, morphology, kinematics, also X-rays...I am working on a setup with idealized massive (10^14 - 10^15 Msol) galaxy clusters, with the aim of comparing the cold gas that forms from a cooling flow in the very centre with observations (content, morphology, kinematics, also X-rays and radio related to feedback). The hybrid AGN jet/thermal model is used in these runs.
The focus of these simulations is the very centre (probably no more than 50 kpc), and we don't really care what the jets do on scales> 100kpc, e.g. how they deposit their energy, since that doesn't affect the cooling flow in the centre. For this reason I am using variable resolution beyond a small radius, 50-100 kpc, and the gas masses smoothly start dropping with radius. This provides a huge boost in terms of the central resolution (two orders of magnitude or so).
This is generally fine, but becomes a problem when we have low-mass jet particles reaching distances of e.g. 500 kpc, where the ambient particles may be 10^3 times more massive. I am aware that lots of SPH calculations may be unreliable in this regime, but as I said, for this purpose it probably doesn't matter. The actual issue is that very rarely (but at some point during the run) something goes wrong and a crash occurs with the error message at the end. Note that this is from runs in COLIBRE with CHIMES cooling, but I think this is a more general problem and I used to rarely get similar problems with the EAGLE model when I used use variable resolution starting at 500 kpc.
I am aware that the code wasn't meant to handle situations like this, and that this probably isn't a 'bug' per se, but it would be really nice if we could try resolving this (I don't have a good idea where to start - I have tried doing this with debug and debugging checks with no good results). It is possible that this might also help with eventual zoom-in simulations (if the jets reach the low-resolution region).
```
**************
ChimesGasVars:
**************
element_abundances[0] = 8.185575e-02
element_abundances[1] = 0.000000e+00
element_abundances[2] = 0.000000e+00
element_abundances[3] = 0.000000e+00
element_abundances[4] = 0.000000e+00
element_abundances[5] = 0.000000e+00
element_abundances[6] = 0.000000e+00
element_abundances[7] = 0.000000e+00
element_abundances[8] = 0.000000e+00
element_abundances[9] = 0.000000e+00
nH_tot = 9.166757e-04
temperature = 1.040293e+12
TempFloor = 1.000000e+01
divVel = 0.000000e+00
doppler_broad = 8.485281e+00
isotropic_photon_density[0] = 4.300861e-07
G0_parameter[0] = 8.264607e-07
H2_dissocJ[0] = 3.666669e-11
isotropic_photon_density[1] = 2.927030e-11
G0_parameter[1] = 1.075972e-07
H2_dissocJ[1] = 2.047000e-11
cr_rate = 1.530088e-21
metallicity = 4.396624e-01
dust_ratio = 4.653028e-23
cell_size = 1.683080e+18
hydro_timestep = 4.414097e+10
ForceEqOn = 0
ThermEvolOn = 1
InitIonState = 0
constant_heating_rate = 0.000000e+00
abundances[0] = 1.159628e+00
abundances[1] = 2.455163e-03
abundances[2] = 9.975442e-01
abundances[3] = 2.591237e-09
abundances[4] = 4.266575e-06
abundances[5] = 1.618964e-03
abundances[6] = 8.023231e-02
abundances[7] = 0.000000e+00
abundances[8] = 0.000000e+00
abundances[9] = 0.000000e+00
++++++++++++++
[03495.8] cooling/CHIMES_split/cooling.c:cooling_cool_part():1442: CHIMES ERROR: the temperature that is being passed to CHIMES is 1.040293e+12. This is too high!
```https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/825Change the Santa-Barbara plotting script to not use py-sphviewer2023-01-06T12:35:50ZMatthieu SchallerChange the Santa-Barbara plotting script to not use py-sphviewer`swiftsimio` is a better choice, more consistent with the rest of the repository.`swiftsimio` is a better choice, more consistent with the rest of the repository.