Skip to content
Snippets Groups Projects

Add functions to permanently remove particles from a simulation

Merged Matthieu Schaller requested to merge removing_particles into master

Add functions to remove particles. When a particle is removed, it is first flagged as inhibited. Then ignored in all interactions. We eliminate them from memory at the next rebuild.

Also fixes an error reported from the undefined behaviour sanitizer and fixes #470 (closed).

Edited by Matthieu Schaller

Merge request reports

Merged by avatar (May 28, 2025 2:56pm UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added 1 commit

    • c3480fcb - Added a few more checks that no inhibited particles are going through the calculations.

    Compare with previous version

  • @jborrow could you check that the hydro test-suite still works on this with the debugging checks on and the minimal hydro scheme? Thanks!

    Edited by Matthieu Schaller
  • This may take some time as I think we can no longer access the internet from the compute nodes. I'll have to re-write the automated test suite if this is a permanent change.

  • No rush.

  • @pdraper This is the merge request for the particle re-ordering. The main changes are in space_rebuild and engine_redistribute.

    Note that there is no code at the moment calling the functions removing particles. But that can be done by uncommenting the temporary block of code in runner_end_force().

    There is one corner case that is not yet well-handled when you remove a particle and then dump a snapshot before rebuilding. A fix for that will come along with the functions adding particles.

  • @jborrow did you say all the tests passed?

    @jwillis you can see what I mean here with the inhibited particles that have to be left out of caches.

  • Yes, as far as I can see, this introduced no regressions in the hydro test suite (compared to master). There are a few minor issues with some of the schemes but they are not related to this MR, and they haven't been introduced recently.

  • just to confirm, that was run with minimal?

  • All schemes currently present in master (excepting shadowswift).

  • Okay, I've had a look and it seems that I just need to add a few new masks in some places.

  • Great. That's what I thought as well.

  • Hmm, thought I give it a spin and uncommented that test, which gave:

    ==22336==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x2ab5d3d620dc at pc 0x00000062a4f8 bp 0x2ab5e80716a0 sp 0x2ab5e8071690
    READ of size 1 at 0x2ab5d3d620dc thread T46
        #0 0x62a4f7 in gpart_is_inhibited /loc/pwda/pdraper/scratch/swift-tests/swiftsim-spare/src/active.h:302
        #1 0x62a4f7 in runner_dopair_grav_pp_truncated /loc/pwda/pdraper/scratch/swift-tests/swiftsim-spare/src/runner_doiact_grav.h:380
        #2 0x62a4f7 in runner_dopair_grav_pp /loc/pwda/pdraper/scratch/swift-tests/swiftsim-spare/src/runner_doiact_grav.h:833
        #3 0x62d48a in runner_dopair_recursive_grav /loc/pwda/pdraper/scratch/swift-tests/swiftsim-spare/src/runner_doiact_grav.h:1597
        #4 0x62d3e6 in runner_dopair_recursive_grav /loc/pwda/pdraper/scratch/swift-tests/swiftsim-spare/src/runner_doiact_grav.h:1619
        #5 0x7ffc9d in runner_main /loc/pwda/pdraper/scratch/swift-tests/swiftsim-spare/src/runner.c:2477
        #6 0x2ab5bc7616b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
        #7 0x2ab5bedba41c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)
    
    0x2ab5d3d620dc is located 60 bytes to the right of 159463584-byte region [0x2ab5ca54e800,0x2ab5d3d620a0)
    allocated by thread T0 here:
        #0 0x2ab5bb886076 in __interceptor_posix_memalign (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x99076)
        #1 0x579ec9 in read_ic_single /loc/pwda/pdraper/scratch/swift-tests/swiftsim-spare/src/single_io.c:487
        #2 0x4071fd in main /loc/pwda/pdraper/scratch/swift-tests/swiftsim-spare/examples/main.c:763
        #3 0x2ab5becd382f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    
    Thread T46 created by T0 here:
        #0 0x2ab5bb823253 in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x36253)
        #1 0x51c8e8 in engine_config /loc/pwda/pdraper/scratch/swift-tests/swiftsim-spare/src/engine.c:6966
        #2 0x407bb9 in main /loc/pwda/pdraper/scratch/swift-tests/swiftsim-spare/examples/main.c:926
        #3 0x2ab5becd382f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    
    SUMMARY: AddressSanitizer: heap-buffer-overflow /loc/pwda/pdraper/scratch/swift-tests/swiftsim-spare/src/active.h:302 gpart_is_inhibited

    That is running the non-MPI EAGLE_6 test with:

    ./configure --with-metis --enable-debugging-checks --enable-debug --enable-sanitizer --enable-undefined-sanitizer --disable-vec

    So we are dying in the extra check:

        /* Check that the particle we interact with was not inhibited */
          if (gpart_is_inhibited(&gparts_j[pjd], e) && mass_j != 0.f)
            error("Inhibited particle used as gravity source.");
  • added 1 commit

    • 865284d3 - Correctly define 1 as an unsigned long long when bit shifting by 63.

    Compare with previous version

  • added 7 commits

    • b69a6ce9 - Only check the inhibited status of gparts if they are not dummy cache-adding particles.
    • da9bf1d6 - Added star formation task.
    • 3219828e - Collect the number of inhibited particles at the end of a time-step in the same way as the updates.
    • cca41277 - Make the gpart friend of inhibted particles a stand-alone particle.
    • 4413f001 - Move the temporary star formation law to the dedicated task.
    • 10f784f7 - Add a task action for the star formation task. Add the new engine policy's name string.
    • 89c0e00e - Set the correct task action for the long-range gravity task.

    Compare with previous version

  • added 1 commit

    • 1a60a352 - Make the temporary star formation law convert part into gparts.

    Compare with previous version

  • Thanks Peter. That was a more ambitious test than intended. The debugging check was indeed faulty. I have also now pushed a lot more code to get the examples to run.

    This can now be run by un-commenting the star formation policy in the main().

  • The sanitizer also brought up an unrelated error so that's also fixed here.

  • added 2 commits

    • 63504ce1 - Correct use of enum types for the Intel compiler.
    • 28a01f67 - Changes to the single i/o mechanism to correctly collect only the non-inhibited DM gparts.

    Compare with previous version

  • added 2 commits

    Compare with previous version

  • Matthieu Schaller added 11 commits

    added 11 commits

    Compare with previous version

  • Matthieu Schaller changed the description

    changed the description

  • Matthieu Schaller changed title from Add functions to permanetly remove particles from a simulation to Add functions to permanently remove particles from a simulation

    changed title from Add functions to permanetly remove particles from a simulation to Add functions to permanently remove particles from a simulation

  • Thanks, that has ran for a number of hours this time (with sanitizers and debugging checks enabled), so the non-MPI version looks good.

  • Thanks for confirming this!

  • Any foreseen dangers in me merging this?

1028 1028 for (int k = 0; k < gcount; k++) {
1029 1029 const double m = gparts[k].mass;
1030 1030
1031 #ifdef SWIFT_DEBUG_CHECKS
1032 if (gparts[k].time_bin == time_bin_inhibited)
1033 error("Inhibited particle in P2M. Should have been remvoed earlier.");
  • Some testing using MPI? What about this snapshot timing issue, could snapshots be only done immediately after a rebuild, for instance or would time then be too far from required.

  • added 1 commit

    Compare with previous version

  • Yes, that's more sensible. I'll fix the snapshot issue by fishing out the particles we want.

    MPI ought to work already but admittedly there was limited testing of the repartitioning/redistributing. I was going to tackle that as a whole when including the particle spawning feature.

  • OK, here is an MPI run error:

    [0003] [00728.4] part.c:part_verify_links():167: Linked particles are not at the same position!
    gp->x=[1.457784e-01 3.753363e+00 2.581094e+00] p->x=[3.347836e-01 3.875554e+00 2.642131e+00] diff=[-1.890052e-01 -1.221907e-01 -6.103649e-02]

    That was running EAGLE_6 with:

    mpirun -np 4 ../swift_mpi -c -s -G -S -t 2 eagle_6.yml

    configure:

    ./configure --with-metis --enable-debugging-checks --enable-debug --enable-sanitizer --enable-undefined-sanitizer --disable-vec

    Just after step 128 when some interesting things had all kicked off.

    
         127   1.278127e-02    0.9104352    0.0983759   1.637011e-07   43   43            1            1            0                19.507      0
    [0003] [00688.1] runner_do_star_formation: Removing particle id=6173617147839 rho=1.500975e+07
         128   1.278143e-02    0.9104457    0.0983631   1.637023e-07   43   50       800621      1661079        29847            108372.891      0
    [0000] [00727.8] engine_drift_all: Drifting all to a=9.104563e-01
    [0000] [00727.9] engine_repartition: repartitioning space
    [0003] [00728.4] part.c:part_verify_links():167: Linked particles are not at the same position!
    gp->x=[1.457784e-01 3.753363e+00 2.581094e+00] p->x=[3.347836e-01 3.875554e+00 2.642131e+00] diff=[-1.890052e-01 -1.221907e-01 -6.103649e-02]
    

    I'll start that again to see if it reproduces.

  • Ok, thanks. Looks like I have my weekend plans sorted now. :)

    That's likely the same thing as the snapshot re-ordering problem here but leading to a crash instead of silently dumping incorrect particles.

  • FYI: it did repeat!

  • Matthieu Schaller added 32 commits

    added 32 commits

    • 6725ee2d...f241e5ea - 30 commits from branch master
    • c22e4447 - Updated doxygen documentation of the MPI collection routines.
    • 768937bf - Merge branch 'master' into removing_particles

    Compare with previous version

  • added 1 commit

    • 245e0644 - Correct wrong address access in the macro of engine_redistribute_savelink_mapper_part()

    Compare with previous version

  • Before 245e0644 this piece of code triggered a seg-fault. I am surprised it never appeared before. The fix seems to do the right thing. What do you think?

    Edited by Matthieu Schaller
  • added 3 commits

    • f7d39bf9 - Remove inhibited particles at the start of the redistribute process.
    • bd4f4927 - When rebuilding just after a redistribute, skip the search for foreign particles
    • 2bc335b9 - Removed the debugging forced repart every 5 steps.

    Compare with previous version

  • These last three commits fix the issue. They also bring in a improvement: when rebuilding after a redistribute some sections of the space_rebuild() function can be safely skipped.

  • Thanks, that bug of mine certainly looks fixed. I take it that fixed the original problem with then revealed an issue with redistribution?

  • Yes, it did. I am actually surprised we never hit it before.

    Anyway, I still seem to have a bug as the sanitizer crashes somewhere in the depth of parallel-hdf5. Not sure yet whether it is a false positive or a genuine issue on my side.

  • OK, I see so that is no. With an effective offset of zero, the code just checked the same links over and over again. So although a bad check it should have no side-effects.

  • Seems we wrote over each other. I am surprised if that had a side-effect, it was reading valid memory, just the wrong part, it also modified nothing.

  • Let's hope the HDF5 issue is on our side.

  • added 1 commit

    • f4daa0f3 - Added debugging function to crash if trying to write an inhibited particle to a snapshot.

    Compare with previous version

  • added 1 commit

    • 33c9cc37 - Collect the number of inhibited particles in each space in engine_collect_end_of_step()

    Compare with previous version

  • added 1 commit

    • 05cacf92 - When dumping a snapshot, fish out the particles that have not been inhibited.

    Compare with previous version

  • added 2 commits

    • 0d30b977 - Also collect the particles to be written in the parallel versions.
    • 64058d51 - Code formatting

    Compare with previous version

  • added 1 commit

    • dd11e16d - Remove the debugging checks for the inhibited particles i/o.

    Compare with previous version

  • I fixed the i/o routines to fish out only the particles we actually want to dump. This now works for MPI i/o as well.

    The sanitizer crash is also in master... So it's not the branch updates.

    Running mpirun -np 4 ../swift_mpi -c -s -G -S -t 4 eagle_12.yml with ./configure --with-metis --enable-debugging-checks --enable-debug --enable-sanitizer --enable-undefined-sanitizer --disable-vec and the parallel_hdf5/1.10.3 module crashes. If you use parallel_hdf5/1.18.20 you are safe. Probably worth starting a separate discussion thread on this. I don't know yet whether there were some compilation/optimization differences in the way these two modules were generated or whether it is a genuine hdf5 bug.

  • I have deferred the discussion of the hdf5 problem to #480 (closed).

  • added 1 commit

    • 4427cce3 - Do not use collective meta-data writes in HDF5 1.10.x. There seems to be an issue with them.

    Compare with previous version

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading