Skip to content
Snippets Groups Projects

Sink : bugs fixing

Merged Darwin requested to merge gear_sink_imf_sampling_merged into master

All known bugs that lead to crashes are solved.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Darwin added GEAR bug labels

    added GEAR bug labels

  • Darwin requested review from @yvesrevaz

    requested review from @yvesrevaz

  • Darwin added 1 commit

    added 1 commit

    Compare with previous version

  • Darwin added 1 commit

    added 1 commit

    Compare with previous version

  • Darwin added 1 commit

    added 1 commit

    Compare with previous version

  • Darwin added 1 commit

    added 1 commit

    Compare with previous version

  • Darwin added 1 commit

    added 1 commit

    Compare with previous version

  • Note that this has the problem with the gravity caches growth. As @pdraper mentioned in #873 (closed) this growth problem is not in master. So we should understand what triggers it here.

  • To be clear its not a problem specific to this branch, but so far, its only this branch that triggered the problem. I think we will need a bit of guidance to get this problem solved. I will check with @Roduit to see how much time he can dedicate to the problem, as we are approaching the end of the semester. @matthieu I pushed Darwin to submit a merge request to avoid divergences with the master in the future. Something its allays paint-full to deal with. But I guess now that the branch will never be merged before this bug will be fixed, right ?

  • We need to see.

    Master never reallocates the caches. Here some code addition reallocates the caches and then the code crashes. What I had not realised is that there are changes in the gravity code itself here. The reallocation feature is new.

    So either the reallocation is the problem or something else is "broken" and the next memory transaction fails.

    One option would be to take the master code and change it to free/reallocate the caches after every gravity call. That should be an extreme version of the memory stress we get with this branch. If normal simulations run in such a configuration then it tells us that the problem is with some other memory operations related to the sinks. Likely their creation or the addition of stars. That latter part could also be tested with the GEAR code without sinks since this one also spawns a lot of stars (but no sinks).

  • Darwin added 5 commits

    added 5 commits

    Compare with previous version

  • Darwin added 3 commits

    added 3 commits

    Compare with previous version

  • @matthieu thanks for the clarification. I was not aware of these additional lines in gravity_cache.h that are only present in the sink branch. So, just to be 100%, the first test would be to force the cache allocation at every timesteps, i.e. moving from

      /* Do we need to grow the cache? */
      if (c->count < gcount_padded) gravity_cache_init(c, gcount_padded + VEC_SIZE);

    to

      /* Do we need to grow the cache? */
      if (1) gravity_cache_init(c, gcount_padded + VEC_SIZE);
  • Something like that and then run without sinks.

  • Author Developer

    I've launched the first test @matthieu suggested (without sink) on the usual homogeneous box test we've been using until now. I have also launched the second test with GEAR star formation scheme. I will let them run for 10 hours. With the sinks, the problem happened far earlier than that.

  • Author Developer

    So, the test to free/reallocate the caches after every gravity call crashes quickly (at the 20th timestep). I ran it again with the sanitizer to get more information (since the error was just corrupted double-linked list without backtrace). Here's what I get (after some path cleaning, if you need more, don't hesitate to tell me) :

    ==667452==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61900438c740 at pc 0x7ff1ee485dae bp 0x7fef87a359a0 sp 0x7fef87a35150
    WRITE of size 1408 at 0x61900438c740 thread T168
        #0 0x7ff1ee485dad in __interceptor_memset /.../libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:799
        #1 0x7eb60b in gravity_cache_zero_output /.../swiftsim_grav_master_test/src/gravity_cache.h:173
        #2 0x80be72 in runner_dopair_grav_pp_truncated_no_cache /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:394
        #3 0x825ac6 in runner_dopair_grav_pp_no_cache /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:1477
        #4 0x825749 in runner_dopair_grav_pp_no_cache /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:1462
        #5 0x82b82e in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2300
        #6 0x82bd07 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2356
        #7 0x82bb42 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2334
        #8 0x82bd07 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2356
        #9 0x82bb42 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2334
        #10 0x82bd07 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2356
        #11 0x82bd07 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2356
        #12 0x82bb42 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2334
        #13 0x5fb2cf in runner_main /.../swiftsim_grav_master_test/src/runner_main.c:258
        #14 0x7ff1ec082179 in start_thread (/lib64/libpthread.so.0+0x8179)
        #15 0x7ff1eba2fdf2 in __GI___clone (/lib64/libc.so.6+0xfcdf2)
    
    0x61900438c740 is located 0 bytes to the right of 960-byte region [0x61900438c380,0x61900438c740)
    allocated by thread T168 here:
        #0 0x7ff1ee4fbeec in __interceptor_posix_memalign /.../libsanitizer/asan/asan_malloc_linux.cpp:226
        #1 0x7eb1b5 in swift_memalign /.../swiftsim_grav_master_test/src/memuse.h:78
        #2 0x7eb1b5 in gravity_cache_init /.../swiftsim_grav_master_test/src/gravity_cache.h:132
        #3 0x7eb717 in gravity_cache_populate /.../swiftsim_grav_master_test/src/gravity_cache.h:217
        #4 0x823fb9 in runner_dopair_grav_pp /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:1277
        #5 0x82b9b2 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2315
        #6 0x82c0b5 in runner_doself_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2417
        #7 0x82bfe6 in runner_doself_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2412
        #8 0x82bfe6 in runner_doself_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2412
        #9 0x5faae4 in runner_main /.../swiftsim_grav_master_test/src/runner_main.c:208
        #10 0x7ff1ec082179 in start_thread (/lib64/libpthread.so.0+0x8179)
    
    Thread T168 created by T0 here:
        #0 0x7ff1ee4a3246 in __interceptor_pthread_create /.../libsanitizer/asan/asan_interceptors.cpp:216
        #1 0x4e5a16 in engine_config /home/roduit/scratch/HomogeneousBox.l8cooling/swiftsim_grav_master_test/src/engine_config.c:926
        #2 0x415fdf in main /.../swiftsim_grav_master_test/swift.c:1546
        #3 0x7ff1eb956492 in __libc_start_main (/lib64/libc.so.6+0x23492)
    
    SUMMARY: AddressSanitizer: heap-buffer-overflow /.../libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:799 in __interceptor_memset
  • Do we have a reproducible example? How to configure, compile, and what to run so this error is caught?

    We're running a debugging workshop currently in Durham, this would be a good issue for me to look into this afternoon with various sanitizers. The workshop also gives me a good excuse to spend the afternoon on this.

    Edited by Mladen Ivkovic
    • Author Developer
      Resolved by Mladen Ivkovic

      Here's the link to reproducible example's files. The link should be accessible to all. The readme contains information for configuration and compilation. The branch is swiftsim_grav_master_test.

      Edit: The true branch name is grav_cache_test.

      Edited by Darwin
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading