Sink : bugs fixing
All known bugs that lead to crashes are solved.
Merge request reports
Activity
requested review from @yvesrevaz
Note that this has the problem with the gravity caches growth. As @pdraper mentioned in #873 (closed) this growth problem is not in master. So we should understand what triggers it here.
To be clear its not a problem specific to this branch, but so far, its only this branch that triggered the problem. I think we will need a bit of guidance to get this problem solved. I will check with @Roduit to see how much time he can dedicate to the problem, as we are approaching the end of the semester. @matthieu I pushed Darwin to submit a merge request to avoid divergences with the master in the future. Something its allays paint-full to deal with. But I guess now that the branch will never be merged before this bug will be fixed, right ?
We need to see.
Master never reallocates the caches. Here some code addition reallocates the caches and then the code crashes. What I had not realised is that there are changes in the gravity code itself here. The reallocation feature is new.
So either the reallocation is the problem or something else is "broken" and the next memory transaction fails.
One option would be to take the master code and change it to free/reallocate the caches after every gravity call. That should be an extreme version of the memory stress we get with this branch. If normal simulations run in such a configuration then it tells us that the problem is with some other memory operations related to the sinks. Likely their creation or the addition of stars. That latter part could also be tested with the GEAR code without sinks since this one also spawns a lot of stars (but no sinks).
added 3 commits
-
b8b5815d...24674a02 - 2 commits from branch
master
- 2ece5fd6 - Merge with master
-
b8b5815d...24674a02 - 2 commits from branch
@matthieu thanks for the clarification. I was not aware of these additional lines in
gravity_cache.h
that are only present in the sink branch. So, just to be 100%, the first test would be to force the cache allocation at every timesteps, i.e. moving from/* Do we need to grow the cache? */ if (c->count < gcount_padded) gravity_cache_init(c, gcount_padded + VEC_SIZE);
to
/* Do we need to grow the cache? */ if (1) gravity_cache_init(c, gcount_padded + VEC_SIZE);
I've launched the first test @matthieu suggested (without sink) on the usual homogeneous box test we've been using until now. I have also launched the second test with GEAR star formation scheme. I will let them run for 10 hours. With the sinks, the problem happened far earlier than that.
So, the test to free/reallocate the caches after every gravity call crashes quickly (at the 20th timestep). I ran it again with the sanitizer to get more information (since the error was just
corrupted double-linked list
without backtrace). Here's what I get (after some path cleaning, if you need more, don't hesitate to tell me) :==667452==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61900438c740 at pc 0x7ff1ee485dae bp 0x7fef87a359a0 sp 0x7fef87a35150 WRITE of size 1408 at 0x61900438c740 thread T168 #0 0x7ff1ee485dad in __interceptor_memset /.../libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:799 #1 0x7eb60b in gravity_cache_zero_output /.../swiftsim_grav_master_test/src/gravity_cache.h:173 #2 0x80be72 in runner_dopair_grav_pp_truncated_no_cache /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:394 #3 0x825ac6 in runner_dopair_grav_pp_no_cache /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:1477 #4 0x825749 in runner_dopair_grav_pp_no_cache /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:1462 #5 0x82b82e in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2300 #6 0x82bd07 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2356 #7 0x82bb42 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2334 #8 0x82bd07 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2356 #9 0x82bb42 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2334 #10 0x82bd07 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2356 #11 0x82bd07 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2356 #12 0x82bb42 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2334 #13 0x5fb2cf in runner_main /.../swiftsim_grav_master_test/src/runner_main.c:258 #14 0x7ff1ec082179 in start_thread (/lib64/libpthread.so.0+0x8179) #15 0x7ff1eba2fdf2 in __GI___clone (/lib64/libc.so.6+0xfcdf2) 0x61900438c740 is located 0 bytes to the right of 960-byte region [0x61900438c380,0x61900438c740) allocated by thread T168 here: #0 0x7ff1ee4fbeec in __interceptor_posix_memalign /.../libsanitizer/asan/asan_malloc_linux.cpp:226 #1 0x7eb1b5 in swift_memalign /.../swiftsim_grav_master_test/src/memuse.h:78 #2 0x7eb1b5 in gravity_cache_init /.../swiftsim_grav_master_test/src/gravity_cache.h:132 #3 0x7eb717 in gravity_cache_populate /.../swiftsim_grav_master_test/src/gravity_cache.h:217 #4 0x823fb9 in runner_dopair_grav_pp /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:1277 #5 0x82b9b2 in runner_dopair_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2315 #6 0x82c0b5 in runner_doself_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2417 #7 0x82bfe6 in runner_doself_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2412 #8 0x82bfe6 in runner_doself_recursive_grav /.../swiftsim_grav_master_test/src/runner_doiact_grav.c:2412 #9 0x5faae4 in runner_main /.../swiftsim_grav_master_test/src/runner_main.c:208 #10 0x7ff1ec082179 in start_thread (/lib64/libpthread.so.0+0x8179) Thread T168 created by T0 here: #0 0x7ff1ee4a3246 in __interceptor_pthread_create /.../libsanitizer/asan/asan_interceptors.cpp:216 #1 0x4e5a16 in engine_config /home/roduit/scratch/HomogeneousBox.l8cooling/swiftsim_grav_master_test/src/engine_config.c:926 #2 0x415fdf in main /.../swiftsim_grav_master_test/swift.c:1546 #3 0x7ff1eb956492 in __libc_start_main (/lib64/libc.so.6+0x23492) SUMMARY: AddressSanitizer: heap-buffer-overflow /.../libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:799 in __interceptor_memset
- Resolved by Matthieu Schaller
Is that with or without the GEAR SF on?
Do we have a reproducible example? How to configure, compile, and what to run so this error is caught?
We're running a debugging workshop currently in Durham, this would be a good issue for me to look into this afternoon with various sanitizers. The workshop also gives me a good excuse to spend the afternoon on this.
Edited by Mladen Ivkovic- Resolved by Matthieu Schaller
It is just with hydro and gravity. Darwin is preparing some material to reproduce the bug. @mivkov some hope you can feed your student with this problem later on today.
- Resolved by Mladen Ivkovic