Split the drift into hydro and gravity drift tasks. Implement FFT task and dependencies
Changes:
- Two drift tasks, one pert particle type.
- FFT task with gravity ghost tasks for the dependencies.
It passes my usual tests but could you make sure it does not break anything ? That should be the last big change. The rest of gravity should be contained within the tasks themselves.
Merge request reports
Activity
Added 1 commit:
- 0a0a27c4 - Do not compile FFTW code if FFTW is not present.
Added 1 commit:
- 28123d7d - Also add the new task types to the partition decision making.
Didn't get far... Just built using:
./configure --with-metis --enable-debug --enable-debugging-checks
with GCC 5.4.0 and ran the standard SodShock test. That core dumped at:
Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000000000042767f in scheduler_activate (t=0x0, s=0x7ffd2634e8f0) at scheduler.h:116 116 if (atomic_cas(&t->skip, 1, 0)) { [Current thread is 1 (Thread 0x2acca6e9b5c0 (LWP 3743))] (gdb) bt #0 0x000000000042767f in scheduler_activate (t=0x0, s=0x7ffd2634e8f0) at scheduler.h:116 #1 engine_unskip (e=0x7ffd2634e8d0) at engine.c:3483 #2 engine_prepare (e=e@entry=0x7ffd2634e8d0) at engine.c:2831 #3 0x000000000042ae55 in engine_step (e=e@entry=0x7ffd2634e8d0) at engine.c:3349 #4 0x0000000000403bec in main (argc=<optimised out>, argv=<optimised out>) at main.c:628
Seems that
grav_top_level
is NULL at:#1 engine_unskip (e=0x7ffd2634e8d0) at engine.c:3483 3483 if (e->s->periodic) scheduler_activate(&e->sched, e->s->grav_top_level);
Added 1 commit:
- d92cb31d - Do not activate the FFT task in situations where it should not exist
Looks like there is another issue. Running the EAGLE_12 test case with debugging checks enabled I eventually see :
drift.h:drift_part():86: particle has not been drifted to the current time p->ti_drift=2597046464806912, c->ti_old=2594847441551360, ti_current=2597046464806912
the ticks vary between the reports as does the exact step (usually need at least a thousand steps, but can need a few thousand). Seen with GCC 5.4.0 and 4.8.5, with and without vectorisation.
Tried repeating it in master, but that didn't turn up the issue.
Added 7 commits:
-
d92cb31d...33476206 - 4 commits from branch
master
- 8d8be86e - Correctly initialise the engine in the FFT test.
- e56eaf3d - Merge branch 'master' into gravity_multi_dt
- 7d68821a - Reset the particles for every run in test125.
Toggle commit list-
d92cb31d...33476206 - 4 commits from branch
Here's a back trace from another occurrence of that drift bug:
#0 0x00002b6d77677428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00002b6d7767902a in __GI_abort () at abort.c:89 #2 0x000000000041b96e in drift_part (ti_current=1035739953364992, ti_old=1033540930109440, timeBase=6.9388939039072285e-20, dt=1.52587887e-07, xp=0x2b6dcc42db00, p=0x2b6db2287ac0) at drift.h:83 #3 cell_drift_part (c=0x2b6df5177e00, e=0x7ffd71592fe0) at cell.c:1567 #4 0x000000000041b737 in cell_drift_part (c=0x2b6ddc1d0e80, e=0x7ffd71592fe0) at cell.c:1548 #5 0x0000000000465f82 in runner_dosub_subset_density (r=0xc3fce0, ci=0x2b6de8a9c100, parts=0x2b6db1358ae0, ind=0x2b6f83774d40, count=6, cj=0x2b6ddc1d0e80, sid=-1, gettimer=0) at runner_doiact.h:3221 #6 0x0000000000461545 in runner_dosub_subset_density (r=0xc3fce0, ci=0x2b6dd5c5a700, parts=0x2b6db1358ae0, ind=0x2b6f83774d40, count=6, cj=0x2b6df9690080, sid=1, gettimer=1) at runner_doiact.h:2712 #7 0x000000000048d1f3 in runner_do_ghost (r=0xc3fce0, c=0x2b6ded363280, timer=0) at runner.c:768 #8 0x000000000048c4a5 in runner_do_ghost (r=0xc3fce0, c=0x2b6df0ea6e00, timer=0) at runner.c:640 #9 0x000000000048c4a5 in runner_do_ghost (r=0xc3fce0, c=0x2b6de8a9c100, timer=0) at runner.c:640 #10 0x000000000048c4a5 in runner_do_ghost (r=0xc3fce0, c=0x2b6dd5c5a700, timer=0) at runner.c:640 #11 0x000000000048c4a5 in runner_do_ghost (r=0xc3fce0, c=0x2b6e049ff780, timer=0) at runner.c:640 #12 0x000000000048c4a5 in runner_do_ghost (r=0xc3fce0, c=0x2b6dfcaef300, timer=1) at runner.c:640 #13 0x00000000004933b7 in runner_main (data=0xc3fce0) at runner.c:1879 #14 0x00002b6d7667d6ba in start_thread (arg=0x2b6dd0ccf700) at pthread_create.c:333 #15 0x00002b6d7774882d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
and a second back trace (different OS to above), which is quite different:
#0 0x00002ae40e6a41d7 in raise () from /lib64/libc.so.6 #1 0x00002ae40e6a58c8 in abort () from /lib64/libc.so.6 #2 0x0000000000420242 in drift_part (timeBase=<optimized out>, ti_current=<optimized out>, ti_old=1974722883485696, dt=<optimized out>, xp=<optimized out>, p=<optimized out>) at drift.h:83 #3 cell_drift_part (c=c@entry=0x2ae4a9297680, e=e@entry=0x7fff14f80e20) at cell.c:1567 #4 0x000000000041fc0c in cell_drift_part (c=0x2ae4893ab880, e=e@entry=0x7fff14f80e20) at cell.c:1548 #5 0x0000000000463227 in runner_dosub_pair1_density (r=0x22f2b30, ci=0x2ae474aa1000, cj=0x2ae4893ab880, sid=12, gettimer=<optimized out>) at runner_doiact.h:2283 #6 0x0000000000463ce7 in runner_dosub_pair1_density (r=0x22f2b30, ci=0x2ae494681080, cj=0x2ae478457f00, sid=12, gettimer=<optimized out>) at runner_doiact.h:2242 #7 0x0000000000463ce7 in runner_dosub_pair1_density (r=0x22f2b30, ci=0x2ae4843f9480, cj=0x2ae47496c800, sid=12, gettimer=<optimized out>) at runner_doiact.h:2242 #8 0x0000000000476754 in runner_main (data=0x22f2b30) at runner.c:1859 #9 0x00002ae40e45adc5 in start_thread () from /lib64/libpthread.so.0 #10 0x00002ae40e76673d in clone () from /lib64/libc.so.6
223 227 else { 228 224 229 lock_lock(&ci->lock); 225 if (ci->drift == NULL) 226 ci->drift = scheduler_addtask(s, task_type_drift, task_subtype_none, 227 0, 0, ci, NULL); 230 231 /* Drift gparts case */ 232 if (t->subtype == task_subtype_grav && ci->drift_gpart == NULL) 233 ci->drift_gpart = scheduler_addtask( 234 s, task_type_drift_gpart, task_subtype_none, 0, 0, ci, NULL); 235 236 /* Drift parts case */ 237 else if (t->subtype == task_subtype_density && ci->drift_part == NULL) 238 ci->drift_part = scheduler_addtask(s, task_type_drift_part, 239 task_subtype_none, 0, 0, ci, NULL); mentioned in issue #308 (closed)
Added 1 commit:
- d116018c - Implemented a first version of the splittask routine for gravity tasks.
Added 1 commit:
- d116018c - Implemented a first version of the splittask routine for gravity tasks.
Added 1 commit:
- f866d555 - Use task_subtype_none as subtype is expected
mentioned in issue #303 (closed)
Which also failed at various times with the same error. A larger MPI test reported:
[0001] [00642.6] drift.h:drift_part():86: particle has not been drifted to the current time p->ti_drift=5655887813279744, c->ti_old=5651489766768640, ti_current=5655887813279744 application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1 [0001] [00660.6] runner_doiact_vec.c:runner_dopair1_density_vec():668: particle shift diff exceeds dx_max_sort. application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1 [0000] [02753.9] runner_doiact_vec.c:runner_dopair1_density_vec():668: particle shift diff exceeds dx_max_sort. application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0 [0002] [01500.2] drift.h:drift_part():86: particle has not been drifted to the current time p->ti_drift=22940210601918464, c->ti_old=22799473113563136, ti_current=22940210601918464 application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2 [0002] [00154.2] drift.h:drift_part():86: particle has not been drifted to the current time p->ti_drift=870813209198592, c->ti_old=868614185943040, ti_current=870813209198592 application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2
which you'd guess is the same problem in a new guise. I'll accept this merge request for now as I don't think these are caused by this patch.
mentioned in commit e7c76035
mentioned in issue #26 (closed)