Skip to content

Rewait tasks can deadlock

Getting back to the topic of raciness (branch thread_safety, issue #58) we have a definite deadlock issue with the rewait tasks. Running:

swift_fixdt -t 1 -f sodShock.hdf5 -m 0.01 -w 5000 -c 0.01 -d 1e-7 -e 0.01

Many, many times will eventually deadlock with the following bt's:

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000436b6b in scheduler_start (s=s@entry=0x7fffe4d2d080, mask=mask@entry=32768, submask=submask@entry=0) at scheduler.c:1016
#2  0x0000000000430832 in engine_launch (e=e@entry=0x7fffe4d2d060, nr_runners=nr_runners@entry=1, mask=mask@entry=32768, submask=submask@entry=0) at engine.c:1494
#3  0x0000000000405cf8 in space_parts_sort (s=s@entry=0x7fffe4d2ce60, ind=ind@entry=0x2e711f0, N=1024128, min=min@entry=0, max=14399, verbose=verbose@entry=0) at space.c:585
#4  0x0000000000406ba8 in space_rebuild (s=0x7fffe4d2ce60, cell_max=cell_max@entry=0, verbose=0) at space.c:403
#5  0x000000000042f86f in engine_rebuild (e=e@entry=0x7fffe4d2d060) at engine.c:1307
#6  0x000000000042fa58 in engine_prepare (e=e@entry=0x7fffe4d2d060) at engine.c:1354
#7  0x0000000000430935 in engine_init_particles (e=e@entry=0x7fffe4d2d060) at engine.c:1525
#8  0x00000000004034f8 in main (argc=<optimised out>, argv=<optimised out>) at main.c:508
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000436ffa in scheduler_gettask (s=s@entry=0x7fffe4d2d080, qid=0, prev=0x0) at scheduler.c:1298
#2  0x00000000004293aa in runner_main (data=0x16cd060) at runner.c:982
#3  0x00002b5dffcb9182 in start_thread (arg=0x2b5e14cfa700) at pthread_create.c:312
#4  0x00002b5e00b8047d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

i.e. lines:

1014   pthread_mutex_lock(&s->sleep_mutex);
1015   while (s->waiting > waiting_old) {
1016     pthread_cond_wait(&s->sleep_cond, &s->sleep_mutex);

and

1297       pthread_mutex_lock(&s->sleep_mutex);
1298       if (s->waiting > 0) pthread_cond_wait(&s->sleep_cond, &s->sleep_mutex);
1299       pthread_mutex_unlock(&s->sleep_mutex);

Tom sees this effect much more often with the gravity tests...

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information