Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • SWIFTsim SWIFTsim
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 57
    • Issues 57
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 21
    • Merge requests 21
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • SWIFT
  • SWIFTsimSWIFTsim
  • Issues
  • #125
Closed
Open
Issue created Mar 17, 2016 by Peter W. Draper@pdraperOwner

Rewait tasks can deadlock

Getting back to the topic of raciness (branch thread_safety, issue #58) we have a definite deadlock issue with the rewait tasks. Running:

swift_fixdt -t 1 -f sodShock.hdf5 -m 0.01 -w 5000 -c 0.01 -d 1e-7 -e 0.01

Many, many times will eventually deadlock with the following bt's:

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000436b6b in scheduler_start (s=s@entry=0x7fffe4d2d080, mask=mask@entry=32768, submask=submask@entry=0) at scheduler.c:1016
#2  0x0000000000430832 in engine_launch (e=e@entry=0x7fffe4d2d060, nr_runners=nr_runners@entry=1, mask=mask@entry=32768, submask=submask@entry=0) at engine.c:1494
#3  0x0000000000405cf8 in space_parts_sort (s=s@entry=0x7fffe4d2ce60, ind=ind@entry=0x2e711f0, N=1024128, min=min@entry=0, max=14399, verbose=verbose@entry=0) at space.c:585
#4  0x0000000000406ba8 in space_rebuild (s=0x7fffe4d2ce60, cell_max=cell_max@entry=0, verbose=0) at space.c:403
#5  0x000000000042f86f in engine_rebuild (e=e@entry=0x7fffe4d2d060) at engine.c:1307
#6  0x000000000042fa58 in engine_prepare (e=e@entry=0x7fffe4d2d060) at engine.c:1354
#7  0x0000000000430935 in engine_init_particles (e=e@entry=0x7fffe4d2d060) at engine.c:1525
#8  0x00000000004034f8 in main (argc=<optimised out>, argv=<optimised out>) at main.c:508
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000436ffa in scheduler_gettask (s=s@entry=0x7fffe4d2d080, qid=0, prev=0x0) at scheduler.c:1298
#2  0x00000000004293aa in runner_main (data=0x16cd060) at runner.c:982
#3  0x00002b5dffcb9182 in start_thread (arg=0x2b5e14cfa700) at pthread_create.c:312
#4  0x00002b5e00b8047d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

i.e. lines:

1014   pthread_mutex_lock(&s->sleep_mutex);
1015   while (s->waiting > waiting_old) {
1016     pthread_cond_wait(&s->sleep_cond, &s->sleep_mutex);

and

1297       pthread_mutex_lock(&s->sleep_mutex);
1298       if (s->waiting > 0) pthread_cond_wait(&s->sleep_cond, &s->sleep_mutex);
1299       pthread_mutex_unlock(&s->sleep_mutex);

Tom sees this effect much more often with the gravity tests...

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking