Replace calls to rand() by calls to rand_r() in the scheduler
Implements #302 (closed) using @d74ksy's changes.
Merge request reports
Activity
1305 1305 if (qid >= s->nr_queues) error("Bad computed qid."); 1306 1306 1307 1307 /* If no previous owner, pick a random queue. */ 1308 if (qid < 0) qid = rand_r((int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues; 1308 if (qid < 0) qid = rand_r((unsigned int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues; 1305 1305 if (qid >= s->nr_queues) error("Bad computed qid."); 1306 1306 1307 1307 /* If no previous owner, pick a random queue. */ 1308 if (qid < 0) qid = rand_r((int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues; 1308 if (qid < 0) qid = rand_r((unsigned int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues; 1305 1305 if (qid >= s->nr_queues) error("Bad computed qid."); 1306 1306 1307 1307 /* If no previous owner, pick a random queue. */ 1308 if (qid < 0) qid = rand_r((int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues; 1308 if (qid < 0) qid = rand_r((unsigned int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues; 1305 1305 if (qid >= s->nr_queues) error("Bad computed qid."); 1306 1306 1307 1307 /* If no previous owner, pick a random queue. */ 1308 if (qid < 0) qid = rand_r((int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues; 1308 if (qid < 0) qid = rand_r((unsigned int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues; From the pthread manual:
Performance and ease-of-use of pthread_getspecific() are critical for functions that rely on maintaining state in thread-specific data. Since no errors are required to be detected by it, and since the only error that could be detected is the use of an invalid key, the function to pthread_getspecific() has been designed to favor speed and simplicity over error reporting.
From stackoverflow (https://stackoverflow.com/questions/8489819/the-cost-of-thread-local) :
Storage space: size of the variable * number of threads, or possibly (sizeof(var) + sizeof(var*)) * number of threads.
There are two basic ways of implementing thread-local storage:
- Using some sort of system call that gets information about the current kernel thread. Sloooow.
- Using some pointer, probably in a processor register, that is set properly at every thread context switch by the kernel - at the same time as all the other registers. Cheap.
On intel platforms, variant 2 is usually implemented via some segment register (FS or GS, I don't remember). Both GCC and MSVC support this. Access times are therefore about as fast as for global variables.
I would expect that this is not an expensive call - TLS is used throughout OpenMP (built, usually, on pthreads) and noone ever considers private variables to be an issue.
Edited by Aidan Chalk
1305 1305 if (qid >= s->nr_queues) error("Bad computed qid."); 1306 1306 1307 1307 /* If no previous owner, pick a random queue. */ 1308 if (qid < 0) qid = rand_r((int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues; 1308 if (qid < 0) qid = rand_r((unsigned int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues; @jwillis when the special node is free, could you test whether this branch makes any difference compared to master ? Thanks!
I think thats probably the case, but we should probably repeat the test a few times if we want to be sure - if its consistently slower TTS on s single thread then that suggests there is something causing slowdown. Would be interested to see how its coping at high thread counts too (such as on KNL)
Reassigned to @matthieu
Which one is which ?
@d74ksy we currently have much more important issues with the KNL...
mentioned in commit cf58ff8b
mentioned in issue #302 (closed)
Ok. I think we have a problem here.
First of all, the bit of code is never called in @jwillis's test case simply because all the relevant task types in there have an assigned owner so they never try to get a random qid. And this is what "saves" us. With gravity on, some tasks don't have an owner assigned so they actually call the bit of code which leads to a crash. Basically
pthread_getspecific(s->local_seed_pointer)
returns 0 which causes a segfault in the call torand_r()
.That raises two questions:
- How do we fix it ?
- Is is expected that we never pass through that bit of code ? i.e. should we really list all the task types in the enum just above that call ?
runner.c:1659 probably needs to change to
unsigned int seed = r->id+1;
If that doesn't fix it i'll look in more detail, but "thread 0" will set its initial seed to 0 which would immediately cause this issue. I think this is not a NULL pointer since it doesn't seg fault but provides a pointer to 0.The second bit I don't know...
Thanks but that does not help:
ASAN:SIGSEGV ================================================================= ==24306==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f0d5db00f70 bp 0x7ffdcc9900c0 sp 0x7ffdcc98f858 T0) #0 0x7f0d5db00f6f (/lib/x86_64-linux-gnu/libc.so.6+0x3af6f) #1 0x7f0d5ee95795 (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x39795) #2 0x469ac6 in scheduler_enqueue /home/matthieu/Desktop/Swift-git/gravity/swiftsim/src/scheduler.c:1381 #3 0x476aa9 in scheduler_enqueue /home/matthieu/Desktop/Swift-git/gravity/swiftsim/src/scheduler.c:1234 #4 0x476aa9 in scheduler_enqueue_mapper /home/matthieu/Desktop/Swift-git/gravity/swiftsim/src/scheduler.c:1134 #5 0x476aa9 in scheduler_start /home/matthieu/Desktop/Swift-git/gravity/swiftsim/src/scheduler.c:1237 #6 0x45b109 in engine_launch /home/matthieu/Desktop/Swift-git/gravity/swiftsim/src/engine.c:3310 #7 0x45bb80 in engine_init_particles /home/matthieu/Desktop/Swift-git/gravity/swiftsim/src/engine.c:3369 #8 0x4055a4 in main /home/matthieu/Desktop/Swift-git/gravity/swiftsim/examples/main.c:676 #9 0x7f0d5dae682f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f) #10 0x405fd8 in _start (/home/matthieu/Desktop/Swift-git/gravity/swiftsim/examples/swift+0x405fd8) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV ??:0 ?? ==24306==ABORTING
Send me a testcase and I'll have a look tomorrow, means somehow a runner is avoiding pthread_setspecific which i wouldn't have thought it could do. Or I missed a bit of code probably.
Edited by Aidan ChalkI am as puzzled as you are.
See #355 (closed). In short:
- Use the gravity_testing branch
- Go to
examples/self_gravity_testing/particle_line/
- Run
python makeIC.py 2 1
- Run
../../swift -G -e -t 1 particle_line_theta_0p7.yml
Crashes straight away in the 0th step.
Put a quick fix based on what pedro said in the meeting in 8eb9db15, I'll test it later on today.
Edited by Aidan ChalkOk, hopefully fixed in 8d0d1871, i've not tested it yet, will do shortly.
Ok, i believe it works now, the test you told me to run:
27732 9.999390e+01 6.103516e-03 0 1 0 0.054 [00018.2] main: done. Bye.
I guess we need @jwillis to test the performance if there's a testcase large enough to use?