Skip to content
Snippets Groups Projects

Replace calls to rand() by calls to rand_r() in the scheduler

Merged Matthieu Schaller requested to merge rand_fix into master

Implements #302 (closed) using @d74ksy's changes.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
1305 1305 if (qid >= s->nr_queues) error("Bad computed qid.");
1306 1306
1307 1307 /* If no previous owner, pick a random queue. */
1308 if (qid < 0) qid = rand_r((int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues;
1308 if (qid < 0) qid = rand_r((unsigned int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues;
  • For this case, would it be sufficient to do rand_r((size_t)t) to get a sufficiently random value? Or just pass in a seed as an additional argument?

    I'm just wondering how fast pthread_getspecific is.

  • Matthieu Schaller
    Matthieu Schaller @matthieu started a thread on commit 25721d42
  • 1305 1305 if (qid >= s->nr_queues) error("Bad computed qid.");
    1306 1306
    1307 1307 /* If no previous owner, pick a random queue. */
    1308 if (qid < 0) qid = rand_r((int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues;
    1308 if (qid < 0) qid = rand_r((unsigned int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues;
  • Pedro Gonnet
    Pedro Gonnet @nnrw56 started a thread on commit 25721d42
  • 1305 1305 if (qid >= s->nr_queues) error("Bad computed qid.");
    1306 1306
    1307 1307 /* If no previous owner, pick a random queue. */
    1308 if (qid < 0) qid = rand_r((int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues;
    1308 if (qid < 0) qid = rand_r((unsigned int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues;
  • Aidan Chalk
    Aidan Chalk @d74ksy started a thread on commit 25721d42
  • 1305 1305 if (qid >= s->nr_queues) error("Bad computed qid.");
    1306 1306
    1307 1307 /* If no previous owner, pick a random queue. */
    1308 if (qid < 0) qid = rand_r((int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues;
    1308 if (qid < 0) qid = rand_r((unsigned int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues;
    • From the pthread manual:

      Performance and ease-of-use of pthread_getspecific() are critical for functions that rely on maintaining state in thread-specific data. Since no errors are required to be detected by it, and since the only error that could be detected is the use of an invalid key, the function to pthread_getspecific() has been designed to favor speed and simplicity over error reporting.

      From stackoverflow (https://stackoverflow.com/questions/8489819/the-cost-of-thread-local) :

      Storage space: size of the variable * number of threads, or possibly (sizeof(var) + sizeof(var*)) * number of threads.

      There are two basic ways of implementing thread-local storage:

      1. Using some sort of system call that gets information about the current kernel thread. Sloooow.
      2. Using some pointer, probably in a processor register, that is set properly at every thread context switch by the kernel - at the same time as all the other registers. Cheap.

      On intel platforms, variant 2 is usually implemented via some segment register (FS or GS, I don't remember). Both GCC and MSVC support this. Access times are therefore about as fast as for global variables.

      I would expect that this is not an expensive call - TLS is used throughout OpenMP (built, usually, on pthreads) and noone ever considers private variables to be an issue.

      Edited by Aidan Chalk
  • Matthieu Schaller
    Matthieu Schaller @matthieu started a thread on commit 25721d42
  • 1305 1305 if (qid >= s->nr_queues) error("Bad computed qid.");
    1306 1306
    1307 1307 /* If no previous owner, pick a random queue. */
    1308 if (qid < 0) qid = rand_r((int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues;
    1308 if (qid < 0) qid = rand_r((unsigned int*)pthread_getspecific(s->local_seed_pointer) ) % s->nr_queues;
  • @jwillis when the special node is free, could you test whether this branch makes any difference compared to master ? Thanks!

  • Screenshot-45

    The scaling is better, but the time to solution of rand_fix appears to be longer: 764558.476 vs 758665.331 on 16 threads.

  • From your experience running all these, would you trust times at the <1% level ?

  • Did you merge the master into this branch? Its a 4 month old branch and we made a lot of other improvements in that time right?

  • Yes, it was rebased to the current master. The "12 top level cells" line on the plot is the current unaltered master.

  • How much different is the runtime on 1 thread? It would be strange to me if this is a repeatable pattern, since if TLS is that expensive then I wonder how OpenMP does it efficiently.

  • My impression here is that the old rand() vs. the new rand_r() with TLS makes no difference to the overall run time. It would just be better practise to use your version.

  • I think thats probably the case, but we should probably repeat the test a few times if we want to be sure - if its consistently slower TTS on s single thread then that suggests there is something causing slowdown. Would be interested to see how its coping at high thread counts too (such as on KNL)

  • I would say that the difference in the times is negligible.

  • Reassigned to @matthieu

  • 1 thread times: 8074326.379 vs 7978294.673

  • Which one is which ?

    @d74ksy we currently have much more important issues with the KNL...

  • rand_fix vs master

  • Matthieu Schaller Status changed to merged

    Status changed to merged

  • mentioned in commit cf58ff8b

  • mentioned in issue #302 (closed)

  • Ok. I think we have a problem here.

    First of all, the bit of code is never called in @jwillis's test case simply because all the relevant task types in there have an assigned owner so they never try to get a random qid. And this is what "saves" us. With gravity on, some tasks don't have an owner assigned so they actually call the bit of code which leads to a crash. Basically pthread_getspecific(s->local_seed_pointer) returns 0 which causes a segfault in the call to rand_r().

    That raises two questions:

    • How do we fix it ?
    • Is is expected that we never pass through that bit of code ? i.e. should we really list all the task types in the enum just above that call ?
  • runner.c:1659 probably needs to change to unsigned int seed = r->id+1; If that doesn't fix it i'll look in more detail, but "thread 0" will set its initial seed to 0 which would immediately cause this issue. I think this is not a NULL pointer since it doesn't seg fault but provides a pointer to 0.

    The second bit I don't know...

  • Thanks but that does not help:

    ASAN:SIGSEGV
    =================================================================
    ==24306==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f0d5db00f70 bp 0x7ffdcc9900c0 sp 0x7ffdcc98f858 T0)
        #0 0x7f0d5db00f6f  (/lib/x86_64-linux-gnu/libc.so.6+0x3af6f)
        #1 0x7f0d5ee95795  (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x39795)
        #2 0x469ac6 in scheduler_enqueue /home/matthieu/Desktop/Swift-git/gravity/swiftsim/src/scheduler.c:1381
        #3 0x476aa9 in scheduler_enqueue /home/matthieu/Desktop/Swift-git/gravity/swiftsim/src/scheduler.c:1234
        #4 0x476aa9 in scheduler_enqueue_mapper /home/matthieu/Desktop/Swift-git/gravity/swiftsim/src/scheduler.c:1134
        #5 0x476aa9 in scheduler_start /home/matthieu/Desktop/Swift-git/gravity/swiftsim/src/scheduler.c:1237
        #6 0x45b109 in engine_launch /home/matthieu/Desktop/Swift-git/gravity/swiftsim/src/engine.c:3310
        #7 0x45bb80 in engine_init_particles /home/matthieu/Desktop/Swift-git/gravity/swiftsim/src/engine.c:3369
        #8 0x4055a4 in main /home/matthieu/Desktop/Swift-git/gravity/swiftsim/examples/main.c:676
        #9 0x7f0d5dae682f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
        #10 0x405fd8 in _start (/home/matthieu/Desktop/Swift-git/gravity/swiftsim/examples/swift+0x405fd8)
    
    AddressSanitizer can not provide additional info.
    SUMMARY: AddressSanitizer: SEGV ??:0 ??
    ==24306==ABORTING
  • Send me a testcase and I'll have a look tomorrow, means somehow a runner is avoiding pthread_setspecific which i wouldn't have thought it could do. Or I missed a bit of code probably.

    Edited by Aidan Chalk
  • I am as puzzled as you are.

    See #355 (closed). In short:

    • Use the gravity_testing branch
    • Go to examples/self_gravity_testing/particle_line/
    • Run python makeIC.py 2 1
    • Run ../../swift -G -e -t 1 particle_line_theta_0p7.yml

    Crashes straight away in the 0th step.

  • Put a quick fix based on what pedro said in the meeting in 8eb9db15, I'll test it later on today.

    Edited by Aidan Chalk
  • I think that fix probably won't work as the enqueueing happens before the runners will be freed? Do the main runners do the enqueueing initially or is it other pthreads?

  • It does work.

  • Oh actually no. :cry:

    The enqueuing is done by the threadpool threads.

  • Implementing a (probably inelegant but minimal) fix so it works for threadpool threads too. I think there is an issue if the code is single threaded, so I will change how it is accessed also.

  • The main thread is used in the threadpool now right?

  • Ok, hopefully fixed in 8d0d1871, i've not tested it yet, will do shortly.

  • Ok, i believe it works now, the test you told me to run:

     27732   9.999390e+01   6.103516e-03          0          1          0                 0.054
    [00018.2] main: done. Bye.

    I guess we need @jwillis to test the performance if there's a testcase large enough to use?

  • Given that this bit of code is not used in any of our tests, not sure we can say anything about the performance...

    Can we open a separate merge request just for the sake of it ?

  • Please register or sign in to reply
    Loading