Resource reuse
Actually also contains corrections to the paper since I'm still too dumb do use git
correctly...
Merge request reports
Activity
Added 1 commit:
- 8cfb2b41 - fix union.
I see we added a data pointer and size to qsched_addres / struct res - are these actually used anywhere or are they just to make the representation clearer (and less distinct from the MPI version)?
On GTX690 the BH (1M randomly uniformly distributed particles) with this version takes 1828.205ms with 4 threads, whilst the master branch takes 1797.659ms. With 2 threads the new version takes 3594.454 ms whilst the master branch takes 3590.812ms. I'll test it on 64 cores after the meeting.
Added 1 commit:
- 6ef308ab - Minor fix to the QR
Ok - I'm not totally convinced by this, though I don't think the error is due to these changes.
If I run:
./test_bh -n 1000000 -t 2 on 64cores, then the code successfully executed with no issues.
However if I run:
./test_bh -n 1000000 -t 32 on 64cores, then the code crashes with a Floating point exception. I'm not sure why this would occur only with more threads...
I just checked GDB
Program received signal SIGFPE, Arithmetic exception. [Switching to Thread 0x7fffcb9a6700 (LWP 8281)] 0x000000000040d1b2 in queue_task_overlap () (gdb) where #0 0x000000000040d1b2 in queue_task_overlap () #1 0x000000000040d7fd in queue_get () #2 0x000000000040a5d2 in qsched_gettask () #3 0x000000000040ab45 in qsched_pthread_run () #4 0x0000003829e079d1 in start_thread (arg=0x7fffcb9a6700) at pthread_create.c:301 #5 0x0000003829ae89dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Looking at it, if
res_union == res_isect
then
return ((float)res_isect) / (res_union - res_isect);
will divide by 0 and crash.
I think we want the task with the most overlap to the prior task, so I think
if (nr_res_a == 0 || nr_res_b == 0) return nr_res_a == nr_res_b ? 1.0f : 0.0f;
should be
if (nr_res_a == 0 || nr_res_b == 0) return 0.0f;
and
if (res_union == 0) return 1.0f;
should be
if (res_union == 0) return 0.0f;
Finally before
return ((float)res_isect) / (res_union - res_isect);
we should check
if(res_union == res_isect) return 1.0f;
Ok - I can't debug it with gcc-5.2 for some reason (it won't find the debug information inside the pthreads inside the library, though it finds the rest fine (including for pthreads in the test_bh.c code))
With 4.4 I have done, and by compiling with -O0, I did find:
res_union == res_isect
And I'm pretty confident that it is due to:
if (ra->data <= rb->data && rb->data < ra->data + ra->size) { if (rb->data + rb->size < ra->data + ra->size) res_isect += rb->size; else res_isect += ra->data + ra->size - rb->data; } else if (rb->data <= ra->data && ra->data < rb->data + rb->size) { if (ra->data + ra->size < rb->data + rb->size) res_isect += ra->size; else res_isect += rb->data + rb->size - ra->data; }
Now if I check
(gdb) print res_a[0].data <= res_b[0].data && res_b[0].data < res_a[0].data + res_a[0].size $12 = 1 (gdb) print res_a[0].data <= res_b[1].data && res_b[1].data < res_a[0].data + res_a[0].size $13 = 1
so for both of res_b's 2 resources,
res_a[0]
is inside it.The same is also true for
res_a[1]
.Since
(gdb) print res_a[0]->size $4 = 90720 (gdb) print res_a[1]->size $5 = 90720 (gdb) print res_b[0]->size $8 = 90720 (gdb) print res_b[1]->size $9 = 90720
this results in
res_isect == res_union
, as we add 90720 tores_isect
4 times.Ok - I think its not in the resources/uses sorting.
(gdb) print s->res[tb->locks[0]] $7 = {lock = 1, hold = 0, owner = 11, parent = 32774, data = 0x7fffc8f93360, size = 94512} (gdb) print s->res[tb->locks[1]] $8 = {lock = 0, hold = 0, owner = 11, parent = 32774, data = 0x7fffc8fc18c0, size = 97296} (gdb) print s->res[ta->locks[1]] $9 = {lock = 0, hold = 0, owner = 11, parent = 32773, data = 0x7fffc8f659d0, size = 94608} (gdb) print s->res[ta->locks[0]] $10 = {lock = 0, hold = 0, owner = 11, parent = 32773, data = 0x7fffc8f38bb0, size = 93840} (gdb) print res_isect $11 = 398016 (gdb) print res_union $12 = 398016 (gdb) print res_a[0] $13 = (struct res *) 0x7fffc5bb6010 (gdb) print *res_a[0] $14 = {lock = 1, hold = 0, owner = 2, parent = 4693, data = 0x7fffc6d16240, size = 99504}
notably
(gdb) print *res_a[0] $14 = {lock = 1, hold = 0, owner = 2, parent = 4693, data = 0x7fffc6d16240, size = 99504}
is neither
(gdb) print s->res[ta->locks[1]] $9 = {lock = 0, hold = 0, owner = 11, parent = 32773, data = 0x7fffc8f659d0, size = 94608} (gdb) print s->res[ta->locks[0]] $10 = {lock = 0, hold = 0, owner = 11, parent = 32773, data = 0x7fffc8f38bb0, size = 93840}
Added 1 commit:
- be009de7 - Potential fix to the queue_task_overlap function