|
|
Summary of Eurohack17
|
|
|
=====================
|
|
|
We were not able to profile our code using the MegaKernel™ due to CUDA limitations, therefore our work was focused on tasks.
|
|
|
|
|
|
All the speedup are given in comparison to the naive version and are from not fully optimized code.
|
|
|
|
|
|
What we have tried:
|
|
|
* Shared memory gives a speedup of about 3x (for self density with shared memory fitting the cell size)^1.
|
|
|
* Symmetry about 1.5x
|
|
|
* Sorted computation 70x
|
|
|
|
|
|
What tricks we have learn:
|
|
|
* Should avoid to have threads in wraps waiting (e.g. in loop `if i == j; continue`, increase manually i and avoid making the thread waiting on the others)
|
|
|
|
|
|
|
|
|
What we are working on:
|
|
|
* Shared memory with a smaller size than the cell size
|
|
|
|
|
|
|
|
|
![Screenshot_2017-09-25_14-15-12](/uploads/02029b0cb2021237c111f54b9a0e79c5/Screenshot_2017-09-25_14-15-12.png)
|
|
|
|
|
|
^1 Number from memory, therefore may not be exact |
|
|
\ No newline at end of file |