Skip to content

Profiling/minikernel implementation of self/pair + looking into improvements

From where @lhausammann and @jborrow started at the hackathon I think this is one of the main things to look into w.r.t GPU performance currently.

Current things to experiment with (probably in the order we should do them) :

  • Shared memory usage ( @lhausammann started looking into this and thinks its beneficial )
  • Sorted/unsorted pair interactions ( @lhausammann also started looking into this)
  • Profiling of minikernels and megakernels ( @jborrow started looking at this)
  • Subcelling of large self or pair interactions (similar to the CPU). This is a major bottleneck for the SodShock at current I believe, we interact cells with thousands of particles together with a naive n^2 approach, which results in a significant performance loss vs CPU.

My preferred route to do this would be to have someone else experiment with the first 2 from the above list and write short (few pages?) reports detailing what the findings were in terms of what works well and doesn't work well. Once we have these I can port the improvements back into the megakernel and see if they're beneficial while the last 2 points are started on the minikernels.

@matthieu does this sound reasonable?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information