DOPAIR2 is wrong
While debugging the GPU version with @d74ksy we realised that DOPAIR2 is wrong.
The way we compute the loop start and end on the axis is incorrect as it does not match the search condition. The other scary aspect is that does not get caught by any of our unit tests. These are too nice in their particle distributions to trigger that bug. Moreover, the error we make is small (as most of the contribution comes from the SELF).