Subsize
Use separate parameters to determine when a pair or self interaction should be made a sub-cell task.
Merge request reports
Activity
So the logic behind the recursion condition is that if the self interaction is split, the resulting pairs are still "normal" pair interactions, i.e. the maximum smoothing length will not exceed the cell edge length.
But before submitting, I'd like to know what the best values are for 16 cores. Since we're running on all cores, there's no need to do this on our special machines, so you could start several runs at the same time :)
That's the bit I was wondering about.
If we have a self that can be split, we are going to execute all the 8 selfs on the progenitors. And we are going to do all possible pairs between these 8 progenitors. Now, since we are doing all possible combinations, do we actually care whether the particles are in their correct octant ? All possible interactions will be processed anyway. Or am I missing something here ? Clearly at the next level we will care, but at this level, is this not to restrictive ?
/** * @brief Can a sub-self hydro task recurse to a lower level based * on the status of the particles in the cell. * * @param c The #cell. */ __attribute__((always_inline)) INLINE static int cell_can_recurse_in_self_task( const struct cell *c) { /* Is the cell split ? */ /* Note: No need for more checks here as all the sub-pairs and sub-self */ /* operations will be executed. So no need for the particle to be at exactly */ /* the right place. */ return c->split; }
/** * @brief Can a self task associated with a cell be split into smaller * sub-tasks. * * @param c The #cell. */ __attribute__((always_inline)) INLINE static int cell_can_split_self_task( const struct cell *c) { /* Is the cell split ? */ /* Note: No need for more checks here as all the sub-pairs and sub-self */ /* tasks will be created. So no need to check for h_max */ return c->split && (space_stretch * kernel_gamma * c->h_max < 0.5f * c->dmin); }
In any case, the comment is incorrect for the second of these functions.
Right, but if
h_max
gets too large, then the cell pairs are all also essentially O(n^2 ) and we don't gain a thing by splitting.I tried running without the
0.5f *
incell_can_split_self_task
, and it was ~ 10% slower on my laptop with two threads.Edited by Matthieu SchallerAdded 1 commit:
- 88356221 - Removed the now unused space_maxcount global variable. Documented the new YAML p…
Cool, thanks for the detailed analysis! Should we try to refine the values to two significant digits, or do you think it's not worth it? Also, for the winning combination, can you check if it affects the runtime at one single thread at all?
The interesting thing is that we should be making sub-cell tasks less often than we currently were, or, by consequence, only make them higher up in the hierarchy.
Already submitted these jobs.
The machine is busy so we will have to wait a bit.
Given how flat the region is around the minimum, I don't think it's worth improving these numbers now. We would probably be fitting noise. Also some other parameters such as the number of particles per cell or the number of top-level cells might also be relevent.
Added 1 commit:
- 135d6034 - Removed incorrect merge and diff lines in space.h