Skip to content
Snippets Groups Projects

Subsize

Merged Pedro Gonnet requested to merge subsize into master

Use separate parameters to determine when a pair or self interaction should be made a sub-cell task.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
    • Owner

      Can I suggest to check the functions cell_can_split_self_task and cell_can_recurse_in_self_task in cell.h ? I am not sure I did not make them too restrictive.

  • Thanks, I'll look at it. Seems sensible to have two different values.

    Did you see the comment I made on the diff regarding the content of the decision functions. I think the "self" versions may be too restrictive.

  • Author Developer

    OK, will have a look when I have a minute!

  • I have a huntch that we can do better but it's hard to explain without a picture.

  • The code works (not that I expected otherwise) so how do you propose we test for the best values ? Parameter search on 16 cores ?

  • Author Developer

    Yup, I had a spreadsheet with powers of two for both parameters and was running this by hand.

    I'd definitely optimize for wall-clock time on all cores, as these are the timings we care about.

  • Ok, so if/when you confirm you are happy with the recursion conditions mentioned above, I'll merge this in. It's a necessary change and we can do detailed parameter search using the master branch to make sure we keep things updated.

  • Author Developer

    So the logic behind the recursion condition is that if the self interaction is split, the resulting pairs are still "normal" pair interactions, i.e. the maximum smoothing length will not exceed the cell edge length.

    But before submitting, I'd like to know what the best values are for 16 cores. Since we're running on all cores, there's no need to do this on our special machines, so you could start several runs at the same time :)

  • That's the bit I was wondering about.

    If we have a self that can be split, we are going to execute all the 8 selfs on the progenitors. And we are going to do all possible pairs between these 8 progenitors. Now, since we are doing all possible combinations, do we actually care whether the particles are in their correct octant ? All possible interactions will be processed anyway. Or am I missing something here ? Clearly at the next level we will care, but at this level, is this not to restrictive ?

    /**
     * @brief Can a sub-self hydro task recurse to a lower level based
     * on the status of the particles in the cell.
     *
     * @param c The #cell.
     */
    __attribute__((always_inline)) INLINE static int cell_can_recurse_in_self_task(
        const struct cell *c) {
    
      /* Is the cell split ? */
      /* Note: No need for more checks here as all the sub-pairs and sub-self */
      /* operations will be executed. So no need for the particle to be at exactly
       */
      /* the right place. */
      return c->split;
    }
    /**
     * @brief Can a self task associated with a cell be split into smaller
     * sub-tasks.
     *
     * @param c The #cell.
     */
    __attribute__((always_inline)) INLINE static int cell_can_split_self_task(
        const struct cell *c) {
    
      /* Is the cell split ? */
      /* Note: No need for more checks here as all the sub-pairs and sub-self */
      /* tasks will be created. So no need to check for h_max */
      return c->split && (space_stretch * kernel_gamma * c->h_max < 0.5f * c->dmin);
    }

    In any case, the comment is incorrect for the second of these functions.

  • Author Developer

    Right, but if h_max gets too large, then the cell pairs are all also essentially O(n^2 ) and we don't gain a thing by splitting.

    I tried running without the 0.5f * in cell_can_split_self_task, and it was ~ 10% slower on my laptop with two threads.

    Edited by Matthieu Schaller
  • My initial search on 16 cores, exploring powers of 2, favours cell_sub_size_self = 32'000 and cell_sub_size_pair = 64'000'000.

    histogram

    Now, clearly I am not bracketing the minimum in the PAIR value so I'll resubmit more jobs.

  • The run with minimal time is about 5% faster than master on 16 cores.

  • Added 1 commit:

    • 88356221 - Removed the now unused space_maxcount global variable. Documented the new YAML p…
  • And here is the updated version with a true minimum. Note the colour scale has changed so can't compare with the previous one directly.

    histogram

    Lowest runtime on 16 cores obtained for cell_sub_size_self = 32'000 and cell_sub_size_pair = 256'000'000

  • Author Developer

    Cool, thanks for the detailed analysis! Should we try to refine the values to two significant digits, or do you think it's not worth it? Also, for the winning combination, can you check if it affects the runtime at one single thread at all?

    The interesting thing is that we should be making sub-cell tasks less often than we currently were, or, by consequence, only make them higher up in the hierarchy.

  • Author Developer

    Oh, and if you can, for the winning combination, task plots would be cool to sanity-check the results :)

    Thanks!

  • Already submitted these jobs. :smile:

    The machine is busy so we will have to wait a bit.

    Given how flat the region is around the minimum, I don't think it's worth improving these numbers now. We would probably be fitting noise. Also some other parameters such as the number of particles per cell or the number of top-level cells might also be relevent.

  • Matthieu Schaller Added 2 commits:

    Added 2 commits:

    • f9bffbff - Updated the default values of space_subsize_pair space_subsize_self to the optim…
    • 951989db - Merge branch 'subsize' of gitlab.cosma.dur.ac.uk:swift/swiftsim into subsize
  • Added 1 commit:

    • 135d6034 - Removed incorrect merge and diff lines in space.h
  • scaling

    Overall faster, but not better scaling. Big difference over MPI but I don't know why.

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading