Skip to content
Snippets Groups Projects

Inhibit vec

Merged James Willis requested to merge inhibit_vec into master

Implements #481 (closed).

  • Optimised the way we handle inhibited particles in the density and force neighbour searches
  • Make sure inhibited particles do not interact in the vectorised versions of the density and force neighbour searches
Edited by Matthieu Schaller

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • I have made a small change >= not == since I use an extra time-bin value for some things. These particles should never appear here but let's be safe.

    Any reason why no changes were applied to cache_read_particles_subset()?

    Also, do you have a way of testing the vectorized version? The debugging checks inside the interaction function are only present in the scalar code.

    Finally, do you know whether the cache construction with the new continue in it or the padding still vectorizes?

  • Matthieu Schaller changed the description

    changed the description

  • James Willis added 1 commit

    added 1 commit

    Compare with previous version

  • James Willis added 1 commit

    added 1 commit

    • bdc6671e - Fix DOPAIR_SUBSET_NAIVE() when debugging checks are enabled.

    Compare with previous version

  • Matthieu Schaller resolved all discussions

    resolved all discussions

  • I think I missed cache_read_particles_subset(), I will do that next.

    The inhibited particles will appear in the vectorised interaction function but will be masked out afterwards.

    So the pair cache construction still vectorises, but the self cache construction doesn't with AVX because of the uninhibited_count variable used for indexing the array... Although when I benchmark it I get the same performance as the master branch. But looking at the vector report again shows that that loop doesn't actually get vectorised as the expected speedup is too low. The self cache construction does vectorise on COSMA7 with AVX512 though.

  • Thanks. Sounds good.

    I am just uneasy about the inability to formally test it but don't have anything to suggest right now.

  • Can you remind me which cache construction function is used for the SELF and which is used for the PAIR?

  • cache_read_particles is the SELF and cache_read_two_partial_cells_sorted is the PAIR.

  • James Willis added 1 commit

    added 1 commit

    • 1483f29f - Place inhibited particles out of range of neighbouring particles so no…

    Compare with previous version

  • added 4 commits

    • 6ee6efb8 - In runner_doself1_density_vec(), construct the padded positions only once for…
    • 0c77373b - In runner_doself1_density_vec() do not check for r2 > 0. This is never done in…
    • ab7e78da - Revert "In runner_doself1_density_vec() do not check for r2 > 0. This is never…
    • 61172cf8 - Cosmetic changes.

    Compare with previous version

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading