Skip to content
Snippets Groups Projects

Draft: Decrease default vectorization threshold when compiling gravity interactions

Closed Peter W. Draper requested to merge vec-threshold into master
All threads resolved!

When optimizing with AVX2 some of the loops (notably at line 1092) are no longer vectorized. This makes that happen and gives good speed ups.

Fixes #865 (closed)

Now draft as the real solution could be a regression in how we use openmp pragmas as compilation hints.

Edited by Peter W. Draper

Merge request reports

Approval is optional

Closed by Peter W. DraperPeter W. Draper 1 year ago (Oct 27, 2023 12:00pm UTC)

Merge details

  • The changes were not merged into master.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Peter W. Draper changed the description

    changed the description

  • mentioned in issue #865 (closed)

  • Peter W. Draper mentioned in merge request !1763 (closed)

    mentioned in merge request !1763 (closed)

  • Is that also good on Intel?

  • Yes, tried this without any AVX512 flags on cosma7 (so just -march=core-avx2) and that shows an improvement as well, in fact it is probably just as fast in this test.

  • From the big job:

    Last five steps involving all the particles before the change in flags:

    #   Step           Time Scale-factor     Redshift      Time-step Time-bins      Updates    g-Updates    s-Updates sink-Updates    b-Updates  Wall-clock time [ms]  Props Dead time [ms]
        5678   6.608618e-03    0.5371942    0.8615242   1.140760e-06   41   46            0 1199808512000            0            0            0            839950.938    257             39973.425
        5710   6.645208e-03    0.5393804    0.8539791   1.145912e-06   41   49            0 1199808512000            0            0            0            859338.562    257             42677.680
        5742   6.681961e-03    0.5415755    0.8464646   1.151054e-06   41   46            0 1199808512000            0            0            0            877712.125    257             43392.041
        5774   6.718881e-03    0.5437795    0.8389806   1.156219e-06   41   47            0 1199808512000            0            0            0            863896.125    257             40606.240
        5806   6.755965e-03    0.5459925    0.8315269   1.161391e-06   41   46            0 1199808512000            0            0            0            830822.625    257             41168.996

    First five steps involving all the particles after the change in flags:

    #   Step           Time Scale-factor     Redshift      Time-step Time-bins      Updates    g-Updates    s-Updates sink-Updates    b-Updates  Wall-clock time [ms]  Props Dead time [ms]
        5838   6.793214e-03    0.5482145    0.8241034   1.166552e-06   41   48            0 1199808512000            0            0            0            747684.812    257             29803.102
        5870   6.830630e-03    0.5504456    0.8167100   1.171735e-06   41   46            0 1199808512000            0            0            0            774931.125    257             32042.306
        5902   6.868211e-03    0.5526857    0.8093465   1.176924e-06   41   47            0 1199808512000            0            0            0            778530.625    257             30668.313
        5934   6.905958e-03    0.5549350    0.8020129   1.182110e-06   41   46            0 1199808512000            0            0            0            768847.062    257             31783.021
        5966   6.943871e-03    0.5571934    0.7947091   1.187301e-06   41   50            0 1199808512000            0            0            0            793773.250    257             33252.469

    The code did redistribute after some of these steps but the reported imbalance remains about the same at 38%.

    Edited by Matthieu Schaller
  • With the code repartitioning it's hard to get more specific numbers on the contribution to gravity only from the logs as the number of particles on rank 0 (the one reporting) varies quite a bit.

  • I've ran some DMO tests with EAGLE_50x2 and that still seems better, so not just some small volume effect (like more turboboost). Still leaves open the question of how the different resolution may impact the choice of interactions. Are there some smaller example snapshots of this run or maybe FLAMINGO at this resolution?

  • Peter W. Draper resolved all threads

    resolved all threads

  • Matthieu Schaller resolved all threads

    resolved all threads

  • Peter W. Draper marked this merge request as draft

    marked this merge request as draft

  • Peter W. Draper changed the description

    changed the description

  • Closing as superseded by !1804 (merged)

Please register or sign in to reply
Loading