Looking at the MPICH documentation, this should work.
So maybe also file a bug against MPICH?
Cheers, Pedro
For context, SWIFT was originally designed as a library since we thought it was likely to be integrated as a part of existing cosmological simulation codes, and not just as a stand-alone executable. The reasoning behind this was the perceived reluctance of users to switch codes.
Note that originally, this approach served us well since it forced us to separate different parts of the code properly, and provide reasonable and well-documented APIs for the client-facing functions.
Seeing, though, that SWIFT is doing quite well as a stand-alone code and does not seem to have ever been integrated into another code, I guess it's OK to provide a top-level executable (and not hide it in examples/swift
)
Cheers, Pedro
Sorry, my bad! I was misremembering this change.
It might be easier, or more readable, to just do:
for (double dist = 1.0f; 1.0 + dist > 1.0; dist /= 2) {
...
}
Also, Jacob also mentioned issues where r
was almost at h
, so possibly also test particles 1.0 - dist
apart?
Cheers, Pedro
Cool, thanks for clarifying!
Somewhat unrelated question while we're talking about performance...
Is it safe to assume that all these functions will compute r = sqrtf(r2)
? If so, why not pull that computation out and pass in both r2
and r
?
Also, we seem to be doing
const float r = sqrtf(r2);
const float r_inv = 1.0f / r;
and not
const float r_inv = 1.0f / sqrtf(r2);
const float r = r2 * r_inv;
While the latter looks unnecessarily cumbersome, it allows the compiler to use the rsqrtss
instruction, which is ~3x faster than the sqrtss
instruction generated by the former (see Intel Intrinsics Guide).
Note that the latter generates some extra code to "manually" add an extra Newton iteration to the result of rsqrtss
(see Godbolt), so it probably won't be 3x faster, but those extra instructions can be ILP'd with other code.
Note furthermore that the latter variant does not require the divss
of the former, which is almost as expensive as sqrtss
.
Cheers, Pedro
We check on start-up whether particlees are at a distance of zero already. But of course that only covers the IC case.
I think what Jacob was suggesting was to run a unit test on the particle interaction scheme at the start of the simulation, e.g. with a fake set of particles at the prescribed pairwise distances, and check for non-finite values there.
Cheers, Pedro
Thanks for the clarification!
So one relatively common way to avoid the 0 / Inf
issue would be to compute
const float r_inv = r ? 1.0f / r : 0.0f;
which is common enough that the compiler should be able to vectorize it or at least avoid a branch delay.
I also like the suggestion of having a single DEBUG_CHECK
-guarded test somewhere at the start of the simulation. I wouldn't test a single small r / h
, but rather the ranges 0, 0.5, 0.25, 0.125, ... down to machine epsilon, as well as 0.5, 0.75, 0.875, ... up to 1.0 (also machine epsilon).
WDYT?
Cheers, Pedro
Nice! Do you know exactly what was causing the overflows/NaNs in those functions?
Cheers, Pedro
Thanks for the clarifications!
Regarding your first point, powers of u=r/h
should silently underflow to zero, which is fine.
I like the suggestion of adding FLT_MIN * h
to r2, but I would still like to understand where the numerical issues for small
r2or large
r_inv` happen.
Ideally, we would have a unit test of sorts that checks each scheme for values of r
infinitesimally near zero or h
, which would both root-cause (I too can verb ;)) the current issue and guard against bad numerics in future schemes. I would try to code this myself, but I'm parenting all day today and I think you want a fix sooner than later.
Cheers, Pedro
Agreed, I know that can happen, but why hydro_distance_to_h_min_ratio
and not just check for r2 == 0.0f
?
Also, why do this check in the main loop and not in the interaction functions themselves, i.e. to avoid the if
-statement being called for all pairs, regardless of whether they are in range or not?
Or even replace if (r2 < hig2) {...}
with if (r2 > 0.0f && r2 < hig2) {...}
? (Actually, here I'm guessing there is some part of the interaction that still contributes if r2 == 0
, but then wouldn't it make more sense to just if
-out that part specifically?
Drive-by code review :)
What I'm most curious about though is why this is needed... The value r2
is computed as a sum of squares, so it should always be greater than or equal to zero, so I'm guessing the loss of precision happens somewhere in IACT_NONSYM
or in one of the runner_iact_nonsym_*
functions?
If so, why not catch it there specifically? This would have the advantage of not adding a conditional statement for each pairwise interaction, but at least only for those within range.
Cheers, Pedro
Expand the comment to explain what you are doing with the random values below?
Adding a conditional statement for each pairwise distance computation is going ruin a lot of performance optimizations, e.g. ILP and vectorization, beyond the delay introduced by the branch itself.
Did you try to measure if and how this affects performance?
"each others" -> "each other."
"interation loops." -> "interaction loops."
Just chiming in from the sidelines here :)
First of all, these are awesome results!
Secondly, topology can matter a lot when there is congestion, e.g. the probability of losing a packet grows exponentially with the number of network hops. Also, the latency per hop will depend on how much traffic there is at the switch.
Note that while these issues may only affect a small fraction of all packets sent back and forth, if they block for several milliseconds, it will cause stragglers and thus affect the entire simulation.
Peter, do you have any data on packet loss or latency over the switches as a function of network load? Or can things like packet loss and latency be measured at the switches at runtime (i.e. on the switch itself, outside of swift
)?
Cheers, Pedro