Skip to content
Snippets Groups Projects
Commit c8b6c872 authored by Matthieu Schaller's avatar Matthieu Schaller
Browse files

Updated the short-range gravity correction to the much faster sigmoid version.

parent b9a6c0e4
No related branches found
No related tags found
1 merge request!393Periodic gravity speed and accuracy improvements
...@@ -36,4 +36,17 @@ __attribute__((always_inline)) INLINE static float approx_expf(float x) { ...@@ -36,4 +36,17 @@ __attribute__((always_inline)) INLINE static float approx_expf(float x) {
return 1.f + x * (1.f + x * (0.5f + x * (1.f / 6.f + 1.f / 24.f * x))); return 1.f + x * (1.f + x * (0.5f + x * (1.f / 6.f + 1.f / 24.f * x)));
} }
/**
* @brief Approximate version of expf(x) using a 6th order Taylor expansion
*
*/
__attribute__((always_inline)) INLINE static float good_approx_expf(float x) {
  • Developer

    what kind of accuracy are you looking for in what range? there's likely a lower-degree polynomial for that specific case...

  • Please register or sign in to reply
  • Actually, I need the best possible approximation to e^x / ( 1+e^x ) for x in [0, 6]. It can just be 1 for any large x and I don't care about negative values. Might be that there is a Pade approximant of order less than 6 that does better than my current use of the approximate exp(). Note though that I can re-use the same exp(x) in both the numerator and denominator.

    I need to work out the approximation quality I need. Essentially if I am close enough to the true value then I can use the analytic expression for the corresponding function in Fourier space. This is already much much faster than before.

    I need to work out what "close enough" means though in terms of accuracy of the whole method.

    Edited by Matthieu Schaller
  • Please register or sign in to reply
  • One more thing: With this choice of function and the right coefficients, you can show that the correction factor is close to 1 (within less than 0.001) for all distances that are less than 0.1 of the mesh cell size. That means that for all self/pairs that take place at 4 levels or more below the top-level of the tree (where the FFT is performed), there is no need to compute any of this. With the erf choice, this is only true at 0.005 of the top-level mesh size, which is rarely reached.

    This feature comes from the sharper transition of this function compared to the erf and allows to avoid computing any of these terms for most of the interesting situations.

    I still need to write the bit of code that exploits this feature but it will make most of the discussion less relevant.

  • Please register or sign in to reply
  • Developer

    well, it just so happens that i know a thing or two about rational/padé interpolation :smiley:

    let me see what i can do about exp(x)/(1+exp(x)). i'll start by aiming for single-precision accuracy and we can take it from there.

  • Please register or sign in to reply
  • Alright, then lets spice it up a bit. :smile:

    We also need to evaluate the first few derivatives of that thing for the multipole-multipole part of the calculation. The formulation I currently use is good as it does not require me to recompute more things. I can just build them using powers of 1 / ( 1 + e^x ). I think that this is a nice property to retain.

  • Please register or sign in to reply
return 1.f +
x * (1.f +
x * (0.5f +
x * ((1.f / 6.f) +
x * ((1.f / 24.f) +
x * ((1.f / 120.f) + (1.f / 720.f) * x)))));
}
#endif /* SWIFT_APPROX_MATH_H */ #endif /* SWIFT_APPROX_MATH_H */
...@@ -19,24 +19,30 @@ ...@@ -19,24 +19,30 @@
#ifndef SWIFT_KERNEL_LONG_GRAVITY_H #ifndef SWIFT_KERNEL_LONG_GRAVITY_H
#define SWIFT_KERNEL_LONG_GRAVITY_H #define SWIFT_KERNEL_LONG_GRAVITY_H
#include <math.h> /* Config parameters. */
#include "../config.h"
/* Includes. */ /* Local headers. */
#include "approx_math.h"
#include "const.h" #include "const.h"
#include "inline.h" #include "inline.h"
#include "vector.h"
#define one_over_sqrt_pi ((float)(M_2_SQRTPI * 0.5)) /* Standard headers */
#include <math.h>
/** /**
* @brief Computes the long-range correction term for the FFT calculation. * @brief Computes the long-range correction term for the FFT calculation.
* *
* @param u The ratio of the distance to the FFT cell scale $u = x/A$. * @param u The ratio of the distance to the FFT cell scale \f$u = r/r_s\f$.
* @param W (return) The value of the kernel function. * @param W (return) The value of the kernel function.
*/ */
__attribute__((always_inline)) INLINE static void kernel_long_grav_eval( __attribute__((always_inline)) INLINE static void kernel_long_grav_eval(
float u, float *const W) { float u, float *const W) {
#ifdef GADGET2_LONG_RANGE_CORRECTION
const float one_over_sqrt_pi = ((float)(M_2_SQRTPI * 0.5));
const float arg1 = u * 0.5f; const float arg1 = u * 0.5f;
const float arg2 = u * one_over_sqrt_pi; const float arg2 = u * one_over_sqrt_pi;
const float arg3 = -arg1 * arg1; const float arg3 = -arg1 * arg1;
...@@ -45,6 +51,35 @@ __attribute__((always_inline)) INLINE static void kernel_long_grav_eval( ...@@ -45,6 +51,35 @@ __attribute__((always_inline)) INLINE static void kernel_long_grav_eval(
const float term2 = arg2 * expf(arg3); const float term2 = arg2 * expf(arg3);
*W = term1 + term2; *W = term1 + term2;
#else
const float arg = 2.f * u;
const float exp_arg = good_approx_expf(arg);
const float term = 1.f / (1.f + exp_arg);
*W = arg * exp_arg * term * term - exp_arg * term + 1.f;
*W *= 2.f;
#endif
}
/**
* @brief Returns the long-range truncation of the Poisson potential in Fourier
* space.
*
* @param u2 The square of the Fourier mode times the cell scale
* \f$u^2 = k^2r_s^2\f$.
* @param W (return) The value of the kernel function.
*/
__attribute__((always_inline)) INLINE static void fourier_kernel_long_grav_eval(
double u2, double *const W) {
#ifdef GADGET2_LONG_RANGE_CORRECTION
*W = exp(-u2);
#else
const double u = sqrt(u2);
const double arg = M_PI_2 * u;
*W = arg / sinh(arg);
#endif
} }
#endif // SWIFT_KERNEL_LONG_GRAVITY_H #endif // SWIFT_KERNEL_LONG_GRAVITY_H
...@@ -20,9 +20,6 @@ ...@@ -20,9 +20,6 @@
/* Config parameters. */ /* Config parameters. */
#include "../config.h" #include "../config.h"
/* Some standard headers. */
#include <pthread.h>
#ifdef HAVE_FFTW #ifdef HAVE_FFTW
#include <fftw3.h> #include <fftw3.h>
#endif #endif
...@@ -33,6 +30,7 @@ ...@@ -33,6 +30,7 @@
/* Local includes. */ /* Local includes. */
#include "engine.h" #include "engine.h"
#include "error.h" #include "error.h"
#include "kernel_long_gravity.h"
#include "runner.h" #include "runner.h"
#include "space.h" #include "space.h"
#include "timers.h" #include "timers.h"
...@@ -242,7 +240,9 @@ void runner_do_grav_fft(struct runner* r, int timer) { ...@@ -242,7 +240,9 @@ void runner_do_grav_fft(struct runner* r, int timer) {
if (k2 == 0.) continue; if (k2 == 0.) continue;
/* Green function */ /* Green function */
const double green_cor = green_fac * exp(-k2 * a_smooth2) / k2; double W;
fourier_kernel_long_grav_eval(k2 * a_smooth2, &W);
const double green_cor = green_fac * W / k2;
/* Deconvolution of CIC */ /* Deconvolution of CIC */
const double CIC_cor = sinc_kx_inv * sinc_ky_inv * sinc_kz_inv; const double CIC_cor = sinc_kx_inv * sinc_ky_inv * sinc_kz_inv;
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment