diff --git a/configure.ac b/configure.ac index 82f4f55cc27d5e95d7382749e1b80328bf5bc79e..5c46efc1a1a47837d9d75e4dcfc0632ea9d98b49 100644 --- a/configure.ac +++ b/configure.ac @@ -16,7 +16,7 @@ # along with this program. If not, see <http://www.gnu.org/licenses/>. # Init the project. -AC_INIT([SWIFT],[0.7.0],[https://gitlab.cosma.dur.ac.uk/swift/swiftsim]) +AC_INIT([SWIFT],[0.8.0],[https://gitlab.cosma.dur.ac.uk/swift/swiftsim]) swift_config_flags="$*" AC_COPYRIGHT diff --git a/doc/RTD/source/Cooling/index.rst b/doc/RTD/source/Cooling/index.rst index 0ca3f62f3ca6fdff2abb95501681dc7bf4676fd2..d9462b5233334769f163883ffd9e859cc7b69767 100644 --- a/doc/RTD/source/Cooling/index.rst +++ b/doc/RTD/source/Cooling/index.rst @@ -31,7 +31,7 @@ cooling contains a temperature floor avoiding negative temperature. Grackle ~~~~~~~ -Grackle is a chemistry and cooling library presented in B. Smith et al. 2016 +Grackle is a chemistry and cooling library presented in `B. Smith et al. 2016 <https://arxiv.org/abs/1610.09591>`_ (do not forget to cite if used). Four different modes are available: equilibrium, 6 species network (H, H\\( ^+ \\), e\\( ^- \\), He, He\\( ^+ \\) and He\\( ^{++} \\)), 9 species network (adds H\\(^-\\), H\\(_2\\) and diff --git a/doc/RTD/source/VELOCIraptorInterface/index.rst b/doc/RTD/source/VELOCIraptorInterface/index.rst index 3ae17f04b9950d30d1484ac5117b65063b7739c6..312b9cd3f893dd44f814ad80ac40748db12bd4d5 100644 --- a/doc/RTD/source/VELOCIraptorInterface/index.rst +++ b/doc/RTD/source/VELOCIraptorInterface/index.rst @@ -5,7 +5,7 @@ VELOCIraptor Interface ====================== This section includes information on the VELOCIraptor interface implemented in -SWIFT. There are mainly four subsection; the first section explains shortly +SWIFT. There are mainly four subsections; the first section explains shortly how VELOCIraptor works, the second subsection explains how to configure SWIFT with VELOCIraptor, the third subsection explains how to configure a standalone version of VELOCIraptor and the last subsection explains how the output format diff --git a/doc/RTD/source/VELOCIraptorInterface/output.rst b/doc/RTD/source/VELOCIraptorInterface/output.rst index cf0e386d4a7de982dcca5a2ddc5ebd642c46ec34..2b0b88eea07df607b71018bdd9bc76d046e76477 100644 --- a/doc/RTD/source/VELOCIraptorInterface/output.rst +++ b/doc/RTD/source/VELOCIraptorInterface/output.rst @@ -46,7 +46,7 @@ The second file that is produced by VELOCIraptor is the ``.catalog_particles`` file, this file contains mainly all the IDs of the particles and has two interesting parameters: -+ The ``Num_of_particles_in_groups`` and ``Num_of_particles_in_groups`` ++ The ``Num_of_particles_in_groups`` and ``Total_num_of_particles_in_all_groups`` parameter: Gives the total number of particles in the file or the total number of particles that are in halos. + The ``Particle_IDs``: The list of particles as sorted by halo, in which halo @@ -136,7 +136,7 @@ the unbound particles. Properties file --------------- -The Fourth file is the ``.properties`` file, this file contains many physical +The fourth file is the ``.properties`` file, this file contains many physical useful information of the corresponding halos. This can be divided in several useful groups of physical parameters, on this page we have divided the several variables which are present in the ``.properties`` file. This file has most @@ -179,7 +179,7 @@ Bryan and Norman 1998 properties: + ``Mass_BN98``, The Bryan and Norman (1998) determination of the mass of the halo [#BN98]_. -+ ``R_BN98``, the Bryan and Norman (1998) corresponding radius[#BN98]_. ++ ``R_BN98``, the Bryan and Norman (1998) corresponding radius [#BN98]_. Several Mass types: """"""""""""""""""" diff --git a/doc/RTD/source/VELOCIraptorInterface/whatis.rst b/doc/RTD/source/VELOCIraptorInterface/whatis.rst index e7f067ec4723e41da0f3a95dddad8f55d9897e85..bc14e0925a288315c311972d97951abfc91426a6 100644 --- a/doc/RTD/source/VELOCIraptorInterface/whatis.rst +++ b/doc/RTD/source/VELOCIraptorInterface/whatis.rst @@ -12,7 +12,7 @@ What is VELOCIraptor? In SWIFT it is possible to run a cosmological simulation and at the same time do on the fly halo finding at specific predefined intervals. For finding the -Halos SWIFT uses VELOCIraptor (Elahi, Thacker and Widrow; 2011) [#velo]_, this +halos SWIFT uses VELOCIraptor (Elahi, Thacker and Widrow; 2011) [#velo]_, this is a C++ halo finder that can use MPI. It differs from other halo finder algorithms in the sense that it uses the velocity distributions of the particles in the simulations and the the positions of the particles to get @@ -22,7 +22,7 @@ whether there are substructures in halos. The Algorithm ------------- -The VELOCIraptor algorithm consist basically off the following steps [#ref]_: +The VELOCIraptor algorithm consists basically of the following steps [#ref]_: 1. A kd-tree is constructed based on the maximization of the Shannon-entropy, this means that every level in the kd-tree an equal number of particles @@ -32,21 +32,21 @@ The VELOCIraptor algorithm consist basically off the following steps [#ref]_: takes into account the absolute positions of the particles. 2. The next part is calculating the the centre of mass velocity and the velocity distribution for every individual node in the kd-tree. -3. Than the algorithm estimates the background velocity density function for +3. Then the algorithm estimates the background velocity density function for every particle based on the cell of the particle and the six nearest neighbour cells. This prevents the background velocity density function to be over sensitive for variations between different cells due to dominant halo features in the velocity density function. 4. After this the algorithm searches for the nearest velocity neighbours (:math:`N_v`) from a set of nearest position neighbours (:math:`N_x>N_v`). - The position neighbours do not need to be in the cell of the particles, in + The neighbours' positions do not need to be in the cell of the particles, in general the set of nearest position neighbours is substantially larger than the nearest velocity neighbours, the default is set as :math:`N_x=32 N_v`. 5. The individual local velocity density function is calculated for every particle. 6. The fractional difference is calculated between the local velocity density function and the background velocity density function. -7. Based on the calculated ratio outliers are picked and the outliers are +7. Based on the calculated ratio, outliers are picked and the outliers are grouped together in halos and subhalos. diff --git a/examples/main.c b/examples/main.c index 5ae01a9280ac84c11513137cd5b26c9e8966f7b9..ecec9637be3f8c868be09496351ec9994038d28d 100644 --- a/examples/main.c +++ b/examples/main.c @@ -1230,7 +1230,7 @@ int main(int argc, char *argv[]) { /* Write final output. */ if (!force_stop) { - engine_drift_all(&e); + engine_drift_all(&e, /*drift_mpole=*/0); engine_print_stats(&e); #ifdef WITH_LOGGER logger_log_all(e.logger, &e); diff --git a/src/Makefile.am b/src/Makefile.am index afe4002b29b55a0f5ea4a0aff9ba1dd5adefc005..9b0610667bdcf1f760f6e94d4481848a8fc4d0f0 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -53,8 +53,8 @@ include_HEADERS = space.h runner.h queue.h task.h lock.h cell.h part.h const.h \ # Common source files AM_SOURCES = space.c runner.c queue.c task.c cell.c engine.c engine_maketasks.c \ - engine_marktasks.c serial_io.c timers.c debug.c scheduler.c proxy.c parallel_io.c \ - units.c common_io.c single_io.c multipole.c version.c map.c \ + engine_marktasks.c engine_drift.c serial_io.c timers.c debug.c scheduler.c \ + proxy.c parallel_io.c units.c common_io.c single_io.c multipole.c version.c map.c \ kernel_hydro.c tools.c part.c partition.c clocks.c parser.c \ physical_constants.c potential.c hydro_properties.c \ threadpool.c cooling.c sourceterms.c \ diff --git a/src/cell.c b/src/cell.c index 5515838ed8d8f31dcb80535ace6238b1beccf3f8..eb702fbf7e9d93e80ff31df68cad7a07db6f8d29 100644 --- a/src/cell.c +++ b/src/cell.c @@ -3292,7 +3292,7 @@ void cell_set_super_mapper(void *map_data, int num_elements, void *extra_data) { /* All top-level cells get an MPI tag. */ #ifdef WITH_MPI - if (c->mpi.tag < 0 && c->mpi.sendto) cell_tag(c); + cell_ensure_tagged(c); #endif /* Super-pointer for hydro */ @@ -3364,6 +3364,21 @@ void cell_drift_part(struct cell *c, const struct engine *e, int force) { if (ti_current < ti_old_part) error("Attempt to drift to the past"); #endif + /* Early abort? */ + if (c->hydro.count == 0) { + + /* Clear the drift flags. */ + c->hydro.do_drift = 0; + c->hydro.do_sub_drift = 0; + + /* Update the time of the last drift */ + c->hydro.ti_old_part = ti_current; + + return; + } + + /* Ok, we have some particles somewhere in the hierarchy to drift */ + /* Are we not in a leaf ? */ if (c->split && (force || c->hydro.do_sub_drift)) { @@ -3535,6 +3550,21 @@ void cell_drift_gpart(struct cell *c, const struct engine *e, int force) { if (ti_current < ti_old_gpart) error("Attempt to drift to the past"); #endif + /* Early abort? */ + if (c->grav.count == 0) { + + /* Clear the drift flags. */ + c->grav.do_drift = 0; + c->grav.do_sub_drift = 0; + + /* Update the time of the last drift */ + c->grav.ti_old_part = ti_current; + + return; + } + + /* Ok, we have some particles somewhere in the hierarchy to drift */ + /* Are we not in a leaf ? */ if (c->split && (force || c->grav.do_sub_drift)) { diff --git a/src/cell.h b/src/cell.h index b364364a65ffd9d7c83ab1fc52a1a9a0b1b10200..c22c2ac1e20d5df474ae88ff9d8b8e89707c79e6 100644 --- a/src/cell.h +++ b/src/cell.h @@ -705,6 +705,68 @@ int cell_can_use_pair_mm(const struct cell *ci, const struct cell *cj, int cell_can_use_pair_mm_rebuild(const struct cell *ci, const struct cell *cj, const struct engine *e, const struct space *s); +/** + * @brief Compute the square of the minimal distance between any two points in + * two cells of the same size + * + * @param ci The first #cell. + * @param cj The second #cell. + * @param periodic Are we using periodic BCs? + * @param dim The dimensions of the simulation volume + */ +__attribute__((always_inline)) INLINE static double cell_min_dist2_same_size( + const struct cell *restrict ci, const struct cell *restrict cj, + const int periodic, const double dim[3]) { + +#ifdef SWIFT_DEBUG_CHECKS + if (ci->width[0] != cj->width[0]) error("Cells of different size!"); + if (ci->width[1] != cj->width[1]) error("Cells of different size!"); + if (ci->width[2] != cj->width[2]) error("Cells of different size!"); +#endif + + const double cix_min = ci->loc[0]; + const double ciy_min = ci->loc[1]; + const double ciz_min = ci->loc[2]; + const double cjx_min = cj->loc[0]; + const double cjy_min = cj->loc[1]; + const double cjz_min = cj->loc[2]; + + const double cix_max = ci->loc[0] + ci->width[0]; + const double ciy_max = ci->loc[1] + ci->width[1]; + const double ciz_max = ci->loc[2] + ci->width[2]; + const double cjx_max = cj->loc[0] + cj->width[0]; + const double cjy_max = cj->loc[1] + cj->width[1]; + const double cjz_max = cj->loc[2] + cj->width[2]; + + if (periodic) { + + const double dx = min4(fabs(nearest(cix_min - cjx_min, dim[0])), + fabs(nearest(cix_min - cjx_max, dim[0])), + fabs(nearest(cix_max - cjx_min, dim[0])), + fabs(nearest(cix_max - cjx_max, dim[0]))); + + const double dy = min4(fabs(nearest(ciy_min - cjy_min, dim[1])), + fabs(nearest(ciy_min - cjy_max, dim[1])), + fabs(nearest(ciy_max - cjy_min, dim[1])), + fabs(nearest(ciy_max - cjy_max, dim[1]))); + + const double dz = min4(fabs(nearest(ciz_min - cjz_min, dim[2])), + fabs(nearest(ciz_min - cjz_max, dim[2])), + fabs(nearest(ciz_max - cjz_min, dim[2])), + fabs(nearest(ciz_max - cjz_max, dim[2]))); + + return dx * dx + dy * dy + dz * dz; + + } else { + + const double dx = min(fabs(cix_max - cjx_min), fabs(cix_min - cjx_max)); + const double dy = min(fabs(ciy_max - cjy_min), fabs(ciy_min - cjy_max)); + const double dz = min(fabs(ciz_max - cjz_min), fabs(ciz_min - cjz_max)); + + return dx * dx + dy * dy + dz * dz; + } +} + /* Inlined functions (for speed). */ /** @@ -884,20 +946,23 @@ __attribute__((always_inline)) INLINE static int cell_need_rebuild_for_pair( } /** - * @brief Add a unique tag to a cell mostly for MPI communications. + * @brief Add a unique tag to a cell, mostly for MPI communications. + * + * This function locks the cell so that tags can be added concurrently. * * @param c The #cell to tag. */ -__attribute__((always_inline)) INLINE static void cell_tag(struct cell *c) { +__attribute__((always_inline)) INLINE static void cell_ensure_tagged( + struct cell *c) { #ifdef WITH_MPI -#ifdef SWIFT_DEBUG_CHECKS - if (c->mpi.tag > 0) error("setting tag for already tagged cell"); -#endif - + lock_lock(&c->hydro.lock); if (c->mpi.tag < 0 && (c->mpi.tag = atomic_inc(&cell_next_tag)) > cell_max_tag) error("Ran out of cell tags."); + if (lock_unlock(&c->hydro.lock) != 0) { + error("Failed to unlock cell."); + } #else error("SWIFT was not compiled with MPI enabled."); #endif // WITH_MPI diff --git a/src/engine.c b/src/engine.c index cd1aed30abcc245c501f942e078dda46b80dd71d..fa095f4d377b483289166cf86fa50e757ec1a82b 100644 --- a/src/engine.c +++ b/src/engine.c @@ -981,8 +981,7 @@ void engine_repartition(struct engine *e) { fflush(stdout); /* Check that all cells have been drifted to the current time */ - space_check_drift_point(e->s, e->ti_current, - e->policy & engine_policy_self_gravity); + space_check_drift_point(e->s, e->ti_current, /*check_multipoles=*/0); #endif /* Clear the repartition flag. */ @@ -2118,7 +2117,7 @@ void engine_prepare(struct engine *e) { /* Unskip active tasks and check for rebuild */ if (!e->forcerebuild && !e->forcerepart && !e->restarting) engine_unskip(e); - const ticks tic2 = getticks(); + const ticks tic3 = getticks(); #ifdef WITH_MPI MPI_Allreduce(MPI_IN_PLACE, &e->forcerebuild, 1, MPI_INT, MPI_MAX, @@ -2127,13 +2126,13 @@ void engine_prepare(struct engine *e) { if (e->verbose) message("Communicating rebuild flag took %.3f %s.", - clocks_from_ticks(getticks() - tic2), clocks_getunit()); + clocks_from_ticks(getticks() - tic3), clocks_getunit()); /* Do we need repartitioning ? */ if (e->forcerepart) { /* Let's start by drifting everybody to the current time */ - engine_drift_all(e); + engine_drift_all(e, /*drift_mpole=*/0); drifted_all = 1; /* And repartition */ @@ -2145,7 +2144,7 @@ void engine_prepare(struct engine *e) { if (e->forcerebuild) { /* Let's start by drifting everybody to the current time */ - if (!e->restarting && !drifted_all) engine_drift_all(e); + if (!e->restarting && !drifted_all) engine_drift_all(e, /*drift_mpole=*/0); /* And rebuild */ engine_rebuild(e, repartitioned, 0); @@ -2937,7 +2936,7 @@ void engine_step(struct engine *e) { e->step_props = engine_step_prop_none; /* When restarting, move everyone to the current time. */ - if (e->restarting) engine_drift_all(e); + if (e->restarting) engine_drift_all(e, /*drift_mpole=*/1); /* Get the physical value of the time and time-step size */ if (e->policy & engine_policy_cosmology) { @@ -2976,7 +2975,7 @@ void engine_step(struct engine *e) { /* Are we drifting everything (a la Gadget/GIZMO) ? */ if (e->policy & engine_policy_drift_all && !e->forcerebuild) - engine_drift_all(e); + engine_drift_all(e, /*drift_mpole=*/1); /* Are we reconstructing the multipoles or drifting them ?*/ if ((e->policy & engine_policy_self_gravity) && !e->forcerebuild) { @@ -3120,7 +3119,7 @@ void engine_check_for_dumps(struct engine *e) { } /* Drift everyone */ - engine_drift_all(e); + engine_drift_all(e, /*drift_mpole=*/0); /* Dump everything */ engine_print_stats(e); @@ -3144,7 +3143,7 @@ void engine_check_for_dumps(struct engine *e) { } /* Drift everyone */ - engine_drift_all(e); + engine_drift_all(e, /*drift_mpole=*/0); /* Dump stats */ engine_print_stats(e); @@ -3156,7 +3155,7 @@ void engine_check_for_dumps(struct engine *e) { e->time = e->ti_next_snapshot * e->time_base + e->time_begin; /* Drift everyone */ - engine_drift_all(e); + engine_drift_all(e, /*drift_mpole=*/0); /* Dump snapshot */ #ifdef WITH_LOGGER @@ -3179,7 +3178,7 @@ void engine_check_for_dumps(struct engine *e) { } /* Drift everyone */ - engine_drift_all(e); + engine_drift_all(e, /*drift_mpole=*/0); /* Dump snapshot */ #ifdef WITH_LOGGER @@ -3196,7 +3195,7 @@ void engine_check_for_dumps(struct engine *e) { e->time = e->ti_next_stats * e->time_base + e->time_begin; /* Drift everyone */ - engine_drift_all(e); + engine_drift_all(e, /*drift_mpole=*/0); /* Dump stats */ engine_print_stats(e); @@ -3219,7 +3218,7 @@ void engine_check_for_dumps(struct engine *e) { } /* Drift everyone */ - engine_drift_all(e); + engine_drift_all(e, /*drift_mpole=*/0); /* Dump... */ #ifdef WITH_LOGGER @@ -3245,7 +3244,7 @@ void engine_check_for_dumps(struct engine *e) { } /* Drift everyone */ - engine_drift_all(e); + engine_drift_all(e, /*drift_mpole=*/0); /* Dump */ engine_print_stats(e); @@ -3273,7 +3272,7 @@ void engine_check_for_dumps(struct engine *e) { } /* Drift everyone */ - engine_drift_all(e); + engine_drift_all(e, /*drift_mpole=*/0); } velociraptor_init(e); @@ -3342,7 +3341,7 @@ void engine_dump_restarts(struct engine *e, int drifted_all, int force) { restart_remove_previous(e->restart_file); /* Drift all particles first (may have just been done). */ - if (!drifted_all) engine_drift_all(e); + if (!drifted_all) engine_drift_all(e, /*drift_mpole=*/1); restart_write(e, e->restart_file); if (e->verbose) @@ -3413,160 +3412,6 @@ void engine_unskip(struct engine *e) { clocks_getunit()); } -/** - * @brief Mapper function to drift *all* particle types and multipoles forward - * in time. - * - * @param map_data An array of #cell%s. - * @param num_elements Chunk size. - * @param extra_data Pointer to an #engine. - */ -void engine_do_drift_all_mapper(void *map_data, int num_elements, - void *extra_data) { - - const struct engine *e = (const struct engine *)extra_data; - const int restarting = e->restarting; - struct space *s = e->s; - struct cell *cells_top; - int *local_cells_with_tasks_top; - - if (restarting) { - - /* When restarting, we loop over all top-level cells */ - cells_top = (struct cell *)map_data; - local_cells_with_tasks_top = NULL; - - } else { - - /* In any other case, we use th list of local cells with tasks */ - cells_top = s->cells_top; - local_cells_with_tasks_top = (int *)map_data; - } - - for (int ind = 0; ind < num_elements; ind++) { - - struct cell *c; - - /* When restarting, the list of local cells with tasks does not - yet exist. We use the raw list of top-level cells instead */ - if (restarting) - c = &cells_top[ind]; - else - c = &cells_top[local_cells_with_tasks_top[ind]]; - - if (c->nodeID == e->nodeID) { - - /* Drift all the particles */ - cell_drift_part(c, e, 1); - - /* Drift all the g-particles */ - cell_drift_gpart(c, e, 1); - } - - /* Drift the multipoles */ - if (e->policy & engine_policy_self_gravity) { - cell_drift_all_multipoles(c, e); - } - } -} - -/** - * @brief Drift *all* particles and multipoles at all levels - * forward to the current time. - * - * @param e The #engine. - */ -void engine_drift_all(struct engine *e) { - - const ticks tic = getticks(); - -#ifdef SWIFT_DEBUG_CHECKS - if (e->nodeID == 0) { - if (e->policy & engine_policy_cosmology) - message("Drifting all to a=%e", - exp(e->ti_current * e->time_base) * e->cosmology->a_begin); - else - message("Drifting all to t=%e", - e->ti_current * e->time_base + e->time_begin); - } -#endif - - if (!e->restarting) { - - /* Normal case: We have a list of local cells with tasks to play with */ - threadpool_map(&e->threadpool, engine_do_drift_all_mapper, - e->s->local_cells_with_tasks_top, - e->s->nr_local_cells_with_tasks, sizeof(int), 0, e); - } else { - - /* When restarting, the list of local cells with tasks does not yet - exist. We use the raw list of top-level cells instead */ - threadpool_map(&e->threadpool, engine_do_drift_all_mapper, e->s->cells_top, - e->s->nr_cells, sizeof(struct cell), 0, e); - } - - /* Synchronize particle positions */ - space_synchronize_particle_positions(e->s); - -#ifdef SWIFT_DEBUG_CHECKS - /* Check that all cells have been drifted to the current time. */ - space_check_drift_point(e->s, e->ti_current, - e->policy & engine_policy_self_gravity); - part_verify_links(e->s->parts, e->s->gparts, e->s->sparts, e->s->nr_parts, - e->s->nr_gparts, e->s->nr_sparts, e->verbose); -#endif - - if (e->verbose) - message("took %.3f %s.", clocks_from_ticks(getticks() - tic), - clocks_getunit()); -} - -/** - * @brief Mapper function to drift *all* top-level multipoles forward in - * time. - * - * @param map_data An array of #cell%s. - * @param num_elements Chunk size. - * @param extra_data Pointer to an #engine. - */ -void engine_do_drift_top_multipoles_mapper(void *map_data, int num_elements, - void *extra_data) { - - struct engine *e = (struct engine *)extra_data; - struct cell *cells = (struct cell *)map_data; - - for (int ind = 0; ind < num_elements; ind++) { - struct cell *c = &cells[ind]; - if (c != NULL) { - - /* Drift the multipole at this level only */ - if (c->grav.ti_old_multipole != e->ti_current) cell_drift_multipole(c, e); - } - } -} - -/** - * @brief Drift *all* top-level multipoles forward to the current time. - * - * @param e The #engine. - */ -void engine_drift_top_multipoles(struct engine *e) { - - const ticks tic = getticks(); - - threadpool_map(&e->threadpool, engine_do_drift_top_multipoles_mapper, - e->s->cells_top, e->s->nr_cells, sizeof(struct cell), 0, e); - -#ifdef SWIFT_DEBUG_CHECKS - /* Check that all cells have been drifted to the current time. */ - space_check_top_multipoles_drift_point(e->s, e->ti_current); -#endif - - if (e->verbose) - message("took %.3f %s.", clocks_from_ticks(getticks() - tic), - clocks_getunit()); -} - void engine_do_reconstruct_multipoles_mapper(void *map_data, int num_elements, void *extra_data) { @@ -3642,13 +3487,20 @@ void engine_makeproxies(struct engine *e) { const int with_gravity = (e->policy & engine_policy_self_gravity); const double theta_crit_inv = e->gravity_properties->theta_crit_inv; const double theta_crit2 = e->gravity_properties->theta_crit2; - const double max_distance = e->mesh->r_cut_max; + const double max_mesh_dist = e->mesh->r_cut_max; + const double max_mesh_dist2 = max_mesh_dist * max_mesh_dist; + + /* Distance between centre of the cell and corners */ + const double r_diag2 = cell_width[0] * cell_width[0] + + cell_width[1] * cell_width[1] + + cell_width[2] * cell_width[2]; + const double r_diag = 0.5 * sqrt(r_diag2); - /* Maximal distance between CoMs and any particle in the cell */ - const double r_max2 = cell_width[0] * cell_width[0] + - cell_width[1] * cell_width[1] + - cell_width[2] * cell_width[2]; - const double r_max = sqrt(r_max2); + /* Maximal distance from a shifted CoM to centre of cell */ + const double delta_CoM = engine_max_proxy_centre_frac * r_diag; + + /* Maximal distance from shifted CoM to any corner */ + const double r_max = r_diag + 2. * delta_CoM; /* Prepare the proxies and the proxy index. */ if (e->proxy_ind == NULL) @@ -3658,20 +3510,20 @@ void engine_makeproxies(struct engine *e) { e->nr_proxies = 0; /* Compute how many cells away we need to walk */ - int delta = 1; /*hydro case */ + int delta_cells = 1; /*hydro case */ /* Gravity needs to take the opening angle into account */ if (with_gravity) { const double distance = 2. * r_max * theta_crit_inv; - delta = (int)(distance / cells[0].dmin) + 1; + delta_cells = (int)(distance / cells[0].dmin) + 1; } /* Turn this into upper and lower bounds for loops */ - int delta_m = delta; - int delta_p = delta; + int delta_m = delta_cells; + int delta_p = delta_cells; /* Special case where every cell is in range of every other one */ - if (delta >= cdim[0] / 2) { + if (delta_cells >= cdim[0] / 2) { if (cdim[0] % 2 == 0) { delta_m = cdim[0] / 2; delta_p = cdim[0] / 2 - 1; @@ -3686,46 +3538,35 @@ void engine_makeproxies(struct engine *e) { message( "Looking for proxies up to %d top-level cells away (delta_m=%d " "delta_p=%d)", - delta, delta_m, delta_p); + delta_cells, delta_m, delta_p); /* Loop over each cell in the space. */ - int ind[3]; - for (ind[0] = 0; ind[0] < cdim[0]; ind[0]++) { - for (ind[1] = 0; ind[1] < cdim[1]; ind[1]++) { - for (ind[2] = 0; ind[2] < cdim[2]; ind[2]++) { + for (int i = 0; i < cdim[0]; i++) { + for (int j = 0; j < cdim[1]; j++) { + for (int k = 0; k < cdim[2]; k++) { /* Get the cell ID. */ - const int cid = cell_getid(cdim, ind[0], ind[1], ind[2]); - - /* and it's location */ - const double loc_i[3] = {cells[cid].loc[0], cells[cid].loc[1], - cells[cid].loc[2]}; - - /* Loop over all its neighbours (periodic). */ - for (int i = -delta_m; i <= delta_p; i++) { - int ii = ind[0] + i; - if (ii >= cdim[0]) - ii -= cdim[0]; - else if (ii < 0) - ii += cdim[0]; - for (int j = -delta_m; j <= delta_p; j++) { - int jj = ind[1] + j; - if (jj >= cdim[1]) - jj -= cdim[1]; - else if (jj < 0) - jj += cdim[1]; - for (int k = -delta_m; k <= delta_p; k++) { - int kk = ind[2] + k; - if (kk >= cdim[2]) - kk -= cdim[2]; - else if (kk < 0) - kk += cdim[2]; + const int cid = cell_getid(cdim, i, j, k); + + /* Loop over all its neighbours neighbours in range. */ + for (int ii = -delta_m; ii <= delta_p; ii++) { + int iii = i + ii; + if (!periodic && (iii < 0 || iii >= cdim[0])) continue; + iii = (iii + cdim[0]) % cdim[0]; + for (int jj = -delta_m; jj <= delta_p; jj++) { + int jjj = j + jj; + if (!periodic && (jjj < 0 || jjj >= cdim[1])) continue; + jjj = (jjj + cdim[1]) % cdim[1]; + for (int kk = -delta_m; kk <= delta_p; kk++) { + int kkk = k + kk; + if (!periodic && (kkk < 0 || kkk >= cdim[2])) continue; + kkk = (kkk + cdim[2]) % cdim[2]; /* Get the cell ID. */ - const int cjd = cell_getid(cdim, ii, jj, kk); + const int cjd = cell_getid(cdim, iii, jjj, kkk); - /* Early abort (same cell) */ - if (cid == cjd) continue; + /* Early abort */ + if (cid >= cjd) continue; /* Early abort (both same node) */ if (cells[cid].nodeID == nodeID && cells[cjd].nodeID == nodeID) @@ -3745,15 +3586,12 @@ void engine_makeproxies(struct engine *e) { /* This is super-ugly but checks for direct neighbours */ /* with periodic BC */ - if (((abs(ind[0] - ii) <= 1 || - abs(ind[0] - ii - cdim[0]) <= 1 || - abs(ind[0] - ii + cdim[0]) <= 1) && - (abs(ind[1] - jj) <= 1 || - abs(ind[1] - jj - cdim[1]) <= 1 || - abs(ind[1] - jj + cdim[1]) <= 1) && - (abs(ind[2] - kk) <= 1 || - abs(ind[2] - kk - cdim[2]) <= 1 || - abs(ind[2] - kk + cdim[2]) <= 1))) + if (((abs(i - iii) <= 1 || abs(i - iii - cdim[0]) <= 1 || + abs(i - iii + cdim[0]) <= 1) && + (abs(j - jjj) <= 1 || abs(j - jjj - cdim[1]) <= 1 || + abs(j - jjj + cdim[1]) <= 1) && + (abs(k - kkk) <= 1 || abs(k - kkk - cdim[2]) <= 1 || + abs(k - kkk + cdim[2]) <= 1))) proxy_type |= (int)proxy_cell_type_hydro; } @@ -3767,44 +3605,28 @@ void engine_makeproxies(struct engine *e) { for an M2L interaction and hence require a proxy as this pair of cells cannot rely on just an M2L calculation. */ - const double loc_j[3] = {cells[cjd].loc[0], cells[cjd].loc[1], - cells[cjd].loc[2]}; + /* Minimal distance between any two points in the cells */ + const double min_dist_centres2 = cell_min_dist2_same_size( + &cells[cid], &cells[cjd], periodic, dim); - /* Start with the distance between the cell centres. */ - double dx = loc_i[0] - loc_j[0]; - double dy = loc_i[1] - loc_j[1]; - double dz = loc_i[2] - loc_j[2]; - - /* Apply BC */ - if (periodic) { - dx = nearest(dx, dim[0]); - dy = nearest(dy, dim[1]); - dz = nearest(dz, dim[2]); - } - - /* Add to it for the case where the future CoMs are in the - * corners */ - dx += cell_width[0]; - dy += cell_width[1]; - dz += cell_width[2]; - - /* This is a crazy upper-bound but the best we can do */ - const double r2 = dx * dx + dy * dy + dz * dz; - - /* Minimal distance between any pair of particles */ - const double min_radius = sqrt(r2) - 2. * r_max; + /* Let's now assume the CoMs will shift a bit */ + const double min_dist_CoM = + sqrt(min_dist_centres2) - 2. * delta_CoM; + const double min_dist_CoM2 = min_dist_CoM * min_dist_CoM; /* Are we beyond the distance where the truncated forces are 0 * but not too far such that M2L can be used? */ if (periodic) { - if ((min_radius < max_distance) && - (!gravity_M2L_accept(r_max, r_max, theta_crit2, r2))) + if ((min_dist_CoM2 < max_mesh_dist2) && + (!gravity_M2L_accept(r_max, r_max, theta_crit2, + min_dist_CoM2))) proxy_type |= (int)proxy_cell_type_gravity; } else { - if (!gravity_M2L_accept(r_max, r_max, theta_crit2, r2)) + if (!gravity_M2L_accept(r_max, r_max, theta_crit2, + min_dist_CoM2)) proxy_type |= (int)proxy_cell_type_gravity; } } @@ -3816,8 +3638,8 @@ void engine_makeproxies(struct engine *e) { if (cells[cid].nodeID == nodeID && cells[cjd].nodeID != nodeID) { /* Do we already have a relationship with this node? */ - int pid = e->proxy_ind[cells[cjd].nodeID]; - if (pid < 0) { + int proxy_id = e->proxy_ind[cells[cjd].nodeID]; + if (proxy_id < 0) { if (e->nr_proxies == engine_maxproxies) error("Maximum number of proxies exceeded."); @@ -3827,24 +3649,31 @@ void engine_makeproxies(struct engine *e) { /* Store the information */ e->proxy_ind[cells[cjd].nodeID] = e->nr_proxies; - pid = e->nr_proxies; + proxy_id = e->nr_proxies; e->nr_proxies += 1; + + /* Check the maximal proxy limit */ + if ((size_t)proxy_id > 8 * sizeof(long long)) + error( + "Created more than %zd proxies. cell.mpi.sendto will " + "overflow.", + 8 * sizeof(long long)); } /* Add the cell to the proxy */ - proxy_addcell_in(&proxies[pid], &cells[cjd], proxy_type); - proxy_addcell_out(&proxies[pid], &cells[cid], proxy_type); + proxy_addcell_in(&proxies[proxy_id], &cells[cjd], proxy_type); + proxy_addcell_out(&proxies[proxy_id], &cells[cid], proxy_type); /* Store info about where to send the cell */ - cells[cid].mpi.sendto |= (1ULL << pid); + cells[cid].mpi.sendto |= (1ULL << proxy_id); } /* Same for the symmetric case? */ if (cells[cjd].nodeID == nodeID && cells[cid].nodeID != nodeID) { /* Do we already have a relationship with this node? */ - int pid = e->proxy_ind[cells[cid].nodeID]; - if (pid < 0) { + int proxy_id = e->proxy_ind[cells[cid].nodeID]; + if (proxy_id < 0) { if (e->nr_proxies == engine_maxproxies) error("Maximum number of proxies exceeded."); @@ -3854,16 +3683,23 @@ void engine_makeproxies(struct engine *e) { /* Store the information */ e->proxy_ind[cells[cid].nodeID] = e->nr_proxies; - pid = e->nr_proxies; + proxy_id = e->nr_proxies; e->nr_proxies += 1; + + /* Check the maximal proxy limit */ + if ((size_t)proxy_id > 8 * sizeof(long long)) + error( + "Created more than %zd proxies. cell.mpi.sendto will " + "overflow.", + 8 * sizeof(long long)); } /* Add the cell to the proxy */ - proxy_addcell_in(&proxies[pid], &cells[cid], proxy_type); - proxy_addcell_out(&proxies[pid], &cells[cjd], proxy_type); + proxy_addcell_in(&proxies[proxy_id], &cells[cid], proxy_type); + proxy_addcell_out(&proxies[proxy_id], &cells[cjd], proxy_type); /* Store info about where to send the cell */ - cells[cjd].mpi.sendto |= (1ULL << pid); + cells[cjd].mpi.sendto |= (1ULL << proxy_id); } } } diff --git a/src/engine.h b/src/engine.h index 50eef4b314da4c294a86e59e29542859f7f402f2..eb73dc32d0dd885424335ad598ec93c866a6ccda 100644 --- a/src/engine.h +++ b/src/engine.h @@ -98,6 +98,7 @@ enum engine_step_properties { #define engine_maxproxies 64 #define engine_tasksreweight 1 #define engine_parts_size_grow 1.05 +#define engine_max_proxy_centre_frac 0.2 #define engine_redistribute_alloc_margin 1.2 #define engine_default_energy_file_name "energy" #define engine_default_timesteps_file_name "timesteps" @@ -399,7 +400,7 @@ void engine_compute_next_stf_time(struct engine *e); void engine_compute_next_statistics_time(struct engine *e); void engine_recompute_displacement_constraint(struct engine *e); void engine_unskip(struct engine *e); -void engine_drift_all(struct engine *e); +void engine_drift_all(struct engine *e, const int drift_mpoles); void engine_drift_top_multipoles(struct engine *e); void engine_reconstruct_multipoles(struct engine *e); void engine_print_stats(struct engine *e); diff --git a/src/engine_drift.c b/src/engine_drift.c new file mode 100644 index 0000000000000000000000000000000000000000..7a842068b57813575c33dd670172059abb1e8fc0 --- /dev/null +++ b/src/engine_drift.c @@ -0,0 +1,297 @@ +/******************************************************************************* + * This file is part of SWIFT. + * Copyright (c) 2012 Pedro Gonnet (pedro.gonnet@durham.ac.uk) + * Matthieu Schaller (matthieu.schaller@durham.ac.uk) + * 2015 Peter W. Draper (p.w.draper@durham.ac.uk) + * Angus Lepper (angus.lepper@ed.ac.uk) + * 2016 John A. Regan (john.a.regan@durham.ac.uk) + * Tom Theuns (tom.theuns@durham.ac.uk) + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + * + ******************************************************************************/ + +/* Config parameters. */ +#include "../config.h" + +/* This object's header. */ +#include "engine.h" + +/** + * @brief Mapper function to drift *all* the #part to the current time. + * + * @param map_data An array of #cell%s. + * @param num_elements Chunk size. + * @param extra_data Pointer to an #engine. + */ +void engine_do_drift_all_part_mapper(void *map_data, int num_elements, + void *extra_data) { + + const struct engine *e = (const struct engine *)extra_data; + const int restarting = e->restarting; + struct space *s = e->s; + struct cell *cells_top; + int *local_cells_top; + + if (restarting) { + + /* When restarting, we loop over all top-level cells */ + cells_top = (struct cell *)map_data; + local_cells_top = NULL; + + } else { + + /* In any other case, we use the list of local cells with tasks */ + cells_top = s->cells_top; + local_cells_top = (int *)map_data; + } + + for (int ind = 0; ind < num_elements; ind++) { + + struct cell *c; + + /* When restarting, the list of local cells does not + yet exist. We use the raw list of top-level cells instead */ + if (restarting) + c = &cells_top[ind]; + else + c = &cells_top[local_cells_top[ind]]; + + if (c->nodeID == e->nodeID) { + + /* Drift all the particles */ + cell_drift_part(c, e, /* force the drift=*/1); + } + } +} + +/** + * @brief Mapper function to drift *all* the #gpart to the current time. + * + * @param map_data An array of #cell%s. + * @param num_elements Chunk size. + * @param extra_data Pointer to an #engine. + */ +void engine_do_drift_all_gpart_mapper(void *map_data, int num_elements, + void *extra_data) { + + const struct engine *e = (const struct engine *)extra_data; + const int restarting = e->restarting; + struct space *s = e->s; + struct cell *cells_top; + int *local_cells_top; + + if (restarting) { + + /* When restarting, we loop over all top-level cells */ + cells_top = (struct cell *)map_data; + local_cells_top = NULL; + + } else { + + /* In any other case, we use the list of local cells with tasks */ + cells_top = s->cells_top; + local_cells_top = (int *)map_data; + } + + for (int ind = 0; ind < num_elements; ind++) { + + struct cell *c; + + /* When restarting, the list of local cells does not + yet exist. We use the raw list of top-level cells instead */ + if (restarting) + c = &cells_top[ind]; + else + c = &cells_top[local_cells_top[ind]]; + + if (c->nodeID == e->nodeID) { + + /* Drift all the particles */ + cell_drift_gpart(c, e, /* force the drift=*/1); + } + } +} + +/** + * @brief Mapper function to drift *all* the multipoles to the current time. + * + * @param map_data An array of #cell%s. + * @param num_elements Chunk size. + * @param extra_data Pointer to an #engine. + */ +void engine_do_drift_all_multipole_mapper(void *map_data, int num_elements, + void *extra_data) { + + const struct engine *e = (const struct engine *)extra_data; + const int restarting = e->restarting; + struct space *s = e->s; + struct cell *cells_top; + int *local_cells_with_tasks_top; + + if (restarting) { + + /* When restarting, we loop over all top-level cells */ + cells_top = (struct cell *)map_data; + local_cells_with_tasks_top = NULL; + + } else { + + /* In any other case, we use the list of local cells with tasks */ + cells_top = s->cells_top; + local_cells_with_tasks_top = (int *)map_data; + } + + for (int ind = 0; ind < num_elements; ind++) { + + struct cell *c; + + /* When restarting, the list of local cells does not + yet exist. We use the raw list of top-level cells instead */ + if (restarting) + c = &cells_top[ind]; + else + c = &cells_top[local_cells_with_tasks_top[ind]]; + + cell_drift_all_multipoles(c, e); + } +} + +/** + * @brief Drift *all* particles and multipoles at all levels + * forward to the current time. + * + * @param e The #engine. + * @param drift_mpoles Do we want to drift all the multipoles as well? + */ +void engine_drift_all(struct engine *e, const int drift_mpoles) { + + const ticks tic = getticks(); + +#ifdef SWIFT_DEBUG_CHECKS + if (e->nodeID == 0) { + if (e->policy & engine_policy_cosmology) + message("Drifting all to a=%e", + exp(e->ti_current * e->time_base) * e->cosmology->a_begin); + else + message("Drifting all to t=%e", + e->ti_current * e->time_base + e->time_begin); + } +#endif + + if (!e->restarting) { + + /* Normal case: We have a list of local cells with tasks to play with */ + + if (e->s->nr_parts > 0) { + threadpool_map(&e->threadpool, engine_do_drift_all_part_mapper, + e->s->local_cells_top, e->s->nr_local_cells, sizeof(int), + /* default chunk */ 0, e); + } + if (e->s->nr_gparts > 0) { + threadpool_map(&e->threadpool, engine_do_drift_all_gpart_mapper, + e->s->local_cells_top, e->s->nr_local_cells, sizeof(int), + /* default chunk */ 0, e); + } + if (drift_mpoles && (e->policy & engine_policy_self_gravity)) { + threadpool_map(&e->threadpool, engine_do_drift_all_multipole_mapper, + e->s->local_cells_with_tasks_top, + e->s->nr_local_cells_with_tasks, sizeof(int), + /* default chunk */ 0, e); + } + + } else { + + /* When restarting, the list of local cells with tasks does not yet + exist. We use the raw list of top-level cells instead */ + + if (e->s->nr_parts > 0) { + threadpool_map(&e->threadpool, engine_do_drift_all_part_mapper, + e->s->cells_top, e->s->nr_cells, sizeof(struct cell), + /* default chunk */ 0, e); + } + if (e->s->nr_gparts > 0) { + threadpool_map(&e->threadpool, engine_do_drift_all_gpart_mapper, + e->s->cells_top, e->s->nr_cells, sizeof(struct cell), + /* default chunk */ 0, e); + } + if (e->policy & engine_policy_self_gravity) { + threadpool_map(&e->threadpool, engine_do_drift_all_multipole_mapper, + e->s->cells_top, e->s->nr_cells, sizeof(struct cell), + /* default chunk */ 0, e); + } + } + + /* Synchronize particle positions */ + space_synchronize_particle_positions(e->s); + +#ifdef SWIFT_DEBUG_CHECKS + /* Check that all cells have been drifted to the current time. */ + space_check_drift_point( + e->s, e->ti_current, + drift_mpoles && (e->policy & engine_policy_self_gravity)); + part_verify_links(e->s->parts, e->s->gparts, e->s->sparts, e->s->nr_parts, + e->s->nr_gparts, e->s->nr_sparts, e->verbose); +#endif + + if (e->verbose) + message("took %.3f %s.", clocks_from_ticks(getticks() - tic), + clocks_getunit()); +} + +/** + * @brief Mapper function to drift *all* top-level multipoles forward in + * time. + * + * @param map_data An array of #cell%s. + * @param num_elements Chunk size. + * @param extra_data Pointer to an #engine. + */ +void engine_do_drift_top_multipoles_mapper(void *map_data, int num_elements, + void *extra_data) { + + struct engine *e = (struct engine *)extra_data; + struct cell *cells = (struct cell *)map_data; + + for (int ind = 0; ind < num_elements; ind++) { + struct cell *c = &cells[ind]; + if (c != NULL) { + + /* Drift the multipole at this level only */ + if (c->grav.ti_old_multipole != e->ti_current) cell_drift_multipole(c, e); + } + } +} + +/** + * @brief Drift *all* top-level multipoles forward to the current time. + * + * @param e The #engine. + */ +void engine_drift_top_multipoles(struct engine *e) { + + const ticks tic = getticks(); + + threadpool_map(&e->threadpool, engine_do_drift_top_multipoles_mapper, + e->s->cells_top, e->s->nr_cells, sizeof(struct cell), 0, e); + +#ifdef SWIFT_DEBUG_CHECKS + /* Check that all cells have been drifted to the current time. */ + space_check_top_multipoles_drift_point(e->s, e->ti_current); +#endif + + if (e->verbose) + message("took %.3f %s.", clocks_from_ticks(getticks() - tic), + clocks_getunit()); +} diff --git a/src/engine_maketasks.c b/src/engine_maketasks.c index 5e26f65b0c7e777bdc2a2bdad38ea45eb97f4306..57a1a7c188aa1c2021877213c51785fbfc842dd7 100644 --- a/src/engine_maketasks.c +++ b/src/engine_maketasks.c @@ -80,8 +80,8 @@ void engine_addtasks_send_gravity(struct engine *e, struct cell *ci, /* Create the tasks and their dependencies? */ if (t_grav == NULL) { - /* Create a tag for this cell. */ - if (ci->mpi.tag < 0) cell_tag(ci); + /* Make sure this cell is tagged. */ + cell_ensure_tagged(ci); t_grav = scheduler_addtask(s, task_type_send, task_subtype_gpart, ci->mpi.tag, 0, ci, cj); @@ -139,8 +139,8 @@ void engine_addtasks_send_hydro(struct engine *e, struct cell *ci, /* Create the tasks and their dependencies? */ if (t_xv == NULL) { - /* Create a tag for this cell. */ - if (ci->mpi.tag < 0) cell_tag(ci); + /* Make sure this cell is tagged. */ + cell_ensure_tagged(ci); t_xv = scheduler_addtask(s, task_type_send, task_subtype_xv, ci->mpi.tag, 0, ci, cj); @@ -238,8 +238,8 @@ void engine_addtasks_send_timestep(struct engine *e, struct cell *ci, /* Create the tasks and their dependencies? */ if (t_ti == NULL) { - /* Create a tag for this cell. */ - if (ci->mpi.tag < 0) cell_tag(ci); + /* Make sure this cell is tagged. */ + cell_ensure_tagged(ci); t_ti = scheduler_addtask(s, task_type_send, task_subtype_tend, ci->mpi.tag, 0, ci, cj); @@ -770,7 +770,7 @@ void engine_make_hierarchical_tasks_stars(struct engine *e, struct cell *c) { void engine_make_self_gravity_tasks_mapper(void *map_data, int num_elements, void *extra_data) { - struct engine *e = ((struct engine **)extra_data)[0]; + struct engine *e = (struct engine *)extra_data; struct space *s = e->s; struct scheduler *sched = &e->sched; const int nodeID = e->nodeID; @@ -780,6 +780,7 @@ void engine_make_self_gravity_tasks_mapper(void *map_data, int num_elements, struct cell *cells = s->cells_top; const double theta_crit = e->gravity_properties->theta_crit; const double max_distance = e->mesh->r_cut_max; + const double max_distance2 = max_distance * max_distance; /* Compute how many cells away we need to walk */ const double distance = 2.5 * cells[0].width[0] / theta_crit; @@ -815,91 +816,50 @@ void engine_make_self_gravity_tasks_mapper(void *map_data, int num_elements, /* Skip cells without gravity particles */ if (ci->grav.count == 0) continue; - /* Is that cell local ? */ - if (ci->nodeID != nodeID) continue; - - /* If the cells is local build a self-interaction */ - scheduler_addtask(sched, task_type_self, task_subtype_grav, 0, 0, ci, NULL); - - /* Recover the multipole information */ - const struct gravity_tensors *const multi_i = ci->grav.multipole; - const double CoM_i[3] = {multi_i->CoM[0], multi_i->CoM[1], multi_i->CoM[2]}; - -#ifdef SWIFT_DEBUG_CHECKS - if (cell_getid(cdim, i, j, k) != cid) - error("Incorrect calculation of indices (i,j,k)=(%d,%d,%d) cid=%d", i, j, - k, cid); - - if (multi_i->r_max != multi_i->r_max_rebuild) - error( - "Multipole size not equal ot it's size after rebuild. But we just " - "rebuilt..."); -#endif + /* If the cell is local build a self-interaction */ + if (ci->nodeID == nodeID) { + scheduler_addtask(sched, task_type_self, task_subtype_grav, 0, 0, ci, + NULL); + } /* Loop over every other cell within (Manhattan) range delta */ - for (int x = -delta_m; x <= delta_p; x++) { - int ii = i + x; - if (ii >= cdim[0]) - ii -= cdim[0]; - else if (ii < 0) - ii += cdim[0]; - for (int y = -delta_m; y <= delta_p; y++) { - int jj = j + y; - if (jj >= cdim[1]) - jj -= cdim[1]; - else if (jj < 0) - jj += cdim[1]; - for (int z = -delta_m; z <= delta_p; z++) { - int kk = k + z; - if (kk >= cdim[2]) - kk -= cdim[2]; - else if (kk < 0) - kk += cdim[2]; + for (int ii = -delta_m; ii <= delta_p; ii++) { + int iii = i + ii; + if (!periodic && (iii < 0 || iii >= cdim[0])) continue; + iii = (iii + cdim[0]) % cdim[0]; + for (int jj = -delta_m; jj <= delta_p; jj++) { + int jjj = j + jj; + if (!periodic && (jjj < 0 || jjj >= cdim[1])) continue; + jjj = (jjj + cdim[1]) % cdim[1]; + for (int kk = -delta_m; kk <= delta_p; kk++) { + int kkk = k + kk; + if (!periodic && (kkk < 0 || kkk >= cdim[2])) continue; + kkk = (kkk + cdim[2]) % cdim[2]; /* Get the cell */ - const int cjd = cell_getid(cdim, ii, jj, kk); + const int cjd = cell_getid(cdim, iii, jjj, kkk); struct cell *cj = &cells[cjd]; -#ifdef SWIFT_DEBUG_CHECKS - const int iii = cjd / (cdim[1] * cdim[2]); - const int jjj = (cjd / cdim[2]) % cdim[1]; - const int kkk = cjd % cdim[2]; - - if (ii != iii || jj != jjj || kk != kkk) - error( - "Incorrect calculation of indices (iii,jjj,kkk)=(%d,%d,%d) " - "cjd=%d", - iii, jjj, kkk, cjd); -#endif - - /* Avoid duplicates of local pairs*/ - if (cid <= cjd && cj->nodeID == nodeID) continue; - - /* Skip cells without gravity particles */ - if (cj->grav.count == 0) continue; + /* Avoid duplicates, empty cells and completely foreign pairs */ + if (cid >= cjd || cj->grav.count == 0 || + (ci->nodeID != nodeID && cj->nodeID != nodeID)) + continue; /* Recover the multipole information */ - const struct gravity_tensors *const multi_j = cj->grav.multipole; - - /* Get the distance between the CoMs */ - double dx = CoM_i[0] - multi_j->CoM[0]; - double dy = CoM_i[1] - multi_j->CoM[1]; - double dz = CoM_i[2] - multi_j->CoM[2]; - - /* Apply BC */ - if (periodic) { - dx = nearest(dx, dim[0]); - dy = nearest(dy, dim[1]); - dz = nearest(dz, dim[2]); - } - const double r2 = dx * dx + dy * dy + dz * dz; + const struct gravity_tensors *multi_i = ci->grav.multipole; + const struct gravity_tensors *multi_j = cj->grav.multipole; + + if (multi_i == NULL && ci->nodeID != nodeID) + error("Multipole of ci was not exchanged properly via the proxies"); + if (multi_j == NULL && cj->nodeID != nodeID) + error("Multipole of cj was not exchanged properly via the proxies"); /* Minimal distance between any pair of particles */ - const double min_radius = - sqrt(r2) - (multi_i->r_max + multi_j->r_max); + const double min_radius2 = + cell_min_dist2_same_size(ci, cj, periodic, dim); /* Are we beyond the distance where the truncated forces are 0 ?*/ - if (periodic && min_radius > max_distance) continue; + if (periodic && min_radius2 > max_distance2) continue; /* Are the cells too close for a MM interaction ? */ if (!cell_can_use_pair_mm_rebuild(ci, cj, e, s)) { @@ -907,6 +867,54 @@ void engine_make_self_gravity_tasks_mapper(void *map_data, int num_elements, /* Ok, we need to add a direct pair calculation */ scheduler_addtask(sched, task_type_pair, task_subtype_grav, 0, 0, ci, cj); + +#ifdef SWIFT_DEBUG_CHECKS +#ifdef WITH_MPI + + /* Let's cross-check that we had a proxy for that cell */ + if (ci->nodeID == nodeID && cj->nodeID != engine_rank) { + + /* Find the proxy for this node */ + const int proxy_id = e->proxy_ind[cj->nodeID]; + if (proxy_id < 0) + error("No proxy exists for that foreign node %d!", cj->nodeID); + + const struct proxy *p = &e->proxies[proxy_id]; + + /* Check whether the cell exists in the proxy */ + int n = 0; + for (; n < p->nr_cells_in; n++) + if (p->cells_in[n] == cj) { + break; + } + if (n == p->nr_cells_in) + error( + "Cell %d not found in the proxy but trying to construct " + "grav task!", + cjd); + } else if (cj->nodeID == nodeID && ci->nodeID != engine_rank) { + + /* Find the proxy for this node */ + const int proxy_id = e->proxy_ind[ci->nodeID]; + if (proxy_id < 0) + error("No proxy exists for that foreign node %d!", ci->nodeID); + + const struct proxy *p = &e->proxies[proxy_id]; + + /* Check whether the cell exists in the proxy */ + int n = 0; + for (; n < p->nr_cells_in; n++) + if (p->cells_in[n] == ci) { + break; + } + if (n == p->nr_cells_in) + error( + "Cell %d not found in the proxy but trying to construct " + "grav task!", + cid); + } +#endif /* WITH_MPI */ +#endif /* SWIFT_DEBUG_CHECKS */ } } } @@ -936,26 +944,6 @@ void engine_make_hierarchical_tasks_mapper(void *map_data, int num_elements, } } -/** - * @brief Constructs the top-level tasks for the short-range gravity - * interactions (master function). - * - * - Create the FFT task and the array of gravity ghosts. - * - Call the mapper function to create the other tasks. - * - * @param e The #engine. - */ -void engine_make_self_gravity_tasks(struct engine *e) { - - struct space *s = e->s; - struct task **ghosts = NULL; - - /* Create the multipole self and pair tasks. */ - void *extra_data[2] = {e, ghosts}; - threadpool_map(&e->threadpool, engine_make_self_gravity_tasks_mapper, NULL, - s->nr_cells, 1, 0, extra_data); -} - /** * @brief Constructs the top-level tasks for the external gravity. * @@ -1796,6 +1784,8 @@ void engine_make_hydroloop_tasks_mapper(void *map_data, int num_elements, /* Get the cell index. */ const int cid = (size_t)(map_data) + ind; + + /* Integer indices of the cell in the top-level grid */ const int i = cid / (cdim[1] * cdim[2]); const int j = (cid / cdim[2]) % cdim[1]; const int k = cid % cdim[2]; @@ -1806,10 +1796,11 @@ void engine_make_hydroloop_tasks_mapper(void *map_data, int num_elements, /* Skip cells without hydro particles */ if (ci->hydro.count == 0) continue; - /* If the cells is local build a self-interaction */ - if (ci->nodeID == nodeID) + /* If the cell is local build a self-interaction */ + if (ci->nodeID == nodeID) { scheduler_addtask(sched, task_type_self, task_subtype_density, 0, 0, ci, NULL); + } /* Now loop over all the neighbours of this cell */ for (int ii = -1; ii < 2; ii++) { @@ -1838,12 +1829,113 @@ void engine_make_hydroloop_tasks_mapper(void *map_data, int num_elements, const int sid = sortlistID[(kk + 1) + 3 * ((jj + 1) + 3 * (ii + 1))]; scheduler_addtask(sched, task_type_pair, task_subtype_density, sid, 0, ci, cj); + +#ifdef SWIFT_DEBUG_CHECKS +#ifdef WITH_MPI + + /* Let's cross-check that we had a proxy for that cell */ + if (ci->nodeID == nodeID && cj->nodeID != engine_rank) { + + /* Find the proxy for this node */ + const int proxy_id = e->proxy_ind[cj->nodeID]; + if (proxy_id < 0) + error("No proxy exists for that foreign node %d!", cj->nodeID); + + const struct proxy *p = &e->proxies[proxy_id]; + + /* Check whether the cell exists in the proxy */ + int n = 0; + for (n = 0; n < p->nr_cells_in; n++) + if (p->cells_in[n] == cj) break; + if (n == p->nr_cells_in) + error( + "Cell %d not found in the proxy but trying to construct " + "hydro task!", + cjd); + } else if (cj->nodeID == nodeID && ci->nodeID != engine_rank) { + + /* Find the proxy for this node */ + const int proxy_id = e->proxy_ind[ci->nodeID]; + if (proxy_id < 0) + error("No proxy exists for that foreign node %d!", ci->nodeID); + + const struct proxy *p = &e->proxies[proxy_id]; + + /* Check whether the cell exists in the proxy */ + int n = 0; + for (n = 0; n < p->nr_cells_in; n++) + if (p->cells_in[n] == ci) break; + if (n == p->nr_cells_in) + error( + "Cell %d not found in the proxy but trying to construct " + "hydro task!", + cid); + } +#endif /* WITH_MPI */ +#endif /* SWIFT_DEBUG_CHECKS */ } } } } } +struct cell_type_pair { + struct cell *ci, *cj; + int type; +}; + +void engine_addtasks_send_mapper(void *map_data, int num_elements, + void *extra_data) { + struct engine *e = (struct engine *)extra_data; + struct cell_type_pair *cell_type_pairs = (struct cell_type_pair *)map_data; + + for (int k = 0; k < num_elements; k++) { + struct cell *ci = cell_type_pairs[k].ci; + struct cell *cj = cell_type_pairs[k].cj; + const int type = cell_type_pairs[k].type; + + /* Add the send task for the particle timesteps. */ + engine_addtasks_send_timestep(e, ci, cj, NULL); + + /* Add the send tasks for the cells in the proxy that have a hydro + * connection. */ + if ((e->policy & engine_policy_hydro) && (type & proxy_cell_type_hydro)) + engine_addtasks_send_hydro(e, ci, cj, /*t_xv=*/NULL, + /*t_rho=*/NULL, /*t_gradient=*/NULL); + + /* Add the send tasks for the cells in the proxy that have a gravity + * connection. */ + if ((e->policy & engine_policy_self_gravity) && + (type & proxy_cell_type_gravity)) + engine_addtasks_send_gravity(e, ci, cj, NULL); + } +} + +void engine_addtasks_recv_mapper(void *map_data, int num_elements, + void *extra_data) { + struct engine *e = (struct engine *)extra_data; + struct cell_type_pair *cell_type_pairs = (struct cell_type_pair *)map_data; + + for (int k = 0; k < num_elements; k++) { + struct cell *ci = cell_type_pairs[k].ci; + const int type = cell_type_pairs[k].type; + + /* Add the recv task for the particle timesteps. */ + engine_addtasks_recv_timestep(e, ci, NULL); + + /* Add the recv tasks for the cells in the proxy that have a hydro + * connection. */ + if ((e->policy & engine_policy_hydro) && (type & proxy_cell_type_hydro)) + engine_addtasks_recv_hydro(e, ci, NULL, NULL, NULL); + + /* Add the recv tasks for the cells in the proxy that have a gravity + * connection. */ + if ((e->policy & engine_policy_self_gravity) && + (type & proxy_cell_type_gravity)) + engine_addtasks_recv_gravity(e, ci, NULL); + } +} + /** * @brief Fill the #space's task list. * @@ -1880,13 +1972,16 @@ void engine_maketasks(struct engine *e) { } if (e->verbose) - message("Making stars tasks took %.3f %s.", + message("Making stellar feedback tasks took %.3f %s.", clocks_from_ticks(getticks() - tic2), clocks_getunit()); tic2 = getticks(); /* Add the self gravity tasks. */ - if (e->policy & engine_policy_self_gravity) engine_make_self_gravity_tasks(e); + if (e->policy & engine_policy_self_gravity) { + threadpool_map(&e->threadpool, engine_make_self_gravity_tasks_mapper, NULL, + s->nr_cells, 1, 0, e); + } if (e->verbose) message("Making gravity tasks took %.3f %s.", @@ -2012,7 +2107,7 @@ void engine_maketasks(struct engine *e) { sched->nr_tasks, sizeof(struct task), 0, e); if (e->verbose) - message("Linking stars tasks took %.3f %s (including reweight).", + message("Linking stars tasks took %.3f %s.", clocks_from_ticks(getticks() - tic2), clocks_getunit()); #ifdef WITH_MPI @@ -2026,32 +2121,34 @@ void engine_maketasks(struct engine *e) { /* Loop over the proxies and add the send tasks, which also generates the * cell tags for super-cells. */ + int max_num_send_cells = 0; + for (int pid = 0; pid < e->nr_proxies; pid++) + max_num_send_cells += e->proxies[pid].nr_cells_out; + struct cell_type_pair *send_cell_type_pairs = NULL; + if ((send_cell_type_pairs = (struct cell_type_pair *)malloc( + sizeof(struct cell_type_pair) * max_num_send_cells)) == NULL) + error("Failed to allocate temporary cell pointer list."); + int num_send_cells = 0; + for (int pid = 0; pid < e->nr_proxies; pid++) { /* Get a handle on the proxy. */ struct proxy *p = &e->proxies[pid]; - for (int k = 0; k < p->nr_cells_out; k++) - engine_addtasks_send_timestep(e, p->cells_out[k], p->cells_in[0], NULL); - - /* Loop through the proxy's outgoing cells and add the - send tasks for the cells in the proxy that have a hydro connection. */ - if (e->policy & engine_policy_hydro) - for (int k = 0; k < p->nr_cells_out; k++) - if (p->cells_out_type[k] & proxy_cell_type_hydro) - engine_addtasks_send_hydro(e, p->cells_out[k], p->cells_in[0], NULL, - NULL, NULL); - - /* Loop through the proxy's outgoing cells and add the - send tasks for the cells in the proxy that have a gravity connection. - */ - if (e->policy & engine_policy_self_gravity) - for (int k = 0; k < p->nr_cells_out; k++) - if (p->cells_out_type[k] & proxy_cell_type_gravity) - engine_addtasks_send_gravity(e, p->cells_out[k], p->cells_in[0], - NULL); + for (int k = 0; k < p->nr_cells_out; k++) { + send_cell_type_pairs[num_send_cells].ci = p->cells_out[k]; + send_cell_type_pairs[num_send_cells].cj = p->cells_in[0]; + send_cell_type_pairs[num_send_cells++].type = p->cells_out_type[k]; + } } + threadpool_map(&e->threadpool, engine_addtasks_send_mapper, + send_cell_type_pairs, num_send_cells, + sizeof(struct cell_type_pair), + /*chunk=*/0, e); + + free(send_cell_type_pairs); + if (e->verbose) message("Creating send tasks took %.3f %s.", clocks_from_ticks(getticks() - tic2), clocks_getunit()); @@ -2069,29 +2166,28 @@ void engine_maketasks(struct engine *e) { /* Loop over the proxies and add the recv tasks, which relies on having the * cell tags. */ + int max_num_recv_cells = 0; + for (int pid = 0; pid < e->nr_proxies; pid++) + max_num_recv_cells += e->proxies[pid].nr_cells_in; + struct cell_type_pair *recv_cell_type_pairs = NULL; + if ((recv_cell_type_pairs = (struct cell_type_pair *)malloc( + sizeof(struct cell_type_pair) * max_num_recv_cells)) == NULL) + error("Failed to allocate temporary cell pointer list."); + int num_recv_cells = 0; for (int pid = 0; pid < e->nr_proxies; pid++) { /* Get a handle on the proxy. */ struct proxy *p = &e->proxies[pid]; - - for (int k = 0; k < p->nr_cells_in; k++) - engine_addtasks_recv_timestep(e, p->cells_in[k], NULL); - - /* Loop through the proxy's incoming cells and add the - recv tasks for the cells in the proxy that have a hydro connection. */ - if (e->policy & engine_policy_hydro) - for (int k = 0; k < p->nr_cells_in; k++) - if (p->cells_in_type[k] & proxy_cell_type_hydro) - engine_addtasks_recv_hydro(e, p->cells_in[k], NULL, NULL, NULL); - - /* Loop through the proxy's incoming cells and add the - recv tasks for the cells in the proxy that have a gravity connection. - */ - if (e->policy & engine_policy_self_gravity) - for (int k = 0; k < p->nr_cells_in; k++) - if (p->cells_in_type[k] & proxy_cell_type_gravity) - engine_addtasks_recv_gravity(e, p->cells_in[k], NULL); + for (int k = 0; k < p->nr_cells_in; k++) { + recv_cell_type_pairs[num_recv_cells].ci = p->cells_in[k]; + recv_cell_type_pairs[num_recv_cells++].type = p->cells_in_type[k]; + } } + threadpool_map(&e->threadpool, engine_addtasks_recv_mapper, + recv_cell_type_pairs, num_recv_cells, + sizeof(struct cell_type_pair), + /*chunk=*/0, e); + free(recv_cell_type_pairs); if (e->verbose) message("Creating recv tasks took %.3f %s.", diff --git a/src/lock.h b/src/lock.h index b2dd2eac9d0ca5d7807907e31cf3fa31894f9aed..39601b0c52e414dad1a507b406c54640a254df30 100644 --- a/src/lock.h +++ b/src/lock.h @@ -34,6 +34,7 @@ #define lock_trylock(l) (pthread_spin_lock(l) != 0) #define lock_unlock(l) (pthread_spin_unlock(l) != 0) #define lock_unlock_blind(l) pthread_spin_unlock(l) +#define lock_static_initializer ((pthread_spinlock_t)0) #elif defined(PTHREAD_LOCK) #include <pthread.h> @@ -44,6 +45,7 @@ #define lock_trylock(l) (pthread_mutex_trylock(l) != 0) #define lock_unlock(l) (pthread_mutex_unlock(l) != 0) #define lock_unlock_blind(l) pthread_mutex_unlock(l) +#define lock_static_initializer PTHREAD_MUTEX_INITIALIZER #else #define swift_lock_type volatile int @@ -52,12 +54,12 @@ INLINE static int lock_lock(volatile int *l) { while (atomic_cas(l, 0, 1) != 0) ; - // while( *l ); return 0; } #define lock_trylock(l) ((*(l)) ? 1 : atomic_cas(l, 0, 1)) #define lock_unlock(l) (atomic_cas(l, 1, 0) != 1) #define lock_unlock_blind(l) atomic_cas(l, 1, 0) +#define lock_static_initializer 0 #endif #endif /* SWIFT_LOCK_H */ diff --git a/src/minmax.h b/src/minmax.h index 90dd87968a94d9601a87fd3b826000c166a98966..e4d7c8788ea1e43d1c296a212193049a94347949 100644 --- a/src/minmax.h +++ b/src/minmax.h @@ -71,4 +71,36 @@ max(_temp, _z); \ }) +/** + * @brief Minimum of four numbers + * + * This macro evaluates its arguments exactly once. + */ +#define min4(x, y, z, w) \ + ({ \ + const __typeof__(x) _x = (x); \ + const __typeof__(y) _y = (y); \ + const __typeof__(z) _z = (z); \ + const __typeof__(w) _w = (w); \ + const __typeof__(x) _temp1 = min(_x, _y); \ + const __typeof__(x) _temp2 = min(_z, _w); \ + min(_temp1, _temp2); \ + }) + +/** + * @brief Maximum of four numbers + * + * This macro evaluates its arguments exactly once. + */ +#define max4(x, y, z, w) \ + ({ \ + const __typeof__(x) _x = (x); \ + const __typeof__(y) _y = (y); \ + const __typeof__(z) _z = (z); \ + const __typeof__(w) _w = (w); \ + const __typeof__(x) _temp1 = max(_x, _y); \ + const __typeof__(x) _temp2 = max(_z, _w); \ + max(_temp1, _temp2); \ + }) + #endif /* SWIFT_MINMAX_H */ diff --git a/src/partition.c b/src/partition.c index ba29b645af4dc41369b2f01ccd48252af673a75b..9cc99a9c571057dda96f85a7ec6b14835b1f3b61 100644 --- a/src/partition.c +++ b/src/partition.c @@ -1793,11 +1793,11 @@ static int check_complete(struct space *s, int verbose, int nregions) { * @brief Check that the threadpool version of the weights construction is * correct by comparing to the old serial code. * - * @tasks the list of tasks - * @nr_tasks number of tasks - * @mydata additional values as passed to threadpool - * @ref_weights_v vertex weights to check - * @ref_weights_e edge weights to check + * @param tasks the list of tasks + * @param nr_tasks number of tasks + * @param mydata additional values as passed to threadpool + * @param ref_weights_v vertex weights to check + * @param ref_weights_e edge weights to check */ static void check_weights(struct task *tasks, int nr_tasks, struct weights_mapper_data *mydata, diff --git a/src/proxy.c b/src/proxy.c index 325ed78644b07a497374e40bfc8518edcb018593..4a67b4b3584c43b2df63f17303eba9ec5e742cb0 100644 --- a/src/proxy.c +++ b/src/proxy.c @@ -269,6 +269,93 @@ void proxy_cells_exchange_second(struct proxy *p) { #endif } +#ifdef WITH_MPI + +void proxy_cells_count_mapper(void *map_data, int num_elements, + void *extra_data) { + struct cell *cells = (struct cell *)map_data; + + for (int k = 0; k < num_elements; k++) { + if (cells[k].mpi.sendto) cells[k].mpi.pcell_size = cell_getsize(&cells[k]); + } +} + +struct pack_mapper_data { + struct space *s; + int *offset; + struct pcell *pcells; + int with_gravity; +}; + +void proxy_cells_pack_mapper(void *map_data, int num_elements, + void *extra_data) { + struct cell *cells = (struct cell *)map_data; + struct pack_mapper_data *data = (struct pack_mapper_data *)extra_data; + + for (int k = 0; k < num_elements; k++) { + if (cells[k].mpi.sendto) { + ptrdiff_t ind = &cells[k] - data->s->cells_top; + cells[k].mpi.pcell = &data->pcells[data->offset[ind]]; + cell_pack(&cells[k], cells[k].mpi.pcell, data->with_gravity); + } + } +} + +void proxy_cells_exchange_first_mapper(void *map_data, int num_elements, + void *extra_data) { + struct proxy *proxies = (struct proxy *)map_data; + + for (int k = 0; k < num_elements; k++) { + proxy_cells_exchange_first(&proxies[k]); + } +} + +struct wait_and_unpack_mapper_data { + struct space *s; + int num_proxies; + MPI_Request *reqs_in; + struct proxy *proxies; + int with_gravity; + swift_lock_type lock; +}; + +void proxy_cells_wait_and_unpack_mapper(void *unused_map_data, int num_elements, + void *extra_data) { + + // MATTHIEU: This is currently unused. Scalar (non-threadpool) version is + // faster but we still need to explore why this happens. + + struct wait_and_unpack_mapper_data *data = + (struct wait_and_unpack_mapper_data *)extra_data; + + for (int k = 0; k < num_elements; k++) { + int pid = MPI_UNDEFINED; + MPI_Status status; + int res; + + /* We need a lock to prevent concurrent calls to MPI_Waitany on + the same array of requests since this is not supported in the MPI + standard (v3.1). This is not really a problem since the threads + would block inside MPI_Waitany anyway. */ + lock_lock(&data->lock); + if ((res = MPI_Waitany(data->num_proxies, data->reqs_in, &pid, &status)) != + MPI_SUCCESS || + pid == MPI_UNDEFINED) + mpi_error(res, "MPI_Waitany failed."); + if (lock_unlock(&data->lock) != 0) { + error("Failed to release lock."); + } + + // message( "cell data from proxy %i has arrived." , pid ); + for (int count = 0, j = 0; j < data->proxies[pid].nr_cells_in; j++) + count += cell_unpack(&data->proxies[pid].pcells_in[count], + data->proxies[pid].cells_in[j], data->s, + data->with_gravity); + } +} + +#endif // WITH_MPI + /** * @brief Exchange the cell structures with all proxies. * @@ -294,13 +381,14 @@ void proxy_cells_exchange(struct proxy *proxies, int num_proxies, /* Run through the cells and get the size of the ones that will be sent off. */ + threadpool_map(&s->e->threadpool, proxy_cells_count_mapper, s->cells_top, + s->nr_cells, sizeof(struct cell), /*chunk=*/0, + /*extra_data=*/NULL); int count_out = 0; int offset[s->nr_cells]; for (int k = 0; k < s->nr_cells; k++) { offset[k] = count_out; - if (s->cells_top[k].mpi.sendto) - count_out += - (s->cells_top[k].mpi.pcell_size = cell_getsize(&s->cells_top[k])); + if (s->cells_top[k].mpi.sendto) count_out += s->cells_top[k].mpi.pcell_size; } if (s->e->verbose) @@ -316,19 +404,19 @@ void proxy_cells_exchange(struct proxy *proxies, int num_proxies, tic2 = getticks(); /* Pack the cells. */ - for (int k = 0; k < s->nr_cells; k++) - if (s->cells_top[k].mpi.sendto) { - cell_pack(&s->cells_top[k], &pcells[offset[k]], with_gravity); - s->cells_top[k].mpi.pcell = &pcells[offset[k]]; - } + struct pack_mapper_data data = {s, offset, pcells, with_gravity}; + threadpool_map(&s->e->threadpool, proxy_cells_pack_mapper, s->cells_top, + s->nr_cells, sizeof(struct cell), /*chunk=*/0, &data); if (s->e->verbose) message("Packing cells took %.3f %s.", clocks_from_ticks(getticks() - tic2), clocks_getunit()); /* Launch the first part of the exchange. */ + threadpool_map(&s->e->threadpool, proxy_cells_exchange_first_mapper, proxies, + num_proxies, sizeof(struct proxy), /*chunk=*/0, + /*extra_data=*/NULL); for (int k = 0; k < num_proxies; k++) { - proxy_cells_exchange_first(&proxies[k]); reqs_in[k] = proxies[k].req_cells_count_in; reqs_out[k] = proxies[k].req_cells_count_out; } diff --git a/src/runner.c b/src/runner.c index 23afc50a589b064a26947d1df6c561eb79dd8344..07f360520ac3036a1232ab309ef7acfefedebdb8 100644 --- a/src/runner.c +++ b/src/runner.c @@ -337,7 +337,7 @@ void runner_do_stars_ghost(struct runner *r, struct cell *c, int timer) { free(sid); } - if (timer) TIMER_TOC(timer_do_stars_ghost); + if (timer) TIMER_TOC(timer_dostars_ghost); } /** @@ -611,16 +611,16 @@ void runner_do_sort_ascending(struct entry *sort, int N) { } } +#ifdef SWIFT_DEBUG_CHECKS +#define RUNNER_CHECK_SORTS(TYPE) /** * @brief Recursively checks that the flags are consistent in a cell hierarchy. * - * Debugging function. + * Debugging function. Exists in two flavours: hydro & stars. * * @param c The #cell to check. * @param flags The sorting flags to check. - */ -#ifdef SWIFT_DEBUG_CHECKS -#define RUNNER_CHECK_SORTS(TYPE) \ + */ \ void runner_check_sorts_##TYPE(struct cell *c, int flags) { \ \ if (flags & ~c->TYPE.sorted) error("Inconsistent sort flags (downward)!"); \ @@ -709,9 +709,9 @@ void runner_do_hydro_sort(struct runner *r, struct cell *c, int flags, if (c->progeny[k] != NULL && c->progeny[k]->hydro.count > 0) { /* Only propagate cleanup if the progeny is stale. */ runner_do_hydro_sort(r, c->progeny[k], flags, - cleanup && (c->progeny[k]->hydro.dx_max_sort > - space_maxreldx * c->progeny[k]->dmin), - 0); + cleanup && (c->progeny[k]->hydro.dx_max_sort_old > + space_maxreldx * c->progeny[k]->dmin), + 0); dx_max_sort = max(dx_max_sort, c->progeny[k]->hydro.dx_max_sort); dx_max_sort_old = max(dx_max_sort_old, c->progeny[k]->hydro.dx_max_sort_old); diff --git a/src/runner_doiact_grav.h b/src/runner_doiact_grav.h index 5bcd57d643911703bc0f4e19f17fdb63a11a5c12..2ed2495154195d09af2825eab4c277150b73ec01 100644 --- a/src/runner_doiact_grav.h +++ b/src/runner_doiact_grav.h @@ -1714,7 +1714,7 @@ static INLINE void runner_do_grav_long_range(struct runner *r, struct cell *ci, const int periodic = e->mesh->periodic; const double dim[3] = {e->mesh->dim[0], e->mesh->dim[1], e->mesh->dim[2]}; const double theta_crit2 = e->gravity_properties->theta_crit2; - const double max_distance = e->mesh->r_cut_max; + const double max_distance2 = e->mesh->r_cut_max * e->mesh->r_cut_max; TIMER_TIC; @@ -1759,24 +1759,11 @@ static INLINE void runner_do_grav_long_range(struct runner *r, struct cell *ci, /* Skip empty cells */ if (multi_j->m_pole.M_000 == 0.f) continue; - /* Get the distance between the CoMs at the last rebuild*/ - double dx_r = CoM_rebuild_top[0] - multi_j->CoM_rebuild[0]; - double dy_r = CoM_rebuild_top[1] - multi_j->CoM_rebuild[1]; - double dz_r = CoM_rebuild_top[2] - multi_j->CoM_rebuild[2]; - - /* Apply BC */ - if (periodic) { - dx_r = nearest(dx_r, dim[0]); - dy_r = nearest(dy_r, dim[1]); - dz_r = nearest(dz_r, dim[2]); - } - const double r2_rebuild = dx_r * dx_r + dy_r * dy_r + dz_r * dz_r; - - const double max_radius = - sqrt(r2_rebuild) - (multi_top->r_max_rebuild + multi_j->r_max_rebuild); + /* Minimal distance between any pair of particles */ + const double min_radius2 = cell_min_dist2_same_size(ci, cj, periodic, dim); /* Are we beyond the distance where the truncated forces are 0 ?*/ - if (periodic && max_radius > max_distance) { + if (periodic && min_radius2 > max_distance2) { #ifdef SWIFT_DEBUG_CHECKS /* Need to account for the interactions we missed */ @@ -1790,6 +1777,19 @@ static INLINE void runner_do_grav_long_range(struct runner *r, struct cell *ci, continue; } + /* Get the distance between the CoMs at the last rebuild*/ + double dx_r = CoM_rebuild_top[0] - multi_j->CoM_rebuild[0]; + double dy_r = CoM_rebuild_top[1] - multi_j->CoM_rebuild[1]; + double dz_r = CoM_rebuild_top[2] - multi_j->CoM_rebuild[2]; + + /* Apply BC */ + if (periodic) { + dx_r = nearest(dx_r, dim[0]); + dy_r = nearest(dy_r, dim[1]); + dz_r = nearest(dz_r, dim[2]); + } + const double r2_rebuild = dx_r * dx_r + dy_r * dy_r + dz_r * dz_r; + /* Are we in charge of this cell pair? */ if (gravity_M2L_accept(multi_top->r_max_rebuild, multi_j->r_max_rebuild, theta_crit2, r2_rebuild)) { diff --git a/src/runner_doiact_stars.h b/src/runner_doiact_stars.h index b89c1a50409055b6cd1be4b8e560448bd30fc69c..3a920ab553f3e604619dfc4b6efe506e83e9ef52 100644 --- a/src/runner_doiact_stars.h +++ b/src/runner_doiact_stars.h @@ -247,7 +247,7 @@ void runner_dopair_subset_stars_density(struct runner *r, } /* loop over the parts in cj. */ } /* loop over the parts in ci. */ - TIMER_TOC(timer_dopair_subset_naive); + TIMER_TOC(timer_dopair_subset_stars_density); } /** @@ -948,7 +948,7 @@ void runner_dosub_subset_stars_density(struct runner *r, struct cell *ci, } /* otherwise, pair interaction. */ - if (gettimer) TIMER_TOC(timer_dosub_subset); + if (gettimer) TIMER_TOC(timer_dosub_subset_pair_stars_density); } /** @@ -1426,7 +1426,7 @@ void runner_dosub_pair_stars_density(struct runner *r, struct cell *ci, if (do_ci || do_cj) runner_dopair_branch_stars_density(r, ci, cj); } - if (gettimer) TIMER_TOC(TIMER_DOSUB_PAIR); + if (gettimer) TIMER_TOC(timer_dosub_pair_stars_density); } /** diff --git a/src/scheduler.c b/src/scheduler.c index f0f6ac6cb34ccbaf31069112b4a7f1955ba2c5ae..2ae8f6785434af021b52dd2d6586b4e2dc5d68bb 100644 --- a/src/scheduler.c +++ b/src/scheduler.c @@ -489,6 +489,12 @@ static void scheduler_splittask_hydro(struct task *t, struct scheduler *s) { double shift[3]; const int sid = space_getsid(s->space, &ci, &cj, shift); +#ifdef SWIFT_DEBUG_CHECKS + if (sid != t->flags) + error("Got pair task with incorrect flags: sid=%d flags=%lld", sid, + t->flags); +#endif + /* Should this task be split-up? */ if (cell_can_split_pair_hydro_task(ci) && cell_can_split_pair_hydro_task(cj)) { diff --git a/src/timers.h b/src/timers.h index 1b11d6e0884b988958e7c491878b317887c745ad..48ca1e2763302a24356d15ceeba7ed982a9f169e 100644 --- a/src/timers.h +++ b/src/timers.h @@ -93,8 +93,10 @@ enum { timer_dostars_ghost, timer_doself_subset_stars_density, timer_dopair_subset_stars_density, - timer_dosubpair_stars_density, + timer_dosub_pair_stars_density, timer_dosub_self_stars_density, + timer_dosub_subset_pair_stars_density, + timer_dosub_subset_self_stars_density, timer_logger, timer_do_stars_sort, timer_count, diff --git a/src/tools.c b/src/tools.c index ff33afbca3fdc97ede61ff09998d8dc27bf0154b..c0400aa7b42322fce276a5e788af7bcb9e7f3625 100644 --- a/src/tools.c +++ b/src/tools.c @@ -714,7 +714,7 @@ int compare_values(double a, double b, double threshold, double *absDiff, * * @return 1 if difference found, 0 otherwise */ -int compare_particles(struct part a, struct part b, double threshold) { +int compare_particles(struct part *a, struct part *b, double threshold) { #ifdef GADGET2_SPH @@ -722,117 +722,117 @@ int compare_particles(struct part a, struct part b, double threshold) { double absDiff = 0.0, absSum = 0.0, relDiff = 0.0; for (int k = 0; k < 3; k++) { - if (compare_values(a.x[k], b.x[k], threshold, &absDiff, &absSum, + if (compare_values(a->x[k], b->x[k], threshold, &absDiff, &absSum, &relDiff)) { message( "Relative difference (%e) larger than tolerance (%e) for x[%d] of " "particle %lld.", - relDiff, threshold, k, a.id); - message("a = %e, b = %e", a.x[k], b.x[k]); + relDiff, threshold, k, a->id); + message("a = %e, b = %e", a->x[k], b->x[k]); result = 1; } } for (int k = 0; k < 3; k++) { - if (compare_values(a.v[k], b.v[k], threshold, &absDiff, &absSum, + if (compare_values(a->v[k], b->v[k], threshold, &absDiff, &absSum, &relDiff)) { message( "Relative difference (%e) larger than tolerance (%e) for v[%d] of " "particle %lld.", - relDiff, threshold, k, a.id); - message("a = %e, b = %e", a.v[k], b.v[k]); + relDiff, threshold, k, a->id); + message("a = %e, b = %e", a->v[k], b->v[k]); result = 1; } } for (int k = 0; k < 3; k++) { - if (compare_values(a.a_hydro[k], b.a_hydro[k], threshold, &absDiff, &absSum, - &relDiff)) { + if (compare_values(a->a_hydro[k], b->a_hydro[k], threshold, &absDiff, + &absSum, &relDiff)) { message( "Relative difference (%e) larger than tolerance (%e) for a_hydro[%d] " "of particle %lld.", - relDiff, threshold, k, a.id); - message("a = %e, b = %e", a.a_hydro[k], b.a_hydro[k]); + relDiff, threshold, k, a->id); + message("a = %e, b = %e", a->a_hydro[k], b->a_hydro[k]); result = 1; } } - if (compare_values(a.rho, b.rho, threshold, &absDiff, &absSum, &relDiff)) { + if (compare_values(a->rho, b->rho, threshold, &absDiff, &absSum, &relDiff)) { message( "Relative difference (%e) larger than tolerance (%e) for rho of " "particle %lld.", - relDiff, threshold, a.id); - message("a = %e, b = %e", a.rho, b.rho); + relDiff, threshold, a->id); + message("a = %e, b = %e", a->rho, b->rho); result = 1; } - if (compare_values(a.density.rho_dh, b.density.rho_dh, threshold, &absDiff, + if (compare_values(a->density.rho_dh, b->density.rho_dh, threshold, &absDiff, &absSum, &relDiff)) { message( "Relative difference (%e) larger than tolerance (%e) for rho_dh of " "particle %lld.", - relDiff, threshold, a.id); - message("a = %e, b = %e", a.density.rho_dh, b.density.rho_dh); + relDiff, threshold, a->id); + message("a = %e, b = %e", a->density.rho_dh, b->density.rho_dh); result = 1; } - if (compare_values(a.density.wcount, b.density.wcount, threshold, &absDiff, + if (compare_values(a->density.wcount, b->density.wcount, threshold, &absDiff, &absSum, &relDiff)) { message( "Relative difference (%e) larger than tolerance (%e) for wcount of " "particle %lld.", - relDiff, threshold, a.id); - message("a = %e, b = %e", a.density.wcount, b.density.wcount); + relDiff, threshold, a->id); + message("a = %e, b = %e", a->density.wcount, b->density.wcount); result = 1; } - if (compare_values(a.density.wcount_dh, b.density.wcount_dh, threshold, + if (compare_values(a->density.wcount_dh, b->density.wcount_dh, threshold, &absDiff, &absSum, &relDiff)) { message( "Relative difference (%e) larger than tolerance (%e) for wcount_dh of " "particle %lld.", - relDiff, threshold, a.id); - message("a = %e, b = %e", a.density.wcount_dh, b.density.wcount_dh); + relDiff, threshold, a->id); + message("a = %e, b = %e", a->density.wcount_dh, b->density.wcount_dh); result = 1; } - if (compare_values(a.force.h_dt, b.force.h_dt, threshold, &absDiff, &absSum, + if (compare_values(a->force.h_dt, b->force.h_dt, threshold, &absDiff, &absSum, &relDiff)) { message( "Relative difference (%e) larger than tolerance (%e) for h_dt of " "particle %lld.", - relDiff, threshold, a.id); - message("a = %e, b = %e", a.force.h_dt, b.force.h_dt); + relDiff, threshold, a->id); + message("a = %e, b = %e", a->force.h_dt, b->force.h_dt); result = 1; } - if (compare_values(a.force.v_sig, b.force.v_sig, threshold, &absDiff, &absSum, - &relDiff)) { + if (compare_values(a->force.v_sig, b->force.v_sig, threshold, &absDiff, + &absSum, &relDiff)) { message( "Relative difference (%e) larger than tolerance (%e) for v_sig of " "particle %lld.", - relDiff, threshold, a.id); - message("a = %e, b = %e", a.force.v_sig, b.force.v_sig); + relDiff, threshold, a->id); + message("a = %e, b = %e", a->force.v_sig, b->force.v_sig); result = 1; } - if (compare_values(a.entropy_dt, b.entropy_dt, threshold, &absDiff, &absSum, + if (compare_values(a->entropy_dt, b->entropy_dt, threshold, &absDiff, &absSum, &relDiff)) { message( "Relative difference (%e) larger than tolerance (%e) for entropy_dt of " "particle %lld.", - relDiff, threshold, a.id); - message("a = %e, b = %e", a.entropy_dt, b.entropy_dt); + relDiff, threshold, a->id); + message("a = %e, b = %e", a->entropy_dt, b->entropy_dt); result = 1; } - if (compare_values(a.density.div_v, b.density.div_v, threshold, &absDiff, + if (compare_values(a->density.div_v, b->density.div_v, threshold, &absDiff, &absSum, &relDiff)) { message( "Relative difference (%e) larger than tolerance (%e) for div_v of " "particle %lld.", - relDiff, threshold, a.id); - message("a = %e, b = %e", a.density.div_v, b.density.div_v); + relDiff, threshold, a->id); + message("a = %e, b = %e", a->density.div_v, b->density.div_v); result = 1; } for (int k = 0; k < 3; k++) { - if (compare_values(a.density.rot_v[k], b.density.rot_v[k], threshold, + if (compare_values(a->density.rot_v[k], b->density.rot_v[k], threshold, &absDiff, &absSum, &relDiff)) { message( "Relative difference (%e) larger than tolerance (%e) for rot_v[%d] " "of particle %lld.", - relDiff, threshold, k, a.id); - message("a = %e, b = %e", a.density.rot_v[k], b.density.rot_v[k]); + relDiff, threshold, k, a->id); + message("a = %e, b = %e", a->density.rot_v[k], b->density.rot_v[k]); result = 1; } } diff --git a/src/tools.h b/src/tools.h index 5ec2ca42810e1cb733e1aa674bd1c94ac5c569bd..22eba1519ebf80673cb2d8540791e6b4d092bab0 100644 --- a/src/tools.h +++ b/src/tools.h @@ -54,7 +54,7 @@ void gravity_n2(struct gpart *gparts, const int gcount, const struct gravity_props *gravity_properties, float rlr); int compare_values(double a, double b, double threshold, double *absDiff, double *absSum, double *relDiff); -int compare_particles(struct part a, struct part b, double threshold); +int compare_particles(struct part *a, struct part *b, double threshold); long get_maxrss(void); diff --git a/tests/testInteractions.c b/tests/testInteractions.c index 0241a46634bdd12fb8a89bc3912c3b160198c389..dae55a337642e1616e94119263ff8f1c2a617c89 100644 --- a/tests/testInteractions.c +++ b/tests/testInteractions.c @@ -197,10 +197,10 @@ int check_results(struct part serial_test_part, struct part *serial_parts, struct part vec_test_part, struct part *vec_parts, int count) { int result = 0; - result += compare_particles(serial_test_part, vec_test_part, ACC_THRESHOLD); + result += compare_particles(&serial_test_part, &vec_test_part, ACC_THRESHOLD); for (int i = 0; i < count; i++) - result += compare_particles(serial_parts[i], vec_parts[i], ACC_THRESHOLD); + result += compare_particles(&serial_parts[i], &vec_parts[i], ACC_THRESHOLD); return result; } diff --git a/tests/testPeriodicBC.c b/tests/testPeriodicBC.c index f49a59c650b46f4de902b050b5248b01e971bfbb..be83f20a58b17f9a5fdcf967cda9a678aab5b8a9 100644 --- a/tests/testPeriodicBC.c +++ b/tests/testPeriodicBC.c @@ -273,7 +273,7 @@ int check_results(struct part *serial_parts, struct part *vec_parts, int count, int result = 0; for (int i = 0; i < count; i++) - result += compare_particles(serial_parts[i], vec_parts[i], threshold); + result += compare_particles(&serial_parts[i], &vec_parts[i], threshold); return result; } diff --git a/tools/analyse_runtime.py b/tools/analyse_runtime.py index 5cf2bbce7f44d5ddca6c6f6c2f372ead835bd569..e1b09a9903c804a20788f165ff28c90142ae38b1 100755 --- a/tools/analyse_runtime.py +++ b/tools/analyse_runtime.py @@ -93,7 +93,7 @@ labels = [ "engine_split:", "space_init", "engine_init", - "engine_repartition_trigger" + "engine_repartition_trigger:" ] is_rebuild = [ 1, diff --git a/tools/combine_ics.py b/tools/combine_ics.py index 447af1cc3d6d9c09e6322648a0dc5035382e857c..ac5680f9c70e3c7f8280b14a80a09d8c3140ae99 100755 --- a/tools/combine_ics.py +++ b/tools/combine_ics.py @@ -1,7 +1,7 @@ #!/usr/bin/env python """ Usage: - combine_ics.py input_file.0.hdf5 merged_file.hdf5 + combine_ics.py input_file.0.hdf5 merged_file.hdf5 gzip_level This file combines Gadget-2 type 2 (i.e. hdf5) initial condition files into a single file that can be digested by SWIFT. @@ -11,6 +11,9 @@ the DM particles is handled. No unit conversions are applied nor are any scale-factors or h-factors changed. The script applies some compression and checksum filters to the output to save disk space. +The last argument `gzip_level` is used to specify the level of compression +to apply to all the fields in the file. Use 0 to cancel all coompression. +The default value is `4`. This file is part of SWIFT. Copyright (C) 2016 Matthieu Schaller (matthieu.schaller@durham.ac.uk) @@ -35,6 +38,11 @@ import sys import h5py as h5 import numpy as np +# Store the compression level +gzip_level = 4 +if sys.argc > 3: + gzip_level = sys.argv[3] + # First, we need to collect some information from the master file main_file_name = str(sys.argv[1])[:-7] print("Merging snapshots files with name", main_file_name) @@ -45,19 +53,19 @@ grp_header = master_file["/Header"] num_files = grp_header.attrs["NumFilesPerSnapshot"] tot_num_parts = grp_header.attrs["NumPart_Total"] -tot_num_parts_high_word = grp_header.attrs["NumPart_Total"] +tot_num_parts_high_word = grp_header.attrs["NumPart_Total_HighWord"] entropy_flag = grp_header.attrs["Flag_Entropy_ICs"] box_size = grp_header.attrs["BoxSize"] time = grp_header.attrs["Time"] # Combine the low- and high-words +tot_num_parts = tot_num_parts.astype(np.int64) for i in range(6): - tot_num_parts[i] += np.int64(tot_num_parts_high_word[i]) << 32 + tot_num_parts[i] += (np.int64(tot_num_parts_high_word[i]) << 32) # Some basic information print("Reading", tot_num_parts, "particles from", num_files, "files.") - # Check whether there is a mass table DM_mass = 0.0 mtable = grp_header.attrs.get("MassTable") @@ -105,7 +113,7 @@ def create_set(grp, name, size, dim, dtype): dtype=dtype, chunks=True, compression="gzip", - compression_opts=4, + compression_opts=gzip_level, shuffle=True, fletcher32=True, maxshape=(size,), @@ -117,7 +125,7 @@ def create_set(grp, name, size, dim, dtype): dtype=dtype, chunks=True, compression="gzip", - compression_opts=4, + compression_opts=gzip_level, shuffle=True, fletcher32=True, maxshape=(size, dim),