Benchmarking of thread scalability with MPI
To be tested:
- swift_mpi on the EAGLE_25 box compiled with "none" as external potential and with metis.
- Run on 4 nodes with 1 to 16 threads per rank.
- Run with
-g
to get the overheads fromgparts
.
- Total time broken down into:
engine_collect_timestep()
engine_launch()
engine_unskip()
engine_drift_all()
engine_rebuild()
engine_repartition()
engine_print_stats()
scheduler_reweight()
-
engine_repartition()
broken down into:partition_repartition()
engine_redistribute()
engine_makeproxies()
-
engine_rebuild()
broken down into:space_rebuild()
engine_maketasks()
engine_marktasks()
-
space_rebuild()
broken down into:space_regrid()
space_parts_get_cell_index()
space_gparts_get_cell_index()
engine_exchange_strays()
-
space_parts_sort()
<-- Note this is also called inengine_redistribute()
. Only want this call for now. -
space_gparts_sort()
<-- Note this is also called inengine_redistribute()
. Only want this call for now. -
part_relink_gparts_to_parts()
<-- Note this is also called inspace_split()
. Only want this top-level call for now. -
part_relink_parts_to_gparts()
<-- Note this is also called inspace_split()
. Only want this top-level call for now. space_split()