Benchmarking of thread scalability with MPI
To be tested:
- swift_mpi on the EAGLE_25 box compiled with "none" as external potential and with metis.
- Run on 4 nodes with 1 to 16 threads per rank.
- Run with
-gto get the overheads fromgparts.
- Total time broken down into:
engine_collect_timestep()engine_launch()engine_unskip()engine_drift_all()engine_rebuild()engine_repartition()engine_print_stats()scheduler_reweight()
-
engine_repartition()broken down into:partition_repartition()engine_redistribute()engine_makeproxies()
-
engine_rebuild()broken down into:space_rebuild()engine_maketasks()engine_marktasks()
-
space_rebuild()broken down into:space_regrid()space_parts_get_cell_index()space_gparts_get_cell_index()engine_exchange_strays()-
space_parts_sort()<-- Note this is also called inengine_redistribute(). Only want this call for now. -
space_gparts_sort()<-- Note this is also called inengine_redistribute(). Only want this call for now. -
part_relink_gparts_to_parts()<-- Note this is also called inspace_split(). Only want this top-level call for now. -
part_relink_parts_to_gparts()<-- Note this is also called inspace_split(). Only want this top-level call for now. space_split()