Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • 840-unit-test-testtimeline-fails
  • 875-wendland-c6-missing-neighbour-contributions
  • 887-code-does-not-compile-with-parmetis-installed-locally-but-without-metis
  • CubeTest
  • FS_Del
  • GEARRT_Iliev1
  • GEARRT_Iliev3
  • GEARRT_Iliev4
  • GEARRT_Iliev5
  • GEARRT_Iliev5-fixed-nr-subcycles
  • GEARRT_Iliev7
  • GEARRT_Iliev_static
  • GEARRT_Ivanova
  • GEARRT_fixed_nr_subcycles
  • GEARRT_injection_tests_Iliev0
  • GPU_swift
  • GrackleCoolingUpdates2
  • Lambda-T-table
  • MAGMA2
  • MAGMA2_matthieu
  • MHD_FS
  • MHD_FS_TESTs
  • MHD_FS_VP_AdvectGauge
  • MHD_Orestis
  • MHD_canvas
  • MHD_canvas_RF_128
  • MHD_canvas_RF_growth_rate
  • MHD_canvas_RobertsFlow
  • MHD_canvas_SPH_errors
  • MHD_canvas_matthieu
  • MHD_canvas_nickishch
  • MHD_canvas_nickishch_Lorentz_force_test
  • MHD_canvas_nickishch_track_everything
  • MHD_canvas_sid
  • OAK/CPAW_updates
  • OAK/LoopAdvectionTest
  • OAK/adaptive_divv
  • OAK/kinetic_dedner
  • REMIX_cosmo
  • RT_dualc
  • RT_recombination_radiation
  • RT_test_mladen
  • SIDM
  • SIDM_wKDSDK
  • SNdust
  • SPHM1RT_CosmologicalStromgrenSphere
  • SPHM1RT_bincheck
  • SPHM1RT_smoothedRT
  • TangoSIDM
  • TestPropagation3D
  • Test_fixedhProb
  • activate_fewer_comms
  • active_h_max_optimization
  • adaptive_softening_Lieuwe
  • add_2p5D
  • add_black_holes_checks
  • adding_sidm_to_master
  • agn_crksph
  • agn_crksph_subtask_speedup
  • amd-optimization
  • arm_vec
  • automatic_tasks
  • better_ray_RNG
  • black_holes_accreted_angular_momenta_from_gas
  • burkert-potential
  • c11
  • c11_atomics_copy
  • cancel_all_sorts
  • cell_exchange_improvements
  • cell_types
  • cherry-pick-cd1c39e0
  • comm_tasks_are_special
  • conduction_velocities
  • cpp-fixes
  • cuda_test
  • darwin/adaptive_softening
  • darwin/add_gizmo_to_sf
  • darwin/gear_chemistry_fluxes
  • darwin/gear_mechanical_feedback
  • darwin/gear_preSN_fbk_merge
  • darwin/gear_preSN_feedback
  • darwin/gear_radiation
  • darwin/simulations
  • darwin/sink_formation_proba
  • darwin/sink_mpi
  • darwin/sink_mpi_physics
  • dead-time-stats
  • derijcke_cooling
  • dev_cms
  • do-not-activate-empty-star-pairs
  • domain_zoom_nometis
  • drift_flag_debug_check
  • driven_turbulence
  • driven_turbulence_forcings
  • engineering
  • eos_updates
  • evrard_disc
  • expand_fof_2022
  • explict_bkg_cdim
  • fewer_gpart_comms
  • v0.0
  • v0.1
  • v0.1.0-pre
  • v0.2.0
  • v0.3.0
  • v0.4.0
  • v0.5.0
  • v0.6.0
  • v0.7.0
  • v0.8.0
  • v0.8.1
  • v0.8.2
  • v0.8.3
  • v0.8.4
  • v0.8.5
  • v0.9.0
  • v1.0.0
  • v2025.01
  • v2025.04
119 results

Target

Select target project
  • dc-oman1/swiftsim
  • swift/swiftsim
  • pdraper/swiftsim
  • tkchan/swiftsim
  • dc-turn5/swiftsim
5 results
Select Git revision
  • GPU_swift
  • Nifty-Clutser-Solution
  • Rsend_repart
  • Subsize
  • active_h_max
  • add-convergence-scripts
  • add_dehnen_aly_density_correction
  • add_particles_debug
  • advanced_opening_criteria
  • arm_vec
  • assume_for_gcc
  • atomic_read_writes
  • back_of_the_queue
  • c11_atomics
  • c11_atomics_copy
  • c11_standard
  • cache_time_bin
  • comm_tasks_are_special
  • cpp
  • cuda_test
  • dumper-thread
  • eagle-cooling-improvements
  • energy_logger
  • engineering
  • evrard_disc
  • ewald_correction
  • extra_EAGLE_EoS
  • feedback_after_hydro
  • gear
  • gear_feedback
  • gear_star_formation
  • generic_cache
  • genetic_partitioning2
  • gizmo
  • gizmo_mfv_entropy
  • gpart_assignment_speedup
  • gravity_testing
  • hydro_validation
  • isolated_galaxy_update
  • ivanova
  • ivanova-dirty
  • kahip
  • local_part
  • logger_index_file
  • logger_restart
  • logger_skip_fields
  • logger_write_hdf5
  • master
  • memalign-test
  • more_task_info
  • move_configure_to_m4
  • mpi-one-thread
  • mpi-packed-parts
  • mpi-send-subparts
  • mpi-send-subparts-vector
  • mpi-subparts-vector-grav
  • mpi-testsome
  • mpi-threads
  • multi_phase_bh
  • new_cpp
  • non-periodic-repart
  • non-periodic-repart-update
  • numa_alloc
  • numa_awareness
  • ontheflyimages
  • optimized_entropy_floor
  • parallel_compression
  • paranoid
  • planetary_ideal_gas
  • pyswiftsim
  • recurse_less_in_sort
  • recursive_star_ghosts
  • remove_long_long
  • reorder_rebuild
  • reverted_grav_depth_logic
  • reweight-fitted-costs
  • reweight-scaled-costs
  • sampled_stellar_evolution
  • scheduler_determinism
  • scotch
  • search-window-tests
  • signal-handler-dump
  • simba-stellar-feedback
  • skeleton
  • smarter_sends
  • snipes_data
  • sort-soa
  • spiral_potential
  • star_formation
  • stellar_disc_example
  • stf_output_dirs
  • streaming_io
  • subcell_sort
  • thread-dump-extract-waiters
  • threadpool_rmapper
  • timestep_limiter_update
  • top_level_cells
  • traphic
  • update-gravity
  • update_brute_force_checks
  • v0.0
  • v0.1
  • v0.1.0-pre
  • v0.2.0
  • v0.3.0
  • v0.4.0
  • v0.5.0
  • v0.6.0
  • v0.7.0
  • v0.8.0
  • v0.8.1
  • v0.8.2
  • v0.8.3
  • v0.8.4
114 results
Show changes
Showing
with 584 additions and 566 deletions
doc/RTD/Innovation/TaskBasedParallelism/OMPScaling.png

12 KiB

doc/RTD/Innovation/TaskBasedParallelism/TasksExample.png

9.43 KiB

.. _GettingStarted:
Task Based Parallelism
=================================
One of biggest problems faced by many applications when running on a shared memory system is *load imbalance*, this occurs when the work load is not evenly distributed across the cores. The most well known paradigm for handling this type of parallel architecture is OpenMP, in which the programmer applies annotations to the code to indicate to the compiler which sections should be executed in parallel. If a ``for`` loop has been identified as a parallel section, the iterations of the loop are split between available threads, each executing on a single core. Once all threads have terminated the program becomes serial again and only executes on single thread, this technique is known as branch-and-bound parallelism, shown in :ref:`branch_and_bound`. Unfortunately, this implementation generally leads to low performance and bad scaling as you increase the number of cores.
.. _branch_and_bound:
.. figure:: OMPScaling.png
:scale: 40 %
:align: center
:figclass: align-center
Figure 1: Branch-and-bound parallelism
Another disadvantage with this form of shared-memory parallelism is that there is no implicit handling of concurrency issues between threads. *Race conditions* can occur when two threads attempt to modify the same data simultaneously, unless explicit *critical* regions are defined which prevent more than one thread executing the same code at the same time. These regions degrade parallel performance even further.
A better way to exploit shared memory systems is to use an approach called *task-based parallelism*. This method describes the entire computation in a way that is more inherently parallelisable. The simulation is divided up into a set of computational tasks which are **dynamically** allocated to a number of processors. In order to ensure that the tasks are executed in the correct order and to avoid *race conditions*, *dependencies* between tasks are identified and strictly enforced by a task scheduler. A Directed Acyclic Graph (DAG) illustrates how a set of computational tasks link together via dependencies. Processors can traverse the graph in topological order, selecting and executing tasks that have no unresolved dependencies or waiting until tasks become available. This selection process continues for all processors until all tasks have been completed. An example of a DAG can be seen in :ref:`DAG`, the figure represents tasks as circles, labelled A-E, and dependencies as arrows. Tasks B and C both depend on A, and D depends on B, whereas A and E are independent tasks. Therefore on a shared memory system, tasks A and E could be executed first. Once task A is finished, tasks B and C become available for execution as their dependencies to A have been resolved. Finally, task D can be executed after task B has completed.
.. _DAG:
.. figure:: TasksExample.png
:scale: 40 %
:align: center
:figclass: align-center
Figure 2: Tasks and Dependencies
The main advantages of using this approach are as follows:
* The order in which the tasks are processed is completely dynamic and adapts automatically to load imbalances.
* If the dependencies and conflicts are specified correctly, there is no need for expensive explicit locking, synchronisation or atomic operations, found in OpenMP to deal with most concurrency problems.
* Each task has exclusive access to the data it is working on, thus improving cache locality and efficiency.
SWIFT modifies the task-based approach by introducing the concept of *conflicts* between tasks. Conflicts occur when two tasks operate on the same data, but the order in which the operations occur does not matter. :ref:`task_conflicts` illustrates tasks with conflicts, where there is a conflict between tasks B and C, and tasks D and E. In a parallel setup, once task A has finished executing, if one processor selects task B, then no other processor is allowed to execute task C until task B has completed, or vice versa. Without this modification, other task-based models used dependencies to model conflicts between tasks, which introduces an artificial ordering between tasks and imposes unnecessary constraints on the task scheduler.
.. _task_conflicts:
.. figure:: TasksExampleConflicts.png
:scale: 40 %
:align: center
:figclass: align-center
Figure 3: Tasks and Conflicts
.. toctree::
:maxdepth: 1
.. _GettingStarted:
Task Graph Partition
=================================
.. toctree::
:maxdepth: 1
.. _GettingStarted:
Vectorisation
=================================
.. toctree::
:maxdepth: 1
.. _GettingStarted:
What makes SWIFT different?
===========================
SWIFT implements a host of techniques that not only make it faster on one core but produce better scaling on shared and distributed memory architectures, when compared to equivalent codes such as Gadget-2.
Here is a list outlining how SWIFT approaches parallelism:
.. toctree::
:maxdepth: 1
HeirarchicalCellDecomposition/index.rst
TaskBasedParallelism/index.rst
Caching/index.rst
TaskGraphPartition/index.rst
HybridParallelism/index.rst
AsynchronousComms/index.rst
Vectorisation/index.rst
This diff is collapsed.
.. _GettingStarted:
What is the Motivation for SWIFT?
=================================
SWIFT is designing with parallelism from the very start. It is a bottom up approach with a task based
parallel infrastructure at it's very core.
.. toctree::
:maxdepth: 1
.. _SWIFTGravity:
Gravity
=============================
SWIFT implements the calculation of gravitational forces using a version of TreePM.
Matthieu - can you update this part? I know you have implemented a Multipole method here
with some SWIFT specific modifications. Could you write a brief overview of the idea?
.. toctree::
:maxdepth: 1
.. _SWIFTSPH:
Smooth Particle Hydrodynamics
=============================
SWIFT simulates the universe using SPH.
.. toctree::
:maxdepth: 1
.. _SWIFTPhysics:
What Physics does SWIFT Simulate?
=================================
.. toctree::
:maxdepth: 1
SPH/sph.rst
Gravity/gravity.rst
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
doc/RTD/source/CitingSWIFT/SWIFT_logo.png

42.3 KiB

This diff is collapsed.
This diff is collapsed.