Skip to content
Snippets Groups Projects
Commit 86174363 authored by Matthieu Schaller's avatar Matthieu Schaller
Browse files

Updated the homepage information for computer scientists

parent af6812cd
No related branches found
No related tags found
1 merge request!11Updated homepage text for all 3 sections
# Computer Scientist # Computer Scientist
## Scaling ## Parallelisation strategy
Cosmological simulations are typically very hard to scale to large numbers of
cores, due to the fact that information is needed from each of the nodes to
perform a given time-step. SWIFT uses smart domain decomposition, vectorisation,
and asynchronous communication to provide a 36.7x speedup over our direct
competition (the publicly available GADGET-2 code) and near-perfect weak
scaling.
![SWIFT Scaling Plot](scalingplot.png)
The left panel ("Weak Scaling") shows how the runtime of a problem changes when
the number of threads is increased proportionally to the number of particles in
the system (i.e. a fixed 'load per thread'). The right panel ("Strong Scaling")
shows how the runtime changes for a fixed load as it is spread over more
threads. The right panel shows the 36.7x speedup that SWIFT offers over
GADGET-2.
SWIFT uses a hybrid MPI + threads parallelisation scheme with a
modified version of the publicly available lightweight tasking library
[QuickShed](https://gitlab.cosma.dur.ac.uk/swift/quicksched) as its
backbone. Communications between compute nodes are scheduled by the
library itself and use asynchronous call to MPI to maximise the
overlap between communication and computation. The domain
decomposition itself is performed by splitting the graph of all the
compute tasks, using the METIS library, such as to minimise the number
of required MPI communications. The core calculations in SWIFT used
hand-written SIMD intrinsics to process multiple particles in parallel
and achieve maximal performance.
## Strong- and weak-scaling
Cosmological simulations are typically very hard to scale to large
numbers of cores, due to the fact that information is needed from each
of the nodes to perform a given time-step. SWIFT uses smart domain
decomposition, vectorisation, and asynchronous communication to
provide a 36.7x speedup over the de-facto standard (the publicly
available GADGET-2 code) and near-perfect weak scaling even on
problems larger than presented in the published astrophysics
literature
![SWIFT Scaling Plot](scalingplot.png) The left panel ("Weak Scaling")
shows how the run-time of a problem changes when the number of threads
is increased proportionally to the number of particles in the system
(i.e. a fixed 'load per thread'). The right panel ("Strong Scaling")
shows how the run-time changes for a fixed load as it is spread over
more threads. The right panel shows the 36.7x speedup that SWIFT
offers over GADGET-2. This uses a representative problem (a snapshot
of the [EAGLE](http://adsabs.harvard.edu/abs/2014ApJS..210...14K)
simulation at late time where the hierarchy of time-steps is very deep
and where most other codes struggle to harvest any scaling or performance.
## I/O performance
SWIFT uses the parallel-hdf5 library to read and write snapshots
efficiently on distributed file systems. By careful tuning of the
Lustre parameters, SWIFT can write snapshots at the maximal disk
writing speed of a given system.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment