Skip to content
Snippets Groups Projects
Commit 44683d78 authored by Matthieu Schaller's avatar Matthieu Schaller
Browse files

Webpage, wiggles and new conclusion section title

parent 6f5b2239
No related branches found
No related tags found
2 merge requests!136Master,!80PASC paper
...@@ -166,13 +166,15 @@ OpenMP\cite{ref:Dagum1998} and MPI\cite{ref:Snir1998}, and domain ...@@ -166,13 +166,15 @@ OpenMP\cite{ref:Dagum1998} and MPI\cite{ref:Snir1998}, and domain
decompositions based on space-filling curves \cite{warren1993parallel}. decompositions based on space-filling curves \cite{warren1993parallel}.
The design and implementation of \swift \cite{gonnet2013swift,% The design and implementation of \swift \cite{gonnet2013swift,%
theuns2015swift,gonnet2015efficient}, a large-scale cosmological theuns2015swift,gonnet2015efficient}, a large-scale cosmological simulation
simulation code built from scratch, provided the perfect code built from scratch, provided the perfect opportunity to test some newer
opportunity to test some newer approaches, i.e.~task-based parallelism, approaches, i.e.~task-based parallelism, fully asynchronous communication, and
fully asynchronous communication, and graph partition-based graph partition-based domain decompositions. The code is open-source and
domain decompositions. available at the address \url{www.swiftsim.com} where all the test cases
This paper describes the results obtained with these parallelisation presented in this paper can also be found.
techniques.
This paper describes the results
obtained with these parallelisation techniques.
%##################################################################################################### %#####################################################################################################
...@@ -570,7 +572,8 @@ algorithm described above in the case of 32 MPI ranks. ...@@ -570,7 +572,8 @@ algorithm described above in the case of 32 MPI ranks.
Using 16 threads per node (no use of hyper-threading) with one MPI Using 16 threads per node (no use of hyper-threading) with one MPI
rank per node, a reasonable parallel efficiency is achieved when rank per node, a reasonable parallel efficiency is achieved when
increasing the thread count from 1 (1 node) to 256 (16 nodes) even increasing the thread count from 1 (1 node) to 256 (16 nodes) even
on a relatively small test case. on a relatively small test case. Wiggles are likely due to the way thread
affinity is set by the operating system at run time.
\label{fig:cosma}} \label{fig:cosma}}
\end{figure*} \end{figure*}
...@@ -669,7 +672,7 @@ test are shown on Fig.~\ref{fig:JUQUEEN2}. ...@@ -669,7 +672,7 @@ test are shown on Fig.~\ref{fig:JUQUEEN2}.
%##################################################################################################### %#####################################################################################################
\section{Conclusions} \section{Discussion \& Conclusion}
When running on the SuperMUC machine with 32 nodes (512 cores), each MPI rank When running on the SuperMUC machine with 32 nodes (512 cores), each MPI rank
contains approximately $1.6\times10^7$ particles in $2.5\times10^5$ contains approximately $1.6\times10^7$ particles in $2.5\times10^5$
... ...
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment