Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
SWIFT
SWIFTsim
Commits
c1ecb201
Commit
c1ecb201
authored
Jan 22, 2016
by
Matthieu Schaller
Browse files
Aidan's fixes
parent
39c9a233
Changes
1
Hide whitespace changes
Inline
Side-by-side
theory/paper_pasc/pasc_paper.tex
View file @
c1ecb201
...
...
@@ -518,14 +518,14 @@ This is illustrated in Figure~\ref{tasks}, where data is exchanged across
two ranks for the density and force computations and the extra
dependencies are shown in red.
The communication itself is implemented using the non-blocking
{
\tt
MPI
\_
Isend
}
and
{
\tt
MPI
\_
Irecv
}
primitives to initiate
communication, and
{
\tt
MPI
\_
Test
}
to check if the communication
was
successful and resolve the communication task's dependencies.
In the
task-based scheme, strictly local tasks which do not rely
on
communication tasks are executed first.
As the data from other ranks
arrive, the corresponding non-local
tasks are unlocked and are
executed whenever a thread picks them up.
The communication itself is implemented using the non-blocking
{
\tt
MPI
\_
Isend
}
and
{
\tt
MPI
\_
Irecv
}
primitives to initiate
communication, and
{
\tt
MPI
\_
Test
}
to check if the communication
was
successful and resolve the communication task's dependencies.
In the
task-based scheme, strictly local tasks which do not rely
on
communication tasks are executed first.
As data from other ranks
arrive, the corresponding non-local
tasks are unlocked and are
executed whenever a thread picks them up.
One direct consequence of this approach is that instead of a single
{
\tt
send
}
/
{
\tt
recv
}
call between each pair of neighbouring ranks,
...
...
@@ -670,7 +670,7 @@ threads per node (i.e. one thread per physical core).
Speed-up.
\textit
{
Right panel:
}
Corresponding parallel efficiency.
Using 16 threads per node (no use of hyper-threading) with one MPI rank
per node, an almost perfect parallel efficiency is achieved when
increasing the node count from 16 (
512
cores) to 2,048 (32,768
increasing the node count from 16 (
256
cores) to 2,048 (32,768
cores).
\label
{
fig:superMUC
}}
\end{figure*}
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment