Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
SWIFTsim
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Deploy
Releases
Model registry
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
SWIFT
SWIFTsim
Commits
97702735
Commit
97702735
authored
9 years ago
by
Pedro Gonnet
Browse files
Options
Downloads
Patches
Plain Diff
added task-based domain decomposition section.
parent
4eef637d
No related branches found
No related tags found
2 merge requests
!136
Master
,
!80
PASC paper
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
theory/paper_pasc/biblio.bib
+20
-0
20 additions, 0 deletions
theory/paper_pasc/biblio.bib
theory/paper_pasc/pasc_paper.tex
+72
-1
72 additions, 1 deletion
theory/paper_pasc/pasc_paper.tex
with
92 additions
and
1 deletion
theory/paper_pasc/biblio.bib
+
20
−
0
View file @
97702735
...
...
@@ -339,4 +339,24 @@ archivePrefix = "arXiv",
pages
=
{24/1--24/27}
}
@article
{
ref:Karypis1998
,
title
=
{A fast and high quality multilevel scheme for partitioning irregular graphs}
,
author
=
{Karypis, George and Kumar, Vipin}
,
journal
=
{SIAM Journal on scientific Computing}
,
volume
=
{20}
,
number
=
{1}
,
pages
=
{359--392}
,
year
=
{1998}
,
publisher
=
{SIAM}
}
@article
{
devine2002zoltan
,
title
=
{Zoltan data management services for parallel dynamic applications}
,
author
=
{Devine, Karen and Boman, Erik and Heaphy, Robert and Hendrickson, Bruce and Vaughan, Courtenay}
,
journal
=
{Computing in Science \& Engineering}
,
volume
=
{4}
,
number
=
{2}
,
pages
=
{90--96}
,
year
=
{2002}
,
publisher
=
{IEEE}
}
This diff is collapsed.
Click to expand it.
theory/paper_pasc/pasc_paper.tex
+
72
−
1
View file @
97702735
...
...
@@ -326,11 +326,82 @@ which is usually not an option for existing large and complex codebases.
Since we were re-implementing
\swift
from scratch, this was not an issue.
The tree-based neighbour-finding described above was replaced with a more
task-friendly approach as described in
\cite
{
gonnet2015efficient
}
.
Particle interactions are computed within, and between pairs, of
hierarchical
{
\em
cells
}
containing one or more particles.
The dependencies between the tasks are set following
equations
\eqn
{
rho
}
,
\eqn
{
dvdt
}
, and
\eqn
{
dudt
}
, i.e. such that for any cell,
all the tasks computing the particle densities therein must have
completed before the particle forces can be computed, and all the
force computations must have completed before the particle velocities
may be updated.
Due to the cache-friendly nature of the task-based computations,
and their ability to exploit symmetries in the particle interactions,
the task-based approach is already more efficient than the tree-based
neighbour search on a single core, and scales efficiently to all
cores of a shared-memory machine
\cite
{
gonnet2015efficient
}
.
\subsection
{
Task-based domain decompositon
}
Given a task-based description of a computation, partitioning it over
a fixed number of nodes is relatively straight-forward: we create
a
{
\em
cell hypergraph
}
in which:
\begin{itemize}
\item
Each
{
\em
node
}
represents a single cell of particles, and
\item
Each
{
\em
edge
}
represents a single task, connecting the
cells used by that task.
\end{itemize}
Since in the particular case of
\swift
each task references at most
two cells, the cell hypergraph is just a regular
{
\em
cell graph
}
.
Any partition of the cell graph represents a partition of the
computation, i.e.~the nodes belonging to each partition each belong
to a computational
{
\em
rank
}
(to use the MPI terminology), and the
data belonging to each cell resides on the partition/rank to which
it has been assigned.
Any task spanning cells that belong to the same partition needs only
to be evaluated on that rank/partition, and tasks spanning more than
one partition need to be evaluated on both ranks/partitions.
If we then weight each edge with the computatoinal cost associated with
each task, then finding a
{
\em
good
}
partitioning reduces to finding a
partition of the cell graph such that:
\begin{itemize}
\item
The weight of the edges within each partition is more or less
equal, and
\item
The weight of the edges spanning two or more partitions is
minimal.
\end{itemize}
\noindent
where the first criteria provides good
{
\em
load-balancing
}
,
i.e.~each partition/rank should involve the same amount of work, and
the second criteria reduces the amount of duplicated work between
partitions/ranks.
Computing such a partition is a standard graph problem and several
software libraries which provide good solutions
\footnote
{
Computing
the optimal partition for more than two nodes is considered NP-hard.
}
,
e.g.~METIS
\cite
{
ref:Karypis1998
}
and Zoltan
\cite
{
devine2002zoltan
}
,
exist.
Note that this approach does not explicitly consider any geomertic
constraints, or strive to partition the
{
\em
amount
}
of data equitably.
The only criteria is the computational cost of each partition, for
which the task decomposition provides a convenient model.
We are therefore partitioning the
{
\em
computation
}
, as opposed
to just the
{
\em
data
}
.
Note also that the proposed partitioning scheme takes neither the
task hierarchy, nor the size of the data that needs to be exchanged
between partitions/ranks into account.
This approach is therefore only reasonable in situations in which
the task hierarchy is wider than flat, i.e.~the length of the critical
path in the task graph is much smaller than the sum of all tasks,
and in which communication latencies are negligible.
\subsection
{
Asynchronous communications
}
\subsection
{
Task-graph domain decompositon
}
%#####################################################################################################
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment