tweaked section 3.1.

b142eb83 · Pedro Gonnet · d55defab · b142eb83
Commit b142eb83 authored 9 years ago by Pedro Gonnet
--- a/theory/paper_pasc/pasc_paper.tex
+++ b/theory/paper_pasc/pasc_paper.tex
@@ -313,9 +313,12 @@ paradigm in which a computation is broken-down in to a set of
 {\em tasks} which can be executed  concurrently.
 In order to ensure that the tasks are executed in the right
 order, e.g.~that data needed by one task is only used once it
-has been produced by another task, and that no two tasks
-update the same data at the same time, {\em dependencies} between
+has been produced by another task, {\em dependencies} between
 tasks are specified and strictly enforced by a task scheduler.
+Additionally, if two tasks require exclusive access to the same
+resource, yet in no particular order, they are treated as
+{\em conflicts} and the scheduler ensures that they are not executed
+concurrently.
 Computations described in this way then parallelize trivially:
 each processor repeatedly grabs a task for which all dependencies
 have been satisfied and executes it until there are no tasks left.
@@ -323,15 +326,14 @@ have been satisfied and executes it until there are no tasks left.
 The main advantages of using a task-based approach are
 %
 \begin{itemize}
-    \item The order in which the tasks are processed is completely
+    \item The order in which the tasks are processed, and how they
+        are assigned to each processor is completely
        dynamic and adapts automatically to load imbalances.
    \item If the dependencies and conflicts are specified correctly,
        there is no need for expensive explicit locking, synchronisation,
        or atomic operations to deal with most concurrency problems.
    \item Each task has exclusive access to the data it is working on,
        thus improving cache locality and efficiency.
-        If each task operates exclusively on a restricted part of the
-        problem data, this can lead to high cache locality and efficiency.
 \end{itemize}
 %
 Task-based parallelism is not a particularly new concept and therefore
@@ -345,9 +347,7 @@ techniques easier, we chose to implement our own task scheduler
 in \swift, which has since been back-ported as the general-purpose
 \qs task scheduler \cite{gonnet2013quicksched}.
 This also allowed us to extend the scheduler with the concept of
-{\em conflicts}: two tasks conflict if they cannot be executed
-concurrently, e.g.~because the access some common resource or memory
-location, yet the order in which they are executed is irrelevant.
+task conflicts.

 Despite its advantages, and the variety of implementations,
 task-based parallelism is rarely used in
@@ -356,7 +356,7 @@ practice (notable exceptions include the PLASMA project
 \cite{ref:Bangerth2007} which uses Intel's TBB).
 The main problem is that to effectively use task-based parallelism,
 most computations need to be completely redesigned to fit the paradigm,
-which is usually not an option for existing large and complex codebases.
+which is usually not an option for large and complex codebases.

 Since we were re-implementing \swift from scratch, this was not an issue.
 The tree-based neighbour-finding described above was replaced with a more
@@ -372,6 +372,12 @@ may be updated.
 The task hierarchy is shown in Figure~\ref{tasks}, where the particles in each
 cell are first sorted (round tasks) before the particle densities
 are computed (first layer of square tasks).
+Ghost tasks (triangles) are used to ensure that all density coputations
+on a cell of particles have completed before the force evaluation tasks
+(second layer of square tasks) can be executed.
+Once all the force tasks on a cell of particles have completed,
+the integrator tasks (inverted triangles) update the particle positions 
+and velocities.

 Due to the cache-friendly nature of the task-based computations, 
 and their ability to exploit symmetries in the particle interactions,