diff --git a/theory/paper_pasc/pasc_paper.tex b/theory/paper_pasc/pasc_paper.tex
index 4fe13a4907535dd38c3b1f6f3a64eb7df8a19b09..1ed99417a5d50ac24ab1670d923c396428ec66df 100644
--- a/theory/paper_pasc/pasc_paper.tex
+++ b/theory/paper_pasc/pasc_paper.tex
@@ -273,8 +273,23 @@ analysis).
 
 \section{Parallelisation strategy}
 
-{\em Some words on how we wanted to be fully hybrid, dynamic,
-and asynchronous.}
+One of the main concerns when developing \swift was to break
+with the branch-and-bound type parallelism inherent to parallel
+codes using OpenMP and MPI, and the constant synchronization
+between computational steps it results in.
+
+If {\em synchronisation} is the main problem, then {\em asynchronicity}
+is the obvious solution.
+We therefore opted for a {\em task-based} approach for maximum
+single-node, or shared-memory, performance.
+This approach not only provides excellent load-balancing on a single
+node, it also provides a powerful model of the computation that
+can be used to partition the work equitably over a set of
+distributed-memory nodes using general-purpose graph partitioning
+algorithms.
+Finally, the necessary communication between nodes can itself be
+modelled in a task-based way, interleaving communication seamlesly
+with the rest of the computation.
 
 \subsection{Task-based parallelism}
 
@@ -501,16 +516,16 @@ One direct consequence of this approach is that instead of a single
 {\tt send}/{\tt recv} call between each pair of neighbouring ranks,
 one such pair is generated for each particle cell.
 This type of communication, i.e.~several small messages instead of
-one large message, is usually discouraged since the sum of the latencies
-for the small messages is usually much larger than the latency of
-the single large message.
+one large message, is usually strongly discouraged since the sum of
+the latencies for the small messages is usually much larger than
+the latency of the single large message.
 This, however, is not a concern since nobody is actually waiting
 to receive the messages in order and the latencies are covered
 by local computations.
 A nice side-effect of this approach is that communication no longer
 happens in bursts involving all the ranks at the same time, but
-is more or less evenly spread over the entire computation, thus
-being less demanding of the communication infrastructure.
+is more or less evenly spread over the entire computation, and is
+therefore less demanding of the communication infrastructure.
 
 
 
@@ -547,8 +562,9 @@ removed the first and last ones, where i/o occurs.
   almost $1000$ across the simulation volume. \label{fig:ICs}}
 \end{figure}  
 
-On all the machines, the code was compiled without switching on explicit
-vectorization nor any architecture-specific flags. 
+On all the machines, the code was compiled out of the box,
+without any tuning, explicit vectorization, or exploiting any
+other specific features of the underlying hardware. 
 
 \subsection{x86 architecture: Cosma-5}