\subsection{Evaluating the forces using the Fast Multipole Method} \label{ssec:fmm_summary} The algorithmically challenging aspect of the \nbody problem is to evaluate for each particle in a system the potential and associated forces generated by all the other particles. Mathematically, this means evaluate \begin{equation} \phi(\mathbf{x}_a) = \sum_{b \neq a} G m_b\varphi(\mathbf{x}_a - \mathbf{x}_b)\qquad \forall~a\in N \label{eq:fmm:n_body} \end{equation} efficiently for large numbers of particles $N$. In the case of collisionless dynamics, the particles are a mere Monte-Carlo sampling of the underlying coarse-grained phase-space distribution which justifies the use of approximate method to evaluate Eq.~\ref{eq:fmm:n_body}. The \emph{Fast Multipole Method} (FMM) \citep{Greengard1987, Cheng1999}, popularized in astronomy and adapted specifically for gravity solvers by \cite{Dehnen2000, Dehnen2002} (see also \cite{Warren1995} for related ideas), is an $\mathcal{O}(N)$ method designed to solve Eq.~\ref{eq:fmm:n_body} by expanding the potential in Taylor series \emph{both} around $\mathbf{x}_a$ and $\mathbf{x}_b$ and grouping similar terms arising from nearby particles. For comparison, a \cite{Barnes1986} tree-code expands the potential only around $\mathbf{x}_b$. \subsubsection{Double expansion of the potential} \begin{figure} \includegraphics[width=\columnwidth]{cells.pdf} \caption{The basics of the FMM: The potential generated by a particle at position $\mathbf{x}_b$ on a particle at position at location $\mathbf{x}_a$ is replaced by a Taylor expansion of the potential around the distance vector $\mathbf{R}$ linking the two centres of mass ($\mathbf{z}_A$ and $\mathbf{z}_B$) of cell $A$ and $B$. The expansion converges towards the exact expression provided $|\mathbf{R}|>|\mathbf{r}_a + \mathbf{r}_b|$.} \label{fig:fmm:cells} \end{figure} In what follows, we use the compact multi-index notation of \cite{Dehnen2014} (repeated in appendix \ref{sec:multi_index_notation} for completeness) to simplify expressions and ease comparisons with other published work. $\mathbf{k}$, $\mathbf{m}$ and $\mathbf{n}$ are multi-indices and $\mathbf{r}$, $\mathbf{R}$, $\mathbf{x}$, $\mathbf{y}$ and $\mathbf{z}$ are vectors, whilst $a$ and $b$ are particle indices. Note also that we make no assumption on the specific functional form of the potential $\varphi$.\\ For a single pair of particles $a$ and $b$ located in cell $A$ and $B$ with centres of mass $\mathbf{z}_A$ and $\mathbf{z}_B$ respectively, as shown on Fig.~\ref{fig:fmm:cells}, the potential generated by $b$ at the location of $a$ can be rewritten as \begin{align} \varphi(\mathbf{x}_a - \mathbf{x}_b) &= \varphi\left(\mathbf{x}_a - \mathbf{z}_A - \mathbf{x}_b + \mathbf{z}_B + \mathbf{z}_A - \mathbf{z}_B\right) \nonumber \\ &= \varphi\left(\mathbf{r}_a - \mathbf{r}_b + \mathbf{R}\right) \nonumber \\ &= \sum_\mathbf{k} \frac{1}{\mathbf{k}!} \left(\mathbf{r}_a - \mathbf{r}_b\right)^{\mathbf{k}} \nabla^{\mathbf{k}}\varphi(\mathbf{R}) \nonumber \\ &= \sum_\mathbf{k} \frac{1}{\mathbf{k}!} \sum_{\mathbf{n} < \mathbf{k}} \binom{\mathbf{k}}{\mathbf{n}} \mathbf{r}_a^{\mathbf{n}} \left(-\mathbf{r}_b\right)^{\mathbf{k} - \mathbf{n}} \nabla^{\mathbf{k}}\varphi(\mathbf{R})\nonumber \\ &= \sum_\mathbf{n} \frac{1}{\mathbf{n}!} \mathbf{r}_a^{\mathbf{n}} \sum_\mathbf{m} \frac{1}{\mathbf{m}!} \left(-\mathbf{r}_b\right)^\mathbf{m} \nabla^{\mathbf{n}+\mathbf{m}} \varphi(\mathbf{R}), \label{eq:fmm:expansion} \end{align} where we used the Taylor expansion of $\varphi$ around $\mathbf{R} \equiv \mathbf{z}_A - \mathbf{z}_B$ on the third line, used $\mathbf{r}_a \equiv \mathbf{x}_a - \mathbf{z}_A$, $\mathbf{r}_b \equiv \mathbf{x}_b - \mathbf{z}_B$ throughout and defined $\mathbf{m} \equiv \mathbf{k}-\mathbf{n}$ on the last line. Expanding the series only up to order $p$, we get \begin{equation} \varphi(\mathbf{x}_a - \mathbf{x}_b) \approx \sum_{\mathbf{n}}^{p} \frac{1}{\mathbf{n}!} \mathbf{r}_a^{\mathbf{n}} \sum_{\mathbf{m}}^{p -|\mathbf{n}|} \frac{1}{\mathbf{m}!} \left(-\mathbf{r}_b\right)^\mathbf{m} \nabla^{\mathbf{n}+\mathbf{m}} \varphi(\mathbf{R}), \label{eq:fmm:fmm_one_part} \end{equation} with the approximation converging as $p\rightarrow\infty$ towards the correct value provided $|\mathbf{R}|>|\mathbf{r}_a + \mathbf{r}_b|$. If we now consider all the particles within $B$ and combine their contributions to the potential at location $\mathbf{x}_a$ in cell $A$, we get \begin{align} \phi_{BA}(\mathbf{x}_a) &= \sum_{b\in B}G m_b\varphi(\mathbf{x}_a - \mathbf{x}_b) \label{eq:fmm:fmm_one_cell} \\ &\approx G\sum_{\mathbf{n}}^{p} \frac{1}{\mathbf{n}!} \mathbf{r}_a^{\mathbf{n}} \sum_{\mathbf{m}} ^{p -|\mathbf{n}|} \frac{1}{\mathbf{m}!} \sum_{b\in B} m_b\left(-\mathbf{r}_b\right)^\mathbf{m} \nabla^{\mathbf{n}+\mathbf{m}} \varphi(\mathbf{R}) \nonumber. \end{align} This last equation forms the basis of the FMM. The algorithm decomposes the equation into three separated sums evaluated at different stages.\\ \subsubsection{The FMM algorithm} In a first step, multipoles are constructed from the innermost sum. For each cell, we compute all the terms \begin{equation} \mathsf{M}_{\mathbf{m}}(\mathbf{z}_B) = \frac{1}{\mathbf{m}!} \sum_{b\in B} m_b\left(-\mathbf{r}_b\right)^\mathbf{m} \label{eq:fmm:P2M} \end{equation} up to order $p$. This is the first kernel of the method, commonly labelled as \textsc{P2M} (particle to multipole). In a second step, we compute the second kernel, \textsc{M2L} (multipole to local expansion), which corresponds to the interaction of a cell with another one: \begin{equation} \mathsf{F}_{\mathbf{n}}(\mathbf{z}_A) = G\sum_{\mathbf{m}}^{p -|\mathbf{n}|} \mathsf{M}_{\mathbf{m}}(\mathbf{z}_B) \mathsf{D}_{\mathbf{n}+\mathbf{m}}(\mathbf{R}), \label{eq:fmm:M2L} \end{equation} where $\mathsf{D}_{\mathbf{n}+\mathbf{m}}(\mathbf{R}) \equiv \nabla^{\mathbf{n}+\mathbf{m}} \varphi(\mathbf{R})$ is an order $n+m$ derivative of the potential. This is the computationally expensive step of the FMM algorithm as the number of operations in a naive implementation using cartesian coordinates scales as $\mathcal{O}(p^6)$. More advanced techniques \citep[e.g.][]{Dehnen2014} can bring the cost down to $\mathcal{O}(p^3)$, albeit at a considerable algebraic cost. For collisionless dynamics, high accuracy is not required and low values of $p$ are sufficient, which maintains the computational cost of the M2L kernel at a reasonable level. Finally, in the last step, the potential is propagated from the local expansion centre to the particles (L2P kernel) using \begin{equation} \phi_{BA}(\mathbf{x}_a) = \sum_{\mathbf{n}}^{p} \frac{1}{\mathbf{n}!} \mathbf{r}_a^{\mathbf{n}} \mathsf{F}_{\mathbf{n}}(\mathbf{z}_A) = \sum_{\mathbf{n}}^{p} \mathsf{X}_{\mathbf{n}}(\mathbf{r}_a) \mathsf{F}_{\mathbf{n}}(\mathbf{z}_A), \label{eq:fmm:L2P} \end{equation} where we introduced the tensors $\mathsf{X}_{\mathbf{n}}(\mathbf{r}_a) \equiv \frac{1}{\mathbf{n}!} \mathbf{r}_a^{\mathbf{n}}$ to simplify the notation. In summary, the potential generated by a cell $B$ on the particles in cell $A$ is obtained by the successive application of the P2M, M2L and L2P kernels. The P2M and L2P kernels are applied only once per particle, whilst one M2L calculation has to be performed for each pair of cells. The forces applied to the particles are obtained by the same procedure using an extra order in the Taylor expansion. For instance, for the acceleration along $x$, we have: \begin{equation} a_x(\mathbf{x}_a) = \sum_{\mathbf{n}}^{p-1} \mathsf{X}_{\mathbf{n}}(\mathbf{r}_a) \mathsf{F}_{\mathbf{n}+\left(1,0,0\right)}(\mathbf{z}_A). \label{eq:fmm:L2P_force} \end{equation} In practice, the multipoles can be constructed recursively from the leaves of the tree to the root and the local expansions from the root to the leaves by shifting the $\mathsf{M}$ and $\mathsf{F}$ tensors and adding their contributions to their parent or daughter cell's tensors respecitvely. The shifting formulas (M2M and L2L kernels) read: \begin{align} \mathsf{M}_{\mathbf{m}}(\mathbf{x} + \mathbf{y}) &= \sum_{\mathbf{n}}^{\mathbf{m}} \frac{\mathbf{y}^\mathbf{n}}{\mathbf{n}!}\mathsf{M}_{\mathbf{m} - \mathbf{n}}(\mathbf{x}), \label{eq:fmm:M2M} \\ \mathsf{F}_{\mathbf{n}}(\mathbf{x} + \mathbf{y}) &= \sum_{\mathbf{m}}^{p-|\mathbf{n}|} \frac{\mathbf{y}^\mathbf{m}}{\mathbf{m}!}\mathsf{F}_{\mathbf{m} + \mathbf{n}}(\mathbf{x}). \label{eq:fmm:L2L} \end{align} One final, useful expression that enters some interaction between tree-leaves is the P2M kernel that directly applies the potential due to a multipole expansion to a particle without using the expansion of the potential $\mathsf{F}$ at the centre of mass of the cell. This kernel is obtained by setting $\mathbf{r}_a$ to zero in eq.~\ref{eq:fmm:expansion}, re-defining $\mathbf{R}\equiv\mathbf{x}_{\rm a} - \mathbf{z}_{\rm B}$ and constructing the same $\mathsf{M}$ and $\mathsf{D}$ tensor as for the other kernels: \begin{align} \phi_{Ba}(\mathbf{x}_a) &= G\sum_{\mathbf{m}}^p \mathsf{M}_{\mathbf{m}} \mathsf{D}_{\mathbf{m}}(\mathbf{R}),\\ a_x(\mathbf{x}_a) &= G\sum_{\mathbf{m}}^p \mathsf{M}_{\mathbf{m}} \mathsf{D}_{\mathbf{m}+\left(1,0,0\right)}(\mathbf{R}). \label{eq:fmm:M2P} \end{align} A traditional tree-code uses solely that kernel to obtain the forces from the multipoles (or often just monopoles, i.e. setting $p=0$ throughout) to the particles. Similarly, the field tensor of a cell can receive the contribution from a single particle at a distance $\mathbf{R} \equiv \mathbf{z}_A - \mathbf{x}_b$ via the P2L kernel: \begin{equation} \mathsf{F}_{\mathbf{n}}(\mathbf{z}_A) = G m_b \mathsf{D}_{\mathbf{n}}(\mathbf{R}). \label{eq:fmm:P2L} \end{equation} The M2P and P2L kernels can be used to speed up the calculations involving only as single particle. All the kernels (Eqs.~\ref{eq:fmm:P2M}-\ref{eq:fmm:P2L}) are rather straightforward to evaluate as they are only made of additions and multiplications (provided $\mathsf{D}$ can be evaluated quickly), which are extremely efficient instructions on modern architectures (see Appendix \ref{sec:pot_derivatives} for the full expressions). However, the fully expanded sums can lead to rather large and prone to typo expressions. To avoid any mishaps, we use a \texttt{python} script to generate C code in which all the sums are unrolled and correct by construction. In \swift, we implemented the kernels up to order $p=5$, as it proved to be accurate enough for our purpose, but this could be extended to higher order easily. This implies storing $56$ numbers per cell for each $\textsf{M}$ and $\textsf{F}$ plus three numbers for the location of the centre of mass. For leaf-cells with large numbers of particles, as in \swift, this is a small memory overhead. One further small improvement consists in choosing $\mathbf{z}_A$ to be the centre of mass of cell $A$ rather than its geometrical centre. The first order multipoles ($\mathsf{M}_{100},\mathsf{M}_{010},\mathsf{M}_{001}$) then vanish by construction. This allows us to simplify some of the expressions and helps reduce, albeit by a small fraction, the memory footprint of the tree structure. \subsubsection{Computing the accelerations via a tree-walk} We define the maximal distance between a centre of mass of a cell $B$ and any particle in that cell as \begin{equation} \rho_B = \max_{b \in B}(|\mathbf{r}_b|) \end{equation} \\ \textcolor{red}{MORE WORDS HERE}