Add more motivation and explanation to the README

fb3c2ff3 · Peter W. Draper · f261af99 · fb3c2ff3
Commit fb3c2ff3 authored 5 years ago by Peter W. Draper
--- a/README.md
+++ b/README.md

 This package is a standalone part of [SWIFT](http://www.swiftsim.com) that
-aims to roughly simulate the MPI interactions that taking a single step
-of a simulation makes.
+aims to roughly simulate the MPI interactions that taking a single step of a
+SWIFT simulation makes.

-The interactions are captured from a run of SWIFT when configured using the
-configure option `--enable-mpiuse-reports`. When this is enabled each step of
-the simulation produces logs for each rank which record when the MPI
-interaction was started and when it completed. Other information such as the
-ranks involved, the size of the data exchanged, the MPI tags used and which
-SWIFT task types were used are also recorded.
+The actual process within SWIFT is that queues of cell-based tasks are ran,
+with their priorities and dependencies determining the order that the tasks
+are ran in. Tasks are only added to a queue when they are ready to run, that
+is they are not waiting for other tasks. This order also determines when the
+sends and recvs needed to update data on other ranks are initiated as this
+happens when the associated task is queued. The sends and recvs are considered
+to be complete when MPI_Test returns true and this unlocks any dependencies
+they have. Obviously a step cannot complete until all the sends and recvs are
+themselves also complete, so the performance of the MPI library and lower
+layers is critical. This seems to be most significant, not when we have a lot
+of work, or very little, but for intermediary busy steps, when the local work
+completes much sooner than the MPI exchanges.
+
+In SWIFT the enqueuing of tasks, thus send and recvs initiation (using
+MPI_Isend and MPI_Irecv) can happen from all the available threads, but the
+polling of MPI_Test is done primarily using two queues, but these can steal
+work from other queues, and other queues can steal MPI_Test calls as well.
+Enqueuing and processing can happen at the same time.
+
+To keep this simple this package uses three threads to simulate all this, a
+thread that does the task of initiating the sends and recvs and two threads
+that poll for completion of the sends and recvs. All threads run at the same
+time.
+
+The send and recvs themselves are captured from a run of SWIFT when configured
+using the configure option `--enable-mpiuse-reports`. When this is enabled
+each step of the simulation produces logs for each rank which record when the
+MPI interaction was started and when it completed. Other information such as
+the ranks involved, the size of the data exchanged, the MPI tags used and
+which SWIFT task types were used are also recorded. 
+
+We read a concatenated log of all these outputs for a single step, and try to
+use the relative times that the interaction were started as a guide, the
+completions are just polled in time completion order until completion really
+occurs. It is also possible to just start all the interactions as quickly as
+possible for comparisons.

 To use the program `swiftmpistepsim` you need to select the step of interest
 (for instance one whose run-time seems dominated by the MPI tasks) and then 
@@ -18,16 +48,16 @@ run using:
   mpirun -np <nranks> swiftmpistepsim <step-log> <output-log>
 ```
 which will output timings for the various MPI calls and record a log
-for the reproduction in the file `<output-log>`.
+for the reproduction in the file `<output-log>`. Note you must use the same
+numbers of ranks as the original run of SWIFT.

-To simulate SWIFT we use three threads, which run simultaneously, one that
-injects the MPI commands, i.e. initiates the interaction using calls to
-`MPI_Isend` and `MPI_Irecv`, and two other threads that poll the MPI library
-using `MPI_Test` to discover when the exchanges have been completed.
+The verbose output and output log can be inspected to see what delays are
+driving the elapsed time for the step. Mainly these seem to be outlier
+MPI_Test calls that take tens of milliseconds.

-SWIFT itself uses more threads than this for the injection and polling phases,
-but it is not thought to make a large difference. A later development could
-explore that...
+A script post-process.py can be ran on the output log to pair the sends and
+recvs across the ranks. This allows the inspection of how well things like
+eager exchanges are working and what effect the size of the packets has.

 ---------------------------
-Peter W. Draper 18 Sep 2019.
+Peter W. Draper 24 Sep 2019.