diff --git a/README.md b/README.md index 36b4e1070ee1f9deeaf5ade6438924706b8cdc5f..ff7a86e302fa8210537a409a8a38584b4b0fb195 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,44 @@ This package is a standalone part of [SWIFT](http://www.swiftsim.com) that -aims to roughly simulate the MPI interactions that taking a single step -of a simulation makes. +aims to roughly simulate the MPI interactions that taking a single step of a +SWIFT simulation makes. -The interactions are captured from a run of SWIFT when configured using the -configure option `--enable-mpiuse-reports`. When this is enabled each step of -the simulation produces logs for each rank which record when the MPI -interaction was started and when it completed. Other information such as the -ranks involved, the size of the data exchanged, the MPI tags used and which -SWIFT task types were used are also recorded. +The actual process within SWIFT is that queues of cell-based tasks are ran, +with their priorities and dependencies determining the order that the tasks +are ran in. Tasks are only added to a queue when they are ready to run, that +is they are not waiting for other tasks. This order also determines when the +sends and recvs needed to update data on other ranks are initiated as this +happens when the associated task is queued. The sends and recvs are considered +to be complete when MPI_Test returns true and this unlocks any dependencies +they have. Obviously a step cannot complete until all the sends and recvs are +themselves also complete, so the performance of the MPI library and lower +layers is critical. This seems to be most significant, not when we have a lot +of work, or very little, but for intermediary busy steps, when the local work +completes much sooner than the MPI exchanges. + +In SWIFT the enqueuing of tasks, thus send and recvs initiation (using +MPI_Isend and MPI_Irecv) can happen from all the available threads, but the +polling of MPI_Test is done primarily using two queues, but these can steal +work from other queues, and other queues can steal MPI_Test calls as well. +Enqueuing and processing can happen at the same time. + +To keep this simple this package uses three threads to simulate all this, a +thread that does the task of initiating the sends and recvs and two threads +that poll for completion of the sends and recvs. All threads run at the same +time. + +The send and recvs themselves are captured from a run of SWIFT when configured +using the configure option `--enable-mpiuse-reports`. When this is enabled +each step of the simulation produces logs for each rank which record when the +MPI interaction was started and when it completed. Other information such as +the ranks involved, the size of the data exchanged, the MPI tags used and +which SWIFT task types were used are also recorded. + +We read a concatenated log of all these outputs for a single step, and try to +use the relative times that the interaction were started as a guide, the +completions are just polled in time completion order until completion really +occurs. It is also possible to just start all the interactions as quickly as +possible for comparisons. To use the program `swiftmpistepsim` you need to select the step of interest (for instance one whose run-time seems dominated by the MPI tasks) and then @@ -18,16 +48,16 @@ run using: mpirun -np <nranks> swiftmpistepsim <step-log> <output-log> ``` which will output timings for the various MPI calls and record a log -for the reproduction in the file `<output-log>`. +for the reproduction in the file `<output-log>`. Note you must use the same +numbers of ranks as the original run of SWIFT. -To simulate SWIFT we use three threads, which run simultaneously, one that -injects the MPI commands, i.e. initiates the interaction using calls to -`MPI_Isend` and `MPI_Irecv`, and two other threads that poll the MPI library -using `MPI_Test` to discover when the exchanges have been completed. +The verbose output and output log can be inspected to see what delays are +driving the elapsed time for the step. Mainly these seem to be outlier +MPI_Test calls that take tens of milliseconds. -SWIFT itself uses more threads than this for the injection and polling phases, -but it is not thought to make a large difference. A later development could -explore that... +A script post-process.py can be ran on the output log to pair the sends and +recvs across the ranks. This allows the inspection of how well things like +eager exchanges are working and what effect the size of the packets has. --------------------------- -Peter W. Draper 18 Sep 2019. +Peter W. Draper 24 Sep 2019.