Logger performance

FYI: @lhausammann and @nnrw56

I have been running the PMillennium-1536 example with and without the logger switched on. This has 3.6B particles and I am running it with 4 MPI ranks using 28 threads each. Latest master, Intel 2020 compiler, same config apart from --enable-logger and the runtime --logger flag switched on.

Overall verdict: The logger run is much slower and uses quite a bit more memory.

For the first 8 steps, the no logger run took 8000s. The run with logger took 21600s.

Key differences:

All the tasks are slower with logger. Likely because the particles are more heavy and we are bandwidth limited.
The run develops imbalance likely because of the logger task. That triggers a lot of redistribute.
On the very first step, we take 45min to complete the step, 90% of that time spent in the logger task. Likely initialising things.
Dumping index files almost every step, taking ~2min / step.
About 15% of the time in the logger run is unaccounted for. Not sure what function call does not print the time spent in it but that needs fixing.
Checking the RES memory on the compute nodes, the run with logger is 15% more greedy.
The logger run crashed twice as it runs out of memory trying to balance things. The default run never builds up more than 1% MPI imbalance after the 2nd step and hence never redists.

Let me know if there is any useful information I can extract from these runs, let me know.

The log files (so far) are:

CPU balance log for the run with logger: rank_cpu_balance.log

Edited Feb 09, 2021 by Matthieu Schaller

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information