More results

0ee9968a · Jonathan Frawley · a23ffff1 · 0ee9968a · 0ee9968a · 0ee9968a
Commit 0ee9968a authored 4 years ago by Jonathan Frawley
--- a/benchmark-slow/job-aps.sh
+++ b/benchmark-slow/job-aps.sh
 #!/bin/bash
 #SBATCH --job-name="swiftaps"
-#SBATCH --ntasks=1
+#SBATCH --ntasks=2
 #SBATCH --ntasks-per-node=1
 #SBATCH --output=swiftaps.out
 #SBATCH --error=swiftaps.err

--- a/benchmark-slow/job-arm.sh
+++ b/benchmark-slow/job-arm.sh
 #!/bin/bash
 #SBATCH --job-name="swiftarm"
-#SBATCH --ntasks=1
+#SBATCH --ntasks=2
 #SBATCH --ntasks-per-node=1
 #SBATCH --output=swiftarm.out
 #SBATCH --error=swiftarm.err

--- a/report/day1/benchmark-slow/aps_report_20210121_163112.html
+++ b/report/day1/benchmark-slow/aps_report_20210121_163112.html
--- a/report/day1/benchmark-slow/swift_mpi_1p_1n_2021-01-21_15-56.html
+++ b/report/day1/benchmark-slow/swift_mpi_1p_1n_2021-01-21_15-56.html
--- a/report/day1/benchmark-slow/swift_mpi_1p_1n_2021-01-21_15-56.txt
+++ b/report/day1/benchmark-slow/swift_mpi_1p_1n_2021-01-21_15-56.txt
+Command:        /cosma/home/ds007/dc-fraw1/performance_analysis_workshop/swift-cs-performance-workshop-2021/benchmark-slow/swiftsim/examples/swift_mpi --cosmology --self-gravity -v 1 --threads=64 -n 1 -P Restarts:enable:0 -PInitialConditions:file_name:/cosma5/data/do008/dc-fraw1/swift_initial_conditions/pmillenium/PMill-768.hdf5 p-mill-768.yml
+Resources:      1 node (32 physical, 64 logical cores per node)
+Memory:         503 GiB per node
+Tasks:          1 process
+Machine:        b108.pri.cosma7.alces.network
+Start time:     Thu Jan 21 15:56:08 2021
+Total time:     1714 seconds (about 29 minutes)
+Full path:      /cosma/home/ds007/dc-fraw1/performance_analysis_workshop/swift-cs-performance-workshop-2021/benchmark-slow/swiftsim/examples
+
+Summary: swift_mpi is Compute-bound in this configuration
+Compute:                                     94.7% |========|
+MPI:                                          0.1% ||
+I/O:                                          5.2% ||
+This application run was Compute-bound. A breakdown of this time and advice for investigating further is in the CPU section below. 
+As very little time is spent in MPI calls, this code may also benefit from running at larger scales.
+
+CPU:
+A breakdown of the 94.7% CPU time:
+Scalar numeric ops:                          12.5% ||
+Vector numeric ops:                          32.9% |==|
+Memory accesses:                             54.6% |====|
+The per-core performance is memory-bound. Use a profiler to identify time-consuming loops and check their cache performance.
+
+MPI:
+A breakdown of the 0.1% MPI time:
+Time in collective calls:                   100.0% |=========|
+Time in point-to-point calls:                 0.0% |
+Effective process collective rate:             750 MB/s
+Effective process point-to-point rate:        0.00 bytes/s
+
+I/O:
+A breakdown of the 5.2% I/O time:
+Time in reads:                              100.0% |=========|
+Time in writes:                               0.0% |
+Effective process read rate:                  99.5 MB/s
+Effective process write rate:                 0.00 bytes/s
+Most of the time is spent in read operations with a low effective transfer rate. This may be caused by contention for the filesystem or inefficient access patterns. Use an I/O profiler to investigate which write calls are affected.
+
+Threads:
+A breakdown of how multiple threads were used:
+Computation:                                 96.4% |=========|
+Synchronization:                              3.6% ||
+Physical core utilization:                  165.5% |================|
+System load:                                161.8% |===============|
+The system load is high. Check that other jobs or system processes are not running on the same nodes.
+
+Memory:
+Per-process memory usage may also affect scaling:
+Mean process memory usage:                    59.1 GiB
+Peak process memory usage:                    74.7 GiB
+Peak node memory usage:                      16.0% |=|
+The peak node memory usage is very low. Larger problem sets can be run before scaling to multiple nodes.
+
+Energy:
+A breakdown of how energy was used:
+CPU:                                      not supported
+System:                                   not supported
+Mean node power:                          not supported
+Peak node power:                              0.00 W
+Energy metrics are not available on this system.
+CPU metrics are not supported (no intel_rapl module)
+