|
|
# GPU Profling
|
|
|
|
|
|
What is GPU profiling?
|
|
|
For the CPU version of the code we have VTune to profile, as well as our tasking plots. However for the GPU version of the code we need different software to profile the MegaKernel™ and improve its performance.
|
|
|
|
|
|
## Getting ```nvvc```
|
|
|
## Getting ```nvvp```
|
|
|
|
|
|
The GUI profiling tool can be downloaded [here](https://developer.nvidia.com/cuda-downloads).
|
|
|
The GUI profiling tool can be downloaded [here](https://developer.nvidia.com/cuda-downloads). The NVIDIA Visual Profiler isn't usually linked within your system (so typing ```nvvp``` in the terminal, or looking in your menus isn't going to help) meaning you will need to look for the install location [here](http://docs.nvidia.com/cuda/cuda-quick-start-guide/index.html) in the Quick Start Guide, or you can use the ```cudatoolkit``` module on Piz Daint.
|
|
|
|
|
|
## Compiling
|
|
|
|
|
|
You can profile tests using the ```craype-accel-nvidia35``` module on Piz Daint. Then, use ```nvprof ./binary``` and this will run your profile.
|
|
|
You can profile tests and the main kernel using the ```cudatoolkit``` module on Piz Daint. Once the module has loaded you will need to recompile your code (check that the ```-lineinfo``` flag is set for ```nvcc``` to collect profiling information). Then, use ```nvprof ./binary``` and this will run your code and should give you some basic profiling output.
|
|
|
|
|
|
#### ```test_27_cells```
|
|
|
|
... | ... | @@ -16,33 +16,36 @@ To get this to work with ```test_27_cells``` you will need to remove the memory |
|
|
|
|
|
## ```nvprof```
|
|
|
|
|
|
You want to use
|
|
|
Chuck these in a shell script (and of course replace ```<Binary> <Opt>``` with your binary).
|
|
|
```
|
|
|
nvprof --export-profile timeline.prof <Binary + Options Call>
|
|
|
nvprof --metrics achieved_occupancy,executed_ipc -o metrics.prof <Binary + Options Call>
|
|
|
nvprof --source-level-analysis pc_sampling -o pcsampling.prof <Binary + Options Call>
|
|
|
nvprof --analysis-metrics -o analysis_metrics.prof <Binary + Options Call>
|
|
|
nvprof --export-profile timeline.prof <Binary> <Opt>
|
|
|
nvprof --metrics achieved_occupancy,executed_ipc -o metrics.prof <Binary> <Opt>
|
|
|
nvprof --source-level-analysis pc_sampling -o pcsampling.prof <Binary> <Opt>
|
|
|
nvprof --analysis-metrics -o analysis_metrics.prof <Binary> <Opt>
|
|
|
```
|
|
|
You can then use it as a submission script with ```sbatch``` or call
|
|
|
```
|
|
|
salloc -C gpu --res=<res> --time=H:MM:SS
|
|
|
```
|
|
|
to get allocated time on an actual node with a GPU.
|
|
|
|
|
|
## Using ```nvvc```
|
|
|
## Using ```nvvp```
|
|
|
|
|
|
You will need to download these from the cluster and store them somewhere on your own machine. These can then be analysed using the GUI tool from NVIDIA, ```nvvc```.
|
|
|
You will need to download these from the cluster and store them somewhere on your own machine. These can then be analysed using the GUI tool from NVIDIA, ```nvvp```.
|
|
|
|
|
|
More information on running the profiler can be found in [these manual pages](http://docs.nvidia.com/cuda/profiler-users-guide/index.html#collecting-remote-data) but to get you started you want to use File -> Import, choose nvprof, and select your two files. Then import them and get to work!
|
|
|
More information on running the profiler can be found in [these manual pages](http://docs.nvidia.com/cuda/profiler-users-guide/index.html#collecting-remote-data) but to get you started you want to use File -> Import, choose nvprof, and select the files. See below for an example.
|
|
|
|
|
|
<VIDEO>
|
|
|
![nvvp_timeline](/uploads/257388d4ba44f97543938db6de312ea9/nvvp_timeline.png)
|
|
|
![nvprof_load](/uploads/4f90914334860382d47b9d0e0b54b5ea/nvprof_load.png)
|
|
|
|
|
|
### PC Sampling
|
|
|
|
|
|
### GPU Profiling on CRAY
|
|
|
<VIDEO>
|
|
|
|
|
|
You can profile tests using the ```craype-accel-nvidia35``` module. Then, use ```nvprof ./binary``` and this will run your profile. To get this to work with ```test_27_cells``` you will need to remove the memory clear at the end of the test as there are some as-yet undiagnosed problems with this...
|
|
|
### Realtime Profiling on Piz Daint
|
|
|
|
|
|
To analyse the data on your local machine you can use either [this nvidia tool](https://github.com/NVIDIA/cuda-profiler/tree/master/one_hop_profiling) to do things in real-time or (probably preferrably) you can just generate some stuff on Piz Daint with ```nvprof``` and copy this to your local machine.
|
|
|
|
|
|
To do that, run your code with the following:
|
|
|
```
|
|
|
nvprof --metrics achieved_occupancy,executed_ipc -o metrics.prof --export-profile timeline.prof
|
|
|
```
|
|
|
|
|
|
### Testing a single task
|
|
|
|
|
|
On branch cuda_test, you can edit and compile a test running a single task. To do so, copy the task that you wish to test in the tests/testcuda.cu file and update do_test_pair or do_test. You will also need to switch runPair on or off in the main.
|
... | ... | @@ -74,4 +77,4 @@ nvprof --analysis-metrics -o analysis_metrics.prof ./testcuda -p 8 -r 10 |
|
|
### GPU Profiling Results
|
|
|
|
|
|
##### ```do_test_pair p=8 r=10``` initial kernel run
|
|
|
![Screen_Shot_2017-09-06_at_09.24.45](/uploads/48634931462502aac88a804f99dfa2c9/Screen_Shot_2017-09-06_at_09.24.45.png) |
|
|
![Screen_Shot_2017-09-06_at_09.24.45](/uploads/48634931462502aac88a804f99dfa2c9/Screen_Shot_2017-09-06_at_09.24.45.png) |
|
|
\ No newline at end of file |