Deadtimes in `runner_do_unskip`

Looking at a recent threadpool task plot for a "small" timestep, we have up to 90% deadtime, i.e. time in which no productive work is being done by the threadpool worker.

Looking at the code, there is nothing blocking in the main loop, i.e. outside of the tic and threadpool_log lines surrounding the mapper function call. So where are these large gaps coming from?

There are two hypotheses here:

The loop does not actually work as I think it does, and one of the operations there is blocking for whatever reason,
The logging/timing does not actually work as I think it does, and the gaps are actually happening somewhere in the mapper itself.

In order to figure out what is going on, I need profiling, but I only want to profile certain events, i.e. what's going on in threadpool_chomp when it is called for runner_do_unskip. What I propose doing is linking against libprofiler.so and adding calls to ProfilerStart and ProfilerStop around the corrseponding call to threadpool_map. This should give me line-by-line profiling information for the precise call stack that I want.

One problem though: I don't have a 16-core machine to run this on. I can set up a branch that does all this and run it on up to four cores on my laptop, but that's about it. Anybody willing to run this on our dedicated machine and send me the results once it's done?

Cheers, Pedro

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information