Threadpool reduction
From my wish list:
There are now two places in the code where we perform a reduction of quantities using the threadpool. One of them is performance critical as it takes place at the end of the time-step to gather the minimum over all (local) top-level cells.
At the moment this is done in a rather ad-hoc way (see engine_collect_end_of_step_mapper()
) by locking something (here the space
that has nothing to do with this bit of code!) whenever the thread is done iterating over its chunk of data.
That seems both inelegant and inefficient to me.
Ideally we could have a mechanism to perform a reduction which only locks once per thread in the pool (or even less) and not once per chunk. And ideally this would be generic and not have to be explicitly written for each reduction.