Following the results in the threadpool_task_plots branch, I've replaced the elaborate hand-crafted engine_barrier function by two pthread_barriers.
threadpool_task_plots
engine_barrier
pthread_barrier
As a result, the runner threads should all start, and synchronize, faster.