Threadpool Load Balancing
Currently the threadpool uses a schedule similar to openMPs dynamic schedule with a fixed chunk size. This is probably fine if all the work chunks take equal time. It is possibly worth investigating using a schedule similar to openMP's "guided" schedule - in which the chunk size starts large (as there is a lot of work to do), but decreases over time (in fact is proportional to the amount of work left to do divided by the number of threads). So possible options are: tp->map_data_chunk = max((tp->map_data_size - task_ind) / (nr_threads * 2 ), min_chunk_size) or, a simple solution that can work sometimes tp->map_data_chunk = max(tp->map_data_chunk/2, min_chunk_size)