Optimal value for `queue_search_window`?

I recently added a change to how tasks are selected (see !75 (merged)), i.e. tasks are now selected with the following criteria:

Loosely sort the tasks according to their weight, which is the length of the critical path they lie on.
Run through that array with a window of size queue_search_window and try to lock the task with the largest data overlap with the last executed task by that thread. If that task can't be locked, drop it from the window and add the next one, rinse, repeat. This approach works well in QuickSched and seems to at least not cause a performance regression in SWIFT, so I guess it's OK. The problem is that I don't have a clue as to what the optimal value for queue_search_window should be. A small window may miss good tasks when the task weights are all more or less equal, yet a large window may pick tasks with a really bad weight and increase the cost of accessing the list.

So here's the issue: Can somebody set up some benchmarks of different sized systems, e.g. 1M-100M particles, and check different values of queue_search_window on Cosma5? Ideally, I'd like to see the effect of the window size in relation to the measurement noise, e.g. a plot with error bars.

The stretch goal here is also to perhaps find a better strategy for the windowing, e.g. if the top task has weight W, make a window that contains tasks with weights at least 0.9 W? Or any other function of W? I'm open to any ideas here!

Cheers, Pedro

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information