NUMA-aware affinity for workers
* This is not vectorisation-specific. * There may be better trade-offs between HT and NUMA. * Maybe print a warning when we require multiple NUMA nodes. * Must also detect when Hyper-Threading is not present. * Probably better as a configure flag, rather than conditional only upon the availability of libnuma. * ~15-40% performance improvement on COSMA.