Simplify, permit user control over affinity
This ensures we 'do the right thing' when the user imposes affinity through e.g. Intel MPI's I_MPI_PIN_DOMAIN or other mechanisms. Also no longer does works with a shuffled cpuid
array, which will hopefully have less surprising failure modes, and means we don't have to handle hyperthreading explicitly. If we want to be clever in this way e.g. to maximise available cache, then we could replace libnuma with hwloc as discussed elsewhere.
Hopefully this gives us a reasonable affinity which is easily overridden using standard techniques.