Loop unrolling
Is there any reason why we're not compiling with -funroll-loops? I just realized that for some functions, gcc produces much better code with this option (see https://godbolt.org/g/vXAlII).
Note that with icc this option doesn't seem to make a difference.