Loop unrolling
Is there any reason why we're not compiling with -funroll-loops
? I just realized that for some functions, gcc
produces much better code with this option (see https://godbolt.org/g/vXAlII).
Note that with icc
this option doesn't seem to make a difference.