General overall vectorization improvements

By running the code through vector advizer, we might get suggestions for simple improvements to some loops everywhere in the code that would then (auto-)vectorize better. I am opening this issue to keep track of these. Obviously we mostly care about loops where a significant amount of time is spent.

The suggestions I expect to receive from VA would be:

  • Type conversion issues
  • Ill-placed branches
  • Ill-formed exit conditions
  • etc.

It'd be good to list these "issues" here and see whether simple fixes can be applied.