General overall vectorization improvements

By running the code through vector advizer, we might get suggestions for simple improvements to some loops everywhere in the code that would then (auto-)vectorize better. I am opening this issue to keep track of these. Obviously we mostly care about loops where a significant amount of time is spent.

The suggestions I expect to receive from VA would be:

Type conversion issues
Ill-placed branches
Ill-formed exit conditions
etc.

It'd be good to list these "issues" here and see whether simple fixes can be applied.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information