General overall vectorization improvements
By running the code through vector advizer, we might get suggestions for simple improvements to some loops everywhere in the code that would then (auto-)vectorize better. I am opening this issue to keep track of these. Obviously we mostly care about loops where a significant amount of time is spent.
The suggestions I expect to receive from VA would be:
- Type conversion issues
- Ill-placed branches
- Ill-formed exit conditions
- etc.
It'd be good to list these "issues" here and see whether simple fixes can be applied.