Use the threadpool to parallelize operations in the particle-splitting code

Merged Matthieu Schaller requested to merge threadpool_splitting into master

Implements #641 (closed).

All the loops over the global particle arrays have been parallelized. I have also added alignment information to help the compiler use faster memcpy implementations.

Merge request reports