Fix to #72 (closed).
This is a workaround to the limitation of parallel-HDF5. The low-level MPI-IO implementations limit writes to 2GB per rank (irrespective of the total amount being written across all nodes).
The solution involves writing chunks of 2GB (or in practice 2'000'000'000Bytes) and then repeat for the remaining chunks, if any, by shifting the position to write of each node in the file and in memory by 2GB. Ranks that did not pass the threshold just write nothing. In realistic scenarios we won't need more than a handful of iterations.