Parallel io improvements
A bunch of improvements to the parallel HDF5 i/o building on the discussion with the developers.
-
Get rank 0 to create the file and empty datasets.
-
Open the file then in parallel with all the ranks.
-
Write the data in parallel.
-
If the library is recent enough, switch on the parallel write of the meta-data.
Merge request reports
Activity
This is the monthly update on i/o...
@pdraper it'd be great to know what you think of it.
Building this shows some documentation is missing:
argument 'internal_units' of command @param is not found in the argument list of prepareArray(struct engine *e, hid_t grp, char *fileName, FILE *xmfFile, char *partTypeGroupName, struct io_props props, long long N_total, const struct unit_system *snapshot_units) /cosma6/data/dp004/pdraper/swift/swiftsim-drift/src/parallel_io.c:244: warning: The following parameters of prepareArray(struct engine *e, hid_t grp, char *fileName, FILE *xmfFile, char *partTypeGroupName, struct io_props props, long long N_total, const struct unit_system *snapshot_units) are not documented: parameter 'grp' parameter 'fileName' parameter 'xmfFile' parameter 'partTypeGroupName' parameter 'N_total'
etc.
There is something wrong with the output naming as well. If I run EAGLE_6:
mpirun -np 8 ../swift_mpi -s -t 2 -n 256 -PSnapshots:delta_time:9e-06 eagle_6.yml
I see the following outputs:
cosma-j > ls -lrt total 63136 -rwxr-xr-x 1 pdraper dphsprog 248 Feb 9 14:40 run.sh -rw-r--r-- 1 pdraper dphsprog 556 Feb 9 14:40 README -rwxr-xr-x 1 pdraper dphsprog 86 Feb 9 14:40 getIC.sh lrwxrwxrwx 1 pdraper dphsprog 25 Feb 9 18:14 EAGLE_ICs_6.hdf5 -> ../../../EAGLE_ICs_6.hdf5 drwxr-xr-x 2 pdraper dphsprog 4096 Feb 9 18:14 restart -rw-r--r-- 1 pdraper dphsprog 1926 Feb 9 18:14 eagle_6.yml -rw-r--r-- 1 pdraper dphsprog 539 Feb 9 18:16 timesteps_2.txt -rw-r--r-- 1 pdraper dphsprog 538 Feb 9 18:16 used_parameters.yml -rw-r--r-- 1 pdraper dphsprog 2701 Feb 9 18:16 dependency_graph.dot -rw-r--r-- 1 pdraper dphsprog 21536 Feb 9 18:17 eagle_0001.hdf5 -rw-r--r-- 1 pdraper dphsprog 21536 Feb 9 18:17 eagle_0002.hdf5 -rw-r--r-- 1 pdraper dphsprog 21536 Feb 9 18:17 eagle_0004.hdf5 -rw-r--r-- 1 pdraper dphsprog 21536 Feb 9 18:17 eagle_0003.hdf5 -rw-r--r-- 1 pdraper dphsprog 21536 Feb 9 18:17 eagle_0005.hdf5 -rw-r--r-- 1 pdraper dphsprog 21536 Feb 9 18:17 eagle_0006.hdf5 -rw-r--r-- 1 pdraper dphsprog 28038 Feb 9 18:17 timesteps_16.txt -rw-r--r-- 1 pdraper dphsprog 858 Feb 9 18:17 energy.txt -rw-r--r-- 1 pdraper dphsprog 2904 Feb 9 18:17 eagle.xmf -rw-r--r-- 1 pdraper dphsprog 21536 Feb 9 18:17 eagle_0008.hdf5 -rw-r--r-- 1 pdraper dphsprog 21536 Feb 9 18:17 eagle_0007.hdf5 -rw-r--r-- 1 pdraper dphsprog 292376 Feb 9 18:17 output.log -rw-r--r-- 1 pdraper dphsprog 64085536 Feb 9 18:17 eagle_0000.hdf5
So all the intermediary dumps are truncated and the final snapshot has the wrong name. Seems that all the outputs are written to this file.
added 1 commit
- 3f4324e6 - Fixed missing documentation of the new parallel write function.
mentioned in commit b503350c
Please register or sign in to reply