Skip to content
Snippets Groups Projects

Parallel io improvements

Merged Matthieu Schaller requested to merge parallel_io_improvements into master

A bunch of improvements to the parallel HDF5 i/o building on the discussion with the developers.

  • Get rank 0 to create the file and empty datasets.

  • Open the file then in parallel with all the ranks.

  • Write the data in parallel.

  • If the library is recent enough, switch on the parallel write of the meta-data.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • This is the monthly update on i/o...

    @pdraper it'd be great to know what you think of it.

  • Building this shows some documentation is missing:

    argument 'internal_units' of command @param is not found in the argument list of prepareArray(struct engine *e, hid_t grp, char *fileName, FILE *xmfFile, char *partTypeGroupName, struct io_props props, long long N_total, const struct unit_system *snapshot_units)
    /cosma6/data/dp004/pdraper/swift/swiftsim-drift/src/parallel_io.c:244: warning: The following parameters of prepareArray(struct engine *e, hid_t grp, char *fileName, FILE *xmfFile, char *partTypeGroupName, struct io_props props, long long N_total, const struct unit_system *snapshot_units) are not documented:
      parameter 'grp'
      parameter 'fileName'
      parameter 'xmfFile'
      parameter 'partTypeGroupName'
      parameter 'N_total'
    

    etc.

  • There is something wrong with the output naming as well. If I run EAGLE_6:

    mpirun -np 8 ../swift_mpi -s -t 2 -n 256 -PSnapshots:delta_time:9e-06 eagle_6.yml

    I see the following outputs:

    cosma-j > ls -lrt
    total 63136
    -rwxr-xr-x 1 pdraper dphsprog      248 Feb  9 14:40 run.sh
    -rw-r--r-- 1 pdraper dphsprog      556 Feb  9 14:40 README
    -rwxr-xr-x 1 pdraper dphsprog       86 Feb  9 14:40 getIC.sh
    lrwxrwxrwx 1 pdraper dphsprog       25 Feb  9 18:14 EAGLE_ICs_6.hdf5 -> ../../../EAGLE_ICs_6.hdf5
    drwxr-xr-x 2 pdraper dphsprog     4096 Feb  9 18:14 restart
    -rw-r--r-- 1 pdraper dphsprog     1926 Feb  9 18:14 eagle_6.yml
    -rw-r--r-- 1 pdraper dphsprog      539 Feb  9 18:16 timesteps_2.txt
    -rw-r--r-- 1 pdraper dphsprog      538 Feb  9 18:16 used_parameters.yml
    -rw-r--r-- 1 pdraper dphsprog     2701 Feb  9 18:16 dependency_graph.dot
    -rw-r--r-- 1 pdraper dphsprog    21536 Feb  9 18:17 eagle_0001.hdf5
    -rw-r--r-- 1 pdraper dphsprog    21536 Feb  9 18:17 eagle_0002.hdf5
    -rw-r--r-- 1 pdraper dphsprog    21536 Feb  9 18:17 eagle_0004.hdf5
    -rw-r--r-- 1 pdraper dphsprog    21536 Feb  9 18:17 eagle_0003.hdf5
    -rw-r--r-- 1 pdraper dphsprog    21536 Feb  9 18:17 eagle_0005.hdf5
    -rw-r--r-- 1 pdraper dphsprog    21536 Feb  9 18:17 eagle_0006.hdf5
    -rw-r--r-- 1 pdraper dphsprog    28038 Feb  9 18:17 timesteps_16.txt
    -rw-r--r-- 1 pdraper dphsprog      858 Feb  9 18:17 energy.txt
    -rw-r--r-- 1 pdraper dphsprog     2904 Feb  9 18:17 eagle.xmf
    -rw-r--r-- 1 pdraper dphsprog    21536 Feb  9 18:17 eagle_0008.hdf5
    -rw-r--r-- 1 pdraper dphsprog    21536 Feb  9 18:17 eagle_0007.hdf5
    -rw-r--r-- 1 pdraper dphsprog   292376 Feb  9 18:17 output.log
    -rw-r--r-- 1 pdraper dphsprog 64085536 Feb  9 18:17 eagle_0000.hdf5

    So all the intermediary dumps are truncated and the final snapshot has the wrong name. Seems that all the outputs are written to this file.

  • added 1 commit

    • 3f4324e6 - Fixed missing documentation of the new parallel write function.

    Compare with previous version

  • added 2 commits

    • aa3fc5f1 - Use e->snapshotOutputCount everywhere and not the static variable anymore.
    • 12bab1d7 - Documented the prepare_file() funtion.

    Compare with previous version

  • Ok, it looks like the change made to the output counter when adding the restart files was not correctly propagated when I merged that into my branch. It should all be fixed now.

  • Good, this all seems to be working now, so will accept.

  • Peter W. Draper mentioned in commit b503350c

    mentioned in commit b503350c

  • Remove the branch if you don't want to keep it.

Please register or sign in to reply
Loading