Skip to content
Snippets Groups Projects

Copy specified HDF5 group from the ICs to output snapshot files

Merged John Helly requested to merge ic_info into master
4 unresolved threads

This tries to address issues #772 (closed) and #556 (closed). On startup the named HDF5 group is copied from the initial conditions file into memory. The group is then written to each snapshot.

There is currently no check on the size of the group so if the group is large or we accidentally set InitialConditions:metadata_group_name to the root group or PartType1 etc we will have problems.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • mentioned in issue #772 (closed)

  • @pdraper any thoughts on this, especially regarding being a good cluster citizen and temporary files this generates?

  • Author Developer

    I don't think it does generate any temporary files. It uses the HDF5 core driver with backing_store=0 so the data is never written to disk. I'm not sure why HDF5 insists on having a unique file name in this case. I couldn't find anything about it in the documentation but in practice if I create a file on disk and then try to make in in-memory file image with the same name it fails. Might need to understand this a bit better before we can merge this.

  • Ah then I misunderstood. Thought it was creating a single empty file for the duration of the run.

  • All sounds fine to me, sadly POSIX has deprecated all the standard ways to make a temporary filename (they worry about the time between making the name and using it, that apparently makes it not worth having), so you need to roll your own as John has done. As for HDF5 still needing a non-existent filename, not surprised...

    Edited by Peter W. Draper
  • Author Developer

    It looks like the filename doesn't matter when creating a new file with the core driver but there's a problem when opening a file image stored in an array:

      hid_t fapl_id = H5Pcreate(H5P_FILE_ACCESS);
      const size_t increment = 1024*1024;
      const hbool_t backing_store = 0;
      H5Pset_fapl_core(fapl_id, increment, backing_store)
      H5Pset_file_image(fapl_id, data, len);
      H5Fopen(fname, H5F_ACC_RDONLY, fapl_id)

    If file 'filename' already exists, this happens:

    HDF5-DIAG: Error detected in HDF5 (1.10.3) thread 0:
      #000: H5F.c line 509 in H5Fopen(): unable to open file
        major: File accessibilty
        minor: Unable to open file
      #001: H5Fint.c line 1400 in H5F__open(): unable to open file
        major: File accessibilty
        minor: Unable to open file
      #002: H5Fint.c line 1546 in H5F_open(): unable to open file: time = Wed Oct 13 11:47:20 2021
    , name = 'abc.hdf5', tent_flags = 0
        major: File accessibilty
        minor: Unable to open file
      #003: H5FD.c line 734 in H5FD_open(): open failed
        major: Virtual File Layer
        minor: Unable to initialize object
      #004: H5FDcore.c line 642 in H5FD__core_open(): file already exists
        major: File accessibilty
        minor: File already exists

    So opening the file fails because the file exists! I guess it makes sense if backing_store=true because HDF5 would have to create a new file on disk in that case. Maybe they just forgot to disable the check when backing_store=false.

  • Is that an hdf5 bug then? would 1.10.6 or 1.12.0 work?

  • Is this then ready for review?

  • Author Developer

    Yes, I think so.

  • My only question is whether some of the functions shouldn't be moved to common_io.c.

    Otherwise, I'll hand over to Peter.

1140 1140 size_t Nspart = 0, Nbpart = 0, Nsink = 0;
1141 1141 double dim[3] = {0., 0., 0.};
1142 1142
1143 /* Prepare struct to store metadata from from ICs */
  • 1140 1140 size_t Nspart = 0, Nbpart = 0, Nsink = 0;
    1141 1141 double dim[3] = {0., 0., 0.};
    1142 1142
    1143 /* Prepare struct to store metadata from from ICs */
    1144 struct ic_info ics_metadata;
  • John Helly added 2 commits

    added 2 commits

    • d0a46460 - Fix typos in comments relating to ic_info struct
    • 31aa1735 - Declare struct ic_info at top of main()

    Compare with previous version

  • 33
    34 /**
    35 * @brief Generates a file name using the process ID and host name
    36 *
    37 * @param prefix Prefix to add to the file name
    38 * @param name Buffer to write the file name to
    39 * @param len Length of the buffer
    40 *
    41 */
    42 static void unique_filename(char *prefix, char *name, size_t len) {
    43
    44 /* Get our process ID */
    45 long pid = (long) getpid();
    46
    47 /* Get our host name. May be truncated or empty. */
    48 const int max_name_length = 500;
  • John Helly added 1 commit

    added 1 commit

    • 935fca07 - Change filename length to a macro in ic_info.c

    Compare with previous version

  • Matthieu Schaller approved this merge request

    approved this merge request

  • Matthieu Schaller unapproved this merge request

    unapproved this merge request

  • The jenkins tool actually complained with:

    hdf5_object_to_blob.c: In function 'unique_filename':
    hdf5_object_to_blob.c:61:11: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
       if((ret >= len) || (ret <= 0)) {
               ^~
    cc1: all warnings being treated as errors
  • Author Developer

    Ok, so that's awkward. snprintf expects to be given the length of the buffer as a size_t but it returns the number of characters it wrote as an int and then I'm supposed to compare them to check it worked.

  • John Helly added 1 commit

    added 1 commit

    • 648a3a4a - Be more careful with snprintf return value when generating filename

    Compare with previous version

  • Author Developer

    This version compiles with no warnings with gcc 11.1. It checks that the int return value is not negative then casts to size_t to compare with the buffer size.

    Edited by John Helly
  • Author Developer

    I see that in the rest of swift the usual approach is to ignore the return code from snprintf, so maybe I shouldn't be worrying about this...

  • Yes, I have been lazy.... I think we don't need to worry about this. The day these functions break, we likely have much more important problems to tackle...

  • Matthieu Schaller approved this merge request

    approved this merge request

  • 78 */
    79 void hdf5_object_to_blob(hid_t group_id, char *name, size_t *len, void **data) {
    80
    81 /* Set up property list */
    82 hid_t fapl_id = H5Pcreate(H5P_FILE_ACCESS);
    83 const size_t increment = 1024*1024;
    84 const hbool_t backing_store = 0;
    85 if(H5Pset_fapl_core(fapl_id, increment, backing_store) < 0) {
    86 fprintf(stderr, "Unable to set core driver\n");
    87 abort();
    88 }
    89
    90 /* Generate a file name which (hopefully!) doesn't exist */
    91 const int max_name_length = 500;
    92 char filename[max_name_length];
    93 unique_filename("__SWIFT_HDF5_CORE_WRITE", filename, max_name_length);
    • Hmm, I'm concerned that read access to the current directory is implied here and may be needed. I think it would be safe to add "/tmp" to the front to make sure that is always true.

    • Please register or sign in to reply
  • A description of InitialConditions:metadata_group_name needs to be added to the parameter_example.yml file.

  • When the HDF5 group doesn't exist we get a big error report:

    HDF5-DIAG: Error detected in HDF5 (1.10.6) thread 22758013430976:                                                                                               
      #000: ../../../src/H5G.c line 463 in H5Gopen2(): unable to open group                                                                                         
        major: Symbol table                                                                                                                                         
        minor: Can't open object                                                                                                                                    
      #001: ../../../src/H5Gint.c line 281 in H5G__open_name(): group not found                                                                                     
        major: Symbol table                                                                                                                                         
        minor: Object not found                                                                                                                                     
      #002: ../../../src/H5Gloc.c line 422 in H5G_loc_find(): can't find object                                                                                     
        major: Symbol table                                                                                                                                         
        minor: Object not found                                                                                                                                     
      #003: ../../../src/H5Gtraverse.c line 851 in H5G_traverse(): internal path traversal failed                                                                   
        major: Symbol table                                                                                                                                         
        minor: Object not found                                                                                                                                     
      #004: ../../../src/H5Gtraverse.c line 627 in H5G__traverse_real(): traversal operator failed                                                                  
        major: Symbol table                                                                                                                                         
        minor: Callback failed                                                                                                                                      
      #005: ../../../src/H5Gloc.c line 378 in H5G__loc_find_cb(): object 'ICs_parameters' doesn't exist                                                             
        major: Symbol table                                                                                                                                         
        minor: Object not found       

    which raises a couple of questions. Do we want to avoid this and just make a less noisy report, and if not, shouldn't the default be the empty string?

  • Author Developer

    We can probably avoid the error messages by checking for the group with H5Lexists() and maybe H5lget_info first. Or just disable error reporting and restore it afterwards.

  • Existence checking sounds better, then just issue a simple warning about not doing the copy. Could make it fatal, if you think the lack is important, but as I said I think for that outcome the default should be don't copy anything.

  • John Helly added 3 commits

    added 3 commits

    • 848fe53d - Check if metadata group exists in ICs file before trying to copy
    • 033e5094 - Add InitialConditions:metadata_group_name to parameter_example.yml
    • 9d3aaa25 - Only report copying of metadata from ICs on rank zero

    Compare with previous version

  • Author Developer

    This version checks that the supplied name is a valid link before trying to open it. That should suppress the HDF5 error messages if the group doesn't exist. If anything goes wrong it just writes a log message and continues.

  • John Helly added 2 commits

    added 2 commits

    • d6ec07cf - Add /tmp/ prefix to filenames for HDF5 file images
    • ec3df62c - Add checks in hdf5_object_to_blob.c and use error() instead of abort()

    Compare with previous version

  • Author Developer

    This version uses a filename in /tmp/ as suggested by Peter. From a bit of testing it doesn't seem to matter whether the directory exists or is readable or writable. The only failing case I've found is if the exact file name we're using exists when calling H5Fopen to open an existing file image. It now stops with an error message if this happens (which should be very unlikely.)

    In the more likely case where there just isn't a metadata group to copy it writes out a warning and continues.

  • Thanks for the changes. All looks ready to me, so accepting.

  • Peter W. Draper mentioned in commit 160ed6db

    mentioned in commit 160ed6db

  • Please register or sign in to reply
    Loading