The CHECKPOINT.h5 file

What is this file ?

  • It is an hdf5 file containing the essential data given as input to and generated by DIRAC.

  • All data stored in this file is defined and documented in the DIRAC data schema, the source of which is found in utils/DIRACschema.txt.

What can I do with it ?

  • One purpose is to make restarting and data curation after a run easier.

  • Another purpose is to facilitate communication with other programs.

  • With the hdf5 format and h5py it is trivial to import data into Python and further process and view it.

Can I also extend the schema as I have data that is not listed here ?

  • The first question is whether this data is indeed essential. The CHECKPOINT.h5 file is not intended for large data sets as it gets saved automatically after a run. It is also not intended for highly specialized or intermediate data. If you want to use hdf5 for such data consider making a special hdf5 file using the interface provided in the mh5 module. This is even easier as you need not define everything as thoroughly as with the schema (see below).

  • In case a new data type is indeed a generally useful addition, please start by documenting it (type and description) and ask for a peer review by one of the developers before proceeding to the next step.

How the schema is processed.

  • The source text in utils/DIRACschema.txt is processed at run time by the python functions read_schema and write_schema that are found in utils/ and which are called by the DIRAC run script pam. This produces a new text file called schema_labels.txt which is placed in the work directory. This is the file used by the actual DIRAC code and contains the set of labels also found on CHECKPOINT.h5. To familiarize yourself with this: copy schema_labels.txt from the work directory and compare it to DIRACschema.txt.

  • Note that the hierarchical structure is defined by /s, much like you see in a Unix directory. This also means that one can not use /s in data labels as hdf5 would get confused.

  • In the Fortran code the generated labels are used directly, an example is found in gp/dircmo.F90: call checkpoint_write ('/result/wavefunctions/scf/energy',rdata=toterg) which writes the total energy (a single real number) with the appropriate label.

  • Note that data is classified as optional or required in the schema. This is used to define whether restart is possible, for this purpose all required data should be present on the checkpoint file.

How can the schema be extended ?

  • For extending the schema: Do NOT edit the schema_labels.txt file. All edits should be made in DIRACschema.txt.

  • Check first whether the data is optional or required. Be careful to define new required data as restart files will be invalid if this data is missing and this may hamper restarting from old checkpoint files.

  • If the data consists of a simple standard type (real, integer or string) which fits in an existing subdirectory you can simply define it at the appropriate place and the scripts will automatically generate the label. After inspecting this you can then use this in calls in the Fortran code.

  • If the data is of composite type, you need to define its elements below. This is done by creating new subsection in the file, an example is the molecule data type that is part of input and defined in a separate section. Each section starts with a * and ends with *end.

  • You may also nest sections, see for instance the data type wavefunctions that has the composite type scf as an element.

What happens at run time and on the Fortran side ?

  • At the start of the run DIRAC checks whether CHECKPOINT.h5 is present and contains all required data. If it is, a restart will be attempted. Note that you can use the copy facilities of pam to place the file in the work directory.

  • During the run the only calls needed on the Fortran side are checkpoint_read and checkpoint_write. These subroutines are found in the module checkpoint and support writing of reals, integers and strings. It is intended to keep the hdf5 interface simple and easily maintainable, so more complicated types should be split up in these standard types. There are two more public routines in this module (for opening and closing a checkpoint file), but these are already called at the appropriate places in DIRAC and should not be called at other places.

  • If the checkpoint_read routine is called data is located on the file and given back to the caller. Error handling is currently absent (still to do) so make sure data is indeed present or you may get crashes.

  • If the checkpoint_write routine is called data is stored on file after checking that the label is indeed known in the schema. Undefined data will not be written and a warning is issued. This guarantees that all data placed on the checkpoint file is properly documented.

  • At the end of the run pam checks whether CHECKPOINT.h5 is present and contains all required data. If it is, it will be copied to the directory from which pam was called and renamed following the same convention as used for the output file, but giving a file extension .h5 instead of .out.