2 DISK FORMATS AND HDF5

 2.1 Choosing the disk format version for new HDS files

Historically, the term “HDS” was used to refer both to a subroutine library and also to an on-disk data format. Both were developed by the UK Starlink project in the 1980’s. Since then the subroutine library has been used extensively - mainly within the Starlink Software Collection but also outside. However the size of typical astronomical data sets has increased enormously since the 1980’s (as has the typical astronomer’s desktop computing power) and the original disk format upon which the HDS library was based no longer looks like a good match to current needs. Solving this problem by upgrading the original disk format was deemed impractical as documentation on the internals of the HDS disk format is sparse and the original developers are no longer available. Instead, it was decided to move to the popular and well supported HDF5 library and corresponding disk format - see https://support.hdfgroup.org/HDF5/

However, given the extensive use of the HDS library within the Starlink Software Collection, simply changing all the HDS calls within starlink application code to equivalent HDF5 calls would be prohibitive, both from the point of view of the effort involved and also the likelihood of bugs being introduced.

Instead it was decided to re-implement the routines within the HDS library using calls to the routines provided by the HDF5 library, using the HDF5 disk format in a way that mimics as closely as possible the original HDS disk format. In this way no changes would be needed to application code since all HDS routines would still be available and would behave in the same way. In addition, data files created by the new HDS library would be compatible with the wide range of publicly available tools that already exist for examining HDF5 files.

For further information about the motivation behind this change and the technicalities involved see “Re-implementing the Hierarchical Data System using HDF5” (T. Jenness, Astronomy and Computing, Vol.12, 2015 - https://arxiv.org/abs/1502.04029).

Note, this document has not yet been fully updated to refer to the new library and disk format. If in doubt, send a query to the Starlink support mailing list or consult the code (held in various repositories within the Starlink project on github).

2.1 Choosing the disk format version for new HDS files

The new HDF5-based disk format is called “HDS version 5” (the previous Starlink-specific disk format was version 4). Perhaps confusingly, the HDS library and data format have always had independent version numbers. So where-as the new HDF5-based data format is “HDS version 5”, the new library is “HDS version 6.0”.

An obvious requirement was that version 6.0 of the HDS library should be able to read and write both data formats (version 4 and version 5). This is necessary for it to be possible to pass data files to and from sites that still use version 5 of the HDS library (i.e. can only access files that use the Starlink-specific HDS disk format). However, when creating a new disk file, the HDS library (version 6) needs to known which disk format (version 4 or version 5) to use. By default, the new disk format (version 5) is used. This means that, by default, files created by version 6 of the HDS library cannot be read by starlink systems that still use version 5. However, this can be changed by setting the environment variable “HDS_VERSION” to “4”, or by using the HDS_TUNE routine to set tuning parameter “VERSION” to 4.

All data files created by version 6 of the HDS library still have the traditional extension - “.sdf” - (“Starlink Data File”), regardless of the disk format in use.