One of the best things about standards is the large number you have to choose from. “Standard” ways of storing data proliferate in all areas of computing, and astronomy is no exception.
Given this, there is a natural desire to write software that can cope with more than one data format, but this can be a major undertaking. This document describes features provided by the NDF library (SUN/33) that help to make it a little easier.
There are two main ways of writing software that can read and write multiple data formats. Perhaps the most obvious is to incorporate a knowledge of the data model used by each format into the data access library and to have it make the appropriate calls (e.g. to different lower-level data access libraries) according to the format being used. This approach is generally quite efficient, but it often presents serious difficulties in practice.
The main problem is that the data access software rapidly becomes extremely complex. Usually, a single person will maintain the data access library used by a suite of applications so, if multiple data formats are to be supported, that person must become expert in all the formats required. Since the resulting library will be the only means of access to these formats, it must be very sophisticated and anticipate every requirement (even if many of the features are, in fact, never used). Given the number and complexity of formats in use, the range of data models they present, and the rate at which they change, this is a near impossible task. It is not surprising, therefore, that few systems attempt to support more than a couple of formats in this way.
An alternative approach is to interpose format conversion software between the original data and the application. This is potentially less efficient, but modern computing equipment makes this less of a problem than it once was. The great advantage is that it decouples the problem of format conversion from that of data access. It also splits the provision of software for accessing each different data format into a series of separate tasks. This makes it possible to support a wide variety of formats relatively easily.
The NDF data access library follows this latter approach by allowing format conversion utilities to be added to it, thereby allowing it to access a range of “foreign” data (i.e. data which are not stored in the native NDF format). This has the following advantages:
The main reason that this approach can be used is because of the relatively sophisticated formatting possibilities and data model presented by the NDF library, with its in-built extensibility. This makes it possible to convert foreign data into NDF format and back again without losing information, while the opposite process is not always possible.
To illustrate how this system works, suppose an application which uses the NDF library wants to access an existing NDF data structure, but the person running it only has data available in a foreign format. The following outlines the sequence of operations that might occur:
mydata.fit
might designate a dataset stored in FITS format).
A rather similar sequence of events might occur when creating a new dataset (e.g. as output from an application), except that the format conversion stage on input would not be required.
These steps are an exact analogue of the conversions that the NDF and HDS libraries perform transparently whenever an application attempts to (e.g.) access an integer data array as floating point, or to read data previously written on a machine which uses a different number representation. The only difference is that format conversion utilities are not a permanent part of the data access software, but are invoked as separate processes which communicate through files rather than via memory.1 This makes it possible to add and remove them as required.
1With the file caching available on modern operating systems, this distinction is actually rather blurred.