INTRODUCTION

Adding Format Conversion Facilities to the NDF Data Access Library
Next→
TOC ↑

1 INTRODUCTION

1.1 Philosophy
1.2 The Format Conversion Approach
1.3 How Format Conversion Operates

One of the best things about standards is the large number you have to choose from. “Standard” ways of storing data proliferate in all areas of computing, and astronomy is no exception.

Given this, there is a natural desire to write software that can cope with more than one data format, but this can be a major undertaking. This document describes features provided by the NDF library (SUN/33) that help to make it a little easier.

1.1 Philosophy

There are two main ways of writing software that can read and write multiple data formats. Perhaps the most obvious is to incorporate a knowledge of the data model used by each format into the data access library and to have it make the appropriate calls (e.g. to different lower-level data access libraries) according to the format being used. This approach is generally quite efficient, but it often presents serious difficulties in practice.

The main problem is that the data access software rapidly becomes extremely complex. Usually, a single person will maintain the data access library used by a suite of applications so, if multiple data formats are to be supported, that person must become expert in all the formats required. Since the resulting library will be the only means of access to these formats, it must be very sophisticated and anticipate every requirement (even if many of the features are, in fact, never used). Given the number and complexity of formats in use, the range of data models they present, and the rate at which they change, this is a near impossible task. It is not surprising, therefore, that few systems attempt to support more than a couple of formats in this way.

An alternative approach is to interpose format conversion software between the original data and the application. This is potentially less efficient, but modern computing equipment makes this less of a problem than it once was. The great advantage is that it decouples the problem of format conversion from that of data access. It also splits the provision of software for accessing each different data format into a series of separate tasks. This makes it possible to support a wide variety of formats relatively easily.

1.2 The Format Conversion Approach

The NDF data access library follows this latter approach by allowing format conversion utilities to be added to it, thereby allowing it to access a range of “foreign” data (i.e. data which are not stored in the native NDF format). This has the following advantages:

The full range of normal NDF data access operations can be supported – reading, writing, updating, deleting, reshaping, etc.
Format conversion utilities can be added to already-built software. Thus you can add the ability to read new data formats to standard applications, without having to re-build them.
Format conversion utilities can be written and added by anyone, so the problem of understanding and accessing a range of different formats can be shared. Particular problems can be tackled by whoever best understands them. This makes the NDF library capable of accessing data in formats completely unknown to its original author.
Because it is easy to add new format conversion utilities, they do not always need to be very sophisticated. Instead, they can be invented or adapted to tackle new problems as they arise. Many of the difficulties encountered when converting a complicated data format into another one with a different data model can be ignored, unless they happen to be relevant to the situation at hand.

The main reason that this approach can be used is because of the relatively sophisticated formatting possibilities and data model presented by the NDF library, with its in-built extensibility. This makes it possible to convert foreign data into NDF format and back again without losing information, while the opposite process is not always possible.

1.3 How Format Conversion Operates

To illustrate how this system works, suppose an application which uses the NDF library wants to access an existing NDF data structure, but the person running it only has data available in a foreign format. The following outlines the sequence of operations that might occur:

(1): The NDF library will first obtain the name of the dataset to be accessed in the normal way, e.g. by prompting (alternatively, the application could obtain the name and pass it to the library, but the two methods are equivalent here).
(2): The library will check whether the data are stored in native NDF format. If so, it will access them directly. If not, it will next identify which foreign format they are stored in. This is done by inspecting the extension on the file name (for instance, the file name mydata.fit might designate a dataset stored in FITS format).
(3): The library will then look to see if a format conversion utility has been defined to convert from the foreign format into native NDF format. Assuming one has, it will invoke it, causing the data to be converted and written into a scratch object in native NDF format.
(4): The scratch object will then be accessed as normal. The application need not know that it hasn’t been given a normal NDF. In addition, all references which the application makes to the dataset name will use the original (foreign) file name, so the user usually need not be aware that conversion has occurred either.
(5): When the dataset is released by the application, the scratch object will be deleted (in fact, this is optional – see §3.2). If it has been modified, a format conversion utility will be sought, and invoked, to perform back-conversion of the modified data before this occurs.

A rather similar sequence of events might occur when creating a new dataset (e.g. as output from an application), except that the format conversion stage on input would not be required.

These steps are an exact analogue of the conversions that the NDF and HDS libraries perform transparently whenever an application attempts to (e.g.) access an integer data array as floating point, or to read data previously written on a machine which uses a different number representation. The only difference is that format conversion utilities are not a permanent part of the data access software, but are invoked as separate processes which communicate through files rather than via memory.¹ This makes it possible to add and remove them as required.

¹With the file caching available on modern operating systems, this distinction is actually rather blurred.

Adding Format Conversion Facilities to the NDF Data Access Library
Next→
TOC ↑