Array Storage Forms

←Prev
ARY
A Subroutine Library for Accessing
ARRAY Data Structures
Next→
TOC ↑

3 Array Storage Forms

Note that at present, the ARY_ system provides full support only for the “primitive” and “simple” forms of the ARRAY data structure.

Some support is also provided for two additional forms:

SCALED: - the “scaled” form described in SGP/38. This form is the same as the “simple” form except that two extra scalar values are included that describe a linear scaling from the stored array values to the data values of interest to an external user. These two scalars are referred to as SCALE and ZERO. The external (unscaled) data values are derived from the stored (scaled) data values as follows:
unscaled = SCALE*scaled + ZERO
DELTA: - this form is not currently described in SGP/38. Delta form provides a lossless compression scheme designed for arrays of integers in which there is at least one pixel axis along which the array value changes only slowly. For further details, see §3.1.

The following points should be noted:

(1)

Scaled and delta arrays are “read-only”. An error will be reported if an attempt is made to map a scaled or delta array for WRITE or UPDATE access. When mapped for READ access, the pointer returned by ARY_MAP provides access to the original data values - that is, the mapped values are the result of (for scaled arrays) applying the scale and zero terms to the stored values, or (for delta arrays) uncompressing the compressed values.

Currently, the internal stored (i.e. scaled or compressed) data values cannot be accessed directly.

(2)

The result of copying a scaled or delta array (using ARY_COPY) will be an equivalent simple array.

(3)

Scaled and delta arrays cannot be created directly. Instead, a simple array must first be created (using ARY_NEW), and this can then be converted to a scaled or delta array as follows:

SCALED: - storing scale and zero values in the simple array using ARY_PTSZ<T>. A typical program would create a simple array, map it for write access, store the scaled data values in the mapped simple array, unmap the array, and then associate scale and zero values with the array, thus converting it to a scaled array.
DELTA: - copying the simple array using ARY_DELTA. The copy will be a compressed array stored in delta form. A typical program would create a simple array, map it for write access, store the uncompressed data values in the mapped simple array, unmap the array, and then copy it using ARY_DELTA. The original simple array could then be deleted if it is no longer needed.

(4)

Scaled and delta arrays cannot have complex data types. An error will be reported if an attempt is made to to import an HDS structure describing a complex scaled or delta array, or to use ARY_PTSZ<T> or ARY_DELTA on an array with complex data values.

(5)

When applied to a scaled or delta array, the ARY_TYPE and ARY_FTYPE routines return the data type of the external (i.e. unscaled or uncompressed) values. In practice, this means that for a scaled array they return the data type of the SCALE and ZERO constants, rather than the data type of the array holding the stored (scaled) data values. For a delta array they return the data type of the original uncompressed values.

3.1 Delta Compressed Array Form

The DELTA storage form provides lossless compression for integer arrays. It uses two methods to achieve compression:

Differences between adjacent data values are stored, rather than the full data values themselves. For many forms of astronomical data, the differences between adjacent data values have a much smaller range than the data values themselves. This means that they can be represented in fewer bits. For instance, if the data values are _INTEGER, then the differences between adjacent values may fit into the range of a _WORD (-32767 to +32767) or even a _BYTE (-127 to +127). This use of a shorter data type usually provides the majority of the compression. However, it is not necessary for all differences to be small - if the difference between two adjacent data values is too large for the compressed data type, the second of the two data values will be stored explicitly using the full data type of the original uncompressed data. Obviously, the more values that need to be stored in full in this way, the lower will be the compression.
In the above description, the term “adjacent” means “adjacent along a specified pixel axis”. The pixel axis along which differences are taken is referred to as the “compression axis”. It may be specified explicitly by the calling application when ARY_DELTA is called, or it may be left unspecified in which case ARY_DELTA will choose the axis that gives the best compression.
If the uncompressed array contains runs of more than three identical values along the compression axis, then the run of identical values is replaced by a single value (stored in full, not as a difference) and a repetition count.

3.1.1 Creating a Delta Array

To create a DELTA array, first store the uncompressed integer values in a simple array, and then copy the array using ARY_DELTA. The copy produced by ARY_DELTA will be stored in DELTA form.

Arrays of floating point values may be compressed by first storing the floating point values in a SCALED array, and then using ARY_DELTA to create a delta compressed copy of the scaled array. Note, the scaled array must use an integer data type to store the internal (i.e. scaled) values. The use of the scaled array means that the compression is not lossless, since some information will have been lost in scaling the floating point values into integers.

3.1.2 The HDS Structure of a Delta Array

The HDS structure of a DELTA array is similar to the SIMPLE array, in that it will contain VARIANT, DATA and ORIGIN components. In addition they can contain SCALE and ZERO terms, which, if present, are used to scale the uncompressed integers as in a SCALED array. Uncompression happens first, producing an array of uncompressed integers, which are then unscaled if required using SCALE and ZERO to produce the final uncompressed, unscaled, array.

DELTA arrays cannot be used to hold complex values and so no IMAGINARY_DATA component will be present. Also, DELTA arrays have an implicit value of .TRUE. for their bad pixel flags, and so no BAD_PIXEL component will be present in the HDS structure.

Information is stored within a DELTA array that allows sub-sections of the compressed array to be uncompressed without needing to uncompress the whole array.

A DELTA array is stored in an HDS structure with type DELTA_ARRAY, and contains the following components:

DATA

- This is a one-dimensional integer array holding the differences between adjacent uncompressed integer data values along the compression axis. Its data type will be eother _INTEGER, _WORD or _BYTE and is specified when ARY_DELTA is called to create the array. A few integer values (all near the maximum value allowed by the data type) are reserved for use as flags to indicate one of the following conditions (where “MAX” represents the largest positive integer value that can be represented using the data type of the DATA array):

The value MAX is reserved to indicate that the next element of the uncompressed array is good, but could not be expressed as a difference from the previous element because the difference would not fit into the available data range of the DATA array. Instead, the full uncompressed value is stored in the next element of the VALUE array.
The value (MAX-1) is reserved to indicate that the next element of the uncompressed array is good and is exactly equal to the following (N-1) elements. The full uncompressed value is stored in the next element of the VALUE array. The value of N is stored in the next element of the REPEAT array.
The value (MAX-2) is reserved to indicate that the next element of the uncompressed array is bad, as are the following (N-1) elements. The full uncompressed value of the next good value following the bad values is stored in the next element of the VALUE array. The value of N is stored in the next element of the REPEAT array.
The value (MAX-3) is reserved to indicate that the next element of the uncompressed array is bad, but the following element is good and its full uncompressed value is stored in the next element of the VALUE array.
The value (MAX-4) is reserved to indicate that the next N elements of the uncompressed array are good but cannot be expressed as differences from the previous element because the differences would not fit into the available data range of the DATA array. Instead, the full uncompressed values are stored in the next N elements of the VALUE array. The value of N is stored in the next element of the REPEAT array.
Any other value is taken to be (NEXT - PREVIOUS) - the difference between the next uncompressed value and the previous uncompressed value.

Notes:

(1): The “available data range” in DATA is reduced to leave room for the above flags.
(2): The first element in each row of pixels parallel to the compression axis is always represented using one of these flag values. This allows each row of pixel values to be uncompressed without reference to any earlier values.
(3): Repeated runs of good or bad value are always contained within a single row of pixels parallel to the compression axis. Runs of repeated values that cross the boundary between adjacent rows are split into two repeated runs - one for each row.

FIRST_DATA

- This is an _INTEGER array with (NDIM-1) axes which have the same order and size as the axes of the uncompressed array, but omitting the compression axis (NDIM is the number of axes in the uncompressed array). It holds the zero-based index into the DATA array at which the first element of the corresponding row of values is stored.

For instance, if the uncompressed array is a cube with bounds (1:10,1:5,1:7), and the compression axis is axis number 2, then the FIRST_DATA array will be two-dimensional with bounds (1:10,1:7). Element (2,3) of this array (for instance) will hold the integer index of the DATA array element that gives the full value for pixel (2,1,3) in the uncompressed array. Elements (2,2,3), (2,3,3), (2,4,3) and (2,5,3) of the uncompressed array are then derived from the following values in the DATA array.

FIRST_REPEAT

- This is an array with the same shape as the FIRST_DATA array. It holds the zero-based index of the first value of the REPEAT array to be used whilst uncompressing the corresponding row of pixels. This component will only be present in the DELTA_ARRAY structure if the REPEAT component is present. The data type of this array will be one of _INTEGER, _UWORD or _UBYTE, depending on the largest value stored in it.

FIRST_VALUE

- This is an array with the same shape as the FIRST_DATA array. It holds the zero-based index of the first value of the VALUE array to be used whilst uncompressing the corresponding row of pixels. The data type of this array will be one of _INTEGER, _UWORD or _UBYTE, depending on the largest value stored in it.

ORIGIN

- A one-dimensional _INTEGER array holding the pixel indices of the first element of the uncompressed array. This component is optional - an origin of (1,1,1...) is assumed if the component is not present in the DELTA_ARRAY structure.

REPEAT

- A one-dimensional _INTEGER array holding the number of repetitions for each value associated with an occurrence of (MAX-1), (MAX-2) or (MAX-4) in the DATA array. The data type of this array will be one of _INTEGER, _UWORD or _UBYTE, depending on the largest value stored in it. This array will not be present if there are no runs in the uncompressed data array.

SCALE

- An optional component giving a scale factor to apply to the uncompressed integer values. It can be of any data type. If present the uncompressed array is treated like a SCALED array. In particular, the data type of the uncompressed array will be the same as the data type of the SCALE component, if present. If not present, the data type of the uncompressed array is given by the data type of the VALUE array.

VALUE

- A one-dimensional array with the same data type as the uncompressed array (_INTEGER, _WORD, _UWORD, _BYTE or _UBYTE) prior to scaling by SCALE and ZERO. It holds full uncompressed integer values for the elements that are flagged with any of the special values listed under “DATA” above. Note, if SCALE and ZERO components are present in the DELTA array, the VALUE array holds internal scaled values, rather than external unscaled values.

VARIANT

- The storage form of the array. This will always be set to “DELTA”.

ZAXIS

- A scalar _INTEGER value giving the index of the ecompression axis - that is, the pixel axis index within the uncompressed array along which differences were taken. Care should be taken in the choice of ZAXIS since it can affect the degree of compression achieved. If ZAXIS is not specified when compressing an array, it defaults to the axis that gives the greatest compression. Note, the ZAXIS value is one-based, not zero-based.

ZDIM

- A scalar _INTEGER holding the length of the compression axis within the uncompressed array. The other dimensions of the uncompressed array are given by the shape of the FIRST_DATA array.

ZERO

- An optional component giving a zero offset to add to the uncompressed integer values. It can be of any data type. If present the uncompressed array is treated like a SCALED array.

ZRATIO

- A scalar _REAL holding the compression factor - that is, the ratio of the uncompressed array size to the compressed array size. This is approximate as it does not include the effects of the metadata needed to describe the extra components of a DELTA array (i.e. the space needed to hold the HDS component names, types, dimensions, etc).

←Prev
ARY
A Subroutine Library for Accessing
ARRAY Data Structures
Next→
TOC ↑