This section describes the components of a CAT catalogue. It is necessary to understand the structure of a CAT catalogue in order to use the CAT library effectively. An idealized computer-readable version of an astronomical catalogue, or similar tabular dataset, might comprise the following elements:
The CAT library is mostly concerned with the first two items. However, it also provides some simple facilities to retrieve and write the textual information of the third item. These latter facilities are provided so that the textual information in a catalogue can be displayed to a user or copied when a new catalogue is created from an old one. The routines for manipulating textual information are described in Section 7.9. They do not interact with any other items in a CAT catalogue and they are not mentioned again in this section.
The table in a CAT catalogue is very similar to a relation in the theory of relational databases, and has many of the same properties. Each row in the table must contain the same number of fields. Corresponding fields in different rows must be of the same type. The table may contain an arbitrary number of rows. In the formal theory of relational databases, no two rows may be identical. CAT relaxes this rule by permitting identical rows, though it is difficult to see what purpose such rows might serve.
The internal organization of a CAT catalogue (the way it is formatted on disk) is unknown to an application using the CAT library. The values in the catalogue are accessed purely through the subroutine interface to the CAT library.
This manual describes version 9.0 of the CAT library. The original specification for the library is described in the document The Starlink Subroutine Interface for Manipulating Catalogues (StarBase/ACD/3.4)[2]. Version 9.0 of CAT is a subset of this full implementation and some of the items present in it serve no apparent purpose. These items correspond to features which were in the original specification but which are not currently implemented. These items may be implemented in future versions and have been included so that applications written now will be compatible with future versions of the library.
Various symbolic constants are referred to throughout this section. These constants are defined in
INCLUDE file CAT_PAR
, which may be INCLUDEd in the subroutines of an application. See
Section 3.2 for details of how to access this file.
In the CAT model of a catalogue a catalogue comprises a number of components. In version 9.0 of CAT a catalogue may contain two sorts of components: columns and parameters:
A catalogue may contain an arbitrary number of columns and an arbitrary number of parameters. Columns and parameters are permanent entities which persist in between invocations of applications accessing the catalogue through the CAT library (typically as items in a disk file). In addition to these permanent components additional sorts of temporary components may be created by CAT: expressions, selections and indices.
Unlike columns and parameters, expressions, selections and indices are ephemeral entities10 which perish when the application using CAT which created them terminates.
Every component consists of a number of attributes. Each type of component (permanent or temporary; column, parameter, expression, selection or index) has a fixed set of attributes, each identified by name. The values of the attributes differ between components, and their totality defines the component. Additionally there are two special attributes which apply to the entire catalogue, rather than to a particular component. This hierarchy is illustrated in Figure 1. Subsequent sections describe these catalogue attributes and the attributes of columns, parameters, expressions, selections and indices.
Catalogues, columns, parameters, expressions, selections and indices are all identified by an identifier. Each identifier is an INTEGER number. The value of an identifier is unique (within a given invocation of an application) and is sufficient to identify the item to which it refers. The following rules apply when using CAT identifiers in applications:
An application can inquire what sort of item (catalogue, column, parameter etc.) an identifier
represents using subroutine CAT_TIDTP
. The various types of identifiers are represented using
INTEGER codes, and symbolic constants are defined for these codes. They are listed in
Table 2.
Type of identifier | CAT symbolic constant |
Catalogue | CAT__CITYP |
Column or field | CAT__FITYP |
Vector column element | CAT__FETYP |
Parameter | CAT__QITYP |
Expression | CAT__EITYP |
Selection | CAT__SITYP |
Index | CAT__IITYP |
Null identifier | CAT__NOID |
The catalogue to which a component (column, parameter etc.) belongs is referred to as the parent of
that component. Subroutine CAT_TIDPR
can be used to inquire the parent of an identifier. In CAT
version 9.0 catalogues do not have parents. If CAT_TIDPR
is used to try to find the parent of a
catalogue then the null identifier is returned.
When CAT is asked to generate an identifier for an item which does not exist (such as the
parent of a catalogue) it will return the ‘null identifier’. The meaning of this identifier is that
the specified component does not exist. The symbolic constant for the null identifier is
CAT__NOID
.
Attributes do not have their own identifiers. An attribute is specified by the identifier of the component of which it is a part and its name. This combination is unique for a given attribute. For example the ‘data type’ attribute of a column (see Section 6.7) is specified by the identifier of the column and the name of the attribute (‘DTYPE’ in this case). Each attribute has a data type associated with it. Families of subroutines (one per data type) are available to set and inquire the values of attributes:
CAT_TATTt
CAT_TIQAt
See Section 7.8.1 for details of using these subroutines.
All the attributes for a given component adopt values when the component is created. Some attributes are mandatory, in which case values must be supplied for them. For the remaining attributes values are optional, and if they are not supplied defaults are adopted.
Most attributes are immutable; they are specified once when the component is created and are fixed thereafter. A few, however, are mutable and may be changed at any stage during the life of the component. The immutable attributes of all the columns in a catalogue are frozen when values are first written to the table11.
In addition to its collection of column and parameter components, all with their individual attributes,
a catalogue also has several attributes which apply directly to the entire catalogue, rather than
to an individual component (see Figure 1 in Section 6.3). These attributes are described
below.
NAME (data type: _CHAR, size = CAT__SZCNM
) The name of the catalogue. It is specified when the
catalogue is created and is mandatory and immutable. The NAME attribute is related to the file name
of the catalogue as follows. It is the same as the file name, but without any preceding directory
specification or trailing file type. Thus, if CATNAME
is the NAME attribute then the corresponding file
name is:
directory_specification/CATNAME.file_type
CAT__SZCNF
) The full path of the catalogue file. It is specified
when the catalogue is created and is mandatory and immutable.
A column may contain either a single value for each row (as in standard relational database theory) or a one-dimensional array of values for each row. An array must be of fixed size, defined when the column is created. There is no upper limit to the number of elements which an array may contain. A single-valued column is called a scalar and a column containing an array is called a vector. The attributes of a column are listed in Table 3 and described below.
The attributes of an individual element of a vector column are somewhat different and are described in Section 6.8, below.
Attribute | Name | Data | Mut- | Mand- | Default |
type | -able | -atory | |||
Name | NAME | _CHAR | |||
Genus | GENUS | _INTEGER | physical: CAT__GPHYS |
||
Expression | EXPR | _CHAR | ‘ ’ | ||
Data type | DTYPE | _INTEGER | |||
Character size | CSIZE | _INTEGER | 20† | ||
Dimensionality | DIMS | _INTEGER | scalar: CAT__SCALR |
||
Size§ | SIZE | _INTEGER | 1 | ||
Null or locum | NULL | _INTEGER | HDS: CAT__NULLD |
||
Exception values | EXCEPT | _CHAR | ‘ ’ | ||
Scale factor | SCALEF | _DOUBLE | 1.0D0 | ||
Zero point | ZEROP | _DOUBLE | 0.0D0 | ||
Order | ORDER | _INTEGER | none: CAT__NOORD |
||
Units | UNITS | _CHAR | ‘ ’ | ||
External format | EXFMT | _CHAR | varies with data type | ||
Preferential display | PRFDSP | _LOGICAL | true | ||
Comments | COMM | _CHAR | ‘ ’ | ||
Modification date | DATE | _DOUBLE | 0.0D0 | ||
CAT__SZCMP
) The name of the column. The rules for column
names are as follows.
(CAT__SZCMP
). This value is chosen for
consistency with HDS and is adequate for FITS tables.
HD_NUMBER
, HD_Number
and hd_number
would all refer to the same column.
CAT__GPHYS
when the column is created.
EXPR (Data type: _CHAR, size = CAT__SZEXS
) In version 9.0 of CAT the expression attribute is
present, but not used. It is set to blank (‘ ’) when the column is created.
DTYPE (Data type: _INTEGER) The data type of values held in the column. The types permitted
are listed in Table 4. They are deliberately the same as the types permitted in HDS and include the
standard types of Fortran 77.
HDS Type | DEC Fortran | CAT symbolic | Description | Standard |
Type | constant | Fortran 77? | ||
_BYTE | BYTE | CAT__TYPEB | Signed byte | No |
_WORD | INTEGER2 | CAT__TYPEW | Signed word | No |
_INTEGER | INTEGER | CAT__TYPEI | Signed integer | Yes |
_REAL | REAL | CAT__TYPER | Single precision | Yes |
_DOUBLE | DOUBLE PRECISION | CAT__TYPED | Double precision | Yes |
_LOGICAL | LOGICAL | CAT__TYPEL | Logical | Yes |
_CHAR[] | CHARACTER[] | CAT__TYPEC | Character string | Yes |
is the number of elements in the character string; it is a positive integer. In a CAT CHARACTER column the size of the string is stored in attribute CSIZE.
_BYTE and _WORD correspond exactly to the DEC Fortran data types BYTE and INTEGER*2 respectively; equivalent types exist in most other implementations of Fortran. The non-standard data types typically are required to accommodate raw data generated by instruments. The ranges of the primitive numeric types will be defined by the particular implementation of Fortran on the computer being used (this table is adapted from SUN/92[11]. See in particular the table in Section 2.2, p3).
CAT__SCALR
and for a vector to CAT__VECTR
.
SIZE (Data type: _INTEGER; a single element
array12) If
the column is a vector this attribute contains the number of elements in the vector. If the column is a
scalar it is set to one.
NULL (Data type: _INTEGER) A flag indicating whether or not null values are recognized in the
column. Three cases are recognized:
CAT__NULLD
),
CAT__NULLS
),
CAT__LOCUM
).
The treatment of null values is discussed in Section 8.2, below.
EXCEPT (Data type: _CHAR, size = CAT__SZVAL
) The value used to represent the null value, or the
locum value generated if nulls are not supported in the column. See Section 8.2 for a full
description.
SCALEF (Data type: _DOUBLE) The scale factor used to calculate the actual value of a
scaled column from the scaled value stored. The actual value is computed according to the
formula
(1) |
CAT__SZUNI
) The units in which values stored in the
column are expressed. The UNITS attribute is used to identify, and control the appearance of,
columns of angles (see Section 8.3). Apart from this exception the units are treated purely
as comments and no attempts are made to automatically propagate and convert units in
calculations and selections. Case sensitivity is irrelevant for units since they are treated
purely as comments. The units attribute can be left completely blank; a blank units attribute
implies that the units are unknown. If it were desired to distinguish a dimensionless quantity
from one with unknown units, the string ‘DIMENSIONLESS’ could be put in the units
attribute13.
EXFMT (Data type: _CHAR, size = CAT__SZEXF
) The format used to represent a datum extracted
from a column for external display on a screen or in a text file. These formats are used solely for
external display, not internal conversion. The external format specifier should be a valid Fortran 77
format specifier for the data type of the column.
PRFDSP (Data type: _LOGICAL) The preferential display flag; a logical flag which indicates to
reporting applications whether, by default, the column is to be displayed or not. It is coded as
follows:
.TRUE.
.FALSE.
CAT__SZCOM
) Explanatory comments describing the column.
The comments may be up to eighty characters long (CAT__SZCOM
).
DATE (Data type: _DOUBLE) In version 9.0 of CAT the modification date is present, but not used.
It is set to 0.0D0 when the column is created.
CAT treats vectors in quite a simple fashion. Values can only be GOT or PUT for individual vector
elements; there are no routines for processing entire vectors. In order to access individual elements it
is necessary to assign identifiers to them. Identifiers for vector elements, like those for scalar columns
and parameters, are obtained using CAT_TIDNT
. The name of a vector column element passed to
CAT_TIDNT
has the same syntax as the NAME attribute of the element, as described below. The
attributes of a vector column element identifier are different to the identifiers for the whole column;
they are listed in Table 6.
Attribute | Name | Data type |
Name | NAME | _CHAR |
Data type | DTYPE | _INTEGER |
Character size | CSIZE | _INTEGER |
Base identifier | BASEID | _INTEGER |
Vector element | ELEM | _INTEGER |
All these attributes are created automatically when an identifier is obtained for the element; they are all mandatory and immutable.
The vector column to which a vector column element belongs is referred to as the column element’s
base column. The DTYPE and CSIZE attributes of a vector column element are necessarily identical to
the corresponding attributes for its base column. The details of the remaining attributes are as
follows.
NAME (data type: _CHAR, size = CAT__SZCMP
) The name of the vector column element. That is, the
name of the base column, followed by the number of the vector element, enclosed in square brackets.
The number of the first element is one. Thus, the name of the fourth element of vector column FLUX
would be FLUX[4]
.
BASEID (data type: _INTEGER) The identifier of the base column of the vector element.
ELEM (data type: _INTEGER) The sequence number of the element in the column vector. The first
element is numbered one. Thus, for example, if the name of the vector column element was FLUX[4]
the value of the ELEM attribute would be four.
Parameters are items of information which apply to the entire catalogue. Examples are the equinox or epoch of the celestial coordinates in the catalogue. The attributes of a parameter are given in Table 7. In CAT version 9.0 parameters must be scalars. However, they have the attributes dimensionality and size to allow for the possibility of vector parameters in future versions of CAT.
Attribute | Name | Data | Mand- | Default |
type | -atory | |||
Name | NAME | _CHAR | ||
Data type | DTYPE | _INTEGER | ||
Character size | CSIZE | _INTEGER | CAT__SZVAL † |
|
Dimensionality | DIMS | _INTEGER | scalar: CAT__SCALR |
|
Size§ | SIZE | _INTEGER | 1 | |
Units | UNITS | _CHAR | ‘ ’ | |
External format | EXFMT | _CHAR | varies with data type | |
Preferential display | PRFDSP | _LOGICAL | true | |
Comments | COMM | _CHAR | ‘ ’ | |
Value | VALUE | varies | zero or ‘ ’ | |
Modification date | DATE | _DOUBLE | 0.0D0 | |
All these attributes, except VALUE, are deliberately identical to the corresponding attributes for columns (see Table 3 and Section 6.7, above, for details). VALUE (Data type: variable, corresponds to attribute DTYPE) The value of the parameter; it is mutable.
Expressions define a quantity computed from the existing scalar columns, vector column elements
and parameters of a catalogue using some algebraic or logical (boolean) expression. An expression
adopts a value for every row in the catalogue. It is similar to a column, except that its value is
computed ‘on the fly’ from existing columns (and parameters), rather than being stored in the
catalogue. Usually an expression will evaluate to a numeric value, but it may equally well evaluate to
a LOGICAL or CHARACTER value. For example, if a catalogue contained columns x
and
y
an expression might be ‘x + y
’. The syntax for specifying expressions is described in
Appendix B.
An expression has a set of attributes which are identical to those for a scalar column (see Section 6.7 and Table 3), but with the following exceptions.
CAT__GVIRT
.
CAT__SCALR
; expressions are always scalars and vector
expressions are forbidden.
Selections define a set of rows selected from a catalogue according to some criteria. For
example, if a catalogue contained column x
then the selection criterion might be ‘x
10.0
’,
that is, the selection would comprise the set of rows for which the field of column x
was greater than
10.0. The syntax for specifying expressions is described in Appendix B.
The attributes of a selection are listed in Table 8 and described below. With the exception of the comments attribute, COMM, they are all mandatory and immutable and are set automatically when the selection is created.
Attribute | Name | Data type |
Expression | EXPR | _CHAR |
Number of rows | NUMSEL | _INTEGER |
Comments | COMM | _CHAR |
Modification date | DATE | _DOUBLE |
CAT__SZEXS
) The expression which rows in the catalogue must
satisfy in order to be included in the selection.
NUMSEL (Data type: _INTEGER) The number of rows in the selection.
COMM (Data type: _CHAR, size = CAT__SZCOM
) Explanatory comments describing the
selection.
DATE (Data type: _DOUBLE) In version 9.0 of CAT the modification date is present, but not used.
It is set to 0.0D0 when the selection is created.
Indices are a mechanism for accessing the rows of a catalogue as though they were sorted into
ascending or descending order on some column. For example, if an ascending index was created on
REAL column DEC
and the rows of the catalogue were accessed through this index the rows would
appear in ascending order of DEC
. In CAT version 9.0 indices are temporary entities which persist only for
the duration of the application which generated them and perish when it terminates. Future versions of CAT
will support permanent indices which persist after the application which generated them
terminates.
Indices can be created from columns of any of the numeric data types. They should not be created from columns of data type CHARACTER or LOGICAL. If an index is created on a column which contains null values then the rows for which the column is null will appear after all the rows with a valid value. The order of such rows is unpredictable.
The attributes of an index are listed in Table 9 and described below. They are all mandatory and immutable and are set automatically when the index is created.
Attribute | Name | Data type |
Column identifier | COLID | _INTEGER |
Order | ORDER | _INTEGER |
Number of rows | NUMSEL | _INTEGER |
Comments | COMM | _CHAR |
Modification date | DATE | _DOUBLE |
CAT__ASCND
– ascending,
CAT__DSCND
– descending.
CAT__SZCOM
) Explanatory comments describing the
index.
DATE (Data type: _DOUBLE) In version 9.0 of CAT the modification date is present, but not used.
It is set to 0.0D0 when the selection is created.
10Future versions of CAT will support permanent indices.
11This description extends the discussion in Section 4, which for simplicity omitted to mention mutable attributes. 12An array is used instead of a scalar to allow the possibility of introducing multi-dimensional arrays in a future version of CAT. 13Magnitudes, which properly are dimensionless, can, of course, have units of ‘MAGNITUDES’ or ‘MAG’ or whatever, if so desired.