6 Components of a CAT catalogue

 6.1 Provision for future enhancements
 6.2 Symbolic constants
 6.3 Catalogues, components and attributes
 6.4 Identifiers
 6.5 Attributes
 6.6 Catalogue attributes
 6.7 Columns
 6.8 Vector column elements
 6.9 Parameters
 6.10 Expressions
 6.11 Selections
 6.12 Indices

This section describes the components of a CAT catalogue. It is necessary to understand the structure of a CAT catalogue in order to use the CAT library effectively. An idealized computer-readable version of an astronomical catalogue, or similar tabular dataset, might comprise the following elements:

(1)
the table of values which comprise the catalogue,
(2)
a description of this table; the details of all the columns that it contains, the number of rows etc,
(3)
textual information about the catalogue; perhaps a short description of the catalogue or a copy of a published paper describing it. This information is intended to be read by a human rather than interpreted by a computer.

The CAT library is mostly concerned with the first two items. However, it also provides some simple facilities to retrieve and write the textual information of the third item. These latter facilities are provided so that the textual information in a catalogue can be displayed to a user or copied when a new catalogue is created from an old one. The routines for manipulating textual information are described in Section 7.9. They do not interact with any other items in a CAT catalogue and they are not mentioned again in this section.

The table in a CAT catalogue is very similar to a relation in the theory of relational databases, and has many of the same properties. Each row in the table must contain the same number of fields. Corresponding fields in different rows must be of the same type. The table may contain an arbitrary number of rows. In the formal theory of relational databases, no two rows may be identical. CAT relaxes this rule by permitting identical rows, though it is difficult to see what purpose such rows might serve.

The internal organization of a CAT catalogue (the way it is formatted on disk) is unknown to an application using the CAT library. The values in the catalogue are accessed purely through the subroutine interface to the CAT library.

6.1 Provision for future enhancements

This manual describes version 9.0 of the CAT library. The original specification for the library is described in the document The Starlink Subroutine Interface for Manipulating Catalogues (StarBase/ACD/3.4)[2]. Version 9.0 of CAT is a subset of this full implementation and some of the items present in it serve no apparent purpose. These items correspond to features which were in the original specification but which are not currently implemented. These items may be implemented in future versions and have been included so that applications written now will be compatible with future versions of the library.

6.2 Symbolic constants

Various symbolic constants are referred to throughout this section. These constants are defined in INCLUDE file CAT_PAR, which may be INCLUDEd in the subroutines of an application. See Section 3.2 for details of how to access this file.

6.3 Catalogues, components and attributes

In the CAT model of a catalogue a catalogue comprises a number of components. In version 9.0 of CAT a catalogue may contain two sorts of components: columns and parameters:

columns
define the individual columns (scalars or vectors) in the table,
parameters
provide single items of information which apply to the entire catalogue. Examples might be the epoch or equinox of celestial coordinates in the catalogue.

A catalogue may contain an arbitrary number of columns and an arbitrary number of parameters. Columns and parameters are permanent entities which persist in between invocations of applications accessing the catalogue through the CAT library (typically as items in a disk file). In addition to these permanent components additional sorts of temporary components may be created by CAT: expressions, selections and indices.

expressions
define a quantity computed from existing columns (and parameters) using some algebraic or logical (boolean) expression,
selections
define a set of rows selected from the catalogue according to some criteria.
indices
define an order for accessing rows in the catalogue equivalent to sorting the catalogue on a specified column.

Unlike columns and parameters, expressions, selections and indices are ephemeral entities10 which perish when the application using CAT which created them terminates.

Every component consists of a number of attributes. Each type of component (permanent or temporary; column, parameter, expression, selection or index) has a fixed set of attributes, each identified by name. The values of the attributes differ between components, and their totality defines the component. Additionally there are two special attributes which apply to the entire catalogue, rather than to a particular component. This hierarchy is illustrated in Figure 1. Subsequent sections describe these catalogue attributes and the attributes of columns, parameters, expressions, selections and indices.


PICT

Figure 1: The hierarchy of catalogues, components and attributes


6.4 Identifiers

Catalogues, columns, parameters, expressions, selections and indices are all identified by an identifier. Each identifier is an INTEGER number. The value of an identifier is unique (within a given invocation of an application) and is sufficient to identify the item to which it refers. The following rules apply when using CAT identifiers in applications:

An application can inquire what sort of item (catalogue, column, parameter etc.) an identifier represents using subroutine CAT_TIDTP. The various types of identifiers are represented using INTEGER codes, and symbolic constants are defined for these codes. They are listed in Table 2.


Type of identifier CAT symbolic constant


Catalogue CAT__CITYP
Column or field CAT__FITYP
Vector column element CAT__FETYP
Parameter CAT__QITYP
Expression CAT__EITYP
Selection CAT__SITYP
Index CAT__IITYP
Null identifier CAT__NOID

Table 2: The types of identifiers

The catalogue to which a component (column, parameter etc.) belongs is referred to as the parent of that component. Subroutine CAT_TIDPR can be used to inquire the parent of an identifier. In CAT version 9.0 catalogues do not have parents. If CAT_TIDPR is used to try to find the parent of a catalogue then the null identifier is returned.

6.4.1 The null identifier

When CAT is asked to generate an identifier for an item which does not exist (such as the parent of a catalogue) it will return the ‘null identifier’. The meaning of this identifier is that the specified component does not exist. The symbolic constant for the null identifier is CAT__NOID.

6.5 Attributes

Attributes do not have their own identifiers. An attribute is specified by the identifier of the component of which it is a part and its name. This combination is unique for a given attribute. For example the ‘data type’ attribute of a column (see Section 6.7) is specified by the identifier of the column and the name of the attribute (‘DTYPE’ in this case). Each attribute has a data type associated with it. Families of subroutines (one per data type) are available to set and inquire the values of attributes:

CAT_TATT <t >
– set an attribute,
CAT_TIQA <t >
– inquire the value of an attribute.

See Section 7.8.1 for details of using these subroutines.

All the attributes for a given component adopt values when the component is created. Some attributes are mandatory, in which case values must be supplied for them. For the remaining attributes values are optional, and if they are not supplied defaults are adopted.

Most attributes are immutable; they are specified once when the component is created and are fixed thereafter. A few, however, are mutable and may be changed at any stage during the life of the component. The immutable attributes of all the columns in a catalogue are frozen when values are first written to the table11.

6.6 Catalogue attributes

In addition to its collection of column and parameter components, all with their individual attributes, a catalogue also has several attributes which apply directly to the entire catalogue, rather than to an individual component (see Figure 1 in Section 6.3). These attributes are described below. NAME  (data type: _CHAR, size = CAT__SZCNM) The name of the catalogue. It is specified when the catalogue is created and is mandatory and immutable. The NAME attribute is related to the file name of the catalogue as follows. It is the same as the file name, but without any preceding directory specification or trailing file type. Thus, if CATNAME is the NAME attribute then the corresponding file name is:

directory_specification/CATNAME.file_type
The file type corresponds to the format of the catalogue (FITS table, Small Text List etc). The various options are described in Appendix C. DATE  (data type: _DOUBLE) In version 9.0 of CAT the modification date is present, but not used. It is set to 0.0D0 when the column is created. BACK  (data type: _INTEGER) The back-end type of the catalogue. It will be one of the CAT__BKFIT, CAT__BKSTL or CAT__BKTST. These symbolic constants are defined in include file CAT_PAR. In version 9.0 of CAT the modification date is present, but not used. It is set to 0.0D0 when the column is created. PATH  (data type: _CHAR, size = CAT__SZCNF) The full path of the catalogue file. It is specified when the catalogue is created and is mandatory and immutable.

6.7 Columns

A column may contain either a single value for each row (as in standard relational database theory) or a one-dimensional array of values for each row. An array must be of fixed size, defined when the column is created. There is no upper limit to the number of elements which an array may contain. A single-valued column is called a scalar and a column containing an array is called a vector. The attributes of a column are listed in Table 3 and described below.

The attributes of an individual element of a vector column are somewhat different and are described in Section 6.8, below.


Attribute Name Data Mut- Mand- Default
type -able -atory






Name NAME _CHAR
Genus GENUS _INTEGER physical: CAT__GPHYS
Expression EXPR _CHAR ‘ ’
Data type DTYPE _INTEGER
Character size CSIZE _INTEGER 20†
Dimensionality DIMS _INTEGER scalar: CAT__SCALR
Size§ SIZE _INTEGER 1
Null or locum NULL _INTEGER HDS: CAT__NULLD
Exception values EXCEPT _CHAR ‘ ’
Scale factor SCALEF _DOUBLE 1.0D0
Zero point ZEROP _DOUBLE 0.0D0
Order ORDER _INTEGER none: CAT__NOORD
Units UNITS _CHAR ‘ ’
External format EXFMT _CHAR varies with data type
Preferential display PRFDSP _LOGICAL true
Comments COMM _CHAR ‘ ’
Modification date DATE _DOUBLE 0.0D0
The size of character strings; other data types have CSIZE = 0.
§
SIZE is a single-element array, not a scalar.

Table 3: Attributes of columns

NAME  (data type: _CHAR, size = CAT__SZCMP) The name of the column. The rules for column names are as follows. GENUS  (Data type: _INTEGER) In version 9.0 of CAT the genus attribute is present, but not used. It is set to CAT__GPHYS when the column is created. EXPR  (Data type: _CHAR, size = CAT__SZEXS) In version 9.0 of CAT the expression attribute is present, but not used. It is set to blank (‘  ’) when the column is created. DTYPE  (Data type: _INTEGER) The data type of values held in the column. The types permitted are listed in Table 4. They are deliberately the same as the types permitted in HDS and include the standard types of Fortran 77.

HDS Type DEC Fortran CAT symbolic Description Standard
Type constant Fortran 77?





_BYTE BYTE CAT__TYPEB Signed byte No
_WORD INTEGER2 CAT__TYPEW Signed word No
_INTEGER INTEGER CAT__TYPEI Signed integer Yes
_REAL REAL CAT__TYPER Single precision Yes
_DOUBLE DOUBLE PRECISION CAT__TYPED Double precision Yes
_LOGICAL LOGICAL CAT__TYPEL Logical Yes
_CHAR[ n] CHARACTER[ n] CAT__TYPEC Character string Yes

n is the number of elements in the character string; it is a positive integer. In a CAT CHARACTER column the size of the string is stored in attribute CSIZE.

_BYTE and _WORD correspond exactly to the DEC Fortran data types BYTE and INTEGER*2 respectively; equivalent types exist in most other implementations of Fortran. The non-standard data types typically are required to accommodate raw data generated by instruments. The ranges of the primitive numeric types will be defined by the particular implementation of Fortran on the computer being used (this table is adapted from SUN/92[11]. See in particular the table in Section 2.2, p3).


Table 4: Permitted data types (adapted from SUN/92)

CSIZE  (Data type: _INTEGER) For a CHARACTER column, the size of the column, otherwise not used and by convention set to zero. DIMS  (Data type: _INTEGER) The dimensionality of the column; a flag indicating whether it is a scalar or a vector. For a scalar column it is set to CAT__SCALR and for a vector to CAT__VECTR. SIZE  (Data type: _INTEGER; a single element array12) If the column is a vector this attribute contains the number of elements in the vector. If the column is a scalar it is set to one. NULL  (Data type: _INTEGER) A flag indicating whether or not null values are recognized in the column. Three cases are recognized:

The treatment of null values is discussed in Section 8.2, below. EXCEPT  (Data type: _CHAR, size = CAT__SZVAL) The value used to represent the null value, or the locum value generated if nulls are not supported in the column. See Section 8.2 for a full description. SCALEF  (Data type: _DOUBLE) The scale factor used to calculate the actual value of a scaled column from the scaled value stored. The actual value is computed according to the formula

actualvalue = (SCALEF ×storedvalue) + ZEROP (1)
ZEROP  (Data type: _DOUBLE) The zero point used to calculate the actual value of a scaled column from the scaled value stored. See above for the formula used. ORDER  (Data type: _INTEGER) The order in which individual fields in the column are stored. The three possibilities, together with the corresponding symbolic constants, are listed in Table 5.

Column order CAT symbolic constants


ascending CAT__ASCND
descending CAT__DSCND
unordered CAT__NOORD

Table 5: Alternatives for the ordering of columns

UNITS  (Data type: _CHAR, size = CAT__SZUNI) The units in which values stored in the column are expressed. The UNITS attribute is used to identify, and control the appearance of, columns of angles (see Section 8.3). Apart from this exception the units are treated purely as comments and no attempts are made to automatically propagate and convert units in calculations and selections. Case sensitivity is irrelevant for units since they are treated purely as comments. The units attribute can be left completely blank; a blank units attribute implies that the units are unknown. If it were desired to distinguish a dimensionless quantity from one with unknown units, the string ‘DIMENSIONLESS’ could be put in the units attribute13. EXFMT  (Data type: _CHAR, size = CAT__SZEXF) The format used to represent a datum extracted from a column for external display on a screen or in a text file. These formats are used solely for external display, not internal conversion. The external format specifier should be a valid Fortran 77 format specifier for the data type of the column. PRFDSP  (Data type: _LOGICAL) The preferential display flag; a logical flag which indicates to reporting applications whether, by default, the column is to be displayed or not. It is coded as follows:
.TRUE.
– display the column by default,
.FALSE.
– do not display the column by default.
COMM  (Data type: _CHAR, size = CAT__SZCOM) Explanatory comments describing the column. The comments may be up to eighty characters long (CAT__SZCOM). DATE  (Data type: _DOUBLE) In version 9.0 of CAT the modification date is present, but not used. It is set to 0.0D0 when the column is created.

6.8 Vector column elements

CAT treats vectors in quite a simple fashion. Values can only be GOT or PUT for individual vector elements; there are no routines for processing entire vectors. In order to access individual elements it is necessary to assign identifiers to them. Identifiers for vector elements, like those for scalar columns and parameters, are obtained using CAT_TIDNT. The name of a vector column element passed to CAT_TIDNT has the same syntax as the NAME attribute of the element, as described below. The attributes of a vector column element identifier are different to the identifiers for the whole column; they are listed in Table 6.


Attribute Name Data type



Name NAME _CHAR
Data type DTYPE _INTEGER
Character size CSIZE _INTEGER
Base identifier BASEID _INTEGER
Vector element ELEM _INTEGER

Table 6: Attributes of a vector column element

All these attributes are created automatically when an identifier is obtained for the element; they are all mandatory and immutable.

The vector column to which a vector column element belongs is referred to as the column element’s base column. The DTYPE and CSIZE attributes of a vector column element are necessarily identical to the corresponding attributes for its base column. The details of the remaining attributes are as follows. NAME  (data type: _CHAR, size = CAT__SZCMP) The name of the vector column element. That is, the name of the base column, followed by the number of the vector element, enclosed in square brackets. The number of the first element is one. Thus, the name of the fourth element of vector column FLUX would be FLUX[4]. BASEID  (data type: _INTEGER) The identifier of the base column of the vector element. ELEM  (data type: _INTEGER) The sequence number of the element in the column vector. The first element is numbered one. Thus, for example, if the name of the vector column element was FLUX[4] the value of the ELEM attribute would be four.

6.9 Parameters

Parameters are items of information which apply to the entire catalogue. Examples are the equinox or epoch of the celestial coordinates in the catalogue. The attributes of a parameter are given in Table 7. In CAT version 9.0 parameters must be scalars. However, they have the attributes dimensionality and size to allow for the possibility of vector parameters in future versions of CAT.


Attribute Name Data Mand- Default
type -atory





Name NAME _CHAR
Data type DTYPE _INTEGER
Character size CSIZE _INTEGER CAT__SZVAL
Dimensionality DIMS _INTEGER scalar: CAT__SCALR
Size§ SIZE _INTEGER 1
Units UNITS _CHAR ‘ ’
External format EXFMT _CHAR varies with data type
Preferential display PRFDSP _LOGICAL true
Comments COMM _CHAR ‘ ’
Value VALUE varies zero or ‘ ’
Modification date DATE _DOUBLE 0.0D0
The size of character strings; other data types have CSIZE = 0.
§
SIZE is a single-element array, not a scalar.

Table 7: Attributes of parameters

All these attributes, except VALUE, are deliberately identical to the corresponding attributes for columns (see Table 3 and Section 6.7, above, for details). VALUE  (Data type: variable, corresponds to attribute DTYPE) The value of the parameter; it is mutable.

6.10 Expressions

Expressions define a quantity computed from the existing scalar columns, vector column elements and parameters of a catalogue using some algebraic or logical (boolean) expression. An expression adopts a value for every row in the catalogue. It is similar to a column, except that its value is computed ‘on the fly’ from existing columns (and parameters), rather than being stored in the catalogue. Usually an expression will evaluate to a numeric value, but it may equally well evaluate to a LOGICAL or CHARACTER value. For example, if a catalogue contained columns x and y an expression might be ‘x + y’. The syntax for specifying expressions is described in Appendix B.

An expression has a set of attributes which are identical to those for a scalar column (see Section 6.7 and Table 3), but with the following exceptions.

6.11 Selections

Selections define a set of rows selected from a catalogue according to some criteria. For example, if a catalogue contained column x then the selection criterion might be ‘x > 10.0’, that is, the selection would comprise the set of rows for which the field of column x was greater than 10.0. The syntax for specifying expressions is described in Appendix B.

The attributes of a selection are listed in Table 8 and described below. With the exception of the comments attribute, COMM, they are all mandatory and immutable and are set automatically when the selection is created.


Attribute Name Data type



Expression EXPR _CHAR
Number of rows NUMSEL _INTEGER
Comments COMM _CHAR
Modification date DATE _DOUBLE

Table 8: Attributes of a selection

EXPR  (Data type: _CHAR, size = CAT__SZEXS) The expression which rows in the catalogue must satisfy in order to be included in the selection. NUMSEL  (Data type: _INTEGER) The number of rows in the selection. COMM  (Data type: _CHAR, size = CAT__SZCOM) Explanatory comments describing the selection. DATE  (Data type: _DOUBLE) In version 9.0 of CAT the modification date is present, but not used. It is set to 0.0D0 when the selection is created.

6.12 Indices

Indices are a mechanism for accessing the rows of a catalogue as though they were sorted into ascending or descending order on some column. For example, if an ascending index was created on REAL column DEC and the rows of the catalogue were accessed through this index the rows would appear in ascending order of DEC. In CAT version 9.0 indices are temporary entities which persist only for the duration of the application which generated them and perish when it terminates. Future versions of CAT will support permanent indices which persist after the application which generated them terminates.

Indices can be created from columns of any of the numeric data types. They should not be created from columns of data type CHARACTER or LOGICAL. If an index is created on a column which contains null values then the rows for which the column is null will appear after all the rows with a valid value. The order of such rows is unpredictable.

The attributes of an index are listed in Table 9 and described below. They are all mandatory and immutable and are set automatically when the index is created.


Attribute Name Data type



Column identifier COLID _INTEGER
Order ORDER _INTEGER
Number of rows NUMSEL _INTEGER
Comments COMM _CHAR
Modification date DATE _DOUBLE

Table 9: Attributes of an index

COLID  (Data type: _INTEGER) The identifier of the column from which the index was created. ORDER  (Data type: _INTEGER) The order of the index. The possibilities are:

CAT__ASCND – ascending,
CAT__DSCND – descending.

NUMSEL  (Data type: _INTEGER) The number of rows in the index. In CAT version 9.0 the number of rows in the index is necessarily the number of rows in the catalogue. The attribute is present in order to allow indices to be created from selections in future versions of CAT. COMM  (Data type: _CHAR, size = CAT__SZCOM) Explanatory comments describing the index. DATE  (Data type: _DOUBLE) In version 9.0 of CAT the modification date is present, but not used. It is set to 0.0D0 when the selection is created.  

10Future versions of CAT will support permanent indices.

11This description extends the discussion in Section 4, which for simplicity omitted to mention mutable attributes. 12An array is used instead of a scalar to allow the possibility of introducing multi-dimensional arrays in a future version of CAT. 13Magnitudes, which properly are dimensionless, can, of course, have units of ‘MAGNITUDES’ or ‘MAG’ or whatever, if so desired.