Components of a CAT catalogue

Processing math: 100%

←Prev
CAT
Catalogue and Table Manipulation Library
Next→
TOC ↑

6 Components of a CAT catalogue

6.1 Provision for future enhancements
6.2 Symbolic constants
6.3 Catalogues, components and attributes
6.4 Identifiers
6.5 Attributes
6.6 Catalogue attributes
6.7 Columns
6.8 Vector column elements
6.9 Parameters
6.10 Expressions
6.11 Selections
6.12 Indices

This section describes the components of a CAT catalogue. It is necessary to understand the structure of a CAT catalogue in order to use the CAT library effectively. An idealized computer-readable version of an astronomical catalogue, or similar tabular dataset, might comprise the following elements:

(1): the table of values which comprise the catalogue,
(2): a description of this table; the details of all the columns that it contains, the number of rows etc,
(3): textual information about the catalogue; perhaps a short description of the catalogue or a copy of a published paper describing it. This information is intended to be read by a human rather than interpreted by a computer.

The CAT library is mostly concerned with the first two items. However, it also provides some simple facilities to retrieve and write the textual information of the third item. These latter facilities are provided so that the textual information in a catalogue can be displayed to a user or copied when a new catalogue is created from an old one. The routines for manipulating textual information are described in Section 7.9. They do not interact with any other items in a CAT catalogue and they are not mentioned again in this section.

The table in a CAT catalogue is very similar to a relation in the theory of relational databases, and has many of the same properties. Each row in the table must contain the same number of fields. Corresponding fields in different rows must be of the same type. The table may contain an arbitrary number of rows. In the formal theory of relational databases, no two rows may be identical. CAT relaxes this rule by permitting identical rows, though it is difficult to see what purpose such rows might serve.

The internal organization of a CAT catalogue (the way it is formatted on disk) is unknown to an application using the CAT library. The values in the catalogue are accessed purely through the subroutine interface to the CAT library.

6.1 Provision for future enhancements

This manual describes version 9.0 of the CAT library. The original specification for the library is described in the document The Starlink Subroutine Interface for Manipulating Catalogues (StarBase/ACD/3.4)[2]. Version 9.0 of CAT is a subset of this full implementation and some of the items present in it serve no apparent purpose. These items correspond to features which were in the original specification but which are not currently implemented. These items may be implemented in future versions and have been included so that applications written now will be compatible with future versions of the library.

6.2 Symbolic constants

Various symbolic constants are referred to throughout this section. These constants are defined in INCLUDE file CAT_PAR, which may be INCLUDEd in the subroutines of an application. See Section 3.2 for details of how to access this file.

6.3 Catalogues, components and attributes

In the CAT model of a catalogue a catalogue comprises a number of components. In version 9.0 of CAT a catalogue may contain two sorts of components: columns and parameters:

columns: define the individual columns (scalars or vectors) in the table,
parameters: provide single items of information which apply to the entire catalogue. Examples might be the epoch or equinox of celestial coordinates in the catalogue.

A catalogue may contain an arbitrary number of columns and an arbitrary number of parameters. Columns and parameters are permanent entities which persist in between invocations of applications accessing the catalogue through the CAT library (typically as items in a disk file). In addition to these permanent components additional sorts of temporary components may be created by CAT: expressions, selections and indices.

expressions: define a quantity computed from existing columns (and parameters) using some algebraic or logical (boolean) expression,
selections: define a set of rows selected from the catalogue according to some criteria.
indices: define an order for accessing rows in the catalogue equivalent to sorting the catalogue on a specified column.

Unlike columns and parameters, expressions, selections and indices are ephemeral entities¹⁰ which perish when the application using CAT which created them terminates.

Every component consists of a number of attributes. Each type of component (permanent or temporary; column, parameter, expression, selection or index) has a fixed set of attributes, each identified by name. The values of the attributes differ between components, and their totality defines the component. Additionally there are two special attributes which apply to the entire catalogue, rather than to a particular component. This hierarchy is illustrated in Figure 1. Subsequent sections describe these catalogue attributes and the attributes of columns, parameters, expressions, selections and indices.

PICT

Figure 1: The hierarchy of catalogues, components and attributes

6.4 Identifiers

Catalogues, columns, parameters, expressions, selections and indices are all identified by an identifier. Each identifier is an INTEGER number. The value of an identifier is unique (within a given invocation of an application) and is sufficient to identify the item to which it refers. The following rules apply when using CAT identifiers in applications:

an application should never set the value of a new identifier itself; the CAT library will always generate a new identifier,
an application should never modify the value of an existing identifier once CAT has allocated it,
an application never needs to know the actual value of an identifier.

An application can inquire what sort of item (catalogue, column, parameter etc.) an identifier represents using subroutine CAT_TIDTP. The various types of identifiers are represented using INTEGER codes, and symbolic constants are defined for these codes. They are listed in Table 2.

Type of identifier	CAT symbolic constant

Catalogue	`CAT__CITYP`
Column or field	`CAT__FITYP`
Vector column element	`CAT__FETYP`
Parameter	`CAT__QITYP`
Expression	`CAT__EITYP`
Selection	`CAT__SITYP`
Index	`CAT__IITYP`
Null identifier	`CAT__NOID`

Table 2: The types of identifiers

The catalogue to which a component (column, parameter etc.) belongs is referred to as the parent of that component. Subroutine CAT_TIDPR can be used to inquire the parent of an identifier. In CAT version 9.0 catalogues do not have parents. If CAT_TIDPR is used to try to find the parent of a catalogue then the null identifier is returned.

6.4.1 The null identifier

When CAT is asked to generate an identifier for an item which does not exist (such as the parent of a catalogue) it will return the ‘null identifier’. The meaning of this identifier is that the specified component does not exist. The symbolic constant for the null identifier is CAT__NOID.

6.5 Attributes

Attributes do not have their own identifiers. An attribute is specified by the identifier of the component of which it is a part and its name. This combination is unique for a given attribute. For example the ‘data type’ attribute of a column (see Section 6.7) is specified by the identifier of the column and the name of the attribute (‘DTYPE’ in this case). Each attribute has a data type associated with it. Families of subroutines (one per data type) are available to set and inquire the values of attributes:

CAT_TATT $<$ t $>$: – set an attribute,
CAT_TIQA $<$ t $>$: – inquire the value of an attribute.

See Section 7.8.1 for details of using these subroutines.

All the attributes for a given component adopt values when the component is created. Some attributes are mandatory, in which case values must be supplied for them. For the remaining attributes values are optional, and if they are not supplied defaults are adopted.

Most attributes are immutable; they are specified once when the component is created and are fixed thereafter. A few, however, are mutable and may be changed at any stage during the life of the component. The immutable attributes of all the columns in a catalogue are frozen when values are first written to the table¹¹.

6.6 Catalogue attributes

In addition to its collection of column and parameter components, all with their individual attributes, a catalogue also has several attributes which apply directly to the entire catalogue, rather than to an individual component (see Figure 1 in Section 6.3). These attributes are described below. NAME (data type: _CHAR, size = CAT__SZCNM) The name of the catalogue. It is specified when the catalogue is created and is mandatory and immutable. The NAME attribute is related to the file name of the catalogue as follows. It is the same as the file name, but without any preceding directory specification or trailing file type. Thus, if CATNAME is the NAME attribute then the corresponding file name is:

directory_specification/CATNAME.file_type

The file type corresponds to the format of the catalogue (FITS table, Small Text List etc). The various options are described in Appendix C. DATE (data type: _DOUBLE) In version 9.0 of CAT the modification date is present, but not used. It is set to 0.0D0 when the column is created. BACK (data type: _INTEGER) The back-end type of the catalogue. It will be one of the CAT__BKFIT, CAT__BKSTL or CAT__BKTST. These symbolic constants are defined in include file CAT_PAR. In version 9.0 of CAT the modification date is present, but not used. It is set to 0.0D0 when the column is created. PATH (data type: _CHAR, size = CAT__SZCNF) The full path of the catalogue file. It is specified when the catalogue is created and is mandatory and immutable.

6.7 Columns

A column may contain either a single value for each row (as in standard relational database theory) or a one-dimensional array of values for each row. An array must be of fixed size, defined when the column is created. There is no upper limit to the number of elements which an array may contain. A single-valued column is called a scalar and a column containing an array is called a vector. The attributes of a column are listed in Table 3 and described below.

The attributes of an individual element of a vector column are somewhat different and are described in Section 6.8, below.

Attribute	Name	Data	Mut-	Mand-	Default
		type	-able	-atory

Name	NAME	_CHAR		$∙$
Genus	GENUS	_INTEGER			physical: `CAT__GPHYS`
Expression	EXPR	_CHAR			‘ ’
Data type	DTYPE	_INTEGER		$∙$
Character size	CSIZE	_INTEGER			20†
Dimensionality	DIMS	_INTEGER			scalar: `CAT__SCALR`
Size§	SIZE	_INTEGER			1
Null or locum	NULL	_INTEGER			HDS: `CAT__NULLD`
Exception values	EXCEPT	_CHAR			‘ ’
Scale factor	SCALEF	_DOUBLE			1.0D0
Zero point	ZEROP	_DOUBLE			0.0D0
Order	ORDER	_INTEGER			none: `CAT__NOORD`
Units	UNITS	_CHAR	$∙$		‘ ’
External format	EXFMT	_CHAR	$∙$		varies with data type
Preferential display	PRFDSP	_LOGICAL	$∙$		true
Comments	COMM	_CHAR	$∙$		‘ ’
Modification date	DATE	_DOUBLE	$∙$		0.0D0

†: The size of character strings; other data types have CSIZE = 0.
§: SIZE is a single-element array, not a scalar.

Table 3: Attributes of columns

NAME (data type: _CHAR, size = CAT__SZCMP) The name of the column. The rules for column names are as follows.

The name must be unique within the totality of parameters and columns for the catalogue. This condition is necessary in order that a component (parameter or column) may be identified unambiguously when its name is used in an expression.
A name may comprise up to fifteen characters (CAT__SZCMP). This value is chosen for consistency with HDS and is adequate for FITS tables.
The name can contain only: upper or lower case alphabetic characters (a-z, A-Z), numeric characters (0-9) and the underscore character (‘_’). Note that lower case alphabetic characters must be allowed in order to access existing FITS tables. However, corresponding upper and lower case characters are considered to be equivalent. Thus, for example, the names: HD_NUMBER, HD_Number and hd_number would all refer to the same column.
The first character must be a letter.

GENUS (Data type: _INTEGER) In version 9.0 of CAT the genus attribute is present, but not used. It is set to CAT__GPHYS when the column is created. EXPR (Data type: _CHAR, size = CAT__SZEXS) In version 9.0 of CAT the expression attribute is present, but not used. It is set to blank (‘ ’) when the column is created. DTYPE (Data type: _INTEGER) The data type of values held in the column. The types permitted are listed in Table 4. They are deliberately the same as the types permitted in HDS and include the standard types of Fortran 77.

HDS Type	DEC Fortran	CAT symbolic	Description	Standard
	Type	constant		Fortran 77?

_BYTE	BYTE	`CAT__TYPEB`	Signed byte	No
_WORD	INTEGER $*$ 2	`CAT__TYPEW`	Signed word	No
_INTEGER	INTEGER	`CAT__TYPEI`	Signed integer	Yes
_REAL	REAL	`CAT__TYPER`	Single precision	Yes
_DOUBLE	DOUBLE PRECISION	`CAT__TYPED`	Double precision	Yes
_LOGICAL	LOGICAL	`CAT__TYPEL`	Logical	Yes
_CHAR[ $* n$ ]	CHARACTER[ $* n$ ]	`CAT__TYPEC`	Character string	Yes

$n$ is the number of elements in the character string; it is a positive integer. In a CAT CHARACTER column the size of the string is stored in attribute CSIZE.

_BYTE and _WORD correspond exactly to the DEC Fortran data types BYTE and INTEGER*2 respectively; equivalent types exist in most other implementations of Fortran. The non-standard data types typically are required to accommodate raw data generated by instruments. The ranges of the primitive numeric types will be defined by the particular implementation of Fortran on the computer being used (this table is adapted from SUN/92[11]. See in particular the table in Section 2.2, p3).

Table 4: Permitted data types (adapted from SUN/92)

CSIZE (Data type: _INTEGER) For a CHARACTER column, the size of the column, otherwise not used and by convention set to zero. DIMS (Data type: _INTEGER) The dimensionality of the column; a flag indicating whether it is a scalar or a vector. For a scalar column it is set to CAT__SCALR and for a vector to CAT__VECTR. SIZE (Data type: _INTEGER; a single element array¹²) If the column is a vector this attribute contains the number of elements in the vector. If the column is a scalar it is set to one. NULL (Data type: _INTEGER) A flag indicating whether or not null values are recognized in the column. Three cases are recognized:

null values are present and are represented using the standard HDS null values (code: CAT__NULLD),
null values are present and are represented using a value specified when the column was created (code: CAT__NULLS),
null values are not present in the column (code: CAT__LOCUM).

The treatment of null values is discussed in Section 8.2, below. EXCEPT (Data type: _CHAR, size = CAT__SZVAL) The value used to represent the null value, or the locum value generated if nulls are not supported in the column. See Section 8.2 for a full description. SCALEF (Data type: _DOUBLE) The scale factor used to calculate the actual value of a scaled column from the scaled value stored. The actual value is computed according to the formula

$a c t u a l v a l u e = (S C A L E F \times s t o r e d v a l u e) + Z E R O P$

(1)

ZEROP (Data type: _DOUBLE) The zero point used to calculate the actual value of a scaled column from the scaled value stored. See above for the formula used. ORDER (Data type: _INTEGER) The order in which individual fields in the column are stored. The three possibilities, together with the corresponding symbolic constants, are listed in Table 5.

Column order	CAT symbolic constants

ascending	`CAT__ASCND`
descending	`CAT__DSCND`
unordered	`CAT__NOORD`

Table 5: Alternatives for the ordering of columns

UNITS (Data type: _CHAR, size = CAT__SZUNI) The units in which values stored in the column are expressed. The UNITS attribute is used to identify, and control the appearance of, columns of angles (see Section 8.3). Apart from this exception the units are treated purely as comments and no attempts are made to automatically propagate and convert units in calculations and selections. Case sensitivity is irrelevant for units since they are treated purely as comments. The units attribute can be left completely blank; a blank units attribute implies that the units are unknown. If it were desired to distinguish a dimensionless quantity from one with unknown units, the string ‘DIMENSIONLESS’ could be put in the units attribute¹³. EXFMT (Data type: _CHAR, size = CAT__SZEXF) The format used to represent a datum extracted from a column for external display on a screen or in a text file. These formats are used solely for external display, not internal conversion. The external format specifier should be a valid Fortran 77 format specifier for the data type of the column. PRFDSP (Data type: _LOGICAL) The preferential display flag; a logical flag which indicates to reporting applications whether, by default, the column is to be displayed or not. It is coded as follows:

.TRUE.: – display the column by default,
.FALSE.: – do not display the column by default.

COMM (Data type: _CHAR, size = CAT__SZCOM) Explanatory comments describing the column. The comments may be up to eighty characters long (CAT__SZCOM). DATE (Data type: _DOUBLE) In version 9.0 of CAT the modification date is present, but not used. It is set to 0.0D0 when the column is created.

6.8 Vector column elements

CAT treats vectors in quite a simple fashion. Values can only be GOT or PUT for individual vector elements; there are no routines for processing entire vectors. In order to access individual elements it is necessary to assign identifiers to them. Identifiers for vector elements, like those for scalar columns and parameters, are obtained using CAT_TIDNT. The name of a vector column element passed to CAT_TIDNT has the same syntax as the NAME attribute of the element, as described below. The attributes of a vector column element identifier are different to the identifiers for the whole column; they are listed in Table 6.

Attribute	Name	Data type

Name	NAME	_CHAR
Data type	DTYPE	_INTEGER
Character size	CSIZE	_INTEGER
Base identifier	BASEID	_INTEGER
Vector element	ELEM	_INTEGER

Table 6: Attributes of a vector column element

All these attributes are created automatically when an identifier is obtained for the element; they are all mandatory and immutable.

The vector column to which a vector column element belongs is referred to as the column element’s base column. The DTYPE and CSIZE attributes of a vector column element are necessarily identical to the corresponding attributes for its base column. The details of the remaining attributes are as follows. NAME (data type: _CHAR, size = CAT__SZCMP) The name of the vector column element. That is, the name of the base column, followed by the number of the vector element, enclosed in square brackets. The number of the first element is one. Thus, the name of the fourth element of vector column FLUX would be FLUX[4]. BASEID (data type: _INTEGER) The identifier of the base column of the vector element. ELEM (data type: _INTEGER) The sequence number of the element in the column vector. The first element is numbered one. Thus, for example, if the name of the vector column element was FLUX[4] the value of the ELEM attribute would be four.

6.9 Parameters

Parameters are items of information which apply to the entire catalogue. Examples are the equinox or epoch of the celestial coordinates in the catalogue. The attributes of a parameter are given in Table 7. In CAT version 9.0 parameters must be scalars. However, they have the attributes dimensionality and size to allow for the possibility of vector parameters in future versions of CAT.

Attribute	Name	Data	Mand-	Default
		type	-atory

Name	NAME	_CHAR	$∙$
Data type	DTYPE	_INTEGER	$∙$
Character size	CSIZE	_INTEGER		`CAT__SZVAL`†
Dimensionality	DIMS	_INTEGER		scalar: `CAT__SCALR`
Size§	SIZE	_INTEGER		1
Units	UNITS	_CHAR		‘ ’
External format	EXFMT	_CHAR		varies with data type
Preferential display	PRFDSP	_LOGICAL		true
Comments	COMM	_CHAR		‘ ’
Value	VALUE	varies		zero or ‘ ’
Modification date	DATE	_DOUBLE		0.0D0

†: The size of character strings; other data types have CSIZE = 0.
§: SIZE is a single-element array, not a scalar.

Table 7: Attributes of parameters

All these attributes, except VALUE, are deliberately identical to the corresponding attributes for columns (see Table 3 and Section 6.7, above, for details). VALUE (Data type: variable, corresponds to attribute DTYPE) The value of the parameter; it is mutable.

6.10 Expressions

Expressions define a quantity computed from the existing scalar columns, vector column elements and parameters of a catalogue using some algebraic or logical (boolean) expression. An expression adopts a value for every row in the catalogue. It is similar to a column, except that its value is computed ‘on the fly’ from existing columns (and parameters), rather than being stored in the catalogue. Usually an expression will evaluate to a numeric value, but it may equally well evaluate to a LOGICAL or CHARACTER value. For example, if a catalogue contained columns x and y an expression might be ‘x + y’. The syntax for specifying expressions is described in Appendix B.

An expression has a set of attributes which are identical to those for a scalar column (see Section 6.7 and Table 3), but with the following exceptions.

The GENUS attribute is always set to CAT__GVIRT.
The EXPR attribute is set to the algebraic expression used to compute the value of the expression.
The DIMS attribute is always CAT__SCALR; expressions are always scalars and vector expressions are forbidden.

6.11 Selections

Selections define a set of rows selected from a catalogue according to some criteria. For example, if a catalogue contained column x then the selection criterion might be ‘x $>$ 10.0’, that is, the selection would comprise the set of rows for which the field of column x was greater than 10.0. The syntax for specifying expressions is described in Appendix B.

The attributes of a selection are listed in Table 8 and described below. With the exception of the comments attribute, COMM, they are all mandatory and immutable and are set automatically when the selection is created.

Attribute	Name	Data type

Expression	EXPR	_CHAR
Number of rows	NUMSEL	_INTEGER
Comments	COMM	_CHAR
Modification date	DATE	_DOUBLE

Table 8: Attributes of a selection

EXPR (Data type: _CHAR, size = CAT__SZEXS) The expression which rows in the catalogue must satisfy in order to be included in the selection. NUMSEL (Data type: _INTEGER) The number of rows in the selection. COMM (Data type: _CHAR, size = CAT__SZCOM) Explanatory comments describing the selection. DATE (Data type: _DOUBLE) In version 9.0 of CAT the modification date is present, but not used. It is set to 0.0D0 when the selection is created.

6.12 Indices

Indices are a mechanism for accessing the rows of a catalogue as though they were sorted into ascending or descending order on some column. For example, if an ascending index was created on REAL column DEC and the rows of the catalogue were accessed through this index the rows would appear in ascending order of DEC. In CAT version 9.0 indices are temporary entities which persist only for the duration of the application which generated them and perish when it terminates. Future versions of CAT will support permanent indices which persist after the application which generated them terminates.

Indices can be created from columns of any of the numeric data types. They should not be created from columns of data type CHARACTER or LOGICAL. If an index is created on a column which contains null values then the rows for which the column is null will appear after all the rows with a valid value. The order of such rows is unpredictable.

The attributes of an index are listed in Table 9 and described below. They are all mandatory and immutable and are set automatically when the index is created.

Attribute	Name	Data type

Column identifier	COLID	_INTEGER
Order	ORDER	_INTEGER
Number of rows	NUMSEL	_INTEGER
Comments	COMM	_CHAR
Modification date	DATE	_DOUBLE

Table 9: Attributes of an index

COLID (Data type: _INTEGER) The identifier of the column from which the index was created. ORDER (Data type: _INTEGER) The order of the index. The possibilities are:

CAT__ASCND – ascending,
CAT__DSCND – descending.

NUMSEL (Data type: _INTEGER) The number of rows in the index. In CAT version 9.0 the number of rows in the index is necessarily the number of rows in the catalogue. The attribute is present in order to allow indices to be created from selections in future versions of CAT. COMM (Data type: _CHAR, size = CAT__SZCOM) Explanatory comments describing the index. DATE (Data type: _DOUBLE) In version 9.0 of CAT the modification date is present, but not used. It is set to 0.0D0 when the selection is created.

¹⁰Future versions of CAT will support permanent indices.

¹¹This description extends the discussion in Section 4, which for simplicity omitted to mention mutable attributes. ¹²An array is used instead of a scalar to allow the possibility of introducing multi-dimensional arrays in a future version of CAT. ¹³Magnitudes, which properly are dimensionless, can, of course, have units of ‘MAGNITUDES’ or ‘MAG’ or whatever, if so desired.

←Prev
CAT
Catalogue and Table Manipulation Library
Next→
TOC ↑