C Catalogue formats

 C.1 FITS
 C.2 TST
 C.3 STL

CURSA can access catalogues held in three different formats: FITS tables, TST and STL. The restrictions and peculiarities associated with each of these formats are described below.

CURSA determines the type of a catalogue from the ‘file type’ component of the name of the file holding the catalogue. The file types for the various formats are included in the descriptions below. If a file name is specified without a file type then it is assumed to be a FITS table.

C.1 FITS

File types: .FIT .fit .FITS .fits .GSC .gsc

Mixed capitalisations, such as .Fit, are also supported.

The .GSC and .gsc file types tables are provided in order to allow regions of the HST Guide Star Catalog to be accessed easily (see also Section 24).

CURSA can read both binary and formatted FITS tables. It can write only binary FITS tables. It should be able to handle most components of FITS tables, with the exception of variable length array columns. If a variable length array column is encountered a warning message will be reported and the column will be ignored.

If a column containing no data is encountered a warning message will be generated and the column will be ignored.

In common with other Starlink software, CURSA does not support the COMPLEX REAL and COMPLEX DOUBLE PRECISION data types. If it encounters COMPLEX columns in a FITS table it represents them as follows:

Usually the table component of a FITS file occurs in the first FITS extension to the file. When reading an existing FITS file CURSA will look for a table in the first extension. In cases where the table is located in an extension other than the first you may specify the required extension by giving its number inside curly brackets after the name of the file. For example, if the table occurred in the third extension of a FITS file called perseus.FIT you would specify:

  perseus.FIT{3}

The closing curly bracket is optional. When CURSA writes FITS tables the table is always written to the first extension.

C.1.1 Textual information

The textual information for a FITS table comprises the entire contents of the primary header and the appropriate table extension header of the FITS file containing the table. The entire contents of both headers are returned because this is the best way to present the maximum amount of information about the catalogue to the user in its full context. For example, a FITS table COMMENT keyword may be used to annotate other keywords and if only the COMMENT keywords were returned ‘out of context’ they would be difficult to understand, and perhaps even misleading.

In addition CURSA invents two additional lines of textual information. The first precedes the primary header and serves to introduce it. The second is inserted between the primary header and the table extension header, and serves to introduce the table extension header.

C.2 TST

File types: .TAB .tab

Mixed capitalisations, such as .Tab, are also supported.

CURSA can read and write catalogues in the TST (Tab-Separated Table) format. The TST format is a standard for exchanging catalogue data and is commonly used to transfer subsets extracted from remote catalogues or archives across the Internet. Typically when a client such as catremote (see Section 25) running on your local computer queries a remote catalogue or archive the selected objects will be returned as a tab-separated table. In addition to CURSA, the TST format is also used by GAIA (see SUN/214[12]), SkyCat24 and Starbase (see Section 10.5). It is documented in SSN/75[9].

Compared to the other formats supported by CURSA, the TST format is somewhat deficient in the amount of metadata that it includes. In particular, the details stored for each column do not include its data type or units. Consequently, CURSA deduces a data type for each column by reading the values that it contains. This procedure usually works reasonably well, though occasionally it produces bizarre results. Unfortunately there is no similar simple trick which can replace the missing units. If you find that you need to fix-up the column details in a TST catalogue one approach is to use catcopy (see Section 14) to convert the catalogue to the STL format (see Appendices D and E) and then edit the STL column definitions, as appropriate. When CURSA writes a TST catalogue it saves the column data type, external format and units. These details are written in a format which CURSA can interpret if it subsequently reads the catalogue. Though this enhancement is specific to CURSA it is entirely consistent with the TST format and does not affect the ability of external programs to read the catalogues. The format in which the additional information is stored is documented in SSN/75.

The TST format does not support vector columns. If a catalogue containing vector columns is written as a tab-separated table each vector element is written as a scalar column.

Unsurprisingly, given its provenance as a medium for transporting subsets extracted from remote catalogues across the Internet, the tab-separated table format is intended for use with relatively small catalogues and is unsuitable for very large ones. Currently CURSA sets no upper limit to the size of catalogue for which it can be used. However, if you attempt to read a catalogue containing more than 15,000 rows a warning message is issued. A large TST format catalogue may take a while to open for reading and CURSA may be unable to access a very large TST catalogue25.

C.2.1 Textual information

The textual information for a tab-separated table comprises the entire description of the table. This approach makes the maximum amount of information about the catalogue available to the user in its full context.

C.2.2 Null values

In a tab-separated table the values for adjacent fields in a given row are separated by a tab character. In tab-separated tables written by CURSA null values are represented by two adjacent tab characters. That is, no value is included for the null field.

C.3 STL

File types: .TXT .txt

Mixed capitalisations, such as .Txt, are also supported.

CURSA can read and write catalogues in the STL (Small Text List) format. Unlike the other formats which CURSA can access the STL format is specific to CURSA. Nonetheless the STL format exists in order to allow easy access to both private tables and versions of standard catalogues held as text files. It is usually straightforward to create an STL catalogue from a text file containing a private list or standard catalogue.

In the STL format both the table of values for the catalogue and the definitions of its columns, parameters etc. are held in simple ASCII text files. These files may be created and modified with a text editor. The information defining the catalogue is called the description of the catalogue and the file in which it is held is called the description file.

When you specify a small text list you give the name of the description file. The table of values comprising the catalogue may either be in the same file as the description or in a separate file. If the table of values occurs in a separate file then the name of this file is specified in the description file and CURSA places no restrictions on this name other than those imposed by the host operating system.

Appendix D is a simple tutorial introduction to STL descriptions. The basic format is described in full in Appendix E. In addition to the basic STL format there is a variant which allows STL format files to inter-operate with applications in the KAPPA image processing package (see SUN/95[5]). This variant is described in Appendix F.

CURSA can read STL format catalogues with either a free format or a fixed-format table of values. However, CURSA can only write STL format catalogues with a free format table. The KAPPA variant of the STL may be both read and written.

As its name implies, the Small Text List format is intended for use with relatively small catalogues and it is unsuitable for very large catalogues. Currently there is no upper limit to the size of catalogue for which it can be used. However, if you attempt to read a catalogue containing more than 15,000 rows a warning message is issued. A large STL format catalogue may take a while to open for reading and CURSA may be unable to access a very large STL catalogue26.

C.3.1 Textual information

The textual information for an STL catalogue comprises the entire contents of the description. This approach makes the maximum amount of information about the catalogue available to the user in its full context.

C.3.2 Null values

The STL format provides support for null values (see Section 5). A null value for a field in an STL table is indicated by inserting the string ‘<null>’ at the appropriate place in the input file. When CURSA reads this string it will interpret it as a null value. Actually, if CURSA encounters any value for a field which it cannot interpret given the data type of the column (such as a string containing alphabetic characters in a field for an INTEGER column) then the field is interpreted as null. However, when preparing STL files I recommend that you indicate nulls using the string ‘<null>’. This string is recognised as indicating a null value even for CHARACTER columns.

When CURSA writes an STL catalogue null fields in the table are represented by the string ‘<null>’.

Null values are not permitted in the KAPPA variant of the STL format (see Appendix F).

24http://archive.eso.org/skycat/

25For information, the underlying reason for this behaviour is that CURSA attempts to memory-map work arrays to hold the columns of an TST catalogue and then reads the table into these arrays when an input catalogue is opened. For a very large catalogue CURSA may be unable to map the required arrays.

26For information, the underlying reason for this behaviour is that CURSA attempts to memory-map work arrays to hold the columns of an STL catalogue and then reads the table into these arrays when an input catalogue is opened. For a very large catalogue CURSA may be unable to map the required arrays.