catpair
is provided to identify ‘corresponding’ objects in two catalogues; objects are considered to
correspond if they have similar positions. An output catalogue is generated from the list of
corresponding objects.
In astronomical catalogues the ‘corresponding’ rows in two catalogues are usually rows which contain data for the same astronomical object. Traditionally in relational database systems corresponding rows are identified by having identical values for some field, such as a name. For example, two rows might be considered to correspond if a name field in both catalogues adopted the value ‘NGC 1305’ for both rows. This operation is usually called joining the two catalogues.
x | - | Object in primary dataset. |
- | Object in secondary dataset. | |
Adjacent objects are pairs.
In astronomical problems such joining by an exact match is relatively uncommon. A more common
case is where corresponding objects are identified by similar positions in both catalogues. This
situation is illustrated in Figure 7. The important point here is that, essentially because of
measurement errors, the corresponding positions are merely similar, not an exact match. This
circumstance makes establishing corresponding rows a much more complicated and problematic
process. In practice the positions used are almost always some type of two-dimensional
coordinates; usually celestial coordinates such as Right Ascension and Declination, or possibly
Cartesian coordinates of some sort. In principle one, three or higher dimensional coordinates
could be used though they are not important in practice. catpair
only supports joining
based on two-dimensional coordinates, though the coordinates may be either Cartesian or
spherical-polar.
In CURSA this special sort of join based on an approximate match in two-dimensional coordinates is called pairing. Thus, in this usage, pairing is a special case of joining catalogues, albeit one which is important in astronomical practice.
catpair
operates on two input catalogues, known as the primary and secondary catalogues.
To fix ideas, think of the primary as being a small list of target objects which you have
compiled, and the secondary as being a standard catalogue, such as the SAO star catalogue,
one of the Durchmusterungen or Dreyer’s New General Catalogue of non-stellar objects.
The final result of the pairing is a new catalogue containing the paired objects; the output
catalogue.
If you wish to pair several catalogues to create a single output catalogue you should invoke catpair
several times, creating intermediate paired catalogues as appropriate.
Pairing is a relatively complicated process and you must answer several prompts to fully
specify the operations to be performed. The following two sections, ‘Requirements’ and
‘Running catpair
’ respectively describe the requirements for catpair
and how to run it. You
should read at least these two sections. Subsequent sections describe various aspects of
the pairing process in greater detail. While it is not strictly necessary to read these latter
sections, they may help you to understand what catpair
is doing and hence to use it more
effectively.
Obviously before running catpair
you must have a primary and a secondary catalogue. The secondary
catalogue must be sorted on the second column that is to be used for the pairing (usually this will be
the or
Declination coordinate). If your secondary is not sorted in this way then use catsort
(see Section 15,
above) to create a suitably sorted secondary catalogue.
You need to know the names of the columns in both catalogues which contain the
coordinates which are to be used for the pairing (and whether they are Cartesian or
spherical-polar coordinates). If you are in doubt about the columns in the catalogues use
catheader
(see Section 13, above) to obtain the details. If the coordinates are Cartesian
then the coordinates in both input catalogues must be in the same system, with the same
units,14
zero point and orientation. That is, a given value for the coordinates (say 23.5, 105.7) should
correspond to the same position in both catalogues. If the coordinates are spherical-polar they must
always be in units of radians. The coordinates in the two catalogues should be of the same type
(equatorial, Galactic etc.) and if they are equatorial they should have the same system, epoch and
equinox.
Finally you need to specify the critical distance,
, which
determines whether two objects, one in each catalogue, are considered pairs or not. If the actual
separation of the two objects is less than or equal to this distance then they are considered pairs; if it is
greater then they are not. In catpair
this critical distance may be either a constant, a column in the
primary (so it varies for different objects in the primary) or an expression based on columns in the
primary. In practice the value adopted for the critical distance is often derived from the errors
associated with the positions in the catalogues. If you do not already know the errors on the positions
in your catalogues, you could consult the textual information information associated with the
catalogue, which will often contain these details. Again use catheader
(see Section 13) to access this
information.
To run catpair
simply type:
By default catpair
writes a summary of the pairing options specified as textual information in the
output catalogue. This information is useful documentation of the pairing and you will usually want
to retain it. However, you can specify that it is not to be written by specifying an extra item on the
command line, as follows:
There must be one or more spaces between ‘catpair
’ and ‘text=none
’. catpair
has an option to
include in the output catalogue three special columns containing additional details for the paired
objects. These columns are described in Section 20.2.1, below. By default these additional columns are
not created. To include them in the output catalogue type:
You must answer a fairly long series of prompts in order to specify the behaviour of catpair
. These
prompts are listed below, in the order in which they are issued by the program, together with a
corresponding explanation. In this list the prompts are identified by the corresponding ADAM
parameter name, which appears at the start of the prompt line.
PRIMARY
SECOND
OUTPUT
catpair
will automatically create the output
catalogue in toto.
CRDTYP
S
’) Specify the type of coordinates which are to be used for the pairing. The
possibilities are either Cartesian coordinates (‘C
’) or celestial spherical-polar coordinates
(‘S
’) such as Right Ascension and Declination.
PCRD1
PCRD2
SCRD1
SCRD2
PDIST
If the pairing coordinates are Cartesian then a constant critical distance would typically be
specified as a simple decimal number, for example ‘23.0’. However, if they were celestial
coordinates then it could be specified as any of the forms in which an angle can be input: a
floating point number in radians, or a sexagesimal value in hours or degrees. In addition
a special format is available in catpair
in which the separation is given as a floating
point number expressed in seconds of arc, immediately followed by the string ‘arcsec
’.
For example, a separation of twenty-three minutes of arc could be entered as any of the
following values:
+00:23:00 | (sexagesimal degrees) | |
1380.0arcsec | (seconds of arc) | |
00:01:31.99 | (sexagesimal hours) | |
6.6904288E-3 | (radians) | |
Note that the sign is necessary in the value in sexagesimal degrees to ensure that the value is interpreted as degrees, not hours. The examples in sexagesimal hours and radians are not particularly sensible here.
PRTYP
C
’) Select the ‘type of pairing’ required, that is specify which set of rows from the
two input catalogues are to be retained in the output catalogue. Briefly, the options
are:
C
M
P
R
A
These options are described in greater detail in Section 20.4, below.
MULTP
yes
’) Specify how multiple matches in the primary are to be handled. The options
are either to retain the single closest match or to retain all the matches. The treatment of multiple
matches is described in detail in Section 20.5, below.
MULTS
no
’) Specify how multiple matches in the secondary are to be handled. The options
are either to retain the single closest match or to retain all the matches. The treatment of multiple
matches is described in detail in Section 20.5, below.
ALLCOL
yes
’) Specify the set of columns to be retained in the output catalogue. The
options are to either retain all the columns from both input catalogues or to retain specified
columns from either input catalogue. If you are in doubt you should retain all the columns. This
alternative is the ‘safest’ and simplest, though it may result in the output catalogue containing
columns which you do not need and consequently using more disk space than is strictly
necessary.
If you choose to retain all the columns they are simply copied automatically from the input catalogue, without further intervention on your part. However, if you choose to specify the columns to retain you will subsequently be prompted for the names of the columns to be retained (and hence you must be prepared with this information). The details of specifying named input columns are described in Section 20.2.2, below.
If you choose to retain all the columns, the columns created in the output catalogue will have the
same names (and other attributes) as the corresponding columns in the input catalogue.
However, in the case where identically named columns in the primary and secondary catalogues
would cause the output catalogue to contain two identically named columns, the names of the
columns in the output catalogue are disambiguated by appending ‘_S
’ to the name of the
column originating in the secondary.
PRMPAR
yes
’) Specify whether the parameters of the primary are to be copied to the output
catalogue.
SECPAR
no
’) Specify whether the parameters of the secondary are to be copied to the
output catalogue.
PTEXT
C
’) Specify what textual information associated with the primary is to be copied to
the output catalogue. The options are: ‘A
’ - all, ‘C
’ - comments and history only and ‘N
’ -
none.
STEXT
N
’) Specify what textual information associated with the secondary is to be copied
to the output catalogue. The options are: ‘A
’ - all, ‘C
’ - comments and history only and ‘N
’ -
none.
If catpair
is invoked with the option spcol=true
then three special columns giving details of the
pairing for each object will be included in the output catalogue. These columns are:
SEPN
PMULT
SMULT
Usually fields in columns PMULT
and SMULT
will have a value of one for paired objects. However, in
cases where there were multiple matches for the pair the values will be larger. See Section 20.5, below
for a discussion of the handling of multiple matches.
If you choose to retain in the output catalogue only some of the columns in the two input catalogues
you will be prompted to supply the names of the columns required and hence you must be
prepared with this information. If you are not familiar with the details of the columns in your
input catalogues you can use catheader
(see Section 13, above) to obtain the necessary
information.
Once you have indicated that you are to retain only specified columns (by replying ‘NO
’ to prompt
ALLCOL
) you will be prompted to enter the names of columns to be retained from the primary
catalogue. Type the name of the first column required then hit return. For example to retain column X
simply type:
A corresponding column with the same name and other attributes will be created in the output
catalogue. Columns may also be retained with a name in the output catalogue which differs from the
name of the corresponding input column. In this case you type: the name of the input column,
a right chevron and the name required for the new output column. For example, if the
column was called X
in the input catalogue and X_PRIM
in the output catalogue you would
type:
An arbitrary number of spaces may appear on either side of the right chevron. A column with the specified new name will be created in the output catalogue, and all its other attributes will be the same as those of the corresponding column in the input catalogue.
Continue in this fashion until you have entered all the columns required from the primary. Then type:
Next you will be prompted for the names of the columns required from the secondary. Proceed exactly
as for the primary and again type END
when you have finished.
If you are retaining a large number of columns it is inconvenient (and, indeed, error-prone) to have to
supply all the column names interactively in response to prompts. In this case it is much more
convenient to run catpair
from a script, and I strongly recommend that you do so. This option is
described in Section 20.2.3, below.
The handling of multiple columns with the same name in the output catalogue is rather different when column names are being specified than when all the columns are being copied automatically. A single column with the specified name is created in the output catalogue and values for all the appropriate columns in the input catalogue are written to the field of this column for the current row. This behaviour is adopted because there there are cases, particularly in MOSAIC and ALLREJ pairing where you might want fields for corresponding columns in the two input catalogues to be written to a single column in the output catalogue. In the case where fields are available from both the primary and secondary catalogues it is always the field from the secondary which is retained.
Often it is more convenient to run catpair
from a prepared script rather than answering the prompts
interactively. This end is simply achieved using Unix’s input redirection mechanism. Simply enter the
responses to the various prompts into a text file, in the correct order, using a text editor. Then
type:
where script_file is the name of the file you have created. Figure 8 shows an annotated
example catpair
script for pairing with Cartesian coordinates. This script is available as
file:
An example script showing pairing with spherical-polar coordinates is available as file:
It may be convenient to use these scripts as starting points for developing your own scripts.
prim | primary catalogue |
sec | secondary catalogue |
out | output catalogue |
C | the pairing coordinates are Cartesian |
X | column with -coordinate for pairing in the primary |
Y | column with -coordinate for pairing in the primary |
X | column with -coordinate for pairing in the secondary |
Y | column with -coordinate for pairing in the secondary |
10.0 | the critical distance |
C | COMMON pairing |
Y | include all the primary multiple matches |
Y | include all the secondary multiple matches |
N | specify the columns to retain |
X | } |
Y | } columns retained from the primary |
ROW | } |
END | end of list of columns from the primary |
X > X_SEC | } columns retained from the secondary |
Y > Y_SEC | } (note the renaming of these columns) |
ROW > ROW_SEC | } |
END | end of list of columns from the secondary |
Y | include primary parameters |
N | exclude secondary parameters |
N | exclude primary textual information |
N | exclude secondary textual information |
The column on the left (in a courier
font) shows the entries in a catpair
script file.
The column on the right (in a roman font) briefly describes the corresponding entry.
This section discusses the criteria used to determine whether two objects, one from each of the two input catalogues, ‘correspond’ or pair. The two objects pair if the difference in their two-dimensional coordinates is smaller than some specified critical distance, . The formulæ differ for Cartesian and celestial coordinates.
If the two objects have Cartesian coordinates and then the criterion is simply that should be less than or equal to the Pythagorean distance between the two points:
(7) |
If the two objects have celestial spherical-polar coordinates (in practice Right Ascension and Declination) and then the criterion is that should be less than or equal to the great circle distance between the two coordinates:
(8) |
Equation 8 is the natural form for the great circle distance, simply derived by applying
spherical trigonometry to the two coordinates. In practice it has the disadvantage that
because of numerical errors it is inaccurate when the great circle distance is a small angle.
There are algebraically equivalent formulations which retain numerical accuracy for small
angles. In catpair
the great circle distance is calculated using the appropriate SLA
routine15,
which uses such a formulation in order to ensure accuracy for small angles.
The following three cases for the value of the critical distance,
, are
supported by catpair
.
A fourth case in which the critical distance is computed from an expression involving columns in both
catalogues is not supported in catpair
. A special instance of this case which sometimes
arises is where both input catalogues have errors in their coordinates which vary with the
objects in the catalogues and thus are stored as columns, one in each catalogue. Objects are
considered to pair when their error circles overlap. Here the expression for the critical distance,
, would
involve columns (containing the errors) from both catalogues and hence this case is not
supported.
Figure 9 illustrates the result of pairing two catalogues, with a set of corresponding rows in the
catalogues identified. There are a number of options for the set of rows to be included in an output
catalogue generated from such a pairing. The various alternatives available in catpair
are described
below.
This section describes how multiple matches are handled by catpair
. Multiple matches can arise
because the pairing techniques are matching objects with similar rather than identical positions and an
object in one catalogue can pair with several in the other catalogue. The terminology used in this
section is:
That is, any match is potentially a pair and the pairing algorithm must prescribe which matches are considered pairs. There are three cases for multiple matches:
catpair
is unsuitable for handling the third case, and should not be used with catalogues where it
is likely to be important. There are, however, several options for handling the first two
cases:
The third option is not practical in a general purpose program such as catpair
because it relies on
astronomical knowledge about the catalogues being paired. Either of the first two options may be
appropriate, depending on the details of the pairing being performed. catpair
provides both options
separately for multiple matches in the primary and secondary, and you should choose the alternatives
appropriate for your work.
- | Object in primary. | |
o | - | Object in secondary. |
For secondary objects to match with the primary object they must fall inside the square (strictly speaking the square should be a circle with a radius equal to the critical distance, ).
An example might help to illustrate the difference between multiple matches in the primary and secondary. Suppose the primary was a private list of target objects and the secondary was the NGC catalogue. Table 13 shows the equatorial coordinates for the triplet of galaxies NGC 3623, NGC 3627 and NGC 362816. Consider the following two cases.
This section describes the pairing algorithm used by catpair
. Strictly speaking you should not need
to know the details of the algorithm in order to use catpair
, but the information is provided for
reference and completeness. catpair
uses an index join technique which is illustrated in
Figure 13. The secondary catalogue is sorted on the second coordinate to be used in the
pairing.17 The
algorithm is then as follows. Every entry in the primary catalogue is examined sequentially and for each entry the
critical distance, ,
is used to compute the minimum and maximum values of the sorted coordinate which could pair with
the primary row. The rows in the secondary catalogue corresponding to these minimum and
maximum values are then identified (remember that the secondary is sorted on this column) to yield a
range of rows which might pair. All of these rows are then examined individually to check if they do
pair.
The advantages of this technique are that it is relatively straightforward and it does not require the primary catalogue to be sorted (though the secondary must). The main disadvantage is that the ranges in the secondary corresponding to subsequent rows in the primary may overlap, thus leading to multiple reads of rows in the secondary. The technique is most appropriate where a small primary is being paired with a large secondary; perhaps a small personal list of target objects is being paired with a large standard catalogue. However, it will certainly work if the primary and secondary are of similar size; it will merely take somewhat longer to execute than is strictly necessary.
14catpair
does not actually check that the units attribute is the same for the various columns holding the coordinates
because in CURSA units are treated purely as comments.
15See SUN/67[32]. The actual routine used is SLA_DSEP
.
16These data were taken from NGC 2000.0 by R.W. Sinnott[27].
17Spherical-polar coordinates must be sorted on Declination or latitude in order to avoid problems with the zero – twenty-four hour boundary.