Identify clumps of emission within a 1, 2 or 3 dimensional NDF
A pixel mask identifying pixels as background, clump or edge pixels is written to the Quality array of
each output NDF (see parameters OUT and QOUT). Three quality bits will be used; one is set if and
only if the pixel is contained within one or more clumps, another is set if and only if the pixel is
not contained within any clump, and the other is set if and only if the pixel is in a clump
but on the edge of the clump (i.e. has one or more neighbouring pixels that are not inside
a clump). These three quality bits have names associated with them which can be used
with the KAPPA applications SETQUAL, QUALTOBAD, REMQUAL, SHOWQUAL. The
names used are "
CLUMP"
, "
BACKGROUND"
and "
EDGE"
. For instance, to overlay the
outline of a set of 2D clumps held in NDF "
fred"
on a previously displayed 2D image, do
"
qualtobad fred fred2 background"
followed by "
contour noclear mode=good fred2"
.
Information about each clump, including a minimal cut-out image of the clump and the clump
parameters, is written to the CUPID extension of the output NDF (see the section "
Use of CUPID
Extension"
below).
An output catalogue containing clump parameters can be created (see parameter OUTCAT).
The algorithm used to identify the clumps (GaussCLumps, ClumpFind, etc) can be specified (see parameter METHOD).
If BACKOFF is FALSE, a clump that sits on a high background level will have a larger reported width
than an identical clump sitting on a lower background level. The position of the centroid may also be
affected by the background level. This is usually undesirable, and so the default value for BACKOFF
is usually TRUE. The main reason you may want to set BACKOFF to FALSE is if you want to compare
clump properties found by FINDCLUMPS with those found by the IDL version of CLUMPFIND
(which includes the background in its calculations). For this reason, the dynamic default value
got BACKOFF is TRUE, unless METHOD is "
ClumpFind"
and the ClumpFind.IDLAlg
configuration parameter is non-zero, in which case the dynamic default for BACKOFF is
FALSE.
Note, the other reported clump properties such as total data value, peak data value, etc, are always based on the full clump data values, including background. []
"
def"
(case-insensitive) or a null (!) value is supplied, a set of default configuration parameter values will be
used.
The supplied value should be either a comma-separated list of strings or the name of a text file preceded by an
up-arrow character "
$$"
, containing one or more comma-separated list of strings. Each string is either a "
keyword=value"
setting, or the name of a text file preceded by an up-arrow character "
$$"
. Such
text files should contain further comma-separated lists which will be read and interpreted in the same
manner (any blank lines or lines beginning with "
#"
are ignored). Within a text file, newlines can
be used as delimiters as well as commas. Settings are applied in the order in which they
occur within the list, with later settings over-riding any earlier settings given for the same
keyword.
Each individual setting should be of the form:
$<$keyword$>$=$<$value$>$
where $<$keyword$>$
has the form "
algorithm.param"
; that is, the name of the algorithm, followed by a dot,
followed by the name of the parameter to be set. If the algorithm name is omitted, the
current algorithm given by parameter METHOD is assumed. The parameters available
for each algorithm are listed in the "
Configuration Parameters"
sections below.
Default values will be used for any unspecified parameters. Assigning the value "
$<$def$>$"
(case insensitive) to a keyword has the effect of reseting it to its default value. Unrecognised options
are ignored (that is, no error is reported). [current value]
"
OUTCAT"
). No catalogue will be produced if a null (!) value is
supplied. The created file will be a FITS file containing a binary table. The columns in this
catalogue will be the same as those created by the "
OUTCAT"
parameter, but the table
will in also hold the contents of the FITS extension of the input NDF, and CADC-style
provenance headers. Note, an error will be reported if the current co-ordinate system of the
input NDF does not include a pair of celestial longitude and latitude axes. The default for
parameter SHAPE is changed to "
Polygon"
if a JSA-style catalogue is being created. [!] "
Algorithms:"
section below. Can be one
of:
GaussClumps
ClumpFind
Reinhold
FellWalker
Each algorithm has a collection of extra tuning values which are set via the CONFIG parameter. [current value]
"
Use of CUPID Extension"
below for further details about the information stored in the CUPID extension. Other applications
within the CUPID package can be used to display this information in various ways. The information
written to the DATA array of this NDF depends on the value of the METHOD parameter. If
METHOD is GaussClumps, the output NDF receives the sum of all the fitted Gaussian clump
models including a global background level chosen to make the mean output value equal to
the mean input value. If METHOD is ClumpFind, FellWalker or Reinhold, each pixel in
the output is the integer index of the clump to which the pixel has been assigned. Bad
values are stored for pixels which are not part of any clump. The output NDF will inherit
the AXIS and WCS components (plus any extensions) from the input NDF. "
JSACAT"
). No catalogue will be
produced if a null (!) value is supplied. The following columns are included in the output
catalogue:
Peak1: The position of the clump peak value on axis 1.
Peak2: The position of the clump peak value on axis 2.
Peak3: The position of the clump peak value on axis 3.
Cen1: The position of the clump centroid on axis 1.
Cen2: The position of the clump centroid on axis 2.
Cen3: The position of the clump centroid on axis 3.
Size1: The size of the clump along pixel axis 1.
Size2: The size of the clump along pixel axis 2.
Size3: The size of the clump along pixel axis 3.
Sum: The total data sum in the clump.
Peak: The peak value in the clump.
Volume: The total number of pixels falling within the clump.
There is also an optional column called "
Shape"
containing an STC-S description of the spatial
coverage of each clump. See parameter SHAPE.
The coordinate system used to describe the peak and centroid positions is determined by the value
supplied for parameter WCSPAR. If WCSPAR is FALSE, then positions are specified in the pixel
coordinate system of the input NDF. In addition, the clump sizes are specified in units of pixels, and
the clump volume is specified in units of cubic pixels (square pixels for 2D data). If WCSPAR is TRUE,
then positions are specified in the current coordinate system of the input NDF. In addition, the
clump sizes and volumes are specified in WCS units. Note, the sizes are still measured
parallel to the pixel axes, but are recorded in WCS units rather than pixel units. Celestial
coordinate positions are units of degrees, sizes are in units are arc-seconds, and areas in square
arc-seconds. Spectral coordinates are in the units displayed by the KAPPA command "
ndftrace"
.
If the data has less than 3 pixel axes, then the columns describing the missing axes will not be present in the catalogue.
The catalogue inherits any WCS information from the input NDF.
The "
size"
of the clump on an axis is the RMS deviation of each pixel centre from the clump centroid,
where each pixel is weighted by the corresponding pixel data value. For a Gaussian profile, this "
size"
value is equal to the standard deviation of the Gaussian. Optionally, the weights can be be based
on the pixel data value after removal of the background - see parameter BACKOFF). If parameter
DECONV is set TRUE, the values stored for "
Size..."
and "
Peak"
are corrected to take account of the
smoothing introduced by the instrumental beam. These corrections reduced the "
size..."
values and
increase the peak value. Beam sizes are specified by configuration parameters FWHMBeam and
VeloRes.
For the GaussClump algorithm, the Sum and Volume values refer to the part of the Gaussian within the level defined by the GaussClump.ModelLim configuration parameter.
The values used for configuration parameters and ADAM parameters are written to the history information of the output catalogue.
The KAPPA command "
listshow"
can be used to draw markers at the central positions of the clumps
described in a catalogue. For instance, the command "
listshow fred plot=mark"
will draw markers
identifying the positions of the clumps described in file fred.FIT, overlaying the markers on top of the
currently displayed image. Specifying "
plot=STCS"
instead of "
plot=mark"
will cause the spatial
outline of the clump to be drawn if it is present in the catalogue (see parameter SHAPE). [!]
’
s Variance component. If the NDF has no Variance component, the suggested
default is based on the differences between neighbouring pixel values. Any pixel-to-pixel
correlation in the noise can result in this estimate being too low. The value supplied for this
parameter will be ignored if the RMS noise level is also given in the configuration file
specified by parameter CONFIG. "
None"
, the spatial shape of
each clump is not recorded in the output catalogue. Otherwise, the catalogue will have an
extra column named "
Shape"
holding an STC-S description of the spatial coverage of each
clump. "
STC-S"
is a textual format developed by the IVOA for describing regions within a
WCS - see http://www.ivoa.net/Documents/latest/STC-S.html for details. These STC-S
desriptions can be displayed by the KAPPA:LISTSHOW command, or using GAIA. Since STC-S
cannot describe regions within a pixel array, it is necessary to set parameter WCSPAR
to TRUE if using this option. An error will be reported if WCSPAR is FALSE. An error
will also be reported if the WCS in the input data does not contain a pair of scelestial sky
axes.
Polygon: Each polygon will have, at most, 15 vertices. If the data is 2-dimensional, the polygon is a fit
to the clump’
s outer boundary (the region containing all godo data values). If the data is
3-dimensional, the spatial footprint of each clump is determined by rejecting the least significant 10%
of spatial pixels, where "
significance"
is measured by the number of spectral channels that contribute
to the spatial pixel. The polygon is then a fit to the outer boundary of the remaining spatial
pixels.
Ellipse: All data values in the clump are projected onto the spatial plane and "
size"
of the collapsed
clump at four different position angles - all separated by 45 degrees - is found (see the OUTCAT
parameter for a description of clump "
size"
). The ellipse that generates the closest sizes at the four
position angles is then found and used as the clump shape.
Ellipse2: The above method for determining ellipses works well for clumps that are in fact elliptical,
but can generate extremely long thin ellipses for clumps are far from being ellitical. The "
Ellipse2"
option uses a different method for determining the best ellipse based on finding many marginal
profiles at one degree intervals of azimuth, and using the longest marginal profile as the major axis.
The ellipse is centred at the clump centroid.
Ellipse3: The same as "
Ellipse2"
except that the ellipse is centred at the clump peak, rather than the
clump centroid, and the pixel data values are used as weights when forming the mean radial distance
at each azimuth angle.
In general, ellipses will outline the brighter, inner regions of each clump, and polygons will include
the fainter outer regions. The dynamic default is "
Polygon"
if a JSA-style catalogue (see parameters
JSACAT) is being created, and "
None"
otherwise. Note, if a JSA-style catalogue is being created an
error will be reported if "
Ellipse"
, "
Ellipse2"
, "
Ellipse 3"
or "
None"
is selected. []
"
CUPID"
in the output NDF and will add the following components to
it:
CLUMPS: This a an array of CLUMP structures, one for each clump identified by the selected
algorithm. Each such structure contains the same clump parameters that are written to the catalogue
via parameter OUTCAT. It also contains a component called MODEL which is an NDF containing a
section of the main input NDF which is just large enough to encompass the clump. Any pixels within
this section which are not contained within the clump are set bad. So for instance, if the input array "
fred.sdf"
is 2-dimensional, and an image of it has been displayed using KAPPA:DISPLAY, then the
outline of clump number 9 (say) in the output image "
fred2.sdf"
can be overlayed on the image by
doing:
contour noclear "
fred2.more.cupid.clumps(9).model"
mode=good
labpos=$\setminus $!
CONFIG: Lists the algorithm configuration parameters used to identify the clumps (see parameter CONFIG).
QUALITY_NAMES: Defines the textual names used to identify background and clump pixels within the Quality mask.
GaussClumps: Based on the algorithm described by Stutski & Gusten (1990, ApJ 356, 513). This algorithm proceeds by fitting a Gaussian profile to the brightest peak in the data. It then subtracts the fit from the data and iterates, fitting a new ellipse to the brightest peak in the residuals. This continues until the integrated data sum in the fitted Gaussians reaches the integrated data sum in the input array, or a series of consecutive fits are made which have peak values below a given multiple of the noise level. Each fitted ellipse is taken to be a single clump and is added to the output catalogue. In this algorithm, clumps may overlap. Any input variance component is used to scale the weight associated with each pixel value when performing the Gaussian fit. The most significant configuration parameters for this algorithm are: GaussClumps.FwhmBeam and GaussClumps.VeloRes which determine the minimum clump size.
ClumpFind: Described by Williams et al (1994, ApJ 428, 693). This algorithm works by first contouring the data at a multiple of the noise, then searches for peaks of emission which locate the clumps, and then follows them down to lower intensities. No a priori clump profile is assumed. In this algorithm, clumps never overlap. Clumps which touch an edge of the data array are not included in the final list of clumps.
Reinhold: Based on an algorithm developed by Kim Reinhold at JAC. See SUN/255 for more information on this algorithm. The edges of the clumps are first found by searching for peaks within a set of 1D profiles running through the data, and then following the wings of each peak down to the noise level or to a local minimum. A mask is thus produced in which the edges of the clumps are marked. These edges however tend to be quite noisy, and so need to be cleaned up before further use. This is done using a pair of cellular automata which first dilate the edge regions and then erode them. The volume between the edges are then filled with an index value associated with the peak position. Another cellular automata is used to removed noise from the filled clumps.
FellWalker: Based on an algorithm which walks up hill along the line of greatest gradient until a significant peak is reached. It then assigns all pixels visited along the route to the clump associated with the peak. Such a walk is performed for every pixel in the data array which is above a specified background level. See SUN/255 for more information on this algorithm.
"
GCANGLE"
- the spatial orientation angle (in degrees, positive from
$+$ve GRID1
axis to $+$ve
GRID2 axis). If set greater than 1, then additional columns will be included holding the initial
estimates for the peak and background values, the number of fitting iterations used and
the final ch-squared value for the fit. "
MaxClumps"
clumps have been identified, or when one of the other termination
criteria is met. [unlimited] "
MaxSkip"
consecutive clumps
cannot be fitted, the iterative fitting process is terminated. [10] "
Npad"
consecutive clumps have been fitted all of which have peak values less than the
threshold value specified by the "
Thresh"
parameter, or when one of the other termination
criteria is met. [10] "
S0"
which encourages the fitted gaussian value to be below the
corresponding value in the observed data at every point (see the Stutski & Gusten paper).
[1.0] "
Sa"
which encourages
the peak amplitude of each fitted gaussian to be close to the corresponding maximum
value in the observed data (see the Stutski & Gusten paper). [1.0] "
Sc"
which encourages the peak position of each fitted gaussian to be close to the
corresponding peak position in the observed data (see the Stutski & Gusten paper). [1.0] "
Level1"
, in which case the contour levels are linearly
spaced, starting at a lowest level given by "
Tlow"
and spaced by "
DeltaT"
. Note, small
values of DeltaT can result in noise spikes being interpreted as real peaks, whilst large
values can result in some real peaks being missed and merged in with neighbouring peaks.
The default value of two times the RMS noise level is usually considered to be optimal,
although this obviously depends on the RMS noise level being correct. The value can be
supplied either as an absolute data value, or as a multiple of the RMS noise using the syntax "
[x]$\ast $RMS"
, where "
[x]"
is a
numerical value (e.g. "
3.2$\ast $RMS"
). [2$\ast $RMS] ’
th data value at which to contour the data array (where
$<$n$>$
is an integer). Values should be given for "
Level1"
, "
Level2"
, "
Level3"
, etc. Any number
of contours can be supplied, but there must be no gaps in the progression of values for
$<$n$>$.
The values will be sorted into descending order before being used. If "
Level1"
is not supplied (the
default), then contour levels are instead determined automatically using parameters "
Tlow"
and "
DeltaT"
. Note clumps found at higher contour levels are traced down to
the lowest supplied contour level, but any new clumps which are initially found at the
lowest contour level are ignored. That is, clumps must have peaks which exceed the second
lowest contour level to be included in the returned catalogue. The values can be supplied
either as absolute data values, or as mutliples of the RMS noise using the syntax "
[x]$\ast $RMS"
, where "
[x]"
is a
numerical value (e.g. "
3.2$\ast $RMS"
).[] "
PERSPECTRUM"
is set TRUE). If a direct comparison with other
implementations of the ClumpFind algorithm is required, a value of 5 should be used (for 3D data) or
20 (for 2D data). [] "
Naxis"
defines what is meant by an "
adjacent"
pixel in this sense. The supplied value must be at least 1 and must not exceed
the number of pixel axes in the data. The default value equals the number of pixel axes
in the data. If the data is 3-dimensional, any given pixel can be considered to be at the
centre of a cube of neighbouring pixels. If "
Naxis"
is 1 only those pixels which are at the
centres of the cube faces are considered to be adjacent to the central pixel. If "
Naxis"
is 2,
pixels which are at the centre of any edge of the cube are also considered to be adjacent
to the central pixel. If "
Naxis"
is 3, pixels which are at the corners of the cube are also
considered to be adjacent to the central pixel. If the data is 2-dimensional, any given pixel can be
considered to be at the centre of a square of neighbouring pixels. If "
Naxis"
is 1 only those
pixels which are at the centres of the square edges are considered to be adjacent to the
central pixel. If "
Naxis"
is 2, pixels which are at square corners are also considered to be
adjacent to the central pixel. For one dimensional data, a value of 1 is always used for "
Naxis"
, and each pixel simply has 2 adjacent pixels, one on either side. Note, the supplied
"
naxis"
value is ignored if the ADAM parameter "
PERSPECTRUM"
is set TRUE. [] "
Level1"
. See also "
DeltaT"
. The value can be
supplied either as an absolute data value, or as a mutliple of the RMS noise using the syntax "
[x]$\ast $RMS"
, where "
[x]"
is a
numerical value (e.g. "
3.2$\ast $RMS"
). [2$\ast $RMS] "
noise"
value. The value can be
supplied either as an absolute data value, or as a mutliple of the RMS noise using the syntax "
[x]$\ast $RMS"
, where "
[x]"
is a
numerical value (e.g. "
3.2$\ast $RMS"
). [2$\ast $RMS] "
[x]$\ast $RMS"
, where "
[x]"
is a
numerical value (e.g. "
3.2$\ast $RMS"
).
[Noise$+$2$\ast $RMS]
"
[x]$\ast $RMS"
, where "
[x]"
is a
numerical value (e.g. "
3.2$\ast $RMS"
). [1.0$\ast $RMS]
"
PERSPECTRUM"
is set TRUE.
[1] "
NOISE
$+$
2$\ast $RMS"
, will not be included in the clump. The value of this parameter is the data increment between pixels, and
can be supplied either as an absolute data value, or as a mutliple of the RMS noise using the syntax "
[x]$\ast $RMS"
, where "
[x]"
is a
numerical value (e.g. "
3.2$\ast $RMS"
). [1.0$\ast $RMS]
"
[x]$\ast $RMS"
, where "
[x]"
is a
numerical value (e.g. "
3.2$\ast $RMS"
). [2.0$\ast $RMS]
"
[x]$\ast $RMS"
, where "
[x]"
is a
numerical value (e.g. "
3.2$\ast $RMS"
). [Noise] "
[x]$\ast $RMS"
, where "
[x]"
is a
numerical value (e.g. "
3.2$\ast $RMS"
). [2$\ast $RMS]