4 The SCUBA-2 Pipeline

←Prev
The SCUBA-2 Data Reduction Cookbook
Next→
TOC ↑

Chapter 4
The SCUBA-2 Pipeline

4.1 Pipeline overview
4.2 The science pipeline
  4.2.1 Pipeline recipes
  4.2.2 REDUCE_SCAN
  4.2.3 REDUCE_SCAN_CHECKRMS
  4.2.4 REDUCE_SCAN_EXTENDED_SOURCES
  4.2.5 REDUCE_SCAN_FAINT_POINT_SOURCES
  4.2.6 REDUCE_SCAN_ISOLATED_SOURCE
  4.2.7 FAINT_POINT_SOURCES_JACKKNIFE
4.3 Running the science pipeline
4.4 Changing the defaults
  4.4.1 Changing ORAC-DR’s behaviour
  4.4.2 Changing the pipeline recipe
  4.4.3 Changing the configuration file
  4.4.4 Parameter-file options
4.5 What to look out for
4.6 Pipeline output
4.7 Getting your data from CADC

4.1 Pipeline overview

SCUBA-2 data-reduction pipelines have been developed based on the existing Orac-dr pipeline (Cavanagh et al., 2008[3]) used for ACSIS. There are three distinct pipelines currently utilised by SCUBA-2. Users will likely only need to run the science pipeline. The other two pipelines are designed to run at the JCMT—the quick-look (QL) and summit pipelines. The latter two are run in real time at the JCMT during data acquisition.

The science pipeline has access to all the data observed for a given project and adopts a best-possible reduction approach. Images are made for each complete observation which are combined to create the final image. Users wishing to reduce their own data should use this pipeline. This pipeline is responsible for producing the reduced data that is accessible to users via CADC.
The QL runs quality assurance checks on the data as they arrive. For science data, it calculates the noise between 2 Hz and 10 Hz, along with the NEP and effective NEP, for each 30-second scan. These values undergo quality-assurance checks to ensure SCUBA-2 is within an acceptable operating range.
The summit pipeline is designed to provide a quick map of the data, it does this by running fewer iterations and chunking the data more. This is a useful guide to observers who wish to check the quality of their data.

The manual for the SCUBA-2 pipeline can be found at SUN/264, while the pipeline software comes as part of the Starlink suite. Data-reduction tutorials are available online¹

4.2 The science pipeline

The science pipeline will perform the following:

Run the iterative map-maker.
Apply the FCF to calibrate to mJy/beam or mJy/arcsec $^{2}$ .
Co-add multiple observations of the same object.
Apply the matched-filter (blank-field configuration file only)
Run a source-finding algorithm.

4.2.1 Pipeline recipes

When a project is initially created and MSBs (Minimum Scheduling Blocks) are constructed using the JCMT Observing Tool, the PI can select a pipeline recipe to assign to the data. When the data are run through the science pipeline this recipe is then called by default. This can be overridden on the command line—see Section 4.4. Described below are the six main Orac-dr science recipes.

Note: the “dimmconfig*” files can be found in:

% ls $STARLINK_DIR/share/smurf

4.2.2 `REDUCE_SCAN`

Configuration file: dimmconfig_jsa_generic.lis

This recipe uses the configuration file dimmconfig_jsa_generic for makemap, unless the sources is identified as a calibrator in which case dimmconfig_bright_compact.lis is used and FCFs are derived from the map. After all observations have been processed the data are co-added and calibrated in mJy/beam using the default FCF. The noise and NEFD properties for the co-add are calculated and written to log files (log.noise and log.nefd respectively). Finally, the Cupid task findclumps is run using the FellWalker algorithm (Berry, 2015[2]) to create a source catalogue.

4.2.3 `REDUCE_SCAN_CHECKRMS`

Configuration file: dimmconfig_jsa_generic.lis

This recipe is the same as REDUCE_SCAN, but includes extra performance estimations determined by SCUBA2_CHECK_RMS (see Picard’s SCUBA2_CHECK_RMS). These extra metrics are written to a log file log.checkrms. Running SCUBA2_CHECK_RMS in the pipeline, rather than as a standalone Picard recipe, allows it to calculate results for co-added maps.

4.2.4 `REDUCE_SCAN_EXTENDED_SOURCES`

Configuration file: dimmconfig_bright_extended.lis

This is the recipe for processing extended sources. Multiple observations are co-added and the output map is calibrated in units of mJy/arcsec $^{2}$ . This recipe also performs a source-finder routine; the results are written as a FITS catalogue (with file extension .FIT) which can be read as a local catalogue into Gaia.

4.2.5 `REDUCE_SCAN_FAINT_POINT_SOURCES`

Configuration file: dimmconfig_blank_field.lis

This is the recipe for processing maps containing faint compact sources. This time the configuration file called by makemap is dimmconfig_blank_field.lis and the map calibrated in mJy/beam. The output map is further processed with a matched filter, then the S/N is taken to enhance point sources. A map is written out at each step. This recipe also performs a source finder routine; the results are written as a FITS catalogue (with file extension .FIT) which can be read as a local catalogue into Gaia.

4.2.6 `REDUCE_SCAN_ISOLATED_SOURCE`

Configuration file: dimmconfig_bright_compact.lis

This is the recipe used for processing calibrator data. It can also be used for any map of a single bright, isolated source at the tracking position.

This reduction constrains the map to zero beyond a radius of 1 arc-min from the source centre. See Section 3.7.3

4.2.7 `FAINT_POINT_SOURCES_JACKKNIFE`

Configuration file: dimmconfig_blank_field.lis

This recipe uses a jack-knife method to remove residual low-spatial frequency noise and create an optimal matched-filtered output map. The map-maker is run twice, first as a standard reduction using dimmconfig_blank_field.lis (and calibrated in mJy/beam), and the second time with a fake source added to the time series. This creates a signal map and an effective PSF map. A jack-knife map is generated from two halves of the dataset and the maps are ‘whitened’ by the removal of the residual 1/f noise. The whitened signal map is processed with the matched filter using the whitened PSF map as the PSF input. The data are calibrated in mJy/beam using a corrected FCF. See Section 7.1.2 for a more-detailed description of this recipe and the files produced.

4.3 Running the science pipeline

Note: Data-reduction tutorials are available online.

Step 1:

Step 1:	Initialise ORAC-DR For 850- $μ$ m data, this is done by: % oracdr_scuba2_850 -cwd For 450- $μ$ m data, this is done by: % oracdr_scuba2_450 -cwd
Step 2:	Set environment variables These ensure the data are read from and written to the right places. Many are set automatically when the pipeline is initialised but others must be set manually. Details of the optional variables are given in SUN/264 but the three main ones are: `STARLINK_DIR` – Location of your Starlink installation. `ORAC_DATA_IN` – The location where the data should be read from. If you are supplying a text file listing the raw data this should be the location of the files listed, unless they are given as full path names. `ORAC_DATA_OUT` – The location where the data products should be written. Also used as the location for a user-specified configuration file. Example: Setting `ORAC_DATA_IN` to be the current directory for C shells (csh, tcsh): % setenv ORAC_DATA_IN . and for Bourne shells (sh, bash, zsh): % export ORAC_DATA_IN=.
Step 3:	Run the pipeline This is done by: % oracdr -files <list_of_files> where the list of files that you wish to reduce can be an individual or multiple observations (one per line in a text file, with full path names).

Initialise ORAC-DR

For 850- $μ$ m data, this is done by:

% oracdr_scuba2_850 -cwd

For 450- $μ$ m data, this is done by:

% oracdr_scuba2_450 -cwd

Step 2:

Set environment variables

These ensure the data are read from and written to the right places. Many are set automatically when the pipeline is initialised but others must be set manually. Details of the optional variables are given in SUN/264 but the three main ones are:

STARLINK_DIR – Location of your Starlink installation.
ORAC_DATA_IN – The location where the data should be read from. If you are supplying a text file listing the raw data this should be the location of the files listed, unless they are given as full path names.
ORAC_DATA_OUT – The location where the data products should be written. Also used as the location for a user-specified configuration file.

Example: Setting ORAC_DATA_IN to be the current directory for C shells (csh, tcsh):

% setenv ORAC_DATA_IN .

and for Bourne shells (sh, bash, zsh):

% export ORAC_DATA_IN=.

Step 3:

Run the pipeline

This is done by:

% oracdr -files <list_of_files>

where the list of files that you wish to reduce can be an individual or multiple observations (one per line in a text file, with full path names).

Tip:

If you run with -verbose on the command line then you will obtain all messages from the Starlink engines (rather than just Orac-dr messages). This is particularly useful for understanding what is occurring during the map-maker stage of reduction. This is particularly recommended for new users.

When executing the Orac-dr command, unless “-nodisplay” is specified, various graphical windows may appear showing the pipeline results. Also with the default “-log” option (including “x”) a new Xwindow will appear which will contain the pipeline output, as shown in Figures 4.1– 4.4.

Figure 4.1: The Xwindows output from the Orac-dr pipeline showing the initial log—here we see the pipeline is checking for the raw files.

Figure 4.2: The Xwindows output from the Orac-dr pipeline—here we see the data being reduced with the recipe REDUCE_SCAN_EXTENDED_SOURCES. The pipeline also reports the name of the observation being reduced and the duration of the observation.

Figure 4.3: The Xwindows output from the Orac-dr pipeline—here we we see the FCF being applied, the graphics being created along with a group file (file containing co-added observations from a single night, if provided).

Figure 4.4: The Xwindows output from the Orac-dr pipeline—here we see the pipeline process has completed.

4.4 Changing the defaults

4.4.1 Changing ORAC-DR’s behaviour

Orac-dr’s behaviour can be changed on the command line. For help simply type

% oracdr -help

To run the pipeline and obtain all messages from the Starlink engines (rather than just Orac-dr messages) you will need to run with verbose (recommended)

% oracdr -files <list_of_files> -verbose

To run the pipeline and have the results sent to the screen (s) and to a file (f—the file produced is usually called .oracdr_NNNN.log where NNNN is the current process ID. It is written to $ORAC_DATA_OUT and is a hidden file) is specified using the -log command.

% oracdr -files <list_of_files> -log sf -verbose

4.4.2 Changing the pipeline recipe

You can override the recipe set in the header by listing any different one on the command line when starting Orac-dr. For example

% oracdr -files <list_of_files> -log sf REDUCE_SCAN_CHECKRMS

You can find out which recipe is set in the data header via the FITS header RECIPE keyword in any of your raw files. For example both of these options will return the same result (ensure KAPPA commands are available before running):

% fitsval s8a20120725_00045_0003 RECIPE
% fitslist s8a20120725_00045_0003 | grep RECIPE

4.4.3 Changing the configuration file

Although each recipe calls one of the standard configuration files you can specify your own. You will need to create a recipe parameter file. This file will set the parameter MAKEMAP_CONFIG to be your new configuration file. The first line must be the name of the recipe used in the reduction.

For example, to run the pipeline with REDUCE_SCAN_CHECKRMS with a configuration file called myconfig.lis, the recipe parameter file (mypars.ini) will look like this.

[REDUCE_SCAN_CHECKRMS]
MAKEMAP_CONFIG = myconfig.lis

Then run the pipeline calling the parameter file via the -recpars option.

% oracdr -files <list_of_files> -log sf -recpars myparams.ini REDUCE_SCAN_CHECKRMS

4.4.4 Parameter-file options

To supply both a new configuration file and a different set of clump-finding parameters we would update the parameter file mypars.ini to look like:

  [REDUCE_SCAN]
  MAKEMAP_CONFIG = mynewconfig.lis
  FINDCLUMPS_CFG = myfellwalkerparams.lis

Other options we can change in the parameter file include—changing the pixel size

[REDUCE_SCAN]
MAKEMAP_PIXSIZE = 2

changing output units to mJy/beam

[REDUCE_SCAN]
CALUNITS = beam

changing output units to mJy/arcsec

[REDUCE_SCAN]
CALUNITS = arcsec

4.5 What to look out for

Once the map-maker has completed you can open your output map using Gaia (see Figure 4.5). The excerpt in Chapter 5 shows the output written to the terminal as you run the map-maker. There are a number of clues in this output that indicate the status of the reduction.

Figure 4.5: Map of CRL 2688 produced with the Smurf task makemap using the iterative algorithm with default parameters.

The number of input files: The first to note is the number of input files; it is worth checking this matches your expected number. Also summarised are the source name, UT date and scan number.
Map dimension: Next the basic dimensions of the data being processed are listed near the start of the first iteration. The example above has 4 arcsec pixels—the default at 850 $μ$ m.
Chunking: The map-maker then determines if the raw data should be split and processed in more than one chunk. In this map the data are reduced in one continuous piece: Continuous chunk 1 / 1. Chunking is where the map-maker processes sub-sections of the time-series data independently and should be avoided if possible—see the text box on Chunking.

Quality statistics

At the beginning of the reduction, the main purpose of QUALITY flagging is to indicate how many bolometers are being used. In the example above you can see that from a total of 5120 bolometers, 1842 were turned off during data acquisition (BADDA). In addition, 136 bolometers exceeded the acceptable noise threshold (NOISE), while tiny fractions of the data were flagged because the telescope was moving too slowly (STAT) or the sample are adjacent to a step that was removed (DCJUMP).

The total number of bad bolometers (BADBOL) is 1984. Accounting for these, and the small numbers of additionally flagged samples, 3128.22 effective bolometers are available after initial cleaning².

After each subsequent iteration a new ‘Quality’ report is produced, indicating how the flags have changed. An important flag that appears in the ‘Quality’ report following the first iteration is COM: the DIMM rejects bolometers (or portions of their time series) if they differ significantly from the common-mode (average) of the remaining bolometers.

You may note that compared with the initial report, the total number of samples with good ‘Quality’ (Total samples available for map) has dropped from 18634826 to 18273302 (about a 2 per cent decrease) as additional samples were flagged in each iteration.

Be aware that some large reductions may take many iterations to reach convergence and you may find significantly fewer bolometers remaining resulting in higher noise than expected.

Convergence

The convergence criteria maptol is updated for each iteration. The convergence can be checked from the line reporting
smf_iteratemap: *** NORMALIZED MAP CHANGE: 0.10559 (mean) 2.81081 (max)

The number to look out for is the mean value of the NORMALIZED MAP CHANGE. This will have to drop below your required maptol for convergence to be achieved.

The default configuration file used in this example executes a maximum of five iterations, but stops sooner if the change in maptol drops below 0.05 (i.e. numiter = $-$ 5). In this example it stops after five iterations.

Tip:

You can interrupt the processing at any stage with a single Ctrl-C. The map-maker will complete the iteration then write out a final science map. Entering Ctrl-C twice will kill the process immediately.

4.6 Pipeline output

The pipeline will produce a group file for each object being processed. If the pipeline is given data from multiple nights, all those data will be included in the group co-add using inverse variance weighting.

The final maps in your output directory will have the suffix _reduced. Maps will be made for individual observations, which will start with an s for “SCUBA-2” (e.g. s20140620_00030_850_reduced.sdf). Group maps, which may contain co-added observations from a single night, are also produced which have the prefix gs for “group SCUBA-2” and the date/scan of the first input file (e.g. gs20140620_30_850_reduced.sdf).

Note: A group file is always created, even if only a single observation is being processed.

Additionally, PNG images are made of the reduced files at a variety of resolutions.

Another useful feature is that the pipeline will generate log files to record various useful quantities. The standard log files from reducing science data are:

log.noise—noise in the map for each observation and the co-add (calculated from the median of the error array), and
log.nefd—NEFD calculated for each observation and for the co-added map(s).
log.removedobs—list of observations removed from each group (e.g. due to failing QA).

4.7 Getting your data from CADC

The JCMT Science Archive is hosted by The Canadian Astronomy Data Centre (CADC). Both raw data and data processed by the science pipeline are made available to PIs and co-Is through the CADC interface (https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/en/jcmt/).

To access proprietary data you will need to have your CADC username registered by the EAO and thereby associated with the project code. Please contact your friend of project or helpdesk@eaobservatory.org to register your account.

An important search option to be aware of is ‘Group Type’, where your options are Simple, Night, Project and Public. Simple (which becomes ‘obs’ on the result page) is an individual observation; night means the group file from the pipeline (these may or may not include more than one observation; the ‘Group Members’ value will tell you); and the project option is generated if an entire project has been run through the pipeline and identical sources across the project are co-added into master group files.

¹https://www.eaobservatory.org/jcmt/science/reductionanalysis-tutorials/

²The fractional number is due to time-slices being removed during cleaning. The number of bolometers is then reconstructed from the number of remaining time-slices.

←Prev
The SCUBA-2 Data Reduction Cookbook
Next→
TOC ↑

Chapter 4The SCUBA-2 Pipeline