The iterative map-maker developed for Smurf [1] is a sophisticated algorithm that is able to reduce the large amounts of data collected by SCUBA-2 in a reasonable amount of time. The algorithm measures and removes correlated noise by iteratively solving for sources of noise along with the sky signal. This is much less computationally intensive than a full generalized least squares solution (e.g. [2]), which requires careful measurements of noise auto- and cross-power spectra and inversion of very large matrices. Due to the iterative nature of the algorithm, however, the full data set must be accessed repeatedly. Since the data rate is high (10s of GB per hour of observation), reading the full data set (often many hours) from disk is slow. Re-reading the data on every iteration takes a considerable amount of time, so it is desirable to keep the data in memory once it has been read in the first time. This limits the amount of data that can be reduced at a time. Longer observations with data sizes larger than than the available memory must be reduced in chunks and combined after the fact, if caching and re-reading the data is to be avoided.
The next generation of submillitre instruments, such as SWCam [3], the short-wavelength camera for CCAT [4], will have 10 times as many detector elements as SCUBA-2 and will be sampled at a rate of kHz, compared to SCUBA-2’s Hz. This increase in data volume by a factor of presents a serious problem for data reduction, both in terms of memory usage and processing time. Several approaches have been considered to account for this large increase in data volume:
The distibuted-memory parallelized version of the iterative map-maker has been described elsewhere [6], but in this document we add more detail and discuss some practical considerations.
A note on terminology: a distributed-memory cluster consists of a number of connected nodes. Each node may consist of a number of processors or cores. A single node can support multithreaded parallelization using shared memory between threads. The term process refers to a single executable that is run on a node; a process is able to spawn multiple threads to take advantage of a multicore node. The term system is used to refer to the set of all processes involved in the calculation.
1Message Passing Interface: a standard defining a set of routines to pass messages between processes.