### POL2NOISE

Analyse the noise in a POL2 vector catalogue

#### Description:

This script manipulates the noise estimates contained within a POL2 vector catalogue. Currently two options are available, selected by parameter MODE:
• " Display" - this displays the noise values stored in a specified column of the catalogue (e.g. " DQ" ) and compares them to the noise values estimated from the variations of the corresponding value column (e.g. " Q" ). This allows the noise values (" DQ" etc) to be verified.

• " Remodel" - This replaces all the noise values stored in a specified catalogue with values based on smoother models of the background and source noise. This helps to remove outlying noise estimates that are anomalously low or high because of the random nature of the MAPVAR errors created by the pol2map script.

#### Usage:

pol2noise cat column [mode] [perc] [presmooth] [style] [device]

#### Parameters:

The input vector catalogue. This should have been created by POL2MAP.
The name of the catalogue column to be used if parameter MODE is " Display" . Both the named column and the associated error column (" $<$X$>$" and " D$<$X$>$" ) must exist in the catalogue. The name must be one of " Q" , " U" , " I" or " PI" .
##### DEBIASTYPE = LOGICAL (Given)
Gives the type of bias estimator to use if paremeter MODE is " Remodel" , using the nomeclature of Montier at al " Polarization measurements analysis II. Best estimators of polarization fraction and angle" (A&A, 2018):
• " AS" : The asymptotic estimator. See section 2.3 of Montier et al. This estimator produces bad P and PI values if the squared PI value is less than the variance in PI.

• " MAS" : The modified asymptotic estimator. See section 2.5 of Montier et al. This estimator does not produces bad P and PI values, even if the squared PI value is less than the variance in PI.

• " None" : No de-biasing.

The graphics workstation to use. The default is the current graphics device as previously set using KAPPA:GDSET. If GDSET has not been used to establish a current graphics device, the user is prompted for a device. Use KAPPA:GDNAMES to see a list of available device names. []
An NDF that contains an exposure time map for the data in the supplied catalogue. Only used if parameter MODE is " Remodel" . For instance, the " iext" , " qext" or " uext" map that was created by POL2MAP at the same time as the catalogue could be supplied.
Controls the level of information to write to a text log file. Allowed values are as for " ILEVEL" . The log file to create is specified via parameter " LOGFILE. Note, the default value is " NONE" , which caused no log file to be created. Setting this parameter to another value (e.g. " ATASK" ) causes the log file to be produced. [" NONE" ]
Controls the level of information displayed on the screen by the script. It can take any of the following values (note, these values are purposefully different to the SUN/104 values to avoid confusion in their effects):
• " NONE" : No screen output is created

• " CRITICAL" : Only critical messages are displayed such as warnings.

• " PROGRESS" : Extra messages indicating script progress are also displayed.

• " ATASK" : Extra messages are also displayed describing each atask invocation. Lines starting with " $>$$>$$>$" indicate the command name and parameter values, and subsequent lines hold the screen output generated by the command.

• " DEBUG" : Extra messages are also displayed containing unspecified debugging information.

[" PROGRESS" ]

The name of the log file to create if GLEVEL is not NONE (which it is by default). The default log file name is " pol2noise.log" and is created in the current directory. Any file with the same name is over-written. Any old log file will be closed before the new one is opened. []
The operation to be performed on the input catalogue specified by parameter CAT:
• " DISPLAY" : Verify the noise estimates on a single quantity by comparing them to the local variations of the quantity. See parameters COLUMN, DEVICE, PER, PRESMOOTH.

• " REMODEL" : Replace the noise estimates in the catalogue using a smoother model. See parameters OUT, EXPTIME, DEBIASTYPE.

[" Display" ]

Controls the default level of information reported by Starlink atasks invoked within the executing script. The accepted values are the list defined in SUN/104 (" None" , " Quiet" , " Normal" , " Verbose" , etc). [" Normal" ]
The output FITS vector catalogue. Only used if parameter MODE is " Remodel" .
The percentile corresponding to the highest " D$<$X$>$" value to include in the scatter plot. Only used if parameter MODE is " Display" . In the range 20 to 100. A value below 100 causes the edge pixels, which usually have very high variances, to be excluded from the plot. A red contour is displayed over the " D$<$X$>$" map indicating the noise level corresponding to the value of PERC. [90]
Controls initial smoothing of the " D$<$X$>$" map in " Method 2" . If a value is supplied for PRESMOOTH, then the " D$<$X$>$" map read from the catalogue is first smoothed using a Gaussian kernel before being used. The value of PRESMOOTH gives the FWHM of the Gaussian kernel, in pixels. If a null (!) value is supplied for PRESMOOTH (which is the default value), the " D$<$X$>$" values read from the catalogue are used directly as the second noise estimate, without any smoothing. [!]
Should the temporary directory containing the intermediate files created by this script be retained? If not, it will be deleted before the script exits. If retained, a message will be displayed at the end specifying the path to the directory. [FALSE]
A group of attribute settings describing the plotting style to use for the annotated axes, etc.

A comma-separated list of strings should be given in which each string is either an attribute setting, or the name of a text file preceded by an up-arrow character " ^" . Such text files should contain further comma-separated lists which will be read and interpreted in the same manner. Attribute settings are applied in the order in which they occur within the list, with later settings overriding any earlier settings given for the same attribute.

Each individual attribute setting should be of the form:

$<$name$>$=$<$value$>$

where $<$name$>$ is the name of a plotting attribute, and $<$value$>$ is the value to assign to the attribute. Default values will be used for any unspecified attributes. All attributes will be defaulted if a null value (!)—the initial default—is supplied.

See section " Plotting Attributes" in SUN/95 for a description of the available attributes. Any unrecognised attributes are ignored (no error is reported). The appearance of the markers in the scatter plot is controlled by the attributes " Colour(Markers)" , " Width(Markers)" , etc. Likewise the appearance of the best fit line (and the contour lines) is controlled using " Colour(Curves)" , " Width(Curves)" , etc. [current value]

#### Display Mode

The " display" mode compares the error estimates stored in the error columns of the catalogue (e.g. the values in the DI column) with estimates of the errors derived from the value columns (e.g. the I column). This comparison is limited to regions where the true astronomical signal can be assumed to be zero (i.e. the background regions). This mode can only be used with the intensity-like values in the catalogue (e.g. I, Q, U and PI).

Two different methods are used for comparing the noise values. Ideally they should both show that the error columns stored in the catalogue accurately describe the noise in the data columns. However, in reality they will both give reports that depart from this ideal by differing amounts. Results for both methods are displayed on the graphics device specified by parameter DEVICE. The results from " Method 1" are displayed in the lower row of three pictures on the graphics device, and the results from " Method 2" are displayed in the upper row of three pictures. For convenience, the scatter plot produced by method 1 (top right picture) is overlayed in blue on top of the scatter plot produced by method 1 (lower right plot).

Method 1:

Firstly, a mask is created that is later used to identify background regions. This is based on the total intensity, I, regardless of the column being checked, since I is usually much brighter than Q, U or PI. The values from catalogue columns " I" and " DI" are extracted into a pair of 2-dimensional maps. A basic SNR map is then formed (I/DI) and the FellWalker algorithm within the Starlink CUPID package (see SUN/255) is used to identify contiguous clumps of pixels with SNR higher than 5 and to extend each such clump down to an SNR of 2.

Next, the values from the requested catalogue columns, " $<$X$>$" and " D$<$X$>$" , are extracted into a pair of 2-dimensional maps and masked to remove source regions (i.e. the clumps of high SNR identified by FellWalker).

The full range of " D$<$X$>$" values in the remaining background is divided into a set of bins, and each " $<$X$>$" value is then placed into a bin on the basis of its corresponding " D$<$X$>$" value. The "$<$X$>$" values in each bin should in principle have a mean value of zero since they are all background pixels. The standard deviation of the " $<$X$>$" values in each bin is found and plotted against the " D$<$X$>$" value associated with the bin (which varies across the map, being larger nearer the edges). Ideally the resulting best fit line should have a slope of unity and an offset of zero, indicating that the noise estimate " D$<$X$>$" associated with each bin is a good measure of the standard deviation of the " $<$X$>$" values in the bin.

The masked "$<$X$>$" and " D$<$X$>$" maps are displayed using a common scaling on the graphics device specified by parameter DEVICE. A scatter plot showing the standard deviation of the " $<$X$>$" values in each bin on the vertical axis and the RMS " D$<$X$>$" value in each bin on the horizontal axis is also displayed. The slope and offset of the best fitting straight line are displayed on standard output, together with the RMS residual of the fit. The upper data limit included in the scatter plot is displayed as a red contour on the two map.

The " D$<$X$>$" map may optionally be smoothed using a Gaussian kernel before being used - see parameter PRESMOOTH.

The size of each " D$<$X$>$" bin and the data included in the scatter plot can make a significant difference to the final slope and offset. The first bin (lowest D$<$X$>$) is centred on the peak of the D$<$X$>$ histogram. This histogram is usually heavily skewed with a very rapid rise at low D$<$X$>$ values followed by a long tail to higher D$<$X$>$ values. The bin width is 1.5 times the FWHM of the histogram peak, as determined solely from the D$<$X$>$ values below the peak. All bins have equal width, and the highest bin includes the D$<$X$>$ value corresponding to the value of parameter PERC. Any points below the first bin or above the last bin are excluded from the scatter plot. This binning scheme aims to reduce statistical bias at the low D$<$X$>$ end, which tends to cause the lowest D$<$X$>$ points in the scatter plot to be slightly higher than they should be. This bias is caused by there being few points at lower D$<$X$>$ to balance those with higher D$<$X$>$ value.

Method 2:

Firstly, the values from catalogue columns " I" and " DI" are extracted into a pair of 2-dimensional maps. A basic SNR map is then formed (I/DI) and significant spatial structures are identified and blanked out using the KAPPA:FFCLEAN command on the SNR map. The SNR map is used here, instead of simply using " I" , in order to flatten the noise level across the map, which helps FFLCEAN. Each blanked out region in this mask (i.e. each source area) is then enlarged slightly to remove any remaining nearby low-level source pixels.

Next, the values from catalogue columns " $<$X$>$" and " D$<$X$>$" are extracted into a pair of 2-dimensional maps and masked (using the mask described above) to remove source regions.

The first noise estimate measures the spatial variation in pixel value in the neighbourhood of each pixel in the masked " $<$X$>$" map. It is formed by first squaring the masked " $<$X$>$" map, then smoothing the squared map using a Gaussian smoothing kernel, then taking the square root of the smoothed map. Thus each pixel value in the final map represents the RMS of the nearby pixels in masked " $<$X$>$" map. The FWHM of the Gaussian smoothing kernel is chosen in order to maximise the correlation between the first and second estimates of the noise.

The " D$<$X$>$" map, which holds the second noise estimate, may optionally be smoothed using a Gaussian kernel before being used - see parameter PRESMOOTH.

The maps holding the two masked noise estimates are displayed using a common scaling on the graphics device specified by parameter DEVICE. A scatter plot of the values in these two maps is also displayed. The slope and offset of the best fitting straight line, based on the visible points in the scatter plot, are displayed on standard output, together with the RMS residual of the fit. The upper data limits to be included in the scatter plot can be controlled by parameter PERC, and are displayed as red contours on the two maps.

#### Remodel Mode

The " remodel" mode creates an output catalogue holding a copy of the input catalogue, and then calculates new values for all the error columns in the output catalogue. The new I, Q and U error values are first derived from a three component model of the noise in each quantity:

The " background component" is derived from the exposure time map (obtained using parameter EXPTIME). The background component is equal to " A$\ast$(exptime^B)" where A and B are constants determined by doing a linear fit between the log of the noise estimate in the catalogue (DQ, DU or DI) and the log of the exposure time (in practice, B is usually close to -0.5). The fit is limited to background areas in the signal map, but also excludes a thin rim around the edge of the map where the original noise estimates are subject to large inaccuracies. Since the exposure time map is usually very much smoother than the original noise estimates, the background component is also much smoother.

The " source component" represents the extra noise found in and around compact sources and caused by pointing errors, calibration errors, etc. The background component is first subtracted from the catalogue noise estimates and the residual noise values are then modelled using a collection of Gaussians. This modeling is done using the GaussClumps algorithm provided by the findclumps command in the Starlink CUPID package. The noise residuals are first divided into a number of " islands" , each island being a collection of contiguous pixels with noise residual significantly higher than zero (this is done using the FellWalker algorithm in CUPID). The GaussClumps algorithm is then used to model the noise residuals in each island. The resulting model is smoothed lightly using a Gaussian kernel of FWHM 1.2 pixels.

The " residual component" represents any noise not accounted for by the other two models. The noise residuals are first found by subtracting the other two components from the original catalogue noise estimates. Any strong outlier values are removed and the results are smoothed more heavily using a Gaussian kernel of FWHM 4 pixels.

The final model is the sum of the above three components. The new DI, DQ and DU values are found independently using the above method. The errors for the derived quantities (DPI, DP and DANG) are then found from DQ, DU and DI using the usual error popagation formulae. Finally new P and PI values are found using a specified form of de-biasing (see parameter DEBIASTYPE).

The results of the re-modelling are displayed on the graphics device specified by parameter DEVICE. A row of four pictures is created for each Stokes parametyer (I, Q and U). From left to right, these are:

• An image of the original error estimates in the supplied catalogue.

• An image of the re-modelled error estimates in the output catalogue.

• An image of the residuals between original and re-modelled error estimates.

• A scatter plot of re-modelled against original error estimates.

The images of the original and re-modelled error estimates use the same scaling. The image of the residuals is scaled between the 2nd and 98th percentiles.