Handling data quality

Processing math: 100%

←Prev
Introduction to ADAM Programming
Next→
TOC ↑

12 Handling data quality

A data array may contain elements which are not of good quality. In the present context, this does not mean that perhaps data are noisier than the observer had hoped, but rather that there are data elements whose values are fundamentally flawed.

Such bad values can arise in a variety of ways. For example, a bad pixel in a data array may be due to a dead element in a CCD chip during an observing run. Bad values may also be the result of data processing. The example considered below deals with bad values due to attempting to take the square root of a negative number. Two methods of dealing with data quality in NDFs are available as follow:

Bad or ‘magic’ values: – bad data values are replaced with special bad values. Each data type has an associated bad value. For example the bad value for real data on a VAX is defined as FFFFFFFF in hexadecimal, which is approximately $-$ 1.7014E38 (see SGP/38). However, this and other bad values are system dependent and programs must refer to them using symbolic constants. These symbolic constants are defined in the include file with logical name PRM_PAR, (see also Section 14) and are of the form VAL__BADx where x is one of R, D, I, W, UW, B or UB corresponding to the HDS data types _REAL, _DOUBLE, _INTEGER, _WORD, _UWORD, _BYTE, and _UBYTE respectively.
Quality arrays: – data quality can also be indicated by using a quality array associated with a data array. Non-zero quality values generally indicate that the associated data element is bad. However a quality array is not normally used merely to differentiate between good and bad data as it requires an extra array – unlike the bad value method. The advantage of a quality array is that different indicators of quality may be set. For example, IUE data have associated flags which are used to indicate one of a range of conditions which may apply to a data element; a pixel may be subject to microphonic noise or be saturated or coincide with a reseau mark etc. An application may wish to differentiate between these conditions. This topic is not considered further, but the reader is referred to SUN/33, Section 10, for a description of implementing such a scheme.

An NDF may use either of the above methods – or both – or indeed have no indication of data quality at all. When a data array is mapped, the bad value for the data type is automatically inserted into the mapped array in place of any bad data elements.

ADAM_EXAMPLES:SQROOT.FOR takes the square root of each element in the input data array. Such an application must consider what to do in the event that any of the input data are negative. The correct behaviour is to check for this condition and insert the bad value for the data type as follows:

        INCLUDE ’PRM_PAR’                      ! Defines VAL__BADR etc
        REAL IN(NELM), OUT(NELM)
  *   ........
        DO I=1,NELM
  *      Test if input value is negative.
           IF(IN(I).LE.0)THEN
              OUT(I)=VAL__BADR
           ELSE
              OUT(I)=SQRT(IN(I))
           ENDIF
        ENDDO

However if an application is going to consider bad pixels it should also recognise the possibility of bad input pixel values. Such values should be propagated as bad (unless explicitly repaired in some fashion). So the test shown above should be amended as follows:

  *    Test if input is negative or is itself a bad value.
           IF ( IN(I).LE.0 .OR. IN(I).EQ.VAL__BADR ) THEN
              OUT(I)=VAL__BADR

The above example illustrates the only two operations which should be conducted with bad values, i.e. assignment and comparison. It is inappropriate to perform any arithmetic function such as taking the square root of a bad value.

The bad-pixel flag.

Ideally all applications which process data should consider the possibility of bad data values. A simple application which divides each element in a data array by two may not itself give rise to new bad pixels, but it should trap the case where the input value is bad, as the output ought to contain the appropriate bad value, not the bad value divided by two!

However many data arrays contain no bad values and it is obviously undesirable that all applications be forced to check every data element as shown in the previous fragment of code. In order to address this problem, each array component (such as the main data or variance array) of an NDF has an associated logical bad-pixel flag. This is set to .FALSE. if there are definitely no bad pixels present, whereas .TRUE. indicates that bad pixels may or may not be present. (The uncertainty in the latter case arises because of the difficulty of keeping track of whether bad pixels have been set; certain operations may introduce bad pixels but the NDF system cannot be sure whether this has in fact happened without checking each data value explicitly – too time-consuming a procedure to perform by default.)

Two common situations where it is useful to know whether input data contain bad values are as follow:

(1): An application may choose not to handle data which contain bad pixels. Such an application should check whether an input data array contains such values in order that the user may be informed of the difficulty (and probably the application aborted). An example is shown in Section 13.
(2): It is more efficient for a program to deal separately with the case where no bad pixels are present.

In both the cases cited above, an application should find the value of the bad-pixel flag for any data arrays of interest. The call below will cause the value of the logical variable BAD to be set according to whether there are bad pixels in the main data array of the NDF.

CALL NDF_BAD (NDF, ’Data’, CHECK, BAD, STATUS)

The input logical argument CHECK requires some explanation. If CHECK is set to .FALSE. the value of the bad-pixel flag returned is as described above, i.e.
BAD=.FALSE. $\Rightarrow$ definitely no bad pixels present, whereas BAD=.TRUE. $\Rightarrow$ bad pixels may be present.
However, if CHECK is set to .TRUE. this forces an explicit (and time consuming) check for the presence of bad data values and the bad-pixel flag becomes set as follows:
BAD=.FALSE. $\Rightarrow$ no bad pixels present, and BAD=.TRUE. $\Rightarrow$ bad pixels are present.

An application which cannot deal with bad values should use the explicit check (i.e. CHECK=.TRUE.) so that it never gives up unnecessarily. However, an application which is using the check for efficiency purposes might choose merely to look at the value of the bad-pixel flag (i.e. CHECK=.FALSE.) as the time taken to do the explicit check might negate any efficiency advantage which is gained.

An application which is aware that it has created an output data array which contains bad values should indicate this by setting the bad-pixel flag to .TRUE.; conversely if it can be confident that an output data array contains no bad values, the flag can be set to .FALSE..

The value of the bad-pixel flag for an NDF array component is set by calling NDF_SBAD as shown in this extract from ADAM_EXAMPLES:SQROOT.FOR. The program counts how many bad values it has assigned in the output array and sets the bad-pixel flag accordingly.

  *   Set bad-pixel flag according to whether any bad pixels have been set.
  *   (NBAD is the number of bad pixels which have been set.)
        IF (NBAD.EQ.0) THEN
           CALL NDF_SBAD (.FALSE., NDF2, ’DATA’, STATUS)
        ELSE
           CALL NDF_SBAD (.TRUE., NDF2, ’DATA’, STATUS)
        ENDIF

←Prev
Introduction to ADAM Programming
Next→
TOC ↑