3 ERR – Error Reporting System

 3.1 Overview
 3.2 Inherited status checking
 3.3 Setting and defining status values
 3.4 Reporting errors
 3.5 Message tokens in error messages
 3.6 When to report an error
 3.7 The content of error messages
 3.8 Adding contextual information
 3.9 Deferred error reporting
 3.10 Error table limits
 3.11 Format of delivered messages
 3.12 Routines which perform “cleaning-up” operations
 3.13 Intercepting error messages
 3.14 Protecting tokens
 3.15 Reporting Status, Fortran I/O and operating system errors
 3.16 Incorporating foreign routines
 3.17 Converting existing subroutine libraries

3.1 Overview

Although the Message Reporting System could be used for reporting errors, there are a number of considerations which demand that separate facilities are available for this:

These considerations have led to the design and implementation of a set of subroutines which form the Error Reporting System. The subroutines have names of the form:

  ERR_name

where name indicates what the subroutine does. These subroutines work in conjunction with the Message System and allow error messages to incorporate message tokens.

3.2 Inherited status checking

The recommended method of indicating when errors have occurred within Starlink software is to use an integer status value in each subroutine argument list. This inherited status argument, say STATUS, should always be the last argument and every subroutine should check its value on entry. The principle is as follows:

Here is an example of the use of inherited status within a simple subroutine:

        SUBROUTINE ROUTN( VALUE, STATUS )
  
  *  Define the SAI__OK global constant.
        INCLUDE ’SAE_PAR’
        INTEGER STATUS
  
        ...
  
  *  Check the inherited global status.
        IF ( STATUS .NE. SAI__OK ) RETURN
  
        <application code>
  
        END

If an error occurs within the “application code” of such a subroutine, then STATUS is set to a value which is not SAI__OK, an error is reported (see below) and the subroutine aborts.

Note that it is often useful to use a status argument and inherited status checking in subroutines which “cannot fail”. This prevents them executing, possibly producing a run-time error, if their arguments contain rubbish after a previous error. Every piece of software that calls such a routine is then saved from making an extra status check. Furthermore, if the routine is later upgraded it may acquire the potential to fail, and so a status argument will subsequently be required. If a status argument is included initially, existing code which calls the routine will not need to be changed (see further discussion of this in §3.17).

3.3 Setting and defining status values

The use of the global constants SAI__OK and SAI__ERROR for setting status values is recommended in general applications. These global constants may be defined in each subroutine by including the file SAE_PAR at the beginning of the subroutine, prior to the declaration of any subroutine arguments or local variables. When writing subroutine libraries, however, it is useful to have a larger number of globally unique error codes available and to define symbolic constants for these in a separate include file. The naming convention:

  fac__ecode

should be used for the names of error codes defined in this way; where fac is the three-character facility prefix and ecode is up to five alphanumeric characters of error code name. Note the double underscore used in this naming convention. The include file should be referred to by the name fac_ERR, e.g.

        INCLUDE ’SGS_ERR’

where the facility name is SGS, the Starlink Simple Graphics System, in this case. These symbolic constants should be defined at the beginning of every subroutine which requires them, prior to the declaration of any subroutine arguments or local variables.

The purpose of error codes is to enable the status argument to indicate that an error has occurred by having a value which is not equal to SAI__OK. By using a set of predefined error codes the calling module is able to test the returned status to distinguish between error conditions which may require different action. It is not generally necessary to define a very large number of error codes which would allow a unique value to be used every time an error report is made. It is sufficient to be able to distinguish the important classes of error which may occur. Examples of existing software can be consulted as a guide in this matter.

The Starlink utility MESSGEN (see SUN/185) should be used on UNIX to generate a set of globally unique error codes for a package. It may be used to create the Fortran include file and/or a C header file defining symbolic names for the error codes, and/or the “facility error message file”, which can be used to associate a simple message with each error code (see §3.15). There is an alternative but compatible method of calculating the set of error codes for a package described in Appendix G.

Software from outside a package which defines a set of error codes may use that package’s codes to test for specific error conditions arising within that package. However, with the exception of the SAI__ codes, it should not assign these values to the status argument. To do so could cause confusion about which package detected the error.

3.4 Reporting errors

The subroutine used to report errors is ERR_REP. It has a calling sequence of the form

        CALL ERR_REP( PARAM, TEXT, STATUS )

Here, the argument PARAM is the error message name, TEXT is the error message text and STATUS is the inherited status. These arguments are broadly similar to those used in the Message System subroutine MSG_OUT.

The error message name PARAM should be a globally unique identifier for the error report. It is recommended that it has the form:

  routn_message

in the general case of subroutines within an application, or:

  fac_routn_message

in the case of routines within a subroutine library. In the former case, routn is the name of the application routine from which ERR_REP is being called and message is a sequence of characters uniquely identifying the error report within that subroutine. In the latter case, fac_routn is the full name of the subroutine from which ERR_REP is being called (see the Starlink Application Programming Standard , SGP/16, for a discussion of the recommended subroutine naming convention), and message is a sequence of characters unique within that subroutine. These naming conventions are designed to ensure that each individual error report made within a complete software system has a unique error name associated with it.

Here is a simple example of error reporting where part of the application code of the previous example detects an invalid value of some kind, sets STATUS, reports the error and then aborts:

        IF ( <value invalid> ) THEN
           STATUS = SAI__ERROR
           CALL ERR_REP( ’ROUTN_BADV’, ’Value is invalid.’, STATUS )
           GO TO 999
        END IF
  
        ...
  
   999  CONTINUE
        END

In the event of an invalid value, the Error System would produce a message like:

        !! Value is invalid.

Note that when the message is output to the user, the Error System precedes the given text with exclamation marks. For more information on this, see §3.11.

The sequence of three operations:

(1)
Set STATUS to an error value.
(2)
Report an error.
(3)
Abort.

is the standard response to an error condition and should be adopted by all software which uses the Error System.

Note that the behaviour of the STATUS argument in ERR_REP differs somewhat from that in MSG_OUT in that ERR_REP will execute regardless of the input value of STATUS. Although the Starlink convention is for subroutines not to execute if their status argument indicates a previous error, the Error System subroutines obviously cannot behave in this way if their purpose is to report these errors.

On exit from ERR_REP the value of STATUS remains unchanged, with three exceptions:

3.5 Message tokens in error messages

Message tokens can be used in the error text presented to ERR_REP in the same manner as their use in calls to MSG_OUT, MSG_OUTIF and MSG_LOAD. Here is an example where two values, LOWER and UPPER, are in conflict:

  *  Check if LOWER and UPPER are in conflict.
        IF ( LOWER .GT. UPPER ) THEN
  
  *     Construct and report the error message.
           STATUS = SAI__ERROR
           CALL MSG_SETI( ’LO’, LOWER )
           CALL MSG_SETI( ’UP’, UPPER )
           CALL ERR_REP( ’BOUND_ERR’,
       :                ’LOWER(^LO) is greater than UPPER(^UP).’, STATUS )
           GO TO 999
        END IF

If the value of LOWER is 50 and the value of UPPER is 10, then the user might receive a message like:

  !! LOWER(50) is greater than UPPER(10).

After a call to ERR_REP, all message tokens are left undefined.

3.6 When to report an error

In the following example, part of an application makes a series of subroutine calls:

        CALL ROUTN1( A, B, STATUS )
        CALL ROUTN2( C, STATUS )
        CALL ROUTN3( T, Z, STATUS )
  
  *  Check the global status.
        IF ( STATUS .NE. SAI__OK ) GO TO 999
  
        ...
  
  999   CONTINUE
        END

Each of these subroutines uses the inherited status strategy and makes error reports by calling ERR_REP. If an error occurs within any of the subroutines, STATUS will be set to an error value by that routine and inherited status checking by all subsequent routines will cause them not to execute. Thus, it becomes unnecessary to check for an error after each subroutine call, and a single check at the end of a sequence of calls is all that is required to correctly handle any error condition that may arise. Because an error report will already have been made by the subroutine that failed, it is usually sufficient simply to abort if an error arises in a sequence of subroutine calls.

It is important to distinguish the case where a called subroutine sets STATUS and makes its own error report, as above, from the case where STATUS is set explicitly as a result of a directly detected error, as in the previous example. If the error reporting strategy is to function correctly, then responsibility for reporting the error must lie with the routine which modifies the status argument. The golden rule is therefore:

If STATUS is explicitly set to an error value, then an accompanying call to ERR_REP must be made.

Unless there are good documented reasons why this cannot be done, subroutines which return a status value and do not make an accompanying error report should be regarded as containing a bug1.

3.7 The content of error messages

The purpose of an error message is to be informative and it should therefore provide as much relevant information about the context of the error as possible. It must also avoid the danger of being misleading, or of containing too much irrelevant information which might be confusing to a user. Particular care is necessary when reporting errors from within subroutines which might be called by a wide variety of software. Such reports must not make unjustified assumptions about what sort of application might be calling them. For example, in a routine that adds two arrays, the report:

  !! Error adding two arrays.

would be preferable to:

  !! Error adding two images.

if the same routine could be called to add two spectra!

The name of the routine which called ERR_REP to make an error report can often be a vital piece of information when trying to understand what went wrong. However, the error report is intended for the user, not the programmer, and so the name of an obscure internal routine is more likely to confuse than to clarify the nature of the error. A good rule of thumb is to include the names of routines in error reports only if those names also appear in documentation – so that the function they perform can be discovered without delving into the code. An example of this appears in the next section.

3.8 Adding contextual information

Instead of simply aborting when a status value is set by a called subroutine, it is also possible for an application to add further information about the circumstances surrounding the error. In the following example, an application makes several calls to a subroutine which might return an error status value. In each case, it reports a further error message so that it is clear which operation was being performed when the lower-level error occurred:

  *  Smooth the sky values.
        CALL SMOOTH( NX, NY, SKY, STATUS )
        IF ( STATUS .NE. SAI__OK ) THEN
           CALL ERR_REP( ’SKYOFF_SKY’,
       :                ’SKYOFF: Failed to smooth sky values.’, STATUS )
           GO TO 999
        END IF
  
  *  Smooth the object values.
        CALL SMOOTH( NX, NY, OBJECT, STATUS )
        IF ( STATUS .NE. SAI__OK ) THEN
           CALL ERR_REP( ’SKYOFF_OBJ’,
       :                ’SKYOFF: Failed to smooth object values.’,
       :                STATUS )
           GO TO 999
        END IF
  
        ...
  
  999   CONTINUE
        END

Notice how an additional error report is made in each case, but because the original status value contains information about the precise nature of the error which occurred within the subroutine SMOOTH, it is left unchanged.

If the first call to subroutine SMOOTH were to fail, say because it could not find any valid pixels in the image it was smoothing, then the error message the user would receive might be:

  !! Image contains no valid pixels to smooth.
  !  SKYOFF: Error smoothing sky image.

The first part of this message originates from within the subroutine SMOOTH, while the second part qualifies the earlier report, making it clear how the error has arisen. Since SKYOFF is the name of an application known to the user, it has been included in the contextual error message.

This technique can often be very useful in simplifying error diagnosis, but it should not be overdone; the practice of reporting errors at every level in a program hierarchy tends to produce a flood of redundant messages. As an example of good practice for a subroutine library, an error report made when an error is first detected, followed by a further contextual error report from the “top-level” routine which the user actually called, normally suffices to produce very helpful error messages.

3.9 Deferred error reporting

The action of the subroutine ERR_REP is to report an error to the Error System but the Error System has the capacity to defer the output of that message to the user. This allows the final delivery of error messages to be controlled within applications software, and this control is achieved using the subroutines ERR_MARK, ERR_RLSE, ERR_FLUSH and ERR_ANNUL. This section describes the function of these subroutines and how they are used.

Subroutine ERR_MARK has the effect of ensuring that all subsequent error messages are deferred by the Error System and stored in an “error table” instead of being delivered immediately to the user. ERR_MARK also starts a new “error context” which has its own table of error messages and message tokens which are independent of those in the previous error context. A return to the previous context can later be made by calling ERR_RLSE. When ERR_RLSE is called, the new error context created by ERR_MARK ceases to exist and any error messages stored in it are transferred to the previous context. Calls to ERR_MARK and ERR_RLSE can be nested if required but should always occur in matching pairs. In this way, no existing error messages can be lost through the deferral mechanism.

The system starts at base-level context (level 1) – at this level, error messages are output to the user immediately. If a call to ERR_RLSE returns the system to base-level context, any messages still stored in the error table will be automatically delivered to the user.

The purpose of deferred error reporting can be illustrated by the following example. Consider a subroutine, say HELPER, which detects an error during execution. The subroutine HELPER reports the error that has occurred, giving as much contextual information about the error as it can. It also returns an error status value, enabling the software that called it to react to the failure appropriately. However, what may be considered an “error” at the level of subroutine HELPER, e.g. an “end of file” condition, may be considered by the calling module to be a case which can be handled without informing the user, e.g. by simply terminating its input sequence. Thus, although the subroutine HELPER will always report the error condition, it is not always necessary for the associated error message to reach the user. The deferral of error reporting enables application programs to handle such error conditions internally.

Here is a schematic example of what subroutine HELPER might look like:

        SUBROUTINE HELPER( LINE, STATUS )
  
        ...
  
  *  Check if a Fortran I/O error has occurred.
        IF ( IOSTAT .NE. 0 ) THEN
  
  *     Set STATUS and report the error.
           IF ( IOSTAT .LT. 0 ) THEN
  
  *        Report an end-of-file error.
              STATUS = <end-of-file error code>
              CALL ERR_REP( ’HELPER_FIOER’,
       :         ’Fortran I/O error: end of input file reached’, STATUS )
           ELSE
  
  *        Report a Fortran I/O error.
              STATUS = SAI__ERROR
              CALL ERR_REP( ’HELPER_FIOER’,
       :         ’Fortran I/O error encountered during data input’,
       :         STATUS )
           END IF
  
  *     Abort.
           GO TO 999
        END IF
  
        ...
  
   999  CONTINUE
        END

Suppose HELPER is called and reports an error, returning with STATUS set. At this point, the error message may, or may not, have been received by the user – this will depend on the environment in which the routine is running, and on whether the software which called HELPER took any action to defer the error report. HELPER itself does not need to take action (indeed it should not take action) to ensure delivery of the message to the user; its responsibility ends when it aborts, and responsibility for handling the error condition then passes to the software which called it.

Now suppose that the subroutine HELPED calls HELPER and wishes to defer any messages from HELPER so that it can decide how to handle error conditions itself, rather than troubling the user with spurious messages. It can do this by calling the routine ERR_MARK before it calls HELPER.

The operation of error message deferral can be illustrated by a simple example:

        SUBROUTINE HELPED( STATUS )
  
        ...
  
  *  Create a new error context.
        CALL ERR_MARK
  
        <any error messages from HELPER are now deferred>
  
        CALL HELPER( LINE, STATUS )
  
        ...

By calling ERR_MARK before calling HELPER, subroutine HELPED ensures that any error messages reported by HELPER are deferred, i.e. held in the error table. HELPED can then handle the error condition itself in one of two ways:

Here is the previous example, elaborated to demonstrate the use of ERR_ANNUL. It shows how an “end of file” condition from HELPER might be detected, annulled, and stored by HELPED in a logical variable EOF for later use:

  *  Initialise end-of-file flag, EOF.
        EOF = .FALSE.
  
  *  Create a new error context.
        CALL ERR_MARK
  
  *  Read line of data.
        CALL HELPER( LINE, STATUS )
  
  *  Trap end-of-file error status and annul any reported error messages
  *  for the current error context.
        IF ( STATUS .EQ. <end-of-file error status> ) THEN
           CALL ERR_ANNUL( STATUS )
           EOF = .TRUE.
        END IF
  
  *  Release the current error context.
        CALL ERR_RLSE
  
  *  Abort application on error.
        IF ( STATUS .NE. SAI__OK ) GO TO 999
  
        ...
  
  999   CONTINUE
        END

Note that the routine chooses only to handle “end of file” error conditions; any other error condition will not be annulled and will subsequently cause an abort when STATUS is checked after the call to ERR_RLSE.

Here is an example showing how both ERR_FLUSH and ERR_ANNUL may be used during the process of acquiring a value from the user via a call to the subroutine RDPAR:

  *  Create a new error context.
        CALL ERR_MARK
  
  *  Loop to get DSCALE parameter value.
        DO WHILE ( .TRUE. )
           CALL RDPAR( ’DSCALE’, DSCALE, STATUS )
  
  *     Check the returned global status.
           IF ( STATUS .EQ. SAI__OK ) THEN
  
  *        Success, so continue with the application.
              GO TO 10
           ELSE IF ( STATUS .EQ. <abort status> ) THEN
  
  *        User wanted to abort, so abort the application.
              CALL ERR_RLSE
              GO TO 999
           ELSE IF ( STATUS .EQ. <null status> ) THEN
  
  *        User entered "null", so annul the error and supply a default.
              CALL ERR_ANNUL( STATUS )
              DSCALE = 1.0
              GO TO 10
           ELSE
  
  *        An error has occurred, so ensure the user knows about it before
  *        trying again.
              CALL ERR_FLUSH( STATUS )
              CALL CNPAR( ’DSCALE’, STATUS )
           END IF
        END DO
  
   10   CONTINUE
  
  *  Release the current error context.
        CALL ERR_RLSE
  
        ...
  
   999  CONTINUE
        END

Note how ERR_FLUSH is used to ensure that any error messages are output to the user before trying again to get a new value. In effect, it passes responsibility for the error condition to the user. This interactive situation is typical of how ERR_FLUSH should be used; it is not needed very often during normal error reporting, and it is certainly not required as a regular means of ensuring the delivery of error messages following calls to ERR_REP – this should be left to the Error System itself when returned to the base-level context.

Note that if ERR_FLUSH cannot output the error message to the user, then it will return the error status ERR__OPTER. This allows critical applications to attempt to recover in the event of the failure of the Error System.

Finally, as a safety feature, if ERR_FLUSH is called when no errors have been reported, it outputs the message

  !! No error to report (improper use of EMS).

This is to highlight problems where the inherited status has been set by some item of software, but no accompanying error message has been reported.

3.10 Error table limits

The error table can contain up to 32 error messages, normally reported at different levels within the hierarchy of a structured program. If an attempt is made to defer the reporting of more than 32 error messages, then the last reported error message will be replaced by the message:

  !! Error message stack overflow (EMS fault).

There are up to 256 context levels available in the Error System, the initial (base-level) error context level being 1. The current error context level may be inquired using a call to ERR_LEVEL. If an attempt is made to mark a context level beyond 256, the error message:

  !! Error context stack overflow (EMS fault).

is placed on the error stack at context level 256 and any subsequent error reports will be placed at context level 256. A bug report should be made if either of the “EMS fault” error messages are reported from software.

3.11 Format of delivered messages

When messages are delivered to the user, the Error System prefixes the given text with exclamation marks to call attention to the message and to distinguish between error messages and normal informational messages output using MSG. When a sequence of deferred messages is flushed, the first will be prefixed by ‘!! ’ and the remainder by ‘!  ’.

By default, messages are split so that output lines do not exceed 79 characters – the split is made on word boundaries if possible. The maximum output line size can be altered using tuning parameters (see §4). If a message has to be split for delivery by the Error System, text on continuation lines is indented by three spaces, e.g.

  !! The first line of an error message ...
  !     and its continuation onto another line.
  !  A second contextual error message.

3.12 Routines which perform “cleaning-up” operations

If a subroutine performs “cleaning-up” operations which must execute even if the inherited status has been set, then a different sequence of status checking must usually be performed. The deferral of error messages may also be involved.

Normally, the effect required is that a cleaning-up routine called with its status argument set to SAI__OK will behave like any other routine, setting the status value and reporting an error if it fails. However, if the value of status has been set to an error condition because of a previous error, it must still attempt to execute, even if there is a good chance that it will not succeed. In this latter case, an error report is not normally required from the cleaning-up routine.

The following is a typical example:

        CALL ALLOC( NBYTES, PNTR, STATUS )
  
        <application code>
  
        CALL DEALL( NBYTES, PNTR, STATUS )

Here, ALLOC allocates some memory for use by the “application code” and DEALL de-allocates it at the end. The following error conditions may arise:

The solution is to write DEALL so that it saves the value of STATUS on entry and restores it again on exit. To preserve the associated error messages, calls to ERR_MARK and ERR_RLSE are also required. For example:

        SUBROUTINE DEALL( NBYTES, PNTR, STATUS )
  
        ...
  
  *  Save the initial status value and set a new value for this routine.
        ISTAT = STATUS
        STATUS = SAI__OK
  
  *  Create a new error context.
        CALL ERR_MARK
  
        <clean-up code>
  
  *  If the initial status was bad, then ignore all internal errors.
        IF ( ISTAT .NE. SAI__OK ) THEN
           CALL ERR_ANNUL( STATUS )
           STATUS = ISTAT
        END IF
  
  *  Release the current error context.
        CALL ERR_RLSE
  
        END

Note how a new error context is used to constrain ERR_ANNUL to annulling only errors arising from the “clean-up code”, and not the pre-existing error condition which is to be preserved.

Two routines are provided to “wrap up” these clean-up calls: ERR_BEGIN and ERR_END, which begin and end what is effectively a new error reporting environment. ERR_BEGIN will begin a new error context and return the status value set to SAI__OK. A call to ERR_END will annul the current error context if the previous context contains undelivered error messages. It will then release the current error context. ERR_END returns the status of the last reported error message pending delivery to the user after the current error context has been released. If there are no error messages pending output, then the status is returned set to SAI__OK. This behaviour is exactly that represented by the code in the previous example. Here is the previous example re-written using calls to ERR_BEGIN and ERR_END:

        SUBROUTINE DEALL( NBYTES, PNTR, STATUS )
  
        ...
  
  *  Begin a new error reporting environment.
        CALL ERR_BEGIN( STATUS )
  
        <clean-up code>
  
  *  End the current error reporting environment.
        CALL ERR_END( STATUS )
  
        END

Like ERR_MARK and ERR_RLSE, ERR_BEGIN and ERR_END should always occur in pairs and can be nested if required. If ERR_BEGIN is called with STATUS set to an error value, then a check is made to determine if there are any error messages pending output at the current error context; if there are not then the status has been set without making an error report. In these cases ERR_BEGIN will make the error report:

  !! Status set with no error report (improper use of EMS).

using the given status value before marking a new error context.

Any code which attempts to execute when the inherited status is set to an error value should be regarded as “cleaning-up”.

3.13 Intercepting error messages

It may sometimes be convenient within an application to obtain access to any error messages within the current context via a character variable, instead of the error output stream. The Error System provides subroutine ERR_LOAD to do this; it has the calling sequence:

        CALL ERR_LOAD( PARAM, PARLEN, OPSTR, OPLEN, STATUS )

The behaviour of ERR_LOAD is the same as ERR_FLUSH except that, instead of delivering deferred error messages from the current error context to the user, the error messages are returned, one by one, through character variables in a series of calls to ERR_LOAD.

On the first call of this routine, the error table for the current context is copied into a holding area, the current error context is annulled and the first message in the holding area is returned. Thereafter, each time the routine is called, the next message from the holding area is returned. The argument PARAM is the returned message name and PARLEN the length of the message name in PARAM. OPSTR is the returned error message text and OPLEN is the length of the error message in OPSTR.

The status associated with the returned message is returned in STATUS until there are no more messages to return – then STATUS is set to SAI__OK, PARAM and OPSTR are set to blanks and PARLEN and OPLEN to 1. As for ERR_FLUSH, a warning message is generated if there are no messages initially. The status returned with the warning message is EMS__NOMSG.

After STATUS has been returned SAI__OK, the whole process is repeated for subsequent calls.

The symbolic constants ERR__SZPAR and ERR__SZMSG are provided for declaring the lengths of character variables which are to receive message names and error messages in this way. These constants are defined in the include file ERR_PAR (see §6).

3.14 Protecting tokens

As a general rule, message tokens should be assigned, using calls to the MSG_SETx and MSG_FMTx routines, immediately prior to the call in which they are to be used. However, this is not always convenient; e.g. within an iteration or a block IF statement where the same tokens may be used in one of several potential message reports. Under these circumstances, it is important to protect the values of assigned message tokens when subroutines which may fail are called – when a subroutine fails it must be assumed that it will make an accompanying error report using ERR_REP within the existing error reporting context, thereby annulling any currently defined message tokens. The only sure way of protecting against such behaviour is to bracket the subroutine call which may fail with calls to ERR_MARK and ERR_RLSE. The same precautions are needed when any subroutine is called which may in turn call any of MSG_OUT, MSG_OUTIF, MSG_LOAD, ERR_REP or ERR_LOAD (all of which annul tokens).

It is not good practice to assign message tokens which are to be used in another subroutine.

Here is an example of assigning message tokens outside a block IF statement to be used by ERR_REP and MSG_OUT calls within the IF block. The code is a fragment of a routine for re-scaling a single array to a mean of unity. If the call to the subroutine MEAN fails, any assigned message tokens in the current error reporting context may be annulled; hence the need to bracket this call by calls to ERR_MARK and ERR_RLSE.

  *  Get the data arrays.
        CALL GETDAT( X, Y, QUAL, NDATA, STATUS )
  
  *  Check the returned status.
        IF ( STATUS .EQ. SAI__OK ) THEN
  
  *     The data have been obtained successfully, assign the token value
  *     and inform the user of the number of data obtained.
           CALL MSG_SETI( ’NDATA’, NDATA )
  
           IF ( NDATA .LE. 0 ) THEN
  
  *        No data exist, report an error message and abort.
              STATUS = SAI__ERROR
              CALL ERR_REP( ’NDATA_INVAL’,
       :                    ’Cannot use this number of data (^NDATA).’,
       :                    STATUS )
           ELSE
  
  *        Get the mean of the data.
              CALL ERR_MARK
              CALL MEAN( NDATA, Y, QUAL, MEAN, STATUS )
              CALL ERR_RLSE
  
  *        Check the returned status.
              IF ( STATUS .EQ. SAI__OK ) THEN
  
  *           Deliver the number of data and their mean to the user.
                 CALL MSG_SETR( ’MEAN’, MEAN )
  
                 IF ( NDATA .EQ. 1 ) THEN
                    CALL MSG_OUT( ’ ’,
       :               ’^NDATA data value (^MEAN) will be used.’,
       :               STATUS )
                 ELSE
                    CALL MSG_OUT( ’ ’,
       :               ’^NDATA data values with a mean of ^MEAN’ //
       :               ’ will be used.’, STATUS )
                 END IF
              ELSE
  
  *           Failed to calculate a mean value for the data (the quality
  *           flags were probably all bad). Report an error and abort.
                 IF ( NDATA .EQ. 1 ) THEN
                    CALL MSG_SETC( ’VALUE’, ’value’ )
                 ELSE
                    CALL MSG_SETC( ’VALUE’, ’values’ )
                 END IF
  
                 CALL ERR_REP( ’BAD_DATA’,
       :            ’No mean available for ^NDATA ^VALUE,’, //
       :            ’ cannot rescale the data.’, STATUS )
              END IF
           END IF
        END IF

3.15 Reporting Status, Fortran I/O and operating system errors

Some of the lower-level Starlink libraries cannot use ERR (or EMS) to make error reports; furthermore, some items of software may need to perform Fortran I/O operations or calls to operating system routines. Any of these may fail but will not have made error reports through ERR_REP. For this reason it is sometimes useful to convert the given error code into a message which can be displayed as part of a message at a higher level where some context information can be added.

Three subroutines exist to enable a message token to be built from the error code returned under these circumstances. These subroutines are:

  ERR_FACER( TOKEN, STATUS )

where STATUS is a standard Starlink facility status value,

  ERR_FIOER( TOKEN, IOSTAT )

where IOSTAT is a Fortran I/O status code, and

  ERR_SYSER( TOKEN, SYSTAT )

where SYSTAT is a status value returned from an operating system routine.

Each of the above routines will assign the message associated with the given error code to the specified token, appending the message if the token is already defined. The error code argument is never altered by these routines. It is important that the correct routine is called, otherwise the wrong message or, at best, only an error number will be obtained.

ERR_FACER is not likely to be useful for applications programmers because suitable error reports will probably have been made by higher-level facilities called directly by the application – it is really provided for completeness. The other two routines will be more useful.

Here is an example of using ERR_FIOER. It is a section of code that writes a character variable to a formatted sequential file, given the Fortran logical unit of the file:

  *  Write the character variable STR.
        WRITE( UNIT, ’(A)’, IOSTAT = IOSTAT ) STR
  
  *  Check the Fortran I/O status.
        IF ( IOSTAT .NE. 0 ) THEN
  
  *     Fortran write error, so set STATUS.
           STATUS = SAI__ERROR
  
  *     Define the I/O status and logical unit message tokens and attempt
  *     to obtain the file name.
           CALL ERR_FIOER( ’MESSAGE’, IOSTAT )
           CALL MSG_SETI( ’UNIT’, UNIT )
           INQUIRE( UNIT, NAME = FNAME, IOSTAT = IOS )
  
  *     Check the returned I/O status from the INQUIRE statement and act.
           IF ( IOS .EQ. 0 ) THEN
  
  *        Define the file name message token.
              CALL MSG_SETC( ’FNAME’, FNAME )
  
  *        Report the error.
              CALL ERR_REP( ’PUTSTR_WRERR’,
       :                ’Error writing to file ^FNAME on ’ //
       :                ’unit ^UNIT: ^MESSAGE’, STATUS )
           ELSE
  
  *        No file name has been found so just report the error.
              CALL ERR_REP( ’PUTSTR_WRERR’,
       :                ’Error writing to unit ^UNIT: ^MESSAGE’, STATUS )
           END IF
  
           GO TO 999
        END IF
  
        ...
  
   999  CONTINUE
        END

Here, the name of the file being read is also obtained in order to construct a comprehensive error message, which might be something like:

  !! Error writing to file BLOGGS.DAT on unit 17: Disk quota exceeded.

Note that the I/O status values used in Fortran do not have universally defined meanings except for zero (meaning no error), but by using ERR_FIOER it is still possible to make high quality error reports about Fortran I/O errors in a portable manner.

In a similar way, the subroutine ERR_SYSER may be used to assign an operating system message associated with the system status flag SYSTAT to the named message token. Of course, software that calls operating system routines directly cannot be portable, but ERR_SYSER provides a convenient interface for reporting errors that occur in such routines in a form that can be easily changed if necessary. For example:

        IF ( <system error condition> ) THEN
  
  *     Operating system error, so set STATUS.
           STATUS = SAI__ERROR
  
  *     Report the error and abort.
           CALL ERR_SYSER( ’ERRMSG’, SYSTAT )
           CALL ERR_REP( ’ROUTN_SYSER’, ’System error: ^ERRMSG’, STATUS )
           GO TO 999
        END IF
  
        ...
  
   999  CONTINUE
        END

Fortran I/O and operating system error messages, obtained through calls to ERR_FIOER and ERR_SYSER respectively, will differ depending upon which operating system (or even flavour of operating system) an application is run on.

Because of the necessary generality of these messages (and those from ERR_FACER), many will appear rather vague and unhelpful without additional contextual information. This is particularly true of UNIX implementations. It is very important to provide additional contextual information when using these routines in order to avoid obfuscating rather than clarifying the nature of an error. This can be done either as part of the error message which includes the message token set by ERR_FACER, ERR_FIOER or ERR_SYSER, or by making a further error report. The examples in this section provide a good illustration of how this can be done.

3.16 Incorporating foreign routines

Sometimes “foreign” subroutines must be called which do not use the Starlink error status conventions (e.g. because they must adhere to some standard interface definition like GKS). Unless they are unusually robust, such routines must normally be prevented from executing under error conditions, either by performing a status check immediately beforehand, or by enclosing them within an appropriate IF...END IF block. Depending on the form of error indication that such foreign routines use, it may also be necessary to check afterwards whether they have succeeded or not. If such a routine fails, then for compatibility with other Starlink software a status value should be set and an error report made on its behalf.

For example, the following code makes a GKS inquiry and checks the success of that inquiry:

        IF ( STATUS .EQ. SAI__OK ) THEN
  
  *     Inquire the GKS workstation colour facilities available.
           CALL GQCF( WTYPE, ERRIND, NCOLI, COLA, NPCI )
  
  *     Check if a GKS error has occurred.
           IF ( ERRIND .NE. GKS__OK ) THEN
  
  *        An error has occurred, so report it and abort.
              STATUS = SAI__ERROR
              CALL MSG_SETI( ’ERRIND’, ERRIND )
              CALL ERR_REP( ’ROUTN_GQCFERR’,
       :              ’Error no. ^ERRIND occurred in GKS routine GQCF ’ //
                      ’(enquire workstation colour facilities).’,
       :              STATUS )
              GO TO 999
           END IF
        END IF
  
        ...
  
   999  CONTINUE
        END

In some cases, it may be possible to obtain a textual error message from the error flag, by means of a suitable inquiry routine, which could be used as the basis of the error report.

It will be obvious from this example how convenient the inherited error status strategy is, and how much extra work is involved in obtaining the same degree of robustness and quality of error reporting from routines which do not use it. It is worth bearing this in mind if you are involved in importing foreign subroutine libraries for use with Starlink software: the provision of a few simple routines for automating error reporting, or an extra layer of subroutine calls where inherited status checking and error reporting can be performed, can make the final product vastly easier to use. Starlink staff will be pleased to offer advice on this matter if consulted.

3.17 Converting existing subroutine libraries

When converting existing subroutine libraries to use the inherited status conventions and the Error System, it is conceivable that an existing subroutine which does not have a status argument will acquire the potential to fail and report an error, either from within itself or from packages layered beneath it. Ideally, the argument list of the subroutine should be changed to include a status argument. However, it may be inconvenient to modify the argument list of a commonly used subroutine (i.e. because of the amount of existing code which would have to be changed), and so an alternative method is needed to determine if status has been set during the call so that the appropriate action can be taken by the caller. The subroutine ERR_STAT is provided for recovering the last reported status value under these conditions. Here is an example of the use of ERR_STAT, called from a subroutine which follows the error reporting conventions:

  *  Call subroutine NOSTAT.
        CALL NOSTAT( GIVARG, RETARG )
        CALL ERR_STAT( STATUS )

Here, the calls to NOSTAT and ERR_STAT are equivalent to one subroutine call with a status argument. The use of ERR_STAT to return the current error status relies upon the use of ERR_REP to report errors by the conventions described in this document. In particular, foreign packages must be incorporated in the recommended wayas described in §3.16 for ERR_STAT to be reliable.

Finally, it is emphasised that ERR_STAT is only for use where there is no other choice than to use this mechanism to determine the last reported error status.

1For historical reasons there are still some routines in ADAM which set a status value without making an accompanying error report – these are gradually being corrected. If such a routine is used before it has been corrected, then the strategy outlined here is recommended. It is advisable not to complicate new code by attempting to make an error report on behalf of the faulty subroutine. If it is appropriate, please ensure that the relevant support person is made aware of the problem.