Although the Message Reporting System could be used for reporting errors, there are a number of considerations which demand that separate facilities are available for this:
This can lead to several error reports arising from a single failure.
These considerations have led to the design and implementation of a set of subroutines which form the Error Reporting System. The subroutines have names of the form:
where name
indicates what the subroutine does. These subroutines work in conjunction with the
Message System and allow error messages to incorporate message tokens.
The recommended method of indicating when errors have occurred within Starlink software is to use an integer status value in each subroutine argument list. This inherited status argument, say STATUS, should always be the last argument and every subroutine should check its value on entry. The principle is as follows:
SAI__OK
.
Here is an example of the use of inherited status within a simple subroutine:
If an error occurs within the “application code” of such a subroutine, then STATUS is set to a value
which is not SAI__OK
, an error is reported (see below) and the subroutine aborts.
Note that it is often useful to use a status argument and inherited status checking in subroutines which “cannot fail”. This prevents them executing, possibly producing a run-time error, if their arguments contain rubbish after a previous error. Every piece of software that calls such a routine is then saved from making an extra status check. Furthermore, if the routine is later upgraded it may acquire the potential to fail, and so a status argument will subsequently be required. If a status argument is included initially, existing code which calls the routine will not need to be changed (see further discussion of this in §3.17).
The use of the global constants SAI__OK
and SAI__ERROR
for setting status values is recommended in
general applications. These global constants may be defined in each subroutine by including the file
SAE_PAR at the beginning of the subroutine, prior to the declaration of any subroutine arguments or
local variables. When writing subroutine libraries, however, it is useful to have a larger number of
globally unique error codes available and to define symbolic constants for these in a separate include
file. The naming convention:
should be used for the names of error codes defined in this way; where fac
is the three-character
facility prefix and ecode
is up to five alphanumeric characters of error code name. Note the double
underscore used in this naming convention. The include file should be referred to by the name fac_ERR,
e.g.
where the facility name is SGS
, the Starlink Simple Graphics System, in this case. These symbolic
constants should be defined at the beginning of every subroutine which requires them, prior to the
declaration of any subroutine arguments or local variables.
The purpose of error codes is to enable the status argument to indicate that an error has
occurred by having a value which is not equal to SAI__OK
. By using a set of predefined error
codes the calling module is able to test the returned status to distinguish between error
conditions which may require different action. It is not generally necessary to define a very
large number of error codes which would allow a unique value to be used every time
an error report is made. It is sufficient to be able to distinguish the important classes of
error which may occur. Examples of existing software can be consulted as a guide in this
matter.
The Starlink utility MESSGEN (see SUN/185) should be used on UNIX to generate a set of globally unique error codes for a package. It may be used to create the Fortran include file and/or a C header file defining symbolic names for the error codes, and/or the “facility error message file”, which can be used to associate a simple message with each error code (see §3.15). There is an alternative but compatible method of calculating the set of error codes for a package described in Appendix G.
Software from outside a package which defines a set of error codes may use that package’s codes to test for specific error conditions arising within that package. However, with the exception of the SAI__ codes, it should not assign these values to the status argument. To do so could cause confusion about which package detected the error.
The subroutine used to report errors is ERR_REP. It has a calling sequence of the form
Here, the argument PARAM is the error message name, TEXT is the error message text and STATUS is the inherited status. These arguments are broadly similar to those used in the Message System subroutine MSG_OUT.
The error message name PARAM should be a globally unique identifier for the error report. It is recommended that it has the form:
in the general case of subroutines within an application, or:
in the case of routines within a subroutine library. In the former case, routn
is the name of the
application routine from which ERR_REP is being called and message
is a sequence of
characters uniquely identifying the error report within that subroutine. In the latter case,
fac_routn
is the full name of the subroutine from which ERR_REP is being called (see the
Starlink Application Programming Standard , SGP/16, for a discussion of the recommended
subroutine naming convention), and message
is a sequence of characters unique within that
subroutine. These naming conventions are designed to ensure that each individual error
report made within a complete software system has a unique error name associated with
it.
Here is a simple example of error reporting where part of the application code of the previous example detects an invalid value of some kind, sets STATUS, reports the error and then aborts:
In the event of an invalid value, the Error System would produce a message like:
Note that when the message is output to the user, the Error System precedes the given text with exclamation marks. For more information on this, see §3.11.The sequence of three operations:
is the standard response to an error condition and should be adopted by all software which uses the Error System.
Note that the behaviour of the STATUS argument in ERR_REP differs somewhat from that in MSG_OUT in that ERR_REP will execute regardless of the input value of STATUS. Although the Starlink convention is for subroutines not to execute if their status argument indicates a previous error, the Error System subroutines obviously cannot behave in this way if their purpose is to report these errors.
On exit from ERR_REP the value of STATUS remains unchanged, with three exceptions:
SAI__OK
– in this case an additional error
message to this effect is stacked for output to the user and STATUS is returned set to
ERR__BADOK.
Message tokens can be used in the error text presented to ERR_REP in the same manner as their use in calls to MSG_OUT, MSG_OUTIF and MSG_LOAD. Here is an example where two values, LOWER and UPPER, are in conflict:
If the value of LOWER is 50 and the value of UPPER is 10, then the user might receive a message like:
After a call to ERR_REP, all message tokens are left undefined.
In the following example, part of an application makes a series of subroutine calls:
Each of these subroutines uses the inherited status strategy and makes error reports by calling ERR_REP. If an error occurs within any of the subroutines, STATUS will be set to an error value by that routine and inherited status checking by all subsequent routines will cause them not to execute. Thus, it becomes unnecessary to check for an error after each subroutine call, and a single check at the end of a sequence of calls is all that is required to correctly handle any error condition that may arise. Because an error report will already have been made by the subroutine that failed, it is usually sufficient simply to abort if an error arises in a sequence of subroutine calls.
It is important to distinguish the case where a called subroutine sets STATUS and makes its own error report, as above, from the case where STATUS is set explicitly as a result of a directly detected error, as in the previous example. If the error reporting strategy is to function correctly, then responsibility for reporting the error must lie with the routine which modifies the status argument. The golden rule is therefore:
If STATUS is explicitly set to an error value, then an accompanying call to ERR_REP must be made.
Unless there are good documented reasons why this cannot be done, subroutines which return a status value and do not make an accompanying error report should be regarded as containing a bug1.
The purpose of an error message is to be informative and it should therefore provide as much relevant information about the context of the error as possible. It must also avoid the danger of being misleading, or of containing too much irrelevant information which might be confusing to a user. Particular care is necessary when reporting errors from within subroutines which might be called by a wide variety of software. Such reports must not make unjustified assumptions about what sort of application might be calling them. For example, in a routine that adds two arrays, the report:
would be preferable to:
if the same routine could be called to add two spectra!
The name of the routine which called ERR_REP to make an error report can often be a vital piece of information when trying to understand what went wrong. However, the error report is intended for the user, not the programmer, and so the name of an obscure internal routine is more likely to confuse than to clarify the nature of the error. A good rule of thumb is to include the names of routines in error reports only if those names also appear in documentation – so that the function they perform can be discovered without delving into the code. An example of this appears in the next section.
Instead of simply aborting when a status value is set by a called subroutine, it is also possible for an application to add further information about the circumstances surrounding the error. In the following example, an application makes several calls to a subroutine which might return an error status value. In each case, it reports a further error message so that it is clear which operation was being performed when the lower-level error occurred:
Notice how an additional error report is made in each case, but because the original status value contains information about the precise nature of the error which occurred within the subroutine SMOOTH, it is left unchanged.
If the first call to subroutine SMOOTH were to fail, say because it could not find any valid pixels in the image it was smoothing, then the error message the user would receive might be:
The first part of this message originates from within the subroutine SMOOTH, while the second part qualifies the earlier report, making it clear how the error has arisen. Since SKYOFF is the name of an application known to the user, it has been included in the contextual error message.
This technique can often be very useful in simplifying error diagnosis, but it should not be overdone; the practice of reporting errors at every level in a program hierarchy tends to produce a flood of redundant messages. As an example of good practice for a subroutine library, an error report made when an error is first detected, followed by a further contextual error report from the “top-level” routine which the user actually called, normally suffices to produce very helpful error messages.
The action of the subroutine ERR_REP is to report an error to the Error System but the Error System has the capacity to defer the output of that message to the user. This allows the final delivery of error messages to be controlled within applications software, and this control is achieved using the subroutines ERR_MARK, ERR_RLSE, ERR_FLUSH and ERR_ANNUL. This section describes the function of these subroutines and how they are used.
Subroutine ERR_MARK has the effect of ensuring that all subsequent error messages are deferred by the Error System and stored in an “error table” instead of being delivered immediately to the user. ERR_MARK also starts a new “error context” which has its own table of error messages and message tokens which are independent of those in the previous error context. A return to the previous context can later be made by calling ERR_RLSE. When ERR_RLSE is called, the new error context created by ERR_MARK ceases to exist and any error messages stored in it are transferred to the previous context. Calls to ERR_MARK and ERR_RLSE can be nested if required but should always occur in matching pairs. In this way, no existing error messages can be lost through the deferral mechanism.
The system starts at base-level context (level 1) – at this level, error messages are output to the user immediately. If a call to ERR_RLSE returns the system to base-level context, any messages still stored in the error table will be automatically delivered to the user.
The purpose of deferred error reporting can be illustrated by the following example. Consider a subroutine, say HELPER, which detects an error during execution. The subroutine HELPER reports the error that has occurred, giving as much contextual information about the error as it can. It also returns an error status value, enabling the software that called it to react to the failure appropriately. However, what may be considered an “error” at the level of subroutine HELPER, e.g. an “end of file” condition, may be considered by the calling module to be a case which can be handled without informing the user, e.g. by simply terminating its input sequence. Thus, although the subroutine HELPER will always report the error condition, it is not always necessary for the associated error message to reach the user. The deferral of error reporting enables application programs to handle such error conditions internally.
Here is a schematic example of what subroutine HELPER might look like:
Suppose HELPER is called and reports an error, returning with STATUS set. At this point, the error message may, or may not, have been received by the user – this will depend on the environment in which the routine is running, and on whether the software which called HELPER took any action to defer the error report. HELPER itself does not need to take action (indeed it should not take action) to ensure delivery of the message to the user; its responsibility ends when it aborts, and responsibility for handling the error condition then passes to the software which called it.
Now suppose that the subroutine HELPED calls HELPER and wishes to defer any messages from HELPER so that it can decide how to handle error conditions itself, rather than troubling the user with spurious messages. It can do this by calling the routine ERR_MARK before it calls HELPER.
The operation of error message deferral can be illustrated by a simple example:
By calling ERR_MARK before calling HELPER, subroutine HELPED ensures that any error messages reported by HELPER are deferred, i.e. held in the error table. HELPED can then handle the error condition itself in one of two ways:
SAI__OK
. This effectively
causes the error condition to be ignored. For instance, it might be used if an “end of
file” condition was expected, but was to be ignored and some appropriate action taken
instead. (A call to ERR_REP could also be used after ERR_ANNUL to replace the initial
error condition with another more appropriate one, although this is not often done.)
SAI__OK
. This
notifies the user that a problem has occurred, but allows the application to continue
anyway. For instance, it might be used if a series of files were being read: if one of these
files could not be accessed, then the user could be informed of this by calling ERR_FLUSH
before going on to process the next file.Here is the previous example, elaborated to demonstrate the use of ERR_ANNUL. It shows how an “end of file” condition from HELPER might be detected, annulled, and stored by HELPED in a logical variable EOF for later use:
Note that the routine chooses only to handle “end of file” error conditions; any other error condition will not be annulled and will subsequently cause an abort when STATUS is checked after the call to ERR_RLSE.
Here is an example showing how both ERR_FLUSH and ERR_ANNUL may be used during the process of acquiring a value from the user via a call to the subroutine RDPAR:
Note how ERR_FLUSH is used to ensure that any error messages are output to the user before trying again to get a new value. In effect, it passes responsibility for the error condition to the user. This interactive situation is typical of how ERR_FLUSH should be used; it is not needed very often during normal error reporting, and it is certainly not required as a regular means of ensuring the delivery of error messages following calls to ERR_REP – this should be left to the Error System itself when returned to the base-level context.
Note that if ERR_FLUSH cannot output the error message to the user, then it will return the error status ERR__OPTER. This allows critical applications to attempt to recover in the event of the failure of the Error System.
Finally, as a safety feature, if ERR_FLUSH is called when no errors have been reported, it outputs the message
This is to highlight problems where the inherited status has been set by some item of software, but no accompanying error message has been reported.
The error table can contain up to 32 error messages, normally reported at different levels within the hierarchy of a structured program. If an attempt is made to defer the reporting of more than 32 error messages, then the last reported error message will be replaced by the message:
There are up to 256 context levels available in the Error System, the initial (base-level) error context level being 1. The current error context level may be inquired using a call to ERR_LEVEL. If an attempt is made to mark a context level beyond 256, the error message:
is placed on the error stack at context level 256 and any subsequent error reports will be placed at context level 256. A bug report should be made if either of the “EMS fault” error messages are reported from software.
When messages are delivered to the user, the Error System prefixes the given text with exclamation
marks to call attention to the message and to distinguish between error messages and normal
informational messages output using MSG. When a sequence of deferred messages is flushed, the first
will be prefixed by ‘!!
’ and the remainder by ‘!
’.
By default, messages are split so that output lines do not exceed 79 characters – the split is made on word boundaries if possible. The maximum output line size can be altered using tuning parameters (see §4). If a message has to be split for delivery by the Error System, text on continuation lines is indented by three spaces, e.g.
If a subroutine performs “cleaning-up” operations which must execute even if the inherited status has been set, then a different sequence of status checking must usually be performed. The deferral of error messages may also be involved.
Normally, the effect required is that a cleaning-up routine called with its status argument set to
SAI__OK
will behave like any other routine, setting the status value and reporting an error
if it fails. However, if the value of status has been set to an error condition because of a
previous error, it must still attempt to execute, even if there is a good chance that it will not
succeed. In this latter case, an error report is not normally required from the cleaning-up
routine.
The following is a typical example:
Here, ALLOC allocates some memory for use by the “application code” and DEALL de-allocates it at the end. The following error conditions may arise:
The solution is to write DEALL so that it saves the value of STATUS on entry and restores it again on exit. To preserve the associated error messages, calls to ERR_MARK and ERR_RLSE are also required. For example:
Note how a new error context is used to constrain ERR_ANNUL to annulling only errors arising from the “clean-up code”, and not the pre-existing error condition which is to be preserved.
Two routines are provided to “wrap up” these clean-up calls: ERR_BEGIN and ERR_END, which
begin and end what is effectively a new error reporting environment. ERR_BEGIN will begin a new
error context and return the status value set to SAI__OK
. A call to ERR_END will annul the current
error context if the previous context contains undelivered error messages. It will then release the
current error context. ERR_END returns the status of the last reported error message pending delivery
to the user after the current error context has been released. If there are no error messages pending
output, then the status is returned set to SAI__OK
. This behaviour is exactly that represented by the
code in the previous example. Here is the previous example re-written using calls to ERR_BEGIN and
ERR_END:
Like ERR_MARK and ERR_RLSE, ERR_BEGIN and ERR_END should always occur in pairs and can be nested if required. If ERR_BEGIN is called with STATUS set to an error value, then a check is made to determine if there are any error messages pending output at the current error context; if there are not then the status has been set without making an error report. In these cases ERR_BEGIN will make the error report:
using the given status value before marking a new error context.
Any code which attempts to execute when the inherited status is set to an error value should be regarded as “cleaning-up”.
It may sometimes be convenient within an application to obtain access to any error messages within the current context via a character variable, instead of the error output stream. The Error System provides subroutine ERR_LOAD to do this; it has the calling sequence:
The behaviour of ERR_LOAD is the same as ERR_FLUSH except that, instead of delivering deferred error messages from the current error context to the user, the error messages are returned, one by one, through character variables in a series of calls to ERR_LOAD.
On the first call of this routine, the error table for the current context is copied into a holding area, the current error context is annulled and the first message in the holding area is returned. Thereafter, each time the routine is called, the next message from the holding area is returned. The argument PARAM is the returned message name and PARLEN the length of the message name in PARAM. OPSTR is the returned error message text and OPLEN is the length of the error message in OPSTR.
The status associated with the returned message is returned in STATUS until there are
no more messages to return – then STATUS is set to SAI__OK
, PARAM and OPSTR are
set to blanks and PARLEN and OPLEN to 1. As for ERR_FLUSH, a warning message is
generated if there are no messages initially. The status returned with the warning message is
EMS__NOMSG.
After STATUS has been returned SAI__OK
, the whole process is repeated for subsequent
calls.
The symbolic constants ERR__SZPAR and ERR__SZMSG are provided for declaring the lengths of character variables which are to receive message names and error messages in this way. These constants are defined in the include file ERR_PAR (see §6).
As a general rule, message tokens should be assigned, using calls to the MSG_SETx and MSG_FMTx routines, immediately prior to the call in which they are to be used. However, this is not always convenient; e.g. within an iteration or a block IF statement where the same tokens may be used in one of several potential message reports. Under these circumstances, it is important to protect the values of assigned message tokens when subroutines which may fail are called – when a subroutine fails it must be assumed that it will make an accompanying error report using ERR_REP within the existing error reporting context, thereby annulling any currently defined message tokens. The only sure way of protecting against such behaviour is to bracket the subroutine call which may fail with calls to ERR_MARK and ERR_RLSE. The same precautions are needed when any subroutine is called which may in turn call any of MSG_OUT, MSG_OUTIF, MSG_LOAD, ERR_REP or ERR_LOAD (all of which annul tokens).
It is not good practice to assign message tokens which are to be used in another subroutine.
Here is an example of assigning message tokens outside a block IF statement to be used by ERR_REP and MSG_OUT calls within the IF block. The code is a fragment of a routine for re-scaling a single array to a mean of unity. If the call to the subroutine MEAN fails, any assigned message tokens in the current error reporting context may be annulled; hence the need to bracket this call by calls to ERR_MARK and ERR_RLSE.
Some of the lower-level Starlink libraries cannot use ERR (or EMS) to make error reports; furthermore, some items of software may need to perform Fortran I/O operations or calls to operating system routines. Any of these may fail but will not have made error reports through ERR_REP. For this reason it is sometimes useful to convert the given error code into a message which can be displayed as part of a message at a higher level where some context information can be added.
Three subroutines exist to enable a message token to be built from the error code returned under these circumstances. These subroutines are:
where STATUS is a standard Starlink facility status value,
where IOSTAT is a Fortran I/O status code, and
where SYSTAT is a status value returned from an operating system routine.
Each of the above routines will assign the message associated with the given error code to the specified token, appending the message if the token is already defined. The error code argument is never altered by these routines. It is important that the correct routine is called, otherwise the wrong message or, at best, only an error number will be obtained.
ERR_FACER is not likely to be useful for applications programmers because suitable error reports will probably have been made by higher-level facilities called directly by the application – it is really provided for completeness. The other two routines will be more useful.
Here is an example of using ERR_FIOER. It is a section of code that writes a character variable to a formatted sequential file, given the Fortran logical unit of the file:
Here, the name of the file being read is also obtained in order to construct a comprehensive error message, which might be something like:
Note that the I/O status values used in Fortran do not have universally defined meanings except for zero (meaning no error), but by using ERR_FIOER it is still possible to make high quality error reports about Fortran I/O errors in a portable manner.
In a similar way, the subroutine ERR_SYSER may be used to assign an operating system message associated with the system status flag SYSTAT to the named message token. Of course, software that calls operating system routines directly cannot be portable, but ERR_SYSER provides a convenient interface for reporting errors that occur in such routines in a form that can be easily changed if necessary. For example:
Fortran I/O and operating system error messages, obtained through calls to ERR_FIOER and ERR_SYSER respectively, will differ depending upon which operating system (or even flavour of operating system) an application is run on.
Because of the necessary generality of these messages (and those from ERR_FACER), many will appear rather vague and unhelpful without additional contextual information. This is particularly true of UNIX implementations. It is very important to provide additional contextual information when using these routines in order to avoid obfuscating rather than clarifying the nature of an error. This can be done either as part of the error message which includes the message token set by ERR_FACER, ERR_FIOER or ERR_SYSER, or by making a further error report. The examples in this section provide a good illustration of how this can be done.
Sometimes “foreign” subroutines must be called which do not use the Starlink error status conventions (e.g. because they must adhere to some standard interface definition like GKS). Unless they are unusually robust, such routines must normally be prevented from executing under error conditions, either by performing a status check immediately beforehand, or by enclosing them within an appropriate IF...END IF block. Depending on the form of error indication that such foreign routines use, it may also be necessary to check afterwards whether they have succeeded or not. If such a routine fails, then for compatibility with other Starlink software a status value should be set and an error report made on its behalf.
For example, the following code makes a GKS inquiry and checks the success of that inquiry:
In some cases, it may be possible to obtain a textual error message from the error flag, by means of a suitable inquiry routine, which could be used as the basis of the error report.
It will be obvious from this example how convenient the inherited error status strategy is, and how much extra work is involved in obtaining the same degree of robustness and quality of error reporting from routines which do not use it. It is worth bearing this in mind if you are involved in importing foreign subroutine libraries for use with Starlink software: the provision of a few simple routines for automating error reporting, or an extra layer of subroutine calls where inherited status checking and error reporting can be performed, can make the final product vastly easier to use. Starlink staff will be pleased to offer advice on this matter if consulted.
When converting existing subroutine libraries to use the inherited status conventions and the Error System, it is conceivable that an existing subroutine which does not have a status argument will acquire the potential to fail and report an error, either from within itself or from packages layered beneath it. Ideally, the argument list of the subroutine should be changed to include a status argument. However, it may be inconvenient to modify the argument list of a commonly used subroutine (i.e. because of the amount of existing code which would have to be changed), and so an alternative method is needed to determine if status has been set during the call so that the appropriate action can be taken by the caller. The subroutine ERR_STAT is provided for recovering the last reported status value under these conditions. Here is an example of the use of ERR_STAT, called from a subroutine which follows the error reporting conventions:
Here, the calls to NOSTAT and ERR_STAT are equivalent to one subroutine call with a status argument. The use of ERR_STAT to return the current error status relies upon the use of ERR_REP to report errors by the conventions described in this document. In particular, foreign packages must be incorporated in the recommended wayas described in §3.16 for ERR_STAT to be reliable.
Finally, it is emphasised that ERR_STAT is only for use where there is no other choice than to use this mechanism to determine the last reported error status.
1For historical reasons there are still some routines in ADAM which set a status value without making an accompanying error report – these are gradually being corrected. If such a routine is used before it has been corrected, then the strategy outlined here is recommended. It is advisable not to complicate new code by attempting to make an error report on behalf of the faulty subroutine. If it is appropriate, please ensure that the relevant support person is made aware of the problem.