One of the most useful routines within GRP is GRP_GROUP. This routine appends names to the end of a previously created group using a “group expression” obtained from the environment via a named parameter which can be of any type. The routine GRP_GRPEX also performs this function, except that the group expression is provided by the calling application, rather than being obtained through the parameter system.
This section describes the syntax of group expressions.
Group expressions may contain several “delimiter” characters (usually a comma although this can be changed, see section 5.7) and the substrings delimited by these characters are referred to as “elements”. If there are no delimiters in a group expression, then the group expression consists of a single element. For instance, the group expression:
consists of the three elements NEW_FILE,
A_2RAWFLAT
and ^
LIST.DAT. Note, delimiter characters are ignored if they occur within matching “nesting
characters” (see section 4.5). For instance, nesting prevents the group expression:
being split into three elements instead of two (i.e. the first comma does not act as a delimiter because it occurs within a nest formed by matching parentheses).
Each element of a group expression may be a literal name (eg NEW_FILE in the previous example), or
an “indirection element” or a “modification element”. An indirection element specifies a text file from
which further names are to be read (eg ^
LIST.DAT in the previous example). A modification element
specifies an existing group of names which are to be used as the basis for the new names (eg
A_2RAWFLAT
in the previous example). These are described in more detail below.
Each element in a group expression will give rise to one or more names (depending on whether the element consists of a literal name, an indirection element or a modification element). These names may be edited before being stored in a group by including certain “editing strings” within the text of the element. The general format of an element with editing strings included is:
The kernel string can be a single element, or can be a full group expression. Processing of the element proceeds as follows:
The names which result from this processing are then added to a group. If there is no ambiguity about where the kernel starts and finishes (for instance if the prefix and suffix are both omitted, and the kernel consists of a single element) then the kernel does not need to be enclosed within kernel delimiters. The contents of the kernel can be any group expression. In particular, the kernel can contain other nested kernels with their own associated editing strings.
Let’s look at some examples:
This will give rise to the three names A_TOM_B, A_DICK_B and A_HARRY_B.
This will read names from the text file FILE.LIS
(see the description of indirection elements
below), and replace all occurrences of the string “_OLD” within the names with the string
“_NEW”.
This is a complex example and needs looking at carefully. Looking at it at the highest level, it can be thought of as:
where kernel
is the group expression:
The first and third elements in this inner group expression are simple literal names and give rise to the two names A and C. The second element specifies that the three names ONE, TWO and THREE are to be edited by replacement of the letter T by the letter Z, and the addition of the prefix B_. After editing, these three names become B_ONE, B_ZWO and B_ZHREE. So the total group specified by this inner kernel is:
We can now go back and look at the full group expression in the form:
The first element specifies the single name WW. The second element specifies that each of the names arising from the expansion of the inner kernel (i.e. the names listed above) should be edited by replacing _Z with _Y, and then appending the suffix KK. Thus the final group contains:
An indirection element consists of an “indirection character” (usually “
” (up arrow) although this can
be changed, see section 5.7) followed by the name of a text file. For instance, the group
expression:
^
would cause GRP to search for a file called raw_data
.
The specified file is read to obtain further names to be added to the group. Each line in the file is processed as if it were a separate group expression, and so may contain any combination of literal names, modification elements or further indirection elements. It is thus possible to get several levels of indirection, in which a literal name is specified within a text file, which is itself specified within an indirection element contained within another text file, etc. GRP imposes a limit of 7 levels of indirection, primarily to safe-guard against “run-away” indirection which happens (for instance) when a file specifies itself within an indirection element.
Indirection elements are always considered to be case sensitive, even if the group has been designated case insensitive. This is because file names on certain operating systems (eg UNIX) are always considered case sensitive, and so problems would arise while accessing indirection files if GRP was to consider them case insensitive.
The file name can contain shell meta-characters (references to environment variables for instance) which will be expanded before the file is used.
A modification element causes GRP to generate a set of names by copying the names from another group. These new names can then be modified using the facilities for editing names described above. The application specifies which group is to be used as the basis for the new names. A special character (usually a “” character, but this can be changed if required) is used as a token to represent all the names in the basis group. Thus:
would cause all the names in the basis group to be modified by replacing the string _DS with the string _BK. The basis names can also be modified by the addition of a prefix and suffix. Following the description of name editing given above, you may expect the format to be (for instance):
in which the token character takes on the role of the kernel. This does in fact work, but in this case the opening and closing kernel delimiters (“{” and “}”) can be omitted because there is no ambiguity about where the kernel starts and finishes. Thus a simpler form would be:
The addition of a prefix and suffix can be combined with substitution as usual. For instance, the element:
would cause all occurrences of the letter C within the names of the basis group to be replaced with D, followed by the addition of the prefix A and the suffix B.
If a “null” group is specified as the basis group (i.e. the group identifier is given as GRP__NOID), then there are no names on which to base the new names and the token character is treated as a literal name. That is, if the user gave the group expression
and the application had specified a null group as the basis for modification elements, the the specified editing would be applied to the literal name “”, resulting in the single literal name “A_2” being added to the group.
There is sometimes a clash of interests to be resolved when deciding on the best choice for the character which delimits elements within a group expression. The default delimiter character is the comma, but this character can sometimes be useful within an element, for instance when specifying a set of indices. For instance, if the user gave the group expression:
in which each element is a literal name corresponding to an array element, it would be wrong to split this up using the commas as delimiters into the four strings “A(1”, “2)”, “B(3” and “10)”.
To get round this particular problem, GRP ignores delimiters which occur within matching “nesting characters”. There are two nesting characters, the “open nest” character (usually set to “(”) and the “close nest” character (usually set to “)”). Thus in the above example, the commas occurring within the parentheses would not be treated as delimiters, resulting in the group expression being split into the two elements A(1,2) and B(3,10). The characters to use as the opening and closing nest characters may be set by the calling application (see section 5.7).
GRP allows a group expression to be flagged by terminating it with a “flag” character (usually a minus sign although this can be changed, see section 5.7). If the last character in the group expression is a flag character, then the FLAG argument of routine GRP_GROUP is returned true. The flag character is stripped off the group expression before it is split up into elements, so the flag character itself does not get included in any of the names stored in the group.
A typical use of this facility might be to allow the user to request a further prompt for more names. For instance, in the example of section 1.2, the user may wish to specify more input file names than will fit on a single line. To allow this, the call to GRP_GROUP would be replaced with the following:
The user could then request a further prompt by appending a minus sign to the end of the group expression, as follows:
The names obtained at each prompt are appended to the end of the group, which expands as necessary.
Note, if the final element in a group expression is an indirection element, the flag character may be placed at the end of the last record in the indicated text file. For instance, instead of giving:
where the file LIST.DAT contains the single record
a user could “hard-wire” a flag character on to the end of LIST.DAT so that it contains:
It is often useful to mix comments with names, particularly within a text file. All group expressions (whether obtained from the environment or from a text file or as an argument) are truncated if a “comment” character is found (usually ‘#” but this can be changed, see section 5.7). Anything occurring after such a character is ignored. In a text file, the comment is assumed to extend from the comment character to the end of the line, so a new group expression may be given on the next line. Note, blank lines are not ignored. Each blank line within a text file will result in a blank name being added to the group.
If is possible to specify that a given character be used as an “escape character” within group expressions. This facility is normally suppressed, but an application can choose to switch it on by assigning a value to the ESCAPE control character associated with a group (see section 5.7). If this is done, any special meaning associated with a character within a group expression is ignored if the character is preceeded by an escape character. The escape characters themselves are not included in the resulting names if they preceed any of the other “special” control characters. Note, escape characters which do not preceed another control character are included in the resulting names.
For instance, the group expression:
would normally result in an error because the “|
” character would be taken as the start of an
incomplete specification for some editing to apply to the preceeding text (assuming the application
has not changed the default editing behaviour). If, in fact, the user wants this string to
be accepted as a literal string (maybe representing a Unix piping operation for instance),
then the “|
” should be escaped. Assuming the application chooses to use the backslash
character “\
” as the escape character, then this can be done by entering the following group
expression:
The “\
” character results in the “|
” character being treated as part of the required string, rather than as
the start of an editing specification. The string returned to the application is then “* | A
” (note, the
escape character has been removed). Any escape characters which do not preceed special
characters are included literaly in the returned string. So, for instance, if the group expression
was:
the string “\* | A
” would be returned to the application.
All escape characters within a section of a group expression can be ignored by using the special strings “!!” and “!!” to mark the start and end of the section.
Names are stored within a group in the order in which they are specified in the group expression. For instance, if the file F1.DAT contained the following two records:
and the file F2.DAT contained the following three records:
then the group expression:
would result in the names being added to the group in the following order
The contents of the two indirection files have been inserted at the position at which the corresponding indirection element occurred. Names resulting from the expansion of modification elements are similarly inserted into the list at the position at which the modification element occurred. The modified names are stored in the same order as the names within the group upon which the modification was based. For example, if the above group is used as the basis for modification, then the group expression:
would give rise to the group: