TUNING

←Prev
HDS
Hierarchical Data System
Next→
TOC ↑

15 TUNING

15.1 Setting HDS Tuning Parameters
15.2 Tuning Parameters Available
15.3 Tuning in Practice

15.1 Setting HDS Tuning Parameters

HDS has a number of internal integer tuning parameters whose values control various aspects of its behaviour and some of which may have an important effect on its performance. Each of these parameters has a default value, but may be over-ridden by either of two mechanisms.

Environment Variables.

By defining appropriate environment variables it is possible to set new default values for HDS tuning parameters. The translation of these environment variables is picked up when HDS starts up (typically when the first HDS routine is called) and an attempt is then made to interpret the resulting value as an integer. If successful, the default value of the tuning parameter is set to the new value. If not, it remains unchanged. HDS applies sensible constraints to any new values supplied.

For example, the UNIX C shell environment variable definition:

setenv HDS_INALQ 10

could be used to increase the default size of all newly created HDS files to 10 blocks.⁸

The name of the environment variable is constructed by prefixing the string ‘HDS_’ to the tuning parameter name. All such environment variables must be specified using upper case.

It should be recognised that this ability to set tuning parameter values via environment variables can be dangerous. It is provided mainly to encourage experimentation and to overcome “one-off” tuning problems, but it carries a risk of disrupting normal program behaviour. In particular, you should not expect that all HDS programs will necessarily continue to work with all possible settings of their tuning parameters, and software developers are urged not to write programs which depend on non-default settings of HDS tuning parameters, as this may give rise to conflicts with other software. If a tuning parameter setting really is critical, then it should be set by the software itself (see below), so as to prohibit outside interference.

Calling HDS_TUNE.

Tuning parameter values may also be set directly from within an item of software by means of calls to the routine HDS_TUNE. This allows programs to over-ride the default settings (or those established via environment variables). To modify the ‘MAP’ tuning parameter, for example, the following call might be used:

CALL HDS_TUNE( ’MAP’, 0, STATUS )

This would have the effect of disabling file mapping in favour of reading and writing as the preferred method of accessing data in container files. The related routine HDS_GTUNE may be used to determine the current setting of a tuning parameter (see §15.3 for an example of its use).

15.2 Tuning Parameters Available

HDS currently uses the following tuning parameters to control its behaviour.

Parameters which control the top level HDS library

VERSION - Data Format for New Files:

This determines the data format that is used when a new container file is created. It may take the value 3, 4 or 5, and currently defaults to 5. Version 4 is a Starlink-specific data format that is a development of the original HDS data format created by the Starlink project in the 1980’s. Version 5 uses the HDF5 data format on disk to mimic the facilities of version 4. Such files cannot be accessed by versions of the HDS library prior to version 6, but can be examined using various publicly available HDF5 tools - see section “DISK FORMATS AND HDF5”. Version 3 refers to the Starlink-specific data format as it was prior to the introduction of 64-bit mode.

VERSION $=$ 5:: Use the HDS data format 5 library, which is based on HDF5.
VERSION $=$ 4:: Use the HDS data format 4 library with its “64BIT” tuning parameter set to 1.
VERSION $=$ 3:: Use the same HDS library as data format 4, but with its “64BIT” tuning parameter set to 0.

V4LOCKERROR - controls error reporting:

If non-zero, an error is reported if a thread lock function (datLock, datUnlock or datLocked) is used on a locator for an object stored using disk format version 4. Otherwise, the function returns without action. The default is to return without action. This facility is intended for debugging purposes.

Parameters used with both HDS version 4 and version 5 files

MAP - Use file mapping if available?

This value controls the method by which HDS performs I/O operations on the values of primitive objects and may take the following values:

MAP $=$ 1:: Use “file mapping” (if supported) as the preferred method of accessing primitive data.
MAP $=$ 0:: Use read/write operations (if supported) as the preferred data access method.
MAP $= -$ 1:: Use whichever method is normally faster for sequential access to all elements of a large array of data.
MAP $= -$ 2:: Use whichever method is normally faster for sparse random access to a large array of data.
MAP $= -$ 3:: Use whichever method normally makes the smaller demand on system memory resources (normally this means a request to minimise use of address space or swap file space, but the precise interpretation is operating system dependent). This is normally the appropriate option if you intend to use HDS arrays as temporary workspace.

HDS converts all other values to one. The value may be changed at any time.

A subsequent call to HDS_GTUNE, specifying the ‘MAP’ tuning parameter, will return 0 or 1 to indicate which option was actually chosen. This may depend on the capabilities of the host operating system and the particular implementation of HDS in use. The default value for this tuning parameter is also system dependent (see §F.3).

Typically, file mapping has the following plus and minus points:

It allows large arrays accessed via the HDS mapping routines to be sparsely accessed in an efficient way. In this case, only those regions of the array actually accessed will need to be read/written, as opposed to reading the entire array just to access a small fraction of it. This might be useful, for instance, if a 1-dimensional profile through a large image were being generated.
It allows HDS container files to act as “backing store” for the virtual memory associated with objects accessed via the mapping routines. The operating system can then use HDS files, rather than its own backing (swap) file, to implement virtual memory management. This means that you do not need to have a large system backing file available in order to access large datasets.
For the same reason, temporary objects created with DAT_TEMP and mapped to provide temporary workspace make no additional demand on the system backing file.
On some operating systems file mapping may be less efficient in terms of elapsed time than direct read/write operations. Conversely, on some operating systems it may be more efficient.
Despite the memory efficiency of file mapping, there may be a significant efficiency penalty when large arrays are mapped to provide workspace. This is because the scratch data will often be written back to the container file when the array is unmapped (despite the fact that the file is about to be deleted). This can take a considerable time and cannot be prevented as the operating system has control over this process.
Unfortunately, on some operating systems, this process appears to occur even when normal system calls are used to allocate memory because file mapping is used implicitly. In this case, HDS’s file mapping is at no particular disadvantage.
Not all operating systems support file mapping and it generally requires system-specific programming techniques, making it more trouble to implement on a new operating system.

Using read/write access has the following advantages and disadvantages:

On some operating systems it may be more efficient than file mapping in terms of elapsed time in cases where an array of data will be accessed in its entirety (the normal situation). This is generally not true of UNIX systems, however,
It is an inefficient method of accessing a small subset of a large array because it requires the entire array to be read/written. The solution to this problem is to explicitly access the required subset using (e.g.) DAT_SLICE, although this complicates the software somewhat.
It makes demands on the operating system’s backing file which the file mapping technique avoids (see above). As a result, there is little point in creating scratch arrays with DAT_TEMP for use as workspace unless file mapping is available (because the system backing file will be used anyway).
If an object is accessed several times simultaneously using HDS mapping routines, then modifications made via one mapping may not be consistently reflected in the other mapping (modifications will only be updated in the container file when the object is unmapped, so the two mappings may get out of step in the mean time). Conversely, if file mapping is in use and a primitive object is mapped in its entirety without type conversion, then this behaviour does not occur (all mappings remain consistent). It may occur, however, if a slice is being accessed or if type conversion is needed.
It is debatable which behaviour is preferable. The best policy is to avoid the problem entirely by not utilising multiple access to the same object while modifications are being made.

SHELL - Preferred shell:

This parameter determines which UNIX shell should be used to interpret container file names which contain “special” characters representing pattern-matching, environment variable substitution, etc. Each shell typically has its own particular way of interpreting these characters, so users of HDS may wish to select the same shell as they normally use for entering commands. The following values are allowed:

SHELL $=$ 2:: Use the “tcsh” shell (if available). If this is not available, then use the same shell as when SHELL $=$ 1.
SHELL $=$ 1:: Use the “csh” shell (C shell on traditional UNIX systems). If this is not available, then use the same shell as when SHELL $=$ 0.
SHELL $=$ 0 (the default):: Use the “sh” shell. This normally means the Bourne Shell on traditional UNIX systems, but on systems which support it, the similar POSIX “sh” shell may be used instead.
SHELL $= -$ 1:: Don’t use any shell for interpreting single file names (all special characters are to be interpreted literally). When performing “wild-card” searches for multiple files (with HDS_WILD), use the same shell as when SHELL $=$ 0.

HDS converts all other values to zero.

Parameters used only with HDS version 4 files

INALQ - Initial File Allocation Quantity:

This value determines how many blocks⁹ are to be allocated when a new container file is created. The default value of 2 is the minimum value allowed; the first block contains header information and the second contains the top-level object. Note that the host operating system may impose further restrictions on allowable file sizes, so the actual size of a file may not match the value specified exactly.

The value of this parameter reverts to its default value (or the value specified by the HDS_INALQ environment variable) after each file is created, so if it is being set from within a program, it must be set every time that it is required.

If a file is to be extended frequently (through the creation of new objects within it), then this parameter may provide a worthwhile efficiency gain by allowing a file of a suitable size to be created initially. On most UNIX systems, however, the benefits are minimal.

MAXWPL - Maximum Size of the “Working Page List”:

This value specifies how many blocks are to be allocated to the memory cache which HDS uses to hold information about the structure of HDS files and objects and to buffer its I/O operations when obtaining this information. The default value is 32 blocks; this value cannot be decreased. Modifications to this value will only have an effect if made before HDS becomes active (i.e. before any call is made to another HDS routine).

There will not normally be any need to increase this value unless excessively complex data structures are being accessed with very large numbers of locators simultaneously active.

NBLOCKS - Size of the internal “Transfer Buffer”:

When HDS has to move large quantities of data from one location to another, it often has to store an intermediate result. In such cases, rather than allocate a large buffer to hold all the intermediate data, it uses a smaller buffer and performs the transfer in pieces. This parameter specifies the maximum size in blocks which this transfer buffer may have and is constrained to be no less than the default, which is 32 blocks.

The value should not be too small, or excessive time will be spent in loops which repeatedly refill the buffer. Conversely, too large a value will make excessive demands on memory. In practice there is a wide range of acceptable values, so this tuning parameter will almost never need to be altered.

NCOMP - Optimum number of structure components:

This value may be used to specify the expected number of components which will be stored in an HDS structure. HDS does not limit the number of structure components, but when a structure is first created, space is set aside for creation of components in future. If more than the expected number of components are subsequently created, then HDS must eventually re-organise part of the container file to obtain the space needed. Conversely, if fewer components are created, then some space in the file will remain unused. The value is constrained to be at least one, the default being 6 components.

The value of this parameter is used during the creation of the first component in every new structure. It reverts to its default value (or the value specified by the HDS_NCOMP environment variable) afterwards, so if it is being set from within a program, it must be set every time it is needed.

SYSLCK - System wide lock flag:

This parameter is present for historical reasons and has no effect on UNIX systems.

WAIT - Wait for locked files?

This parameter is present for historical reasons and currently has no effect on UNIX systems, where HDS file locking is not implemented.

64BIT - Use 64-bit (HDS version 4) files?

This value can be used to select whether new files are created in the 64-bit (HDS version 4) format. If 64BIT=0 then files are created in the previous (HDS version 3) format.

This parameter is normally overriden by the VERSION parameter of the top level library.

Parameters used only with HDS version 5 files

LOCKCHECK - controls lock checking:: If non-zero, an error is reported if a V5 HDS object has not been locked appropriately by the current thread before use. If zero, no such check is performed. This is provided so that old software which pre-dates the datLock function (and which presumably is explicitly designed to avoid conflicting simultaneous access to an HDS object from separate threads) can be used without change. Use of this tuning parameter should be seen as a temporary stop-gap measure until such time as the software can be changed to use the datLock function correctly.

15.3 Tuning in Practice

Normally, a single application which wished to tune HDS itself (rather than accepting the default settings, or those specified by environment variables) would do so via calls to HDS_TUNE at the start, and would thus establish a default “tuning profile” to apply throughout the rest of the program. Similarly, a software environment can initially tune HDS to obtain the required default behaviour for the applications it will later invoke.

Sometimes, however, it may be necessary to modify a tuning parameter to improve performance locally while not affecting behaviour of other parts of a program (or other applications in a software environment). The routine HDS_GTUNE may therefore be used to determine the current setting of an HDS tuning parameter, so that it may later be returned to its original value. For instance, if the ‘MAP’ parameter were to be set locally to allow sparse access to a large array of data, the following technique might be used:

        ...
        INTEGER OLDMAP

  *  Obtain the original setting of the MAP parameter.
        CALL HDS_GTUNE( ’MAP’, OLDMAP, STATUS )
        IF ( STATUS .EQ. SAI__OK ) THEN

  *  Set a new value.
           CALL HDS_TUNE( ’MAP’, -2, STATUS )

           <map the array>

  *  Return to the old tuning setting.
           CALL ERR_BEGIN( STATUS )
           CALL HDS_TUNE( ’MAP’, OLDMAP, STATUS )
           CALL ERR_END( STATUS )
        END IF

Notice how great care has been taken over handling error conditions. In a large software system it could prove disastrous if a tuning parameter remained set to an incorrect value (perhaps causing gross inefficiencies elsewhere) simply because HDS_TUNE did not execute after an unexpected error had caused STATUS to be set to an error value.

⁸An HDS block is 512 bytes.

⁹An HDS block is 512 bytes.

←Prev
HDS
Hierarchical Data System
Next→
TOC ↑