15 TUNING
15.1 Setting HDS Tuning Parameters
HDS has a number of internal integer tuning parameters whose values control various aspects of its
behaviour and some of which may have an important effect on its performance. Each of these
parameters has a default value, but may be over-ridden by either of two mechanisms.
-
Environment Variables.
By defining appropriate environment variables it is possible to set new default values
for HDS tuning parameters. The translation of these environment variables is picked up
when HDS starts up (typically when the first HDS routine is called) and an attempt is
then made to interpret the resulting value as an integer. If successful, the default value of
the tuning parameter is set to the new value. If not, it remains unchanged. HDS applies
sensible constraints to any new values supplied.
For example, the UNIX C shell environment variable definition:
could be used to increase the default size of all newly created HDS files to 10
blocks.
The name of the environment variable is constructed by prefixing the string ‘HDS_’ to the
tuning parameter name. All such environment variables must be specified using upper
case.
It should be recognised that this ability to set tuning parameter values via environment variables
can be dangerous. It is provided mainly to encourage experimentation and to overcome
“one-off” tuning problems, but it carries a risk of disrupting normal program behaviour. In
particular, you should not expect that all HDS programs will necessarily continue to work with
all possible settings of their tuning parameters, and software developers are urged not to write
programs which depend on non-default settings of HDS tuning parameters, as this
may give rise to conflicts with other software. If a tuning parameter setting really is
critical, then it should be set by the software itself (see below), so as to prohibit outside
interference.
-
Calling HDS_TUNE.
Tuning parameter values may also be set directly from within an item of software by means of
calls to the routine HDS_TUNE. This allows programs to over-ride the default settings (or those
established via environment variables). To modify the ‘MAP’ tuning parameter, for example, the
following call might be used:
CALL HDS_TUNE( ’MAP’, 0, STATUS )
This would have the effect of disabling file mapping in favour of reading and writing as the
preferred method of accessing data in container files. The related routine HDS_GTUNE may be
used to determine the current setting of a tuning parameter (see §15.3 for an example of its
use).
15.2 Tuning Parameters Available
HDS currently uses the following tuning parameters to control its behaviour.
Parameters which control the top level HDS library
-
VERSION - Data Format for New Files:
This determines the data format that is used when a new container file is created. It may
take the value 3, 4 or 5, and currently defaults to 5. Version 4 is a Starlink-specific data
format that is a development of the original HDS data format created by the Starlink
project in the 1980’s. Version 5 uses the HDF5 data format on disk to mimic the facilities
of version 4. Such files cannot be accessed by versions of the HDS library prior to version
6, but can be examined using various publicly available HDF5 tools - see section “DISK
FORMATS AND HDF5”. Version 3 refers to the Starlink-specific data format as it was
prior to the introduction of 64-bit mode.
-
VERSION5:
-
Use the HDS data format 5 library, which is based on HDF5.
-
VERSION4:
-
Use the HDS data format 4 library with its “64BIT
” tuning parameter set to 1.
-
VERSION3:
-
Use the same HDS library as data format 4, but with its “64BIT
” tuning parameter
set to 0.
-
V4LOCKERROR - controls error reporting:
If non-zero, an error is reported if a thread lock function (datLock, datUnlock or datLocked) is
used on a locator for an object stored using disk format version 4. Otherwise, the function
returns without action. The default is to return without action. This facility is intended for
debugging purposes.
Parameters used with both HDS version 4 and version 5 files
-
MAP - Use file mapping if available?
This value controls the method by which HDS performs I/O operations on the values of
primitive objects and may take the following values:
-
MAP1:
-
Use “file mapping” (if supported) as the preferred method of accessing primitive
data.
-
MAP0:
-
Use read/write operations (if supported) as the preferred data access method.
-
MAP1:
-
Use whichever method is normally faster for sequential access to all elements of a
large array of data.
-
MAP2:
-
Use whichever method is normally faster for sparse random access to a large array
of data.
-
MAP3:
-
Use whichever method normally makes the smaller demand on system memory
resources (normally this means a request to minimise use of address space or swap
file space, but the precise interpretation is operating system dependent). This is
normally the appropriate option if you intend to use HDS arrays as temporary
workspace.
HDS converts all other values to one. The value may be changed at any time.
A subsequent call to HDS_GTUNE, specifying the ‘MAP’ tuning parameter, will
return 0 or 1 to indicate which option was actually chosen. This may depend on the
capabilities of the host operating system and the particular implementation of HDS
in use. The default value for this tuning parameter is also system dependent (see
§F.3).
Typically, file mapping has the following plus and minus points:
- It allows large arrays accessed via the HDS mapping routines to be sparsely accessed
in an efficient way. In this case, only those regions of the array actually accessed will
need to be read/written, as opposed to reading the entire array just to access a small
fraction of it. This might be useful, for instance, if a 1-dimensional profile through a
large image were being generated.
- It allows HDS container files to act as “backing store” for the virtual memory
associated with objects accessed via the mapping routines. The operating system can
then use HDS files, rather than its own backing (swap) file, to implement virtual
memory management. This means that you do not need to have a large system
backing file available in order to access large datasets.
- For the same reason, temporary objects created with DAT_TEMP and mapped to
provide temporary workspace make no additional demand on the system backing
file.
- On some operating systems file mapping may be less efficient in terms of elapsed
time than direct read/write operations. Conversely, on some operating systems it
may be more efficient.
- Despite the memory efficiency of file mapping, there may be a significant efficiency
penalty when large arrays are mapped to provide workspace. This is because the
scratch data will often be written back to the container file when the array is
unmapped (despite the fact that the file is about to be deleted). This can take a
considerable time and cannot be prevented as the operating system has control over
this process.
Unfortunately, on some operating systems, this process appears to occur even when
normal system calls are used to allocate memory because file mapping is used
implicitly. In this case, HDS’s file mapping is at no particular disadvantage.
- Not all operating systems support file mapping and it generally requires
system-specific programming techniques, making it more trouble to implement on
a new operating system.
Using read/write access has the following advantages and disadvantages:
-
SHELL - Preferred shell:
This parameter determines which UNIX shell should be used to interpret container file names
which contain “special” characters representing pattern-matching, environment variable
substitution, etc. Each shell typically has its own particular way of interpreting these characters,
so users of HDS may wish to select the same shell as they normally use for entering commands.
The following values are allowed:
-
SHELL2:
-
Use the “tcsh” shell (if available). If this is not available, then use the same shell as
when SHELL1.
-
SHELL1:
-
Use the “csh” shell (C shell on traditional UNIX systems). If this is not available,
then use the same shell as when SHELL0.
-
SHELL0 (the default):
-
Use the “sh” shell. This normally means the Bourne Shell on traditional UNIX systems,
but on systems which support it, the similar POSIX “sh” shell may be used instead.
-
SHELL1:
-
Don’t use any shell for interpreting single file names (all special characters are to be
interpreted literally). When performing “wild-card” searches for multiple files (with
HDS_WILD), use the same shell as when SHELL0.
HDS converts all other values to zero.
Parameters used only with HDS version 4 files
-
INALQ - Initial File Allocation Quantity:
This value determines how many blocks
are to be allocated when a new container file is created. The default value of 2 is the
minimum value allowed; the first block contains header information and the second
contains the top-level object. Note that the host operating system may impose further
restrictions on allowable file sizes, so the actual size of a file may not match the value
specified exactly.
The value of this parameter reverts to its default value (or the value specified by the
HDS_INALQ environment variable) after each file is created, so if it is being set from
within a program, it must be set every time that it is required.
If a file is to be extended frequently (through the creation of new objects within it), then
this parameter may provide a worthwhile efficiency gain by allowing a file of a suitable
size to be created initially. On most UNIX systems, however, the benefits are minimal.
-
MAXWPL - Maximum Size of the “Working Page List”:
This value specifies how many blocks are to be allocated to the memory cache which HDS
uses to hold information about the structure of HDS files and objects and to buffer its
I/O operations when obtaining this information. The default value is 32 blocks; this value
cannot be decreased. Modifications to this value will only have an effect if made before
HDS becomes active (i.e. before any call is made to another HDS routine).
There will not normally be any need to increase this value unless excessively complex
data structures are being accessed with very large numbers of locators simultaneously
active.
-
NBLOCKS - Size of the internal “Transfer Buffer”:
When HDS has to move large quantities of data from one location to another, it often has
to store an intermediate result. In such cases, rather than allocate a large buffer to hold
all the intermediate data, it uses a smaller buffer and performs the transfer in pieces. This
parameter specifies the maximum size in blocks which this transfer buffer may have and
is constrained to be no less than the default, which is 32 blocks.
The value should not be too small, or excessive time will be spent in loops which
repeatedly refill the buffer. Conversely, too large a value will make excessive demands on
memory. In practice there is a wide range of acceptable values, so this tuning parameter
will almost never need to be altered.
-
NCOMP - Optimum number of structure components:
This value may be used to specify the expected number of components which will be
stored in an HDS structure. HDS does not limit the number of structure components, but
when a structure is first created, space is set aside for creation of components in future. If
more than the expected number of components are subsequently created, then HDS must
eventually re-organise part of the container file to obtain the space needed. Conversely, if
fewer components are created, then some space in the file will remain unused. The value
is constrained to be at least one, the default being 6 components.
The value of this parameter is used during the creation of the first component in every
new structure. It reverts to its default value (or the value specified by the HDS_NCOMP
environment variable) afterwards, so if it is being set from within a program, it must be
set every time it is needed.
-
SYSLCK - System wide lock flag:
This parameter is present for historical reasons and has no effect on UNIX systems.
-
WAIT - Wait for locked files?
This parameter is present for historical reasons and currently has no effect on UNIX
systems, where HDS file locking is not implemented.
-
64BIT - Use 64-bit (HDS version 4) files?
This value can be used to select whether new files are created in the 64-bit (HDS version
4) format. If 64BIT=0 then files are created in the previous (HDS version 3) format.
This parameter is normally overriden by the VERSION parameter of the top level library.
Parameters used only with HDS version 5 files
-
LOCKCHECK - controls lock checking:
If non-zero, an error is reported if a V5 HDS object has not been locked appropriately
by the current thread before use. If zero, no such check is performed. This is provided
so that old software which pre-dates the datLock function (and which presumably is
explicitly designed to avoid conflicting simultaneous access to an HDS object from
separate threads) can be used without change. Use of this tuning parameter should be
seen as a temporary stop-gap measure until such time as the software can be changed to
use the datLock function correctly.
15.3 Tuning in Practice
Normally, a single application which wished to tune HDS itself (rather than accepting the default
settings, or those specified by environment variables) would do so via calls to HDS_TUNE at the start,
and would thus establish a default “tuning profile” to apply throughout the rest of the program.
Similarly, a software environment can initially tune HDS to obtain the required default behaviour for
the applications it will later invoke.
Sometimes, however, it may be necessary to modify a tuning parameter to improve performance
locally while not affecting behaviour of other parts of a program (or other applications in a software
environment). The routine HDS_GTUNE may therefore be used to determine the current setting of an
HDS tuning parameter, so that it may later be returned to its original value. For instance, if the ‘MAP’
parameter were to be set locally to allow sparse access to a large array of data, the following technique
might be used:
...
INTEGER OLDMAP
* Obtain the original setting of the MAP parameter.
CALL HDS_GTUNE( ’MAP’, OLDMAP, STATUS )
IF ( STATUS .EQ. SAI__OK ) THEN
* Set a new value.
CALL HDS_TUNE( ’MAP’, -2, STATUS )
<map the array>
* Return to the old tuning setting.
CALL ERR_BEGIN( STATUS )
CALL HDS_TUNE( ’MAP’, OLDMAP, STATUS )
CALL ERR_END( STATUS )
END IF
Notice how great care has been taken over handling error conditions. In a large software system it
could prove disastrous if a tuning parameter remained set to an incorrect value (perhaps causing gross
inefficiencies elsewhere) simply because HDS_TUNE did not execute after an unexpected error had
caused STATUS to be set to an error value.
Copyright © 2019 East Asian Observatory