Implementation Details

←Prev
Further parallelization of iteratemap with MPI
Next→
TOC ↑

4 Implementation Details

It will take some work to take the parallel algorithm described here and implemented in the prototype and apply it to makemap. Some of the issues that will have to be considered are listed:

Do we need to dynamically determine the amount of memory on each node? makemap measures the total available memory and decides how much data to process at one time based on the result. But in a typical cluster queueing system, one requests a specific amount of memory per node, so this step might not be needed.
A related question is how should the load be balanced? Do we allow for heterogenous clusters of nodes? One could imagine determining the available memory, number of processors and processor speed on each node and balance the data accordingly. But it would be much simpler to assume the method will only be run on homogeneous clusters and divide the data evenly between the nodes.
Communication is also needed any time a return status is checked, since if one process is going to abort, then whole system should abort. It should be sufficient to precede each status test with an MPI_Allreduce call on the status variable using the operator MPI_LOR (“logical or”).
For certain models it may be more efficient to consider communicators other than the default MPI_COMM_WORLD, which consists of all processes in the system. For example, one might like to calculate the common mode on a per-array basis rather than a single mode for all detectors. If the detectors are split across nodes such each node has detectors belonging to only one array, a communicator can be created for each array (consisting of the nodes that contain data from that array), and then the same form of MPI_Allreduce can be used, with each node specifying the appropriate communicator. This type of sub-division could help with the communication overhead.

←Prev
Further parallelization of iteratemap with MPI
Next→
TOC ↑