PERIOD
is a menu-driven package. On entering PERIOD
, you will be confronted with the following
menu options, which are described in greater detail below.
Any one of these commands can be entered by typing anything from the shortest unambiguous
string up to the full command name. Therefore, P
would be ambiguous, but PE
would
not.
INPUT
As described in section 3.1, this option allows you to input ASCII data into PERIOD
. The routine
determines the number of columns in the input files and then prompts the user for which columns refer to
the x-axis,
y-axis and
y-axis
errors (if desired, see section 3). For example, if the user is inputting radial velocity data, the
x-axis would most probably
be HJD’s, the y-axis
the heliocentric radial velocities and there would most likely be errors associated with each radial velocity value.
Note that the x-axis
values must be in ascending order, otherwise INPUT
will report a warning
and either sort the data (if requested to do so) or abort. Note also that the
y-axis
errors are used by all options in the main PERIOD
menu, but by only the CHISQ
periodicity-finding
option in the period_period
sub-menu.
OGIP
As described in section 3.2, this option allows you to input data from an OGIP FITS
table into PERIOD
.
The routine displays some information about the file requested and allows you to choose which of its
available tables is to be examined. You then select which of the columns in the file refers to the
x-axis,
y-axis and
y-axis
errors (if desired, see section 3).
FAKE
Allows you to create fake data with which to test or experiment with PERIOD
. Two options are catered
for: periodic data or chaotic data. The periodic data are created by summing a user-specified number
of sine curves of the form:
The chaotic data are created using a simple logistic equation of the form:
(see, for example, Scargle 1990ab).
NOISE
Using this option, it is possible to add noise to data or randomize data. The latter operation
is carried out by specifying the [N]
ew dataset option, which will construct an artificial
dataset of the same mean value and the same standard deviation as the original. Selecting
the [O]
ld dataset allows you to apply noise to data, create errorbars on the data points,
and/or add noise to the data sampling (so that, for instance, an evenly sampled dataset
becomes unevenly sampled). This routine is useful, not only in creating realistic artificial
datasets (in conjunction with FAKE
), but also in investigating the effects of noise on a period
detection.
DETREND
This option removes the D.C. bias from data, which if not removed gives rise to significant power
at 0 Hz. There are two options: If the data show no long term trends, it is best to simply
subtract the mean and divide by the standard deviation (the [M]
option). This gives a dataset
with a mean of zero and a standard deviation of one. Otherwise, it is best to subtract a
low-order polynomial fit to the data (the [P]
option), since if these are not removed, a
Fourier transform will inject a significant amount of power at the frequency of the long term
variations.
WINDOW
One of the main problems with the classical
periodogram1
(see Scargle 1982 for a definition), is spectral leakage, of which there are several forms. Leakage to nearby
frequencies (sidelobes) is due to the finite total interval over which the data is sampled. Leakage to distant
frequencies is due to the finite size of the interval between samples. The WINDOW
option sets all the
y-axis
data points to unity. A discrete Fourier transform of the resulting data (using, for example, the FT
option, see below) yields the window function (or spectrum), which shows the effects of spectral
leakage.
OPEN
It is possible to store the fits calculated by SINE
and PEAKS
in a log file. This option opens a new log file
(if it does not already exist), or else re-opens an old log file and skips over the existing
entries.
CLOSE
This option closes the currently open log file.
PERIOD
This is where all the work is done. You will be confronted by the following sub-menu:
SELECT
– Selects input and output slots for processing, as described in section 3. The
input slots should contain the time-series, the output slots will contain, for example, the
power spectra. SELECT
must be run every time a periodicity-finding option is about to be
executed; although tedious, this prevents one from accidentally overwriting slots.
FREQ
– Sets the frequency search parameters. The minimum frequency, maximum frequency
and frequency interval can be selected by you. Generally, there is no restriction on the
number of frequencies to be stepped-though in the processing. Alternatively, by entering
0’s, default values can be accepted. Note that the default values are set on entering the
PERIOD
package and thus the FREQ
option need not be run if default frequencies are
required. The default values are calculated as follows: minimum frequency = 0 (ie. infinite
period), maximum frequency = 1 / (2 ×
Smallest Data Interval) (ie. Nyquist), frequency interval = 1 / (4 ×
Total Time Interval).
CHISQ
– This is a straight-forward technique where the input data is folded on a series
of trial periods. At each trial period, the data is fitted with a sine curve. The resulting
reduced-χ2
values are plotted as a function of trial frequency and the minima in the plot suggest the
most likely periods. See Horne, Wade and Szkody (1986) for an example of the use of this
method, which is ideally suited to the study of radial velocity data or any other sinusoidal
variations. Note that windowed data cannot be processed by this option since no sine fit
is possible.
CLEAN
– The CLEAN
algorithm was originally developed for use in aperture synthesis and
was later applied to one-dimensional data by Roberts, Lehár and Dreher (1987). An
adapted version of Lehár’s code is used here, and is particularly useful for unequally
spaced data. The algorithm basically deconvolves the spectral window from the discrete
Fourier power spectrum (or dirty spectrum). This produces a CLEAN
spectrum, which is
largely free of the many effects of spectral leakage. In order to prevent small errors from
destabilizing the CLEAN
procedure, the user is prompted for two parameters – the loop
gain and the number of iterations. Briefly, with each iteration, some fraction (governed
by the loop gain) of the window function is removed from the dirty spectrum. For
convergence, the loop gain must lie between 0 and 2, typical values being between 0.1
and 1. Values at the bottom of this range require more iterations, but should provide more
stability. Hence, the number of iterations should be large if the loop gain is small, typical
values lying between 1 and 100. Note that an increase in the number of cleans produces
a less noisy spectrum but, in general, the amplitude of the peaks is decreased, sometimes
by a substantial amount. See Roberts, Lehár and Dreher (1987) for further details on
choosing these parameters.
FT
– This option performs a classical discrete Fourier transform on the data and sums
the mean-square-amplitudes of the result to form a power spectrum (see, for example,
Deeming 1975). This discrete Fourier transform is defined for arbitrary data spacing and
is equal to the convolution of the true Fourier transform with a spectral window. Hence,
the effects of data spacing, such as aliasing, are all contained in the spectral window,
which can be generated using the WINDOW
option (see above). This spectral window should
be analysed in conjunction with the discrete Fourier transform generated here in order to
estimate the effects of aliasing.
PDM
– The phase dispersion minimization (PDM) technique is simply an automated
version of the classical method of distinguishing between possible periods, in which
the period producing the least observational scatter about the mean light curve (or, for
example, radial velocity curve) is chosen. This technique (which is described in detail by
Stellingwerf 1978) is well suited to cases in which only a few observations are available
over a limited period of time, especially if the light curve is highly non-sinusoidal. The
data is first folded on a series of trial frequencies. For each trial frequency, the full phase
interval (0,1) is divided into a user-specified number of bins. The width of each bin is
specified by the user, such that a point need not be picked (if a bin width narrower than
the bin spacing is selected) or a point can belong to more than one bin (if a bin width
wider than the bin spacing is selected). The variance of each of these bins (or samples) is
then calculated. This gives a measure of the scatter around the mean light curve defined
by the means of the data in each sample. The PDM statistic can then be calculated by
dividing the overall variance of all the samples by the variance of the original (unbinned)
dataset. This process is then repeated for the next trial frequency. Note that windowed
data cannot be passed to this option since its variance is zero. If the trial period is not a
true period, then the overall sample variance will be approximately equal to the variance
of the original dataset (ie. the PDM statistic will be approximately equal to 1). If the trial
period is a correct period, the PDM statistic will reach a local minimum compared with
neighbouring periods, hopefully near zero.
SCARGLE
– By redefining the classical periodogram (ie. the discrete Fourier periodogram)
in such a manner as to make it invariant to a shift of the origin of time, Lomb (1976)
and Scargle (1982) developed a novel type of periodogram analysis, quite powerful
for finding, and testing the significance of, weak periodic signals in otherwise random,
unevenly sampled data. Horne and Baliunas (1986) have elaborated on the method and
Press and Rybicki (1989) present a fast implementation of the algorithm, a modified
version of which is used here. This implementation uses FFTs to increase the speed
of computation (although it is in no way equivalent to conventional FFT periodogram
analysis). Note that windowed data cannot be passed to this option since it needs to
calculate the variance (which is zero) to normalize the power of the periodogram.
STRING
– The string-length method is an intuitively simple method, described in detail
by Dworetsky (1983) and Friend et al. (1990). The data is folded on a series of trial periods
and at each period the sum of the lengths of line segments joining successive points
(the string-length) is calculated. Minima in a plot of string-length versus trial frequency
indicate possible periods. The string-length method is especially useful in the limit of
a very small number (about 20 or more) of randomly spaced observations of periodic
phenomena. Note that windowed data cannot be passed to this option due to the y-data
scaling process (see Dworetsky 1983).
PEAKS
– This option should be run once a periodogram has been obtained. It finds the
highest peak in the periodogram (or lowest trough if it is a string-length, PDM or reduced-χ2
plot) between user-specified frequencies. The resulting period is calculated, along with an
error. Errors on period detections are notoriously difficult to estimate. The estimate used
in the previous version of PERIOD
(v3.0) employed a formula derived by Kovacs (1981).
The derivation assumed a single signal, Gaussian noise and even data spacing. This is
clearly not the case with most astronomical datasets and the formula is hence of little use
(see Horne and Baliunas 1986). Schwarzenberg-Czerny (1991) presents a detailed account
of the accuracy of period determinations and advises a post-mortem analysis by measuring
the width and heights of peaks in a periodogram. Although virtually impossible to automate,
it is possible to do this manually from within PERIOD
using the fitting routines of QDP/PLT
(see above). Therefore, for the sake of generality and to avoid uncertainties, version 4.0
of PERIOD
now only outputs an error derived by calculating the half-size of a single
frequency bin, centred on the peak (or trough) in a periodogram, and then converting to
period units. This error gives an indication of the accuracy to which a peak can be located
in a periodogram (due to the frequency sampling). Clearly, with a larger frequency search
interval it is more difficult to locate a peak precisely and this is reflected in the error
estimate. However, this error estimate does not take into account the fact that the peak
(or trough) may not represent the true period (which can be shifted due to a number of
effects) and it should therefore be regarded as a minimum error and not a formal error.
If the significance calculation is enabled (with the SIG
command, see below), two false
alarm probabilities are quoted alongside the period. The first (FAP1
) is the probability
that, given the frequency search parameters, there is no periodic component present in
the data with this period. The second (FAP2
) is the probability that the period is not
actually equal to the quoted value but is equal to some other value. Note that FAP1 is only
output if the whole frequency range is specified to be analysed in PEAKS
(see below). One
sigma errors on both significance values are also given. If the significance values are zero,
these errors are displayed as –1, implying that the false alarm probabilities lie between
0.00 and 0.01 with 95% confidence. Clearly, the lower a significance value and its error,
the more likely the quoted period is a correct one. If both the significances and errors
are displayed as –1, this means that the input periodogram has not been subjected to a
significance calculation (ie. the significance calculation has been disabled). Note that the
results can be written to a log file if one is open. For more information on the SIG
option,
see below. For useful discussions on errors and significances of period determinations,
see Schwarzenberg-Czerny (1991) and Nemec and Nemec (1985).
SIG
– This option works as a switch, either turning on or turning off the significance
calculation. The default on entering PERIOD
is for the significance calculation to be
disabled. This means that no significance values are calculated or attached to period
determinations. By typing SIG
, the significance calculation is enabled. You are first
prompted for the number of permutations in the sample. To ensure reliable significance
values, the minimum number of permutations is set to 100. You are then prompted for
a seed for the random number generator – this number determines the starting point in
a number series of infinite period. Therefore, entering the same seed on two calls to SIG
will result in the same sequence of random numbers. If SIG
is already enabled, one can
disable the significance calculation by typing SIG
again.
With the significance calculation enabled, every time a period-finding option is run (CHISQ,
FT, SCARGLE, CLEAN, STRING, PDM
) a Fisher randomization test is performed (see, for
example, Nemec and Nemec 1985). This consists of calculating the periodogram as usual
and loading the specified output slot. The y-axis
data is then shuffled to form a new, randomized time-series. The periodogram of this
dataset is then calculated (but not stored in the output slot, which will always contain the
periodogram of the real time-series). This randomization and periodogram calculation
loop is then performed for the number of permutations specified by the user. This can take
a considerable amount of time, depending on the number of data points in the time-series,
the frequency search parameters and the number of permutations.
Once the loop is complete, you should enter the PEAKS
option to view the resulting
significances. Two significance estimates are given in PEAKS
. The first, denoted FAP1
,
represents the proportion of permutations (ie. shuffled time-series) that contained a
trough lower than (in the case of the CHISQ, STRING
and PDM
options) or a peak higher
than (in the case of the FT, SCARGLE
and CLEAN
options) that of the periodogram of the
unrandomized dataset at any frequency. This therefore represents the probability that,
given the frequency search parameters, no periodic component is present in the data
with this period and it is only output in PEAKS
if the whole frequency range is specified
to be analysed. The second significance, denoted FAP2
, represents the proportion of
permutations that, at the frequency given by the period output by PEAKS
, contained troughs
lower than (or peaks higher than) the peak or trough in the periodogram of the real
dataset. This therefore represents the probability that the period is not actually equal to
the quoted value but is equal to some other value, and is quoted for any frequency range
specified in PEAKS
. Standard errors on both of these false alarm probabilities are also given
(see Nemec and Nemec 1985).
It is perhaps worth mentioning here that significance estimates of period detections
are notoriously unreliable. The methods used in the previous version of PERIOD
(v3.0)
suffered from a number of problems. For example, the F-test used with the PDM
method
(Stellingwerf 1978) has been proved to be incorrect (see, for example, Heck, Manfroid and
Mersch 1985). Similarly, the theoretical minimum string-lengths quoted by Dworetsky
(1983) are misleading, since they are based on evenly-spaced functions and it is possible
to obtain values below this even for pure noise data with certain data spacings. The
well-known SCARGLE
false alarm probabilities are also incorrect, since the Horne and
Baliunas (1986) equation for the number of independent frequencies has been shown to be
incorrect (Christian Knigge (Oxford), private communication). Even if correct, the Horne
and Baliunas formula would be incorrect to apply in a general way since it is a poor
approximation to small datasets. The only reliable method of estimating significances
from such non-parametric tests is by some sort of Monte Carlo or randomization method.
As described above, one such method (Fisher randomization) has been implemented in
this version of PERIOD
(v4.0) following the prescription described by Nemec and Nemec
(1985).
HELP
– This command provides on-line help for PERIOD
. Detailed information about
individual commands can be obtained by typing HELP ’COMMAND’
(eg. HELP PEAKS
).
QUIT
(or EXIT
) – This quits the PERIOD_PERIOD sub-menu and returns the user to the main
PERIOD
menu.
Returning to the main PERIOD
menu:
FIT
Folds the data on a given period and zero point and then fits the data with a sine curve. The sine curve
has the form: Y = GAMMA + (AMPLITUDE * SIN( ((2.0*PI)/PERIOD) * (X - ZEROPT)
))
. Outputs the fit parameters (which can be written to a log file) and the resulting sine
curve.
FOLD
Folds the data on a given period and zero point. Hence, this option transforms the data onto a phase scale, where one phase unit is equal to one period and phase zero is defined by the zero point. If the zero point is not known, the data can be folded by taking the first data point as the zero point. This option is useful for checking whether derived periods actually give sensible results when applied to the data. In addition to normal folding, it is also possible to phase bin the data, which folds the data and then averages all the data points falling into each bin.
SINE
Adds, subtracts, multiplies or divides a sine curve from data. The sine curve has the form: Y = GAMMA
+ (AMPLITUDE * SIN( ((2.0*PI)/PERIOD) * (X - ZEROPT) ))
. This option is useful for removing
or adding known periods from/to data, thus enabling or testing the detection of other
periods.
PLT
This routine calls PGPLOT routines to display the graphs of the slots requested. The layout of the
displays is fixed but output file types such as landscape postscript files can be created. This represents
slightly less functionality than the original XANADU based QDP PLT
routine, but no QDP PLT
routine
is currently available for LINUX.
In order to receive on-line help, simply type HELP
at the PERIOD-PLT
prompt. To exit PERIOD-PLT
and
return to the PERIOD
menu, type EXIT
.
STATUS
Returns information on the data slots or on the stored fits in the log file. This command is useful in
order to check which slots contain which datasets and also as a means of obtaining some elementary
statistics on the stored data. You can also use this option to check the fits from the SINE
and
PEAKS
options stored in the log file without having to exit the package and read the log
file.
OUTPUT
Writes any selected slot to an ASCII file on disk. This is the only way of saving data created by PERIOD
(it does not write to FITS
files), and should therefore be run before QUIT
ing in order to store, say, a
power spectrum.
HELP
This command provides on-line help for PERIOD
. Detailed information about individual commands
can be obtained by typing HELP ’COMMAND’
(eg. HELP PERIOD
).
QUIT
(or EXIT
)This option quits a PERIOD
session. However, it does provide a last chance to stay in the package. This
is essential to prevent accidental exit, since any data files created using PERIOD
will be lost on exit from
the package unless one OUTPUT
s the data first.
1Throughout the PERIOD
package and this document, the terms power spectrum and periodogram are used
interchangeably, although strictly speaking the power spectrum is a theoretical quantity defined as an integral over
continuous time, of which the periodogram is merely an estimate based on a finite amount of discrete data (Scargle
1982).