To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Many practical problems require the fitting of a probability distribution to a data sample, and in many fields of application the available data consist of not just a single sample but a set of samples drawn from similar probability distributions. It is natural to wonder whether the distribution for one sample can be more accurately estimated by using information not just from that sample but also from the other related samples. In the environmental sciences the data samples are typically measurements of the same kind of data made at different sites, and the process of using data from several sites to estimate the frequency distribution is known as regional frequency analysis. We have developed an approach to regional frequency analysis that is statistically efficient and reasonably straightforward to implement. Our aim in this monograph is to present a complete description of our approach: the specification of all necessary computations, a description of the theoretical statistical background, an assessment of the method's performance in plausible practical situations, recommendations to assist with the subjective decisions that are inevitable in any statistical analysis, and consideration of how to overcome some of the difficulties often encountered in practice. The technical level of exposition is intended to be comprehensible to practitioners with no more than a basic knowledge of probability and statistics, including an understanding of the concepts defined in Sections 2.1–2.3.
In 1989 the U.S. Army Corps of Engineers was charged with the responsibility of conducting a national study of water management during periods of drought. One of the results of the study is the National Drought Atlas (Willeke et al., 1995), which contains analyses of data on monthly precipitation, streamflow, reservoir levels, and the Palmer Drought Index for over 1,000 measuring sites in the continental United States. Analysis of the precipitation data used regional frequency analysis and was based on L-moments. Precipitation data were available as totals, in inches, for durations of 1, 2, 3, 6, 12, 24, 36 and 60 months starting in each calendar month January through December. Though regions could in principle have been defined separately for each combination of duration and starting month, this would have led to an atlas that would have been excessively large and difficult to use. It was therefore decided to construct a single set of regions, based on the data for annual precipitation totals, and to use these regions when fitting regional frequency distributions to the data for all durations and starting months.
Here we describe the analysis of the data for annual precipitation totals (though data for other durations and starting months affect some parts of the analysis). The analysis illustrates the steps involved in a large-scale regional frequency analysis exercise and shows how some of the commonly occurring problems in regional frequency analysis may be overcome.
In this monograph we have concentrated on regional frequency analysis using the index-flood procedure defined in Section 1.3 and the comparison of this method with at-site estimation. Several other regional frequency analysis procedures have been proposed; here we briefly describe them. For simplicity we consider them in the context of estimating a frequency distribution parametrized by its mean, its dispersion divided by its mean (typically L-CV), and one or more shape parameters (typically L-skewness). The estimators that each regional frequency analysis procedure uses for the parameters are summarized in Table 8.1. We are concerned with the question of which data are used in the analysis: at-site, regional, or some combination of the two. We do not consider the question of which statistical methods to apply to the data. We believe that methods based on L-moments are the best currently available; other approaches are reviewed by Cunnane (1988).
At-site estimation
For reference, we note here that at-site estimation involves the use of at-site estimates for all of the parameters of the distribution.
Regional shape estimation
If the mean and dispersion are estimated from at-site statistics, and the shape parameters are estimated by averaging the at-site shape measures for the sites in a region, we call the resulting procedure a “regional shape estimation” procedure. It is intermediate between pure at-site estimation and the index-flood procedure. It is discussed in more detail in Section 8.2.
Choosing a distribution for regional frequency analysis
General framework
In regional frequency analysis a single frequency distribution is fitted to data from several sites. In general, the region will be slightly heterogeneous, and there will be no single “true” distribution that applies to each site. The aim is therefore not to identify a “true” distribution but to find a distribution that will yield accurate quantile estimates for each site.
The chosen distribution need not be the distribution that gives the closest approximation to the observed data. Even when a distribution can be found that gives a close fit to the observed data, there is no guarantee that future values will match those of the past, particularly when the data arise from a physical process that can give rise to occasional outlying values far removed from the bulk of the data. As noted in Section 1.2, it is preferable to use a robust approach based on a distribution that will yield reasonably accurate quantile estimates even when the true at-site frequency distributions deviate from the fitted regional frequency distribution.
There may be a particular range of return periods for which quantile estimates are required. In analyses of extreme events such as floods or droughts, quantile estimates in one tail of the distribution will be of particular interest. In other examples, quantiles far into the tails of the distribution may be of little interest. These considerations may affect the choice of regional frequency distribution. If only quantiles in the upper tail are of interest, for example, then it need not matter if a distribution that can take negative values is fitted to data that can only be positive.
Frequency analysis is the estimation of how often a specified event will occur. Estimation of the frequency of extreme events is often of particular importance. Because there are numerous sources of uncertainty about the physical processes that give rise to observed events, a statistical approach to the analysis of data is often desirable. Statistical methods acknowledge the existence of uncertainty and enable its effects to be quantified. Procedures for statistical frequency analysis of a single set of data are well established. It is often the case, however, that many related samples of data are available for analysis. These may, for example, be meteorological or environmental observations of the same variable at different measuring sites, or industrial measurements made on samples of similar products. If event frequencies are similar for the different observed quantities, then more accurate conclusions can be reached by analyzing all of the data samples together than by using only a single sample. In environmental applications this approach is known as regional frequency analysis, because the data samples analyzed are typically observations of the same variable at a number of measuring sites within a suitably defined “region.” The principles of regional frequency analysis, however, apply whenever multiple samples of similar data are available.
Suppose that observations are made at regular intervals at some site of interest. Let Q be the magnitude of the event that occurs at a given time at a given site. We regard Q as a random quantity (a random variable), potentially taking any value between zero and infinity.
The first essential of any statistical data analysis is to check that the data are appropriate for the analysis. For frequency analysis, the data collected at a site must be a true representation of the quantity being measured and must all be drawn from the same frequency distribution. An initial screening of the data should aim to verify that these requirements are satisfied.
The exact nature of the problems that may affect the data depend on the kind of data that were measured. For environmental data for which a frequency analysis is being attempted, two kinds of error are particularly important and plausible.
First, data values may be incorrect. Incorrect recording or transcription of data values is easily done and casts doubt on any subsequent frequency analysis of the data.
Second, the circumstances under which the data were collected may have changed over time. The measuring device may have been moved to a different location or trends over time may have arisen from changes in the environment of the measuring device. This means that the frequency distribution from which the data were sampled is not constant over time, and frequency analysis of the data will not be a valid basis for estimating the probability distribution of future measurements at the site.
Even though the data may reputedly be reliable, it is still important to check for errors. A sobering example was provided by Wallis, Lettenmaier, and Wood (1991), who compiled a set of daily precipitation and temperature records for 1009 sites in the United States from data supplied by the National Climatic Data Center (NCDC).
Of all the stages in a regional frequency analysis involving many sites, the identification of homogeneous regions is usually the most difficult and requires the greatest amount of subjective judgement. The aim is to form groups of sites that approximately satisfy the homogeneity condition, that the sites' frequency distributions are identical apart from a site specific scale factor. This is usually achieved by partitioning the sites into disjoint groups. An alternative approach is to define for each site of interest a region containing those sites whose data can advantageously be used in the estimation of the frequency distribution at the site of interest. This is the basis of the “region of influence” approach to the formation of regions, discussed in Section 8.1.
Which data to use?
Formation of regions is difficult because the at-site frequency distribution of the quantity of interest, Q, is not observed directly. The available data for region formation are quantities calculated from the at-site measurements of Q, which we call at-site statistics, and other site descriptors that we call site characteristics. In environmental applications the site characteristics would typically include the geographical location of the site, its elevation, and other physical properties associated with the site. Other site characteristics may be based on estimates rather than direct measurements, but are sufficiently accurate to be treated as though they were deterministic quantities.
Remote sensing methods, primarily from satellites, can contribute to the study of the sea-surface microlayer in several ways. An overview is given of the ocean parameters which remote sensing techniques can measure, and the spatial and temporal sampling capabilities of sensors which are useful for microlayer studies are described and explained. Infrared sensors measure the temperature of the sea-surface thermal skin, but to define the difference between radiometric and in-water measurements of temperature requires an understanding of the physical processes which control the thermal structure of the surface microlayer. The skin temperature deviation has an effect on the interpretation of global datasets of sea-surface temperature. Satellite images are also able to identify circumstances where local meteorological conditions introduce heterogeneity into the surface microlayer temperature. Imaging radars are very effective in measuring sea-surface roughness and detecting its spatial variability. The effects, on the radar backscatter cross section, of surface wind fields, waves, swell, slicks and other dynamical features enable radars to image a variety of physical processes which affect the surface microlayer. By combining data from more than one remote sensing technique it is proposed that air–sea fluxes may be estimated, and also that surface microlayer processes can be related to the wider spatial context of mesoscale variability.
The last 15 years has seen a considerable increase in our understanding of the chemical composition of the sea-surface microlayer. However, many new developments in methods of chemical analysis have yet to be applied systematically to the microlayer. The development of continuous microlayer samplers coupled to UV absorbance or fluorescence detectors now allows much greater temporal and spatial resolution to be achieved in field measurements, and will have great application with the development of new chemical sensing technology. These techniques, and a greater range of studied environments, indicate that microlayer enrichment of the major classes of organic compounds (protein, carbohydrate) or of organic parameters such as dissolved organic carbon, is less than earlier studies might have indicated. Enrichment factors larger than 1.5 are relatively rare, and depletion is often observed. Improvements in the analysis of specific organic compounds, including better sample blanks, sensitivity and improved identification of individual compounds, have all been seen. Recently-reported concentrations of PCB and organochlorine insecticides in low latitude regions are extremely low, although microlayer samples enriched in these components are found at high latitudes. Organotin species have also been reported in microlayers. For trace elements passively associated with surface-layer organic material such as trace metals, a confusing picture exists.
Laser spectroscopy and laser probes now provide previously unobtainable information on the physico-chemical structure and processes of the ocean microlayer. Methods range from laboratory techniques to in situ ocean microlayer probes.
In the laboratory, laser-induced fluorescence from water-soluble dye molecules is used to track aqueous layer interfacial movement. With these techniques, researchers infer turbulence in the upper several millimetres of a water surface. It is also possible to measure timescales of interfacial layer concentration fluctuations and interfacial layer penetration depths thus providing estimates of gas transfer velocities. Laser induced fluorescence methods are presently limited to laboratory studies.
Two-dimensional scanning laser slope gauges provide in situ measurement of ocean slope. Ocean slope is measured from the refraction of a vertical laser beam upon passing from the ocean into the air. By rapid scanning of the laser beam through a geometric pattern on the ocean surface, the researcher determines ocean wave slope at a variety of surface positions. This measurement is performed on a timescale during which the ocean surface is essentially frozen in time. From the set of slopes, an estimate is obtained of the two-dimensional capillary–gravity wave spectrum for a given instant of time and a given region of ocean surface.
Nonlinear spectroscopic processes such as reflected second harmonic generation and reflected sum frequency generation provide non-intrusive in situ spectroscopic probes of the ocean surface.