To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
If a stationary process has a purely continuous spectrum, it is natural to estimate its spectral density function (sdf) since this function is easier to interpret than the integrated spectrum. Estimation of the sdf has occupied our attention in Chapters 6, 7 and 9. However, if we are given a sample of a time series drawn from a process with a purely discrete spectrum (i.e., a ‘line’ spectrum for which the integrated spectrum is a step function), our estimation problem is quite different: we must estimate the location and magnitude of the jumps in the integrated spectrum. This requires estimation techniques that differ – to some degree at least – from what we have already studied. It is more common, however, to come across processes whose spectra are a mixture of lines and an sdf stemming from a so-called ‘background continuum.’ In Section 4.4 we distinguished two cases. If the sdf for the continuum is that of white noise, we said that the process has a discrete spectrum – as opposed to a purely discrete spectrum, which has only a line component; on the other hand, if the sdf for the continuum differs from that of white noise (sometimes called ‘colored’ noise), we said that the process has a mixed spectrum (see Figures 142 and 143).
In this chapter we shall use some standard concepts from tidal analysis to motivate and illustrate these models. We shall begin with a discrete parameter harmonic process that has a purely discrete spectrum.
In Chapter 6 we introduced the important concept of tapering a time series as a way of obtaining a spectral estimator with acceptable bias properties. While tapering does reduce bias due to leakage, there is a price to pay in that the sample size is effectively reduced. When we also smooth across frequencies, this reduction translates into a loss of information in the form of an increase in variance (recall the Ch factor in Equation (248b) and Table 248). This inflated variance is acceptable in some practical applications, but in other cases it is not. The loss of information inherent in tapering can often be avoided either by prewhitening (see Sections 6.5 and 9.10) or by using Welch's overlapped segment averaging (WOSA – see Section 6.17).
In this chapter we discuss another approach to recovering information lost due to tapering. This approach was introduced in a seminal paper by Thomson (1982) and involves the use of multiple orthogonal tapers. As we shall see, multitaper spectral estimation has a number of interesting points in its favor:
In contrast to either prewhitening which typically requires the careful design of a prewhitening filter or the conventional use of WOSA (i.e., a Hanning data taper with 50% overlap of blocks) which can still suffer from leakage for spectra with very high dynamic ranges, the multitaper scheme can be used in a fairly ‘automatic’ fashion. Hence it is useful in situations where thousands – or millions – of individual time series must be processed so that the pure volume of data precludes a careful analysis of individual series (this occurs routinely in exploration geophysics).
In the previous chapter we produced representations for various deterministic functions and sequences in terms of linear combinations of sinusoids with different frequencies (for mathematical convenience we actually used complex exponentials instead of sinusoids directly). These representations allow us to easily define various energy and power spectra and to attach a physical meaning to them. For example, subject to square integrability conditions, we found that periodic functions are representable (in the mean square sense) by sums of sinusoids over a discrete set of frequency components, while nonperiodic functions are representable (also in the mean square sense) by an integral of sinusoids over a continuous range of frequencies. For periodic functions, the energy from ∞ to ∞ is infinite, so we can define their spectral properties in terms of distributions of power over a discrete set of frequencies. For nonperiodic functions, the energy from –∞ to ∞ is finite, so we can define their properties in terms of an energy distribution over a continuous range of frequencies.
We now want to find some way of representing a stationary process in terms of a ‘sum’ of sinusoids so that we can meaningfully define an appropriate spectrum for it; i.e., we want to be able to directly relate our representation for a stationary process to its spectrum in much the same way we did for deterministic functions. Now a stationary process has associated with it an ensemble of realizations that describe the possible outcomes of a random experiment.
This chapter provides a quick introduction to the subject of spectral analysis. Except for some later references to the exercises of Section 1.6, this material is independent of the rest of the book and can be skipped without loss of continuity. Our intent is to use some simple examples to motivate the key ideas. Since our purpose is to view the forest before we get lost in the trees, the particular analysis techniques we use here have been chosen for their simplicity rather than their appropriateness.
Some Aspects of Time Series Analysis
Spectral analysis is part of time series analysis, so the natural place to start our discussion is with the notion of a time series. The quip (attributed to R. A. Fisher) that a time series is ‘one damned thing after another’ is not far from the truth: loosely speaking, a time series is a set of observations made sequentially in time. Examples abound in the real world, and Figures 2 and 3 show plots of small portions of four actual time series:
the speed of the wind in a certain direction at a certain location, measured every 0.025 second;
the monthly average measurements related to the flow of water in the Willamette River at Salem, Oregon;
the daily record of a quantity (to be precise, the change in average daily frequency) that tells how well an atomic clock keeps time on a day to day basis (a constant value of 0 would indicate that the clock agreed perfectly with a time scale maintained by the U. S. Naval Observatory); and
Spectral analysis almost invariably deals with a class of models called stationary stochastic processes. The material in this chapter is a brief review of the theory behind such processes. The reader is referred to Chapter 3 of Priestley (1981), Chapter 10 of Papoulis (1991) or Chapter 1 of Yaglom (1987) for complementary discussions.
Stochastic Processes
Consider the following experiment (see Figure 31): we hook up a resistor to an oscilloscope in such a way that we can examine the voltage variations across the resistor as a function of time. Every time we press a ‘reset’ button on the oscilloscope, it displays the voltage variations for the 1 second interval following the ‘reset.’ Since the voltage variations are presumably caused by such factors as small temperature variations in the resistor, each time we press the ‘reset’ button, we will observe a different display on the oscilloscope. Owing to the complexity of the factors that influence the display, there is no way that we can use the laws of physics to predict what will appear on the oscilloscope. However, if we repeat this experiment over and over, we soon see that, although we view a different display each time we press the ‘reset’ button, the displays resemble each other: there is a characteristic ‘bumpiness’ shared by all the displays.
We can model this experiment by considering a large bowl in which we have placed pictures of all the oscilloscope displays that we could possibly observe. Pushing the ‘reset’ button corresponds to reaching into the bowl and choosing ‘at random’ one of the pictures.
Chapter 3 has been devoted to the modeling of the dynamics of neurons. The standard model we arrived at contains the main features which have been revealed by neuroelectrophysiology: the model considers neural nets as networks of probabilistic threshold binary automata. Real neural networks, however, are not mere automata networks. They display specific functions and the problem is to decide whether the standard model is able to show the same capabilities.
Memory is considered as one of the most prominent properties of real neural nets. Current experience shows that imprecise, truncated information is often sufficient to trigger the retrieval of full patterns. We correct misspelled names, we associate images or flavors with sounds and so on. It turns out that the formal nets display these memory properties if the synaptic efficacies are determined by the laws of classical conditioning which have been described in section 2.4. The synthesis in a single framework of observations of neurophysiology with observations of experimental psychology, to account for an emergent property of neuronal systems, is an achievement of the theory of neural networks.
The central idea behind the notion of conditioning is that of associativity. It has given rise to many theoretical developments, in particular to the building of simple models of associative memory which are called Hebbian models. The analysis of Hebbian models has been pushed rather far and a number of analytical results relating to Hebbian networks are gathered in this chapter. More refined models are treated in following chapters.
Something essential is missing in the description of memory we have introduced in previous chapters. A neural network, even isolated, is a continuously evolving system which never settles indefinitely in a steady state. We are able to retrieve not only single patterns but also ordered strings of patterns. For example, a few notes are enough for an entire song to be recalled, or, after training, one is able to go through the complete set of movements which are necessary for serving in tennis. Several schemes have been proposed to account for the production of memorized strings of patterns. Simulations show that they perform well, but this does not mean anything as regards the biological relevance of the mechanisms they involve. In actual fact no observation supporting one or the other of the schemes has been reported so far.
Parallel dynamics
Up to now the dynamics has been built so as to make the memorized patterns the fixed points of the dynamics. Once the network settles in one pattern it stays there indefinitely, at least for low noise levels. We have seen that fixed points are the asymptotic behaviors of rather special neural networks, namely those which are symmetrically connected. In asymmetrically connected neural networks whose dynamics is deterministic and parallel (the Little dynamics at zero noise level), the existence of limit cycles is the rule. It is then tempting to imagine that the retrieval of temporal sequences of patterns occurs through limit cycles.
The architectures of the neural networks we considered in Chapter 7 are made exclusively of visible units. During the learning stage, the states of all neurons are entirely determined by the set of patterns to be memorized. They are so to speak pinned and the relaxation dynamics plays no role in the evolution of synaptic efficacies. How to deal with more general systems is not a simple problem. Endowing a neural network with hidden units amounts to adding many degrees of freedom to the system, which leaves room for ‘internal representations’ of the outside world. The building of learning algorithms that make general neural networks able to set up efficient internal representations is a challenge which has not yet been fully satisfactorily taken up. Pragmatic approaches have been made, however, mainly using the so-called back-propagation algorithm. We owe the current excitement about neural networks to the surprising successes that have been obtained so far by calling upon that technique: in some cases the neural networks seem to extract the unexpressed rules that are hidden in sets of raw data. But for the moment we really understand neither the reasons for this success nor those for the (generally unpublished) failures.
The back-propagation algorithm
A direct derivation
To solve the credit assignment problem is to devise means of building relevant internal representations; that is to say, to decide which state Iµ, hid of hidden units is to be associated with a given pattern Iµ, vis of visible units.