To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Now that the basic notions of filtration, process, and stopping time are at our disposal, it is time to develop the stochastic integral ∫ X dZ, as per Itô's ideas explained on page 5. We shall call X the integrand and Z the integrator. Both are now processes.
For a guide let us review the construction of the ordinary Lebesgue–Stieltjes integral ∫ x dz on the half-line; the stochastic integral ∫ X dZ that we are aiming for is but a straightforward generalization of it. The Lebesgue–Stieltjes integral is constructed in two steps. First, it is defined on step functions themselves, restrictions must be placed on the integrator: z must be right-continuous and must have finite variation. This chapter discusses the stochastic analog of these restrictions, identifying the processes that have a chance of being useful stochastic integrators.
Given that a distribution function z on the line is right-continuous and has finite variation, the second step is one of a variety of procedures that extend the integral from step functions to a much larger class of integrands. The most efficient extension procedure is that of Daniell; it is also the only one that has a straightforward generalization to the stochastic case. This is discussed in chapter 3.
This book originated with several courses given at the University of Texas. The audience consisted of graduate students of mathematics, physics, electrical engineering, and finance. Most had met some stochastic analysis during work in their field; the course was meant to provide the mathematical underpinning. To satisfy the economists, driving processes other than Wiener process had to be treated; to give the mathematicians a chance to connect with the literature and discrete-time martingales, I chose to include driving terms with jumps. This plus a predilection for generality for simplicity's sake led directly to the most general stochastic Lebesgue–Stieltjes integral.
The spirit of the exposition is as follows: just as having finite variation and being right-continuous identifies the useful Lebesgue–Stieltjes distribution functions among all functions on the line, are there criteria for processes to be useful as “random distribution functions.” They turn out to be straight-forward generalizations of those on the line. A process that meets these criteria is called an integrator, and its integration theory is just as easy as that of a deterministic distribution function on the line - provided Daniell's method is used. (This proviso has to do with the lack of convexity in some of the target spaces of the stochastic integral.)
For the purpose of error estimates in approximations both to the stochastic integral and to solutions of stochastic differential equations we define various numerical sizes of an integrator Z and analyze rather carefully how they propagate through many operations done on and with Z, for instance, solving a stochastic differential equation driven by Z.
SECTION 1 offers some reasons for why anyone who uses probability should know about the measure theoretic approach.
SECTION 2 describes some of the added complications, and some of the compensating benefits that come with the rigorous treatment of probabilities as measures.
SECTION 3 argues that there are advantages in approaching the study of probability theory via expectations, interpreted as linear functionals, as the basic concept.
SECTION 4 describes the de Finetti convention of identifying a set with its indicator function, and of using the same symbol for a probability measure and its corresponding expectation.
SECTION *5 presents a fair-price interpretation of probability, which emphasizes the linearity properties of expectations. The interpretation is sometimes a useful guide to intuition.
Why bother with measure theory?
Following the appearance of the little book by Kolmogorov (1933), which set forth a measure theoretic foundation for probability theory, it has been widely accepted that probabilities should be studied as special sorts of measures. (More or less true—see the Notes to the Chapter.) Anyone who wants to understand modern probability theory will have to learn something about measures and integrals, but it takes surprisingly little to get started.
For a rigorous treatment of probability, the measure theoretic approach is a vast improvement over the arguments usually presented in undergraduate courses. Let me remind you of some difficulties with the typical introduction to probability.
Independence
There are various elementary definitions of independence for random variables.
SECTION 3 defines the integral with respect to a measure as a linear functional on a cone of measurable functions. The definition sidesteps the details of the construction of integrals from measures.
SECTION *4 constructs integrals of nonnegative measurable functions with respect to a countably additive measure.
SECTION 5 establishes the Dominated Convergence theorem, the Swiss Army knife of measure theoretic probability.
SECTION 6 collects together a number of simple facts related to sets of measure zero.
SECTION *7 presents a few facts about spaces of functions with integrable pth powers, with emphasis on the case p=2, which defines a Hilbert space.
SECTION 8 defines uniform integrability, a condition slightly weaker than domination. Convergence in £1 is characterized as convergence in probability plus uniform integrability.
SECTION 9 defines the image measure, which includes the concept of the distribution of a random variable as a special case.
SECTION 10 explains how generating class arguments, for classes of sets, make measure theory easy.
SECTION *11 extends generating class arguments to classes of functions.
Measures and sigma-fields
As promised in Chapter 1, we begin with measures as set functions, then work quickly towards the interpretation of integrals as linear functionals. Once we are past the purely set-theoretic preliminaries, I will start using the de Finetti notation (Section 1.4) in earnest, writing the same symbol for a set and its indicator function.
SECTION 1 collects together some facts about stochastic processes and the normal distribution, for easier reference.
SECTION 2 defines Brownian motion as a Gaussian process indexed by a subinterval T of the real line. Existence of Brownian motions with and without continuous sample paths is discussed. Wiener measure is defined.
SECTION 3 constructs a Brownian motion with continuous sample paths, using an orthogonal series expansion of square integrable functions.
SECTION *4 describes some of the finer properties—lack of differentiability, and a modulus of continuity—for Brownian motion sample paths.
SECTION 5 establishes the strong Markov property for Brownian motion. Roughly speaking, the process starts afresh as a new Brownian motion after stopping times.
SECTION *6 describes a family of martingales that can be built from a Brownian motion, then establishes Lévy's martingale characterization of Brownian motion with continuous sample paths.
SECTION *7 shows how square integrable functions of the whole Brownian motion path can be represented as limits of weighted sums of increments. The result is a thinly disguised version of a remarkable property of the isometric stochastic integral, which is mentioned briefly.
SECTION *8 explains how the result from Section 7 is the key to the determination of option prices in a popular model for changes in stock prices.
Prerequisites
Broadly speaking, Brownian motion is to stochastic process theory as the normal distribution is to the theory for real random variables.
SECTION 1 presents a few of the basic properties of Fourier transforms that make them such a valuable tool of probability theory.
SECTION 2 exploits a mysterious coincidence, involving the Fourier transform and the density function of the normal distribution, to establish inversion formulas for recovering distributions from Fourier transforms.
SECTION *3 explains why the coincidence from Section 2 is not really so mysterious.
SECTION 4 shows that the inversion formula from Section 2 has a continuity property, which explains why pointwise convergence of Fourier transforms implies convergence in distribution.
SECTION *5 establishes a central limit theorem for triangular arrays of martingale differences.
SECTION 6 extends the theory to multivariate distributions, pointing out how the calculations reduce to one-dimensional analogs for linear combinations of coordinate variables—the Cramér and Wold device.
SECTION *7 provides a direct proof (no Fourier theory) of the fact that the family of (one-dimensional) distributions for all linear combinations of a random vector uniquely determines its multivariate distribution.
SECTION *8 illustrates the use of complex-variable methods to prove a remarkable property of the normal distribution—the Lévy-Cramér theorem.
Definitions and basic properties
Some probabilistic calculations simplify when reexpressed in terms of suitable transformations, such as the probability generating function (especially for random variables taking only positive integer values), the Laplace transform (especially for random variables taking only nonnegative values), or the moment generating function (for random variables with rapidly decreasing tail probabilities).
This book began life as a set of handwritten notes, distributed to students in my one-semester graduate course on probability theory, a course that had humble aims: to help the students understand results such as the strong law of large numbers, the central limit theorem, conditioning, and some martingale theory. Along the way they could expect to learn a little measure theory and maybe even a smattering of functional analysis, but not as much as they would learn from a course on Measure Theory or Functional Analysis.
In recent years the audience has consisted mainly of graduate students in statistics and economics, most of whom have not studied measure theory. Most of them have no intention of studying measure theory systematically, or of becoming professional probabilists, but they do want to learn some rigorous probability theory—in one semester.
Faced with the reality of an audience that might have neither the time nor the inclination to devote itself completely to my favorite subject, I sought to compress the essentials into a course as self-contained as I could make it. I tried to pack into the first few weeks of the semester a crash course in measure theory, with supplementary exercises and a whirlwind exposition (Appendix A) for the enthusiasts. I tried to eliminate duplication of mathematical effort if it served no useful role. After many years of chopping and compressing, the material that I most wanted to cover all fit into a one-semester course, divided into 25 lectures, each lasting from 60 to 75 minutes.
SECTION 1 illustrates the usefulness of coupling, by means of three simple examples.
SECTION 2 describes how sequences of random elements of separable metric spaces that converge in distribution can be represented by sequences that converge almost surely.
SECTION *3 establishes Strassen's Theorem, which translates the Prohorov distance between two probability measures into a coupling.
SECTION *4 establishes Yurinskii's coupling for sums of independent random vectors to normally distributed random vectors.
SECTION 5 describes a deceptively simple example (Tusnády's Lemma) of a quantile coupling, between a symmetric Binomial distribution and its corresponding normal approximation.
SECTION 6 uses the Tusnády Lemma to couple the Haar coefficients for the expansions of an empirical process and a generalized Brownian Bridge.
SECTION 7 derives one of most striking results of modern probability theory, the KMT coupling of the uniform empirial process with the Brownian Bridge process.
What is coupling?
A coupling of two probability measures, P and Q, consists of a probability space (Ω, ℱ, ℙ) supporting two random elements X and F, such that X has distribution P and Y has distribution Q. Sometimes interesting relationships between P and Q can be coded in some simple way into the joint distribution for X and Y. Three examples should make the concept clearer.
Example. Let Pα denote the Bin(n,α) distribution. As α gets larger, the distribution should “concentrate on bigger values.” More precisely, for each fixed x, the tail probability Pα[x, n] should be an increasing function of α. A coupling argument will give an easy proof.