To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The seminal works are those of Gittins (1979, 1989), which presented the key insight and unlocked the multi-armed bandit problem, and Klimov (1974, 1978), which developed a frontal attack on the tax problem. Gittins had in fact perceived the essential step to solution by 1968, and described this at the European Meeting of Statisticians in 1972; see Gittins and Jones (1974). By this time Olivier (1972) had independently hit upon much the same approach. A direct dynamic programming proof of the optimality of the Gittins index policy was given in Whittle (1980), and then generalised to the cases of an open process and of the tax problem in Whittle (1981b, 2005).
For present purposes, we shall not follow the historical sequence of steps that suggested the Gittins index, but shall simply give the dynamic programming arguments that confirm optimality of the index policy and evaluate its performance. We consider only the undiscounted case, which shows particular simplifications and covers the needs of the text.
As mentioned in the text, there are at least two levels of aspiration to optimality in the undiscounted case. One is simply to minimise the average cost, and the second is to minimise also the transient cost: the extra cost incurred in passage from a given initial state to the equilibrium regime. As it turns out, the index policy is optimal at both levels.
Bandit processes
Consider a Markov decision process with the same dynamics over states as that supposed for items over nodes in Chapter 14.
Not only in research, but also in the everyday world of politics and economics, we would all be better off if more people realized that simple nonlinear systems do not necessarily possess simple dynamical properties.
Robert M. May
There is nothing more to say – except why. But since why is difficult to handle, one must take refuge in how.
Toni Morrison
Introduction
There is a rich literature on discrete time models in many disciplines – including economics – in which dynamic processes are described formally by first-order difference equations (see (2.1)). Studies of dynamic properties of such equations usually involve an appropriate definition of a steady state (viewed as a dynamic equilibrium) and conditions that guarantee its existence and local or global stability. Also of importance, particularly in economics following the lead of Samuelson (1947), have been the problems of comparative statics and dynamics: a systematic analysis of how the steady states or trajectories respond to changes in some parameter that affects the law of motion. While the dynamic properties of linear systems (see (4.1)) have long been well understood, relatively recent studies have emphasized that “the very simplest” nonlinear difference equations can exhibit “a wide spectrum of qualitative behavior,” from stable steady states, “through cascades of stable cycles, to a regime in which the behavior (although fully deterministic) is in many respects chaotic or indistinguishable from the sample functions of a random process” (May 1976, p. 459).
One way which I believe is particularly fruitful and promising is to study what would become of the solution of a deterministic dynamic system if it were exposed to a stream of erratic shocks that constantly upsets the evolution.
Ragnar Frisch
Introduction
A random dynamical system is described by a triplet (S, Γ, Q) where S is the state space, Γ an appropriate family of maps from S into itself (interpreted as the set of all admissible laws of motion), and Q is a probability distribution on (some sigmafield of) Γ. The evolution of the system is depicted informally as follows: initially, the system is in some state x in S; an element α1 of Γ is chosen by Tyche according to the distribution Q, and the system moves to the state X1 = α1(x) in period 1. Again, independently of α1, Tyche chooses α2 from Γ according to the same Q, and the state of the system in period 2 is obtained as X2 = α2(X1), and the story is repeated. The initial state x can also be a random variable X0 chosen independently of the maps αn. The sequence Xn so generated is a Markov process. It is an interesting and significant mathematical result that every Markov process on a standard state space may be represented as a random dynamical system. Apart from this formal proposition (see Complements and Details, Proposition C1.1), many important Markov models in applications arise, and are effectively analyzed, as random dynamical systems.
The scope of this book is limited to the study of discrete time dynamic processes evolving over an infinite horizon. Its primary focus is on models with a one-period lag: “tomorrow” is determined by “today” through an exogenously given rule that is itself stationary or time-independent. A finite lag of arbitrary length may sometimes be incorporated in this scheme. In the deterministic case, the models belong to the broad mathematical class, known as dynamical systems, discussed in Chapter 1, with particular emphasis on those arising in economics. In the presence of random perturbations, the processes are random dynamical systems whose long-term stability is our main quest. These occupy a central place in the theory of discrete time stochastic processes.
Aside from the appearance of many examples from economics, there is a significant distinction between the presentation in this book and that found in standard texts on Markov processes. Following the exposition in Chapter 2 of the basic theory of irreducible processes, especially Markov chains, much of Chapters 3–5 deals with the problem of stability of random dynamical systems which may not, in general, be irreducible. The latter models arise, for example, if the random perturbation is limited to a finite or countable number of choices. Quite a bit of this theory is of relatively recent origin and appears especially relevant to economics because of underlying structures of monotonicity or contraction. But it is useful in other contexts as well.
In view of our restriction to discrete time frameworks, we have not touched upon powerful techniques involving deterministic and stochastic differential equations or calculus of variations that have led to significant advances in many disciplines, including economics and finance.
The basic need for a special theory to explain behavior under conditions of uncertainty arises from two considerations: (1) subjective feelings of imperfect knowledge when certain types of choices, typically commitments over time, are made; (2) the existence of certain observed phenomena, of which insurance is the most conspicuous example, which cannot be explained on the assumption that individuals act with subjective certainty.
Kenneth J. Arrow
Introduction
In this chapter we briefly review some results on discounted dynamic programming under uncertainty, and indicate how Markov processes and random dynamical systems are generated by optimal policies. In Section 6.2, following a precise description of the dynamic programming framework, we turn to the special case where the set S of states is countable, and the set A of actions is finite. Here the link between optimal policies and the celebrated functional equation can be established with no measure theoretic complications. In Section 6.3 we study the maximum theorem, which is of independent interest in optimization theory and is a key to the understanding of the basic result in the next section. In Section 6.4 we explore the more general model where S is a Borel subset of a Polish space, and A is a compact action space, and spell out the conditions under which there is a stationary optimal policy.
The dynamic programming technique reviewed here has been particularly powerful in attacking a variety of problems in intertemporal economics. In Section 6.5 we discuss in detail the aggregative model of optimal economic growth under uncertainty.
‘I checked it very thoroughly,’ said the computer, ‘and that quite definitely is the answer. I think the problem, to be quite honest with you, is that you've never actually known what the question is.’
Douglas Adams, The Hitchhiker's Guide to the Galaxy
Entities should not be multiplied unnecessarily.
Occam's Razor
Introduction
Engineers continually face the necessity to analyse data and records from the past. These might consist of measurements of material strength, flood records, electronic signals, wave heights in a certain location, or numbers per unit area of icebergs. The intention is to use these records of the past to assess our uncertainty regarding future events. The overall aim is, as always, to make decisions for design at an acceptable level of risk. Apart from good analysis, data provide the most significant means of decreasing uncertainty and improving our probabilistic estimates. Data are obtained from experiments. These always represent an available strategy, although money and effort are required to carry them out. It is the author's experience that in cases where there is considerable uncertainty regarding a physical process, one real experiment is worth a thousand theories.
In the use of probabilistic models two aspects present themselves: the choice of the model itself, and the estimation of model parameters. We showed in Chapter 8 how to use data to estimate parameters of known distributions. The framework described in Section 8.1.1 involved the knowledge of a distribution or process. In the present chapter, we describe the use of data to fit and compare distributions.