To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Continuous-time stochastic processes arise in many applications in economics, but perhaps nowhere do they play as large a role as in finance. Following the pathbreaking work of Merton (1969, 1973) and Black and Scholes (1973), the use of continuous-time stochastic processes has become a common feature of many applications, especially asset pricing models. Even a casual comparison of the textbooks of the seventies (e.g., Fama and Miller (1972), Fama (1976)) with the current crop (e.g., Ingersoll (1987), Duffie (1988)) serves to demonstrate the remarkable speed with which the tools of stochastic process theory have been assimilated into mainstream finance. This survey will look at the specification and estimation of continuous-time stochastic processes. Although much of the discussion is relevant for other applications, I have chosen to write it from the perspective of someone interested in evaluating the empirical content of current continuous-time asset pricing models and in contributing to their future development.
It is interesting to speculate on the reasons for the widespread adoption of continuous-time models in asset pricing. Although many come to mind, I would argue that they have been widely adopted not because of their empirical properties but in spite of them. The explosion and sophistication of theoretical research simply has not been matched by empirical work. Continuous-time asset pricing models typically involve restrictions linking the parameters of the price process to those of some underlying ‘forcing’ variables. In general equilibrium models, the forcing variables may be taste and technology. In option pricing models, they may be the term structure and/or the price of the underlying security. Tests of these models are invariably joint tests of ‘nuisance’ assumptions, including the specification of the forcing variable process.
This is the first in a two-part survey of recent developments in the rapidly growing literature on methods for solving and estimating dynamic structural models. Part I focusses on discrete decision processes (DDP), i.e., problems where the decision variable is restricted to a countable set. Most of this part deals with single-agent problems, though I do conclude by sketching a framework for inference of discrete dynamic games. Part II, by Ariel Pakes, considers extensions to mixed continuous discrete decision processes (MCDP), i.e., processes where some of the components of the decision variable can take on a continuum of values. Part II provides more extensive coverage of recent developments in estimation of dynamic multiagent models. We have tried to make the parts relatively self contained, so that readers can feel free to skip to the sections that interest them. Both parts of the survey are restricted to problems formulated in discrete time, and, as a result, the words 'discrete' and 'continuous' will refer to the control rather than the time variable.
I presume that readers are familiar with recent surveys of discrete decision processes by Eckstein and Wolpin (1989a) and Rust (1988a and 1994), and focus on issues that are not covered in these surveys. In keeping with the title, section 2 begins with some general comments on (my view of) the problems with the current literature on estimation of dynamic structural models. In section 3 I define the subclass of DDPs and in section 4 I discuss the identification problem. The identification problem can lead to potentially serious questions about the credibility of a structural approach to policy analysis.
This book covers topics in advanced econometrics that I have taught in graduate econometrics programs of the University of California at San Diego, Southern Methodist University, Dallas, the Netherlands Network of Quantitative Economics, Tinbergen Institute, and the Free University, Amsterdam. The selection of the topics is based on my personal interest in the subjects, as well as lack of availability of suitable textbooks in these areas.
Rather than providing an encyclopedic survey of the literature, I have chosen a presentation which fills the gap between intermediate statistics and econometrics (including linear time series analysis) and the level necessary to gain access to the recent econometric literature; in particular, the literature on nonlinear and nonparametric regression, and advanced time series analysis. The ultimate goal is to provide the student with tools for independent research in these areas. This book is particularly suitable for self-tuition, and may prove useful in a graduate course in mathematical statistics and advanced econometrics.
The first four chapters contain enough material to fill a half-semester graduate course in asymptotic theory and nonlinear inference if one skips some of the material involved, and a full semester course if not. In teaching such a half-semester course I usually skip the details of the proofs in chapter 2, and focus on the relations between the various modes of convergence only. Also, I usually skip the sections of chapter 2 and chapter 4 dealing with non-identically distributed samples, and only sketch the proof of the uniform law of large numbers for the i.i.d. case (theorem 2.5.7).
If a time series is modeled as an ARMA(p,q) process while the true datagenerating process is an ARIMA(p– 1,1,q) process, strange things may happen with the asymptotic distributions of parameter estimators. For example, if a time series process Yt is modeled as Yt = αYt−1 + Ut, with Ut Gaussian white noise and α assumed to be in the stable region (-1,1), while in reality the process is a random walk, i.e., ΔYt = Ut, then the OLS estimator αn of α (on the basis of a sample of size n) is n-consistent rather than √n-consistent, and the asymptotic distribution of n(αn-α) is non-normal. Therefore, in testing the hypothesis α = 1 standard asymptotic theory is no longer valid. See Fuller (1976), Dickey and Fuller (1979, 1981), Evans and Savin (1981, 1984), Said and Dickey (1984), Dickey, Hasza, and Fuller (1984), Phillips (1987), Phillips and Perron (1988), Hylleberg and Mizon (1989), and Haldrup and Hylleberg (1989), among others, for various unit root tests (all based on testing α = 1 in an AR model) and Schwert (1989) for a Monte Carlo analysis of the power of some of these tests. Moreover, see Diebold and Nerlove (1990) for a review of the unit root literature, and see Bierens (1993) and Bierens and Guo (1993) for alternative tests of the unit root hypothesis.
In this chapter we shall review and explain the most common unit root tests.
The asymptotic theory of nonlinear regression models, in particular consistency results, heavily depends on uniform laws of large numbers. Understanding these laws requires knowledge of abstract probability theory. In this chapter we shall review the basic elements of this theory as needed in what follows, to make this book almost self-contained. For a more detailed treatment, see for example Billingsley (1979) and Parthasarathy (1977). However, we do assume the reader has a good knowledge of probability and statistics at an intermediate level, for example on the level of Hogg and Craig (1978). The material in this chapter is a revision and extension of section 2.1 in Bierens (1981).
Measure-theoretical foundation of probability theory
The basic concept of probability theory is the probability space. This is a triple {Ω,ℑ,P} consisting of:
— An abstract non-empty set Ω, called the sample space. We do not impose any conditions on this set.
— A non-empty collection ℑ of subsets of Ω, having the following two properties:
where Ec denotes the complement of the subset E with respect to Ω: Ec = Ω\E.
These two properties make ℑ, by definition, a Borel field of subsets of Ω. (Following Chung [1974], the term “Borel field” has the same meaning as the term “σ-Algebra” used by other authors.)
— A probability measure P on {Ω,ℑ}. This is a real-valued set function on ℑ such that: Example: Toss a fair coin.
In the literature on model specification testing two trends can be distinguished. One trend consists of tests using one or more well-specified non-nested alternative specifications. See Cox (1961, 1962), Atkinson (1969, 1970), Quandt (1974), Pereira (1977, 1978), Pesaran and Deaton (1978), Davidson and MacKinnon (1981), among others. The other trend consists of tests of the orthogonality condition, i.e. the condition that the conditional expectation of the error relative to the regressors equals zero a.s., without employing a well-specified alternative. Notable work on this problem has been done by Ramsey (1969, 1970), Hausman (1978), White (1981), Holly (1982), Bierens (1982, 1991a), Newey (1985), and Tauchen (1985), among others.
A pair of models is called non-nested if it is not possible to construct one model out of the other by fixing some parameters. The non-nested models considered in the literature usually have different vectors of regressors, for testing non-nested models with common regressors makes no sense. In the latter case one may simply choose the model with the minimum estimated error variance, and this choice will be consistent in the sense that the probability that we pick the wrong model converges to zero. A serious point overlooked by virtually all authors is that nonnested models with different sets of regressors may all be correct. This is obvious if the dependent variable and all the regressors involved are jointly normally distributed and the non-nested models are all linear, for conditional expectations on the basis of jointly normally distributed random variables are always linear functions of the conditioning variables.
Time series models usually aim to represent, directly or indirectly, the conditional expectation of a time series variable relative to the entire past of the time series process involved. The reason is that this conditional expectation is the best forecasting scheme; best in the sense that it yields forecasts with minimal mean square forecast error. The concept of a conditional expectation relative to a one-sided infinite sequence of “past” variables cannot be made clear on the basis of the elementary notion of conditional expectation known from intermediate mathematical-statistical textbooks. Even the more general approach in chapter 3 is not suitable. What we need here is the concept of a conditional expectation relative to a Borel field. We shall discuss this concept and its consequences (in particular martingale theory) in section 6.1. In section 6.2 we consider various measures of dependence as some, though rather weak, conditions have to be imposed on the dependence of a time series process to prove weak (uniform) laws of large numbers. These weak laws are the topics of sections 6.3 and 6.4.
Throughout we assume that the reader is familiar with the basic elements of linear time series analysis, say on the level of Harvey's (1981) textbook.
Conditional expectations relative to a Borel field
Definition and basic properties
In section 3.1 we have defined the conditional expectation of a random variable Y relative ∣ to a random vector X ε Rk as a Borel measurable real function g on Rk such that
E[Y–g(X)]ψ(X) = 0
for all bounded Borel measurable real functions ψ on Rk.
In this chapter we consider various modes of convergence, i.e., weak and strong convergence of random variables, weak and strong laws of large numbers, convergence in distribution and central limit theorems, weak and strong uniform convergence of random functions, and uniform weak and strong laws. The material in this chapter is a revision and extension of sections 2.2–2.4 in Bierens (1981).
Weak and strong convergence of random variables
In this section we shall deal with the concepts of convergence in probability and almost sure convergence, and various laws of large numbers. Throughout we assume that the random variables involved are defined on a common probability space {Ω,ℑ,P}. The first concept is well known:
Definition 2.1.1 Let (Xn) be a sequence of r.v.'s. We say that Xn converges in probability to a r.v. X if for every ε > 0, limn→∞P(∣Xn – X∣ <; ε) = 1, and we write: Xn → X in pr. or plimn→∞Xn = X.
However, almost sure convergence is a much stronger convergence concept:
Definition 2.1.2 Let (Xn) be a sequence of r.v.'s. We say that Xn converges almost surely (a.s.) to a r.v. X if there is a null set N ε ℑ (that is a set in ℑ satisfying P(N) = 0) such that for every ω ε Ω\N, limn→∞xn(ω) = x(ω), and we write: Xn → X a.s. or limn→∞Xn = X a.s.
This chapter reviews the asymptotic properties of the Nadaraya-Watson type kernel estimator of an unknown (multivariate) regression function. Conditions are set forth for pointwise weak and strong consistency, asymptotic normality, and uniform consistency. These conditions cover the standard i.i.d. case with continuously distributed regressors, as well as the cases where the distribution of all, or some, regressors is discrete. Moreover, attention is paid to the problem of how the kernel and the window width should be specified. This chapter is a modified and extended version of Bierens (1987b). For further reading and references, see the monographs by Eubank (1988), Hardle (1990), and Rosenblatt (1991), and for an empirical application, see Bierens and Pott-Buter (1990).
Introduction
The usual practice in constructing regression models is to specify a parametric family for the response function. Obviously the most popular parametric family is the linear model. However, one could consider this as choosing a parametric functional form from a continuum of possible functional forms, analogously to sampling from a continuous distribution, for often the set of theoretically admissible functional forms is uncountably large. Therefore the probability that we pick the true functional form in this way is zero, or at least very close to zero.
The only way to avoid model misspecification is to specify no functional form at all. But then the problem arises how information about the functional form of the model can be derived from the data. A possible solution to this problem is to use so-called kernel estimators of regression functions.