To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The likelihood function plays a central role in both statistical theory and practice. Basic results about likelihood inference, which we call first order asymptotics, were developed in fundamental work by R. A. Fisher during the 1920s, and now form an essential and widely taught part of both elementary and advanced courses in statistics. It is less well known that Fisher later proposed a more refined approach, which has been developed over the past three decades into a theory of higher order asymptotics. While this theory leads to some extremely accurate methods for parametric inference, accounts of the theory can appear forbidding, and the results may be thought to have little importance for statistical practice.
The purpose of this book is dispel this view, showing how higher order asymptotics may be applied in realistic examples with very little more effort than is needed for first order procedures, and to compare the resulting improved inferences with those from other approaches. To do this we have collected a range of examples and case studies, provided details on the implementation of higher order approximations, and compared the resulting inference to that based on other methods; usually first order likelihood theory, but where appropriate also methods based on simulation. Our examples are nearly all derived from regression models for discrete or continuous data, but range quite widely over the types of models and inference problems where likelihood methods are applied.
This book is about the statistical analysis of data, and in particular approximations based on the likelihood function. We emphasize procedures that have been developed using the theory of higher order asymptotic analysis and which provide more precise inferences than are provided by standard theory. Our goal is to illustrate their use in a range of applications that are close to many that arise in practice. We generally restrict attention to parametric models, although extensions of the key ideas to semi-parametric and non-parametric models exist in the literature and are briefly mentioned in contexts where they may be appropriate. Most of our examples consist of a set of independent observations, each of which consists of a univariate response and a number of explanatory variables.
Much application of likelihood inference relies on first order asymptotics, by which we mean the application of the central limit theorem to conclude that the statistics of interest are approximately normally distributed, with mean and variance consistently estimable from the data. There has, however, been great progress over the past twenty-five years or so in the theory of likelihood inference, and two main themes have emerged. The first is that very accurate approximations to the distributions of statistics such as the maximum likelihood estimator are relatively easily derived using techniques adapted from the theory of asymptotic expansions. The second is that even in situations where first order asymptotics is to be used, it is often helpful to use procedures suggested by these more accurate approximations, as they provide modifications to naive approaches that result in more precise inferences.
In the examples in later chapters we use parametric models almost exclusively. These models are used to incorporate a key element of statistical thinking: the explicit recognition of uncertainty. In frequentist settings imprecise knowledge about the value of a single parameter is typically expressed through a collection of confidence intervals, or equivalently by computation of the P-values associated with a set of hypotheses. If prior information is available then Bayes' theorem can be employed to perform posterior inference.
In almost every realistic setting, uncertainty is gauged using approximations, the most common of which rely on the application of the central limit theorem to quantities derived from the likelihood function. Not only does likelihood provide a powerful and very general framework for inference, but the resulting statements have many desirable properties.
In this chapter we provide a brief overview of the main approximations for likelihood inference. We present both first order and higher order approximations; first order approximations are derived from limiting distributions, and higher order approximations are derived from further analysis of the limiting process. A minimal amount of theory is given to structure the discussion of the examples in Chapters 3 to 7; more detailed discussion of asymptotic theory is given in Chapter 8.
Scalar parameter
In the simplest situation, observations y1, …, yn are treated as a realization of independent identically distributed random variables Y1, …, Yn whose probability density function f(y; θ) depends on an unknown scalar parameter θ.
A superficially appealing way to implement higher order inference procedures would be to write general routines in a computer algebra system such as Maple or Mathematica, designed so that the user need provide the minimum input specific to his or her problem. One would use these routines to derive symbolic expressions for quantities such as r*, and then evaluate these expressions numerically, concealing the hideous details in the computer. Many models have special structure which this approach does not exploit, however, leading to burdensome derivations of intermediate quantities which then simplify enormously, and symbolic computation packages are generally ill-suited for numerical work on the scale needed for applied statistics. Thus although computer algebra systems can be powerful tools for research in higher order asymptotics, those currently available are unsuitable for passing on the fruits of that research to potential users. It is possible to interface separate packages for symbolic and numerical computation, but this halfway house is system-dependent and demands knowledge of advanced features of the packages.
A more practicable approach recognises that many classes of models can be treated in a modular way, so higher order quantities can be expressed using a few elementary building-blocks. In some cases these must be computed specifically for the problem at hand, but the rudimentary symbolic manipulation facilities of environments for numerical computing such as R can then be exploited. A technique that we call pivot profiling can then be used to obtain higher order quantities for the range of interest, by computing them over a fixed grid of values between which they are interpolated.
In this chapter we give a brief overview of the main theoretical results and approximations used in this book. These approximations are derived from the theory of higher order likelihood asymptotics. We present these fairly concisely, with few details on the derivations. There is a very large literature on theoretical aspects of higher order asymptotics, and the bibliographic notes give guidelines to the references we have found most helpful.
The building blocks for the likelihood approximations are some basic approximation techniques: Edgeworth and saddlepoint approximations to the density and distribution of the sample mean, Laplace approximation to integrals, and some approximations related to the chi-squared distribution. These techniques are summarized in Appendix A, and the reader wishing to have a feeling for the mathematics of the approximations in this chapter may find it helpful to read that first.
We provide background and notation for likelihood, exponential family models and transformation models in Section 8.2 and describe the limiting distributions of the main likelihood statistics in Section 8.3. Approximations to densities, including the very important p* approximation, are described in Section 8.4. Tail area approximations for inference about a scalar parameter are developed in Sections 8.5 and 8.6. These tail area approximations are illustrated in most of the examples in the earlier chapters. Approximations for Bayesian posterior distribution and density functions are described in Section 8.7. Inference for vector parameters, using adjustments to the likelihood ratio statistic, is described in Section 8.8.
In Chapters 4 and 5, we presented a selection of case studies with the goals of emphasizing the application and of illustrating higher order approximations as an adjunct to inference. The selection of our case studies was informed by these twin goals – we used relatively small data sets, and our discussion was sometimes rather remote from the original application.
In this chapter we present more detailed analyses of data collected to address particular scientific problems, with emphasis on the modelling and the conclusions. While we use higher order methods as an adjunct to the analysis, the main focus is on the data analysis rather than the inference methods. These case studies are a subset of examples that have crossed our desks in collaborative work. In Chapter 7 we take the opposite approach, and illustrate a selection of inference problems that are amenable to higher order approximation.
Sections 6.2 and 6.3 present slightly non-standard analysis of binary data; in the first case a natural model leads to the complementary log–log link, and in the second we consider conditional assessment of the binary model, eliminating the parameters in the binary regression by conditioning on the sufficient statistic. In Section 6.4 we present detailed analysis of a published set of herbicide data, with particular emphasis on the nlreg package of the hoa bundle. This package provides an extensive set of diagnostics and plots that are a useful adjunct to first order analysis, as well as providing an implementation of higher order approximations.
In this chapter we illustrate the breadth of application of higher order asymptotics by presenting a variety of examples, most of which have appeared in the published literature. In contrast to the earlier chapters, the emphasis is on the methods for higher order approximation, with the data treated as mainly illustrative. Section 7.2 outlines a problem of calibration in normal linear regression, and the two succeeding sections outline higher order approximation for a variance components setting and for dependent data, respectively. Sections 7.5 and 7.6 concern a problem of gamma regression; we compare Bartlett correction to Skovgaard's multivariate adjustment to the likelihood ratio statistic, and indicate the use of Laplace approximation for Bayes inference. In Section 7.7 we consider if it is worthwhile to apply higher order approximation to partial likelihood. The final section concerns use of a constructed exponential family to find the distribution of interest for a randomisation test.
Calibration
Table 7.1 shows measurements of the concentration of an enzyme in human blood plasma. The true concentration x is obtained using an accurate but expensive laboratory method, and the measured concentration y is obtained by a faster and less expensive automatic method. The goal is to use the observed data pairs to estimate values of the true concentration based on further measurements using the less expensive method. This is an example of a calibration problem: we have a model for E(y|x) that depends on some unknown parameters, and use a sample of pairs (x1, y1),…,(xn, yn) to estimate these parameters.
A telephone network consists of a network of exchanges (or routers, in more modern formulations). Many of these are themselves the centre of a local star network, in that they have direct connections to individual subscribers in the region. However, it is the exchanges that we shall regard as constituting the nodes of a network. The defining feature of a loss network is that a call is accepted only if a clear route to its destination is available, otherwise the call is lost.
Random variation can enter the system in various ways. Even if all the parameters of operation and loading are constant and equilibrium has been reached, there will be statistical variation of the numbers and types of calls in progress. This is the source that receives most attention in at least that part of the literature favoured by mathematicians. One might consider it secondary in comparison with the more radical type of uncertainty which faces the system when there are massive variations in load or major internal failures. It is these that determine system structure, by setting a premium on versatility and robustness.
Nevertheless, there will always be times when a system is working close to capacity, and real-time decisions on acceptance and routing must be made to accommodate normal statistical variation. Such circumstances are associated with the decentralised realisation of policies, and will be considered in the next chapter.
We shall then begin by considering a system operating deterministically with a fixed demand pattern. The variables (e.g. numbers of calls in progress, capacity assigned to a given link) will be regarded as continuous in this treatment.
The size of this appendix may seem excessive, in view of the limited mention of random graphs in the main body of the text. However, given that a review as meticulous and wide-ranging as that by Albert and Barabási (2002) regards certain central questions as open for which a full analysis exists in the literature, it seems appropriate to give a brief and self-contained account of that analysis. It cannot be long concealed that much of the analysis referred to is that developed by the author over some 30 years. Detailed references are given in the literature survey of Section A3.8; the principal advances are treatment of the so-called first-shell model, leading to the integral representions (A3.13), (A3.31) and (A3.35) of various generating functions, in which a raft of conclusions is implicit.
The random graph model can be expressed either in terms of graphs or in terms of polymers if we identify nodes with atoms, arcs with bonds, components (connected subgraphs) with polymer molecules and node ‘colours’ with types of atom. We shall for preference use the term ‘unit’ rather than atom, since the unit does occasionally have a compound (but fixed) form. The situation in which a ‘giant’ component forms (i.e. in which all but an infinitesimal proportion of the nodes form a single component) is identified with the phenomenon of gelation in the polymer context.
We come now to a most remarkable piece of work. This concerns the optimisation of load-bearing structures in engineering, such as frameworks consisting of freely jointed struts and ties – we give examples. The purpose of such structures is to communicate stress, a vector quantity, in an economic fashion from the points where external load is applied to the points where it can be off-loaded: the load-accepting foundation.
The problem of optimising such structures was considered in a paper, remarkable for its brevity and penetration as much as for its prescience, published as early as 1904 by the Australian engineer A. G. M. Michell. He derived the dual form of the problem and exhibited its essential role in determining the optimal design. This was at a time when there was no general understanding of the role or interpretation of the dual variable – Michell uses the concept of a ‘virtual displacement’, a term that we shall see as justified. He then went on to derive the continuous form of the dual, corresponding to the unrestricted optimisation of structures on the continuum. This opened the way to the study of structures made of material with directional properties (e.g. the use of resin-bonded fibre-glass matting in yacht hulls, the laying down of bone in such a way as to meet the particular pressures and tensions to which it is subjected). It also turned out to offer an understanding of materials that behave plastically rather than elastically under load – i.e. that yield catastrophically when load reaches a critical value.
The contents list gives a fair impression of the coverage attempted. Networks, both deterministic and stochastic, have emerged as objects of intense interest over recent decades. They occur in communication, traffic, computer, manufacturing and operational research contexts, and as models in almost any of the natural, economic and social sciences. Even engineering frame structures can be seen as networks that communicate stress from load to foundation.
We are concerned in this book with the characterisation of networks that are optimal for their purpose. This is a very natural ambition; so many structures in nature have been optimised by long adaptation, a suggestive mechanism in itself. It is an ambition with an inbuilt hurdle, however: one cannot consider optimisation of design without first considering optimisation of function, of the rules by which the network is to be operated. On the other hand, optimisation of function should find a more natural setting if it is coupled with the optimisation of design.
The mention of communication and computer networks raises examples of areas where theory barely keeps breathless pace with advancing technology. That is a degree of topicality we do not attempt, beyond setting up some basic links in the final chapters. It is well recognised that networks of commodity flow, electrical flow, traffic flow and even mechanical frame structures and biological bone structures have unifying features, and Part I is devoted to this important subclass of cases.
In Chapter 1 we consider a rather general model of commodity distribution for which flow is determined by an extremal principle.
By ‘distributional networks’ we mean networks that carry the flow of some commodity or entity, using a routing rule that is intended to be effective and even optimal. The chapter titles give examples. Operational research abounds in such problems (see Ball et al., 1995a,b and Bertsekas, 1998), but the versions we consider are both easier and harder than these. In operational research problems one is usually optimising flow or operations upon a given network, whereas we aim also to optimise the network itself. Even the Bertsekas text, although entitled Network Optimization, is concerned with operation rather than design. The design problem is more difficult, if for no other reason than that it cannot be settled until the operational rules are clarified. On the other hand, there may be a simplification, in that the optimal network is not arbitrary, but has special properties in its class.
The ‘flow’ may not be a material one – see the Michell structures of Chapters 7–9, for which the entity that is communicated through the network is stress. The communication networks of Part IV are also distributional networks, but ones that have their own particular dynamic and stochastic structure.