To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Suppose one has a set of data that arises from a specific distribution with unknown parameter vector. A natural question to ask is the following: what value of this vector is most likely to have generated these data? The answer to this question is provided by the maximum-likelihood estimator (MLE). Likelihood and related functions are the subject of this chapter. It will turn out that we have already seen some examples of MLEs in the previous chapters. Here, we define likelihood, the score vector, the Hessian matrix, the information-matrix equivalence, parameter identification, the Cramér–Rao lower bound and its extensions, profile (concentrated) likelihood and its adjustments, as well as the properties of MLEs (including conditions for existence, consistency, and asymptotic normality) and the score (including martingale representation and local sufficiency). Applications are given, including some for the normal linear model.
Abadir and Magnus (2002, Econometric Theory) proposed a standard for notation in econometrics. The consistent use of the proposed notation in our volumes shows that it is in fact practical. The notational conventions described here mainly apply to the material covered in this volume. Further notation will be introduced, as needed, as the Series develops.
There is a proliferation of methods of point estimation other than ML. First, MLEs may not have an explicit formula and may be computationally more demanding than alternatives. Second, MLEs typically require the specification of a distribution. Third, optimization of criteria other than the likelihood may have some justification. The first argument has become less relevant with the advent of fast computers, and the alternative estimators based on it usually entail a loss of optimality properties. The second can be countered to some extent with large-sample invariance arguments or with the nonparametric MLE and empirical likelihood seen earlier. However, the third reason can be more fundamental.This chapter presents a selection of four common methods of point estimation, addressing the reasons outlined earlier, to varying degrees: method of moments, least squares, nonparametric (density and regression), and Bayesian estimation methods. In addition to these reasons for alternative estimators, point estimation itself may not be the most informative way to summarize what the data indicate about the parameters. Therefore, the chapter also introduces interval estimation and its multivariate generalization, a topic that leads quite naturally to the subject matter of Chapter 14.
This chapter concerns the measurement of the dependence between variates, by exploiting the additional information contained in joint (rather than just marginal) distribution and density functions. For this multivariate context, we also generalize the third description of randomness seen earlier, i.e., moments and their generating functions.Joint moments and their generating functions are introduced, along with covariances, variance matrices, the Cauchy–Schwarz inequality, and joint c.f.s and their inversion into joint densities. We show how the law of iterated expectations makes use of conditioning when taking expectations with respect to more than one variate. We measure dependence via conditional densities, distributions, moments, and cumulants.
We introduce elementary concepts of sets, probability, and events. We then study and illustrate the basic properties of probability. We use probability to characterize independent events and mutually exclusive events. We study conditioning and Bayes' law. We also introduce essential functions required to calculate probabilities, including the factorial, gamma, and beta functions. We then apply them to calculating combinations and permutations.
This chapter is devoted to the multivariate normal and functions of it. We start by showing how linearity is essential to its definition, then we derive the main properties. These include characteristic and density functions, conditionals, and some of the normal distribution's exceptional properties: the equivalence of no-correlation and independence within the class of elliptical distributions, Cramér's deconvolution theorem, the equivalence of a random sample's normality with the independence of the sample's normal mean and chi-square variance. We also explore other properties such as fourth-order moments in multivariate normal (and elliptical) distributions, the convexity of the m.g.f., joint distributions of linear and quadratic forms and conditions for their independence, the same also for pairs of quadratic forms and their covariance, as well as decompositions of quadratic forms.
The need for this chapter arises once we start considering the realistic case of more than one variate at a time, the multivariate case. We have already started dealing with this topic (in disguise) in the introductions to conditioning and mixing in Chapters 1 and 2, and in some of the exercises using these ideas in Chapter 4. Joint distributions are defined, and we explain their relation to the univariate distributions seen earlier and more generally to the distribution of subsets (marginal distributions). Joint densities are also defined. The independence of variates is defined in terms of their joint distribution. We also introduce the concept of copulas, linking the joint distribution to its marginals.
We concluded the previous chapter by introducing two methods of inference concerning the parameter vector. Since the Bayesian approach was one of them, we focus here on the competing frequentist or classical approach in its attempt to draw conclusions about the value of this vector. We introduce hypothesis testing, test statistics and their critical regions, size, and power. We then introduce desirable properties (lack of bias, uniformly most powerful test, consistency, invariance with respect to some class of transformations, similarity, admissibility) that help us find optimal tests. The Neyman–Pearson lemma and extensions are introduced. Likelihood ratio (LR), Wald (W), score and Lagrange multiplier (LM) tests are introduced for general hypotheses, including inequality hypotheses for the parameter vector. Monotone LR and the Karlin–Rubin theorem are studied, as is Neyman's structure and its role in finding optimal tests. The exponential family features prominently in the applications. Finally, distribution-free (nonparametric) tests are studied and linked to results in earlier chapters.
We introduce variates (random variables), defining their cumulative distribution function (c.d.f.) and probability density function (p.d.f.). We give the two main decomposition theorems of c.d.f.s, Jordan and Lebesgue, and the unifying Stieltjes integral approach for continuous and discrete variates. We study properties and descriptions such as symmetry, median, quantiles, and the quantile function, and modes of a distribution. The mixing of variates is also studied, including the famous example of Student's t, which can be represented as a mixed-normal variate.
We introduce principles of point estimation, that is, the estimation of a value for the vector of unknown parameters of the density of a variate. The chapter starts by considering some desirable properties of point estimators, a sort of “the good, the bad, and the ugly” classification! The topics covered include bias, efficiency, mean-squared error (MSE), consistency, robustness, invariance, and admissibility. We then introduce methods of summarizing the data via statistics that retain the relevant sample information about the parameter vector, and we see how they achieve the desirable properties of estimators. We discuss sufficiency, Neyman's factorization, ancillarity, Rao-Blackwellization, completeness, the Lehmann–Scheffé theorem and the minimum-variance unbiasedness of an estimator, and Basu's theorem. We consider the exponential family and special cases and conclude by introducing the most common model in statistics, the linear model, which is used for illustrations in this chapter and is covered more extensively in the following chapters.
Up to now, we have dealt with various foundational aspects of random variables and their distributions. We have occasionally touched on how these variates can arise in practice. In the second part of this book, we start analyzing in more detail how the variates are connected with sampling situations, how we can estimate the parameters of their distributions (which are typically unknown in practice), and how to conduct inference regarding these estimates and their magnitudes. This chapter starts with the first of these three aims. We study the sample mean and variance and their sampling properties, but also the sample's order statistics and extremes. The empirical distribution function (EDF) is defined and analyzed. In the case of multivariate normality, the Wishart distribution arises as a generalization of the chi-squared. We study the properties of matrix Wishart variates. We also show how Hotelling's T² arises as the counterpart of Student's t in the case of multivariate samples. The density of the correlation coefficient is also derived. We introduce rank and sign correlations, known as Spearman's rho and Kendall's tau, respectively.