To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Microeconometrics research is usually performed on data collected by survey of a sample of the population of interest. The simplest statistical assumption for survey data is simple random sampling (SRS), under which each member of the population has equal probability of being included in the sample. Then it is reasonable to base statistical inference on the assumption that the data (yi, xi) are independent over i and identically distributed. This assumption underlies the small-sample and asymptotic properties of estimators presented in this book, with the notable exception of sample selection models in Chapter 16.
In practice, however, SRS is almost never the right assumption for survey data. Alternative sampling schemes are instead used to reduce survey costs and to increase precision of estimation for subgroups of the population that are of particular interest.
For example, a household survey may first partition the population geographically into subgroups, such as villages or suburbs, with differing sampling rates for different subgroups. Interviews may be conducted on households that are clustered in small geographic areas, such as city blocks. The data (yi, xi) are clearly no longer iid. First, the distribution of (yi, xi) may vary across subgroups, so the identical distribution assumption may be inappropriate. Second, since data may be correlated for households in the same cluster, the assumption that (yi, xi) are independent within the cluster breaks down.
The previous chapter focused on m-estimation, including ML and NLS estimation. Now we consider a much broader class of extremum estimators, those based on method of moments (MM) and generalized method of moments (GMM).
The basis of MM and GMM is specification of a set of population moment conditions involving data and unknown parameters. The MM estimator solves the sample moment conditions that correspond to the population moment conditions. For example, the sample mean is the MM estimator of the population mean. In some cases there may be no explicit analytical solution for the MM estimator, but numerical solution may still be possible. Then the estimator is an example of the estimating equations estimator introduced briefly in Section 5.4.
In some situations, however, MM estimation may be infeasible because there are more moment conditions and hence equations to solve than there are parameters. A leading example is IV estimation in an overidentified model. The GMM estimator, due to Hansen (1982), extends the MM approach to accommodate this case.
The GMM estimator defines a class of estimators, with different GMM estimators obtained by using different population moment conditions, just as different specified densities lead to different ML estimators. We emphasize this moment-based approach to estimation, even in cases where alternative presentations are possible, as it provides a unified approach to estimation and can provide an obvious way to extend methods from linear to nonlinear models.
Part 2 presents the core estimation methods – least squares, maximum likelihood and method of moments – and associated methods of inference for nonlinear regression models that are central in microeconometrics. The material also includes modern topics such as quantile regression, sequential estimation, empirical likelihood, semiparametric and nonparametric regression, and statistical inference based on the bootstrap. In general the discussion is at a level intended to provide enough background and detail to enable the practitioner to read and comprehend articles in the leading econometrics journals and, where needed, subsequent chapters of this book. We presume prior familiarity with linear regression analysis.
The essential estimation theory is presented in three chapters. Chapter 4 begins with the linear regression model. It then covers at an introductory level quantile regression, which models distributional features other than the conditional mean. It provides a lengthy expository treatment of instrumental variables estimation, a major method of causal inference. Chapter 5 presents the most commonly-used estimation methods for nonlinear models, beginning with the topic of m-estimation, before specialization to maximum likelihood and nonlinear least squares regression. Chapter 6 provides a comprehensive treatment of generalized method of moments, which is a quite general estimation framework that is applicable for linear and nonlinear models in single-equation and multi-equation settings. The chapter emphasizes the special case of instrumental variables estimation.
The problem of missing data in survey data is one of long standing, arising from nonresponse or partial response to survey questions. Reasons for nonresponse include unwillingness to provide the information asked for, difficulty of recall of events that occurred in the past, and not knowing the correct response. Imputation is the process of estimating or predicting the missing observations.
In this chapter we deal with the regression setup with data vector (yi, xi), i = 1, …, N. For some of the observations some elements of xi or of both (yi, xi) are missing. A number of questions are considered. When can we proceed with an analysis of only the complete observations, and when should we attempt to fill the gaps left by the missing observations? What methods of imputation are available? When imputed values for missing observations are obtained, how should estimation and inference then proceed?
If a data set has missing observations, and if these gaps can be filled by a statistically sound procedure, then benefit comes from a larger and possibly more representative sample and, under ideal circumstances, more precise inference. The cost of estimating missing data comes from having to make (possibly wrong) assumptions to support a procedure for generating proxies for the missing observations, and from the approximation error inherent in any such procedure. Further, statistical inference that follows data augmentation after imputed values replace missing data is more complicated because such inference must take into account the approximation errors introduced by imputation.
In empirical work data frequently present not one but multiple complications that need to be dealt with simultaneously. Examples of such complications include departures from simple random sampling, clustering of observations, measurement errors, and missing data. When they occur, individually or jointly, and in the context of any of the models developed in Parts 4 and 5, identification of parameters of interest will be compromised. Three chapters in Part 6 – Chapters 24, 26, and 27 – analyze the consequences of such complications and then present methods that control for these complications. The methods are illustrated using examples taken from the earlier parts of the book. This feature gives points of connection between Part 6 and the rest of the book.
Chapter 24, which deals with several features of data from complex surveys, notably stratified sampling and clustering, complements various topics covered in Chapters 3, 5, and 16. Chapter 26 which deals with measurement errors in models studied in Chapters 4, 14, and 20. Chapter 27 is a stand-alone chapter on missing data and multiple imputation, but its use of the EM algorithm and Gibbs sampler also gives it points of contact with Chapters 10 and 13, respectively.
Chapter 25 presents treatment evaluation. Treatment is a broad term that refers to the impact of one variable, e.g. schooling, on some outcome variable, e.g. earnings. Treatment variables may be exogenously assigned, or may be endogenously chosen.
In this chapter we consider tests of hypotheses, possibly nonlinear in the parameters, using estimators appropriate for nonlinear models.
The distribution of test statistics can be obtained using the same statistical theory as that used for estimators, since test statistics like estimators are statistics, that is, functions of the sample. Given appropriate linearization of estimators and hypotheses, the results closely resemble to those for testing linear restrictions in the linear regression model. The results rely on asymptotic theory, however, and exact t- and F-distributed test statistics for the linear model under normality are replaced by test statistics that are asymptotically standard normal distributed (z-tests) or chi-square distributed.
There are two main practical concerns in hypothesis testing. First, tests may have the wrong size, so that in testing at a nominal significance level of, say, 5%, the actual probability of rejection of the null hypothesis may be much more or less than 5%. Such a wrong size is almost certain to arise in moderate size samples as the underlying asymptotic distribution theory is only an approximation. One remedy is the bootstrap method, introduced in this chapter but sufficiently important and broad to be treated separately in Chapter 11. Second, tests may have low power, so that there is low probability of rejecting the null hypothesis when it should be rejected. This potential weakness of tests is often neglected. Size and power are given more prominence here than in most textbook treatments of testing.
This book provides a detailed treatment of microeconometric analysis, the analysis of individual-level data on the economic behavior of individuals or firms. This type of analysis usually entails applying regression methods to cross-section and panel data.
The book aims at providing the practitioner with a comprehensive coverage of statistical methods and their application in modern applied microeconometrics research. These methods include nonlinear modeling, inference under minimal distributional assumptions, identifying and measuring causation rather than mere association, and correcting departures from simple random sampling. Many of these features are of relevance to individual-level data analysis throughout the social sciences.
The ambitious agenda has determined the characteristics of this book. First, although oriented to the practitioner, the book is relatively advanced in places. A cookbook approach is inadequate because when two or more complications occur simultaneously – a common situation – the practitioner must know enough to be able to adapt available methods. Second, the book provides considerable coverage of practical data problems (see especially the last three chapters). Third, the book includes substantial empirical examples in many chapters to illustrate some of the methods covered. Finally, the book is unusually long. Despite this length we have been space-constrained. We had intended to include even more empirical examples, and abbreviated presentations will at times fail to recognize the accomplishments of researchers who have made substantive contributions.
Two important practical aspects of microeconometric modeling are determining whether a model is correctly specified and selecting from alternative models. For these purposes it is often possible to use the hypothesis testing methods presented in the previous chapter, especially when models are nested. In this chapter we present several other methods.
First, m-tests such as conditional moment tests are tests of whether moment conditions imposed by a model are satisfied. The approach is similar in spirit to GMM, except that the moment conditions are not imposed in estimation and are instead used for testing. Such tests are conceptually very different from the hypothesis tests of Chapter 7, as there is no explicit statement of an alternative hypothesis model.
Second, Hausman tests are tests of the difference between two estimators that are both consistent if the model is correctly specified but diverge if the model is incorrectly specified.
Third, tests of nonnested models require special methods because the usual hypothesis testing approach can only be applied when one model is nested within another.
Finally, it can be useful to compute and report statistics of model adequacy that are not test statistics. For example, an analogue of R2 may be used to measure the goodness of fit of a nonlinear model.
Ideally, these methods are used in a cycle of model specification, estimating, testing, and evaluation.
This chapter surveys issues concerning the potential usefulness and limitations of different types of microeconomic data. By far the most common data structure used in microeconometrics is survey or census data. These data are usually called observational data to distinguish them from experimental data.
This chapter discusses the potential limitation of the aforementioned data structures. The inherent limitations of observational data may be further compounded by the manner in which the data are collected, that is, by the sample frame (the way the sample is generated), sample design (simple random sample versus stratified random sample), and sample scope (cross-section versus longitudinal data). Hence we also discuss sampling issues in connection with the use of observational data. Some of this terminology is new at this stage but will be explained later in this chapter.
Microeconometrics goes beyond the analysis of survey data under the assumptions of simple random sampling. This chapter considers extensions. Section 3.2 outlines the structure of multistage sample surveys and some common forms of departure from random sampling; a more detailed analysis of their statistical implications is provided in later chapters. It also considers some commonly occurring complications that result in the data not being necessarily representative of the population. Given the deficiencies of observational data in estimating causal parameters, there has been an increased attempt at exploiting experimental and quasi-experimental data and frameworks. Section 3.3 examines the potential of data from social experiments.
This chapter extends the linear model panel data methods of Chapters 21 and 22 to the nonlinear regression models presented in Chapters 14–20. We focus on short panels and models with a time-invariant individual-specific effect that may be fixed or may be random. Both static and dynamic models are considered.
There is no one-size-fits-all prescription for nonlinear models with individual specific effects. If individual-specific effects are fixed and the panel is short then consistent estimation of the slope parameters is possible for only a subset of nonlinear models. If individual-specific effects are instead purely random then consistent estimation is possible for a wide range of models.
Section 23.2 presents general approaches that may or may not be implementable for particular models. Section 23.3 provides an application to a nonlinear model with multiplicative individual-specific effects. Specializations to the leading classes of nonlinear models – discrete data, selection models, transition data, and count data – are presented in Sections 23.4–23.7. Semiparametric estimation is surveyed in Section 23.8.
General Results
General approaches to extending the methods for linear models are presented in this section. We first present the various models – fixed effects, random effects, and pooled models, distinguishing parametric from conditional mean models. Methods to estimate these models and obtain panel-robust standard errors are then presented. Further details for specific nonlinear panel models are provided in subsequent sections.
The topic of treatment evaluation concerns measuring the impact of interventions on outcomes of interest, with the type of intervention and outcome being defined broadly so as to apply to many different contexts. The treatment evaluation approach and some of its terminology comes from medical sciences where intervention frequently means adopting a treatment regime. Subsequently, one may be interested in measuring the response to the treatment relative to some benchmark, such as no treatment or a different treatment. In economic applications treatment and interventions usually mean the same thing.
Examples of treatments in the economic context are enrollment into a labor training program, being a member of a trade union, receipt of a transfer payment from a social program, changes in regulations for receiving a transfer from a social program, changes in rules and regulations pertaining to financial transactions, changes in economic incentives, and so forth; see Moffitt (1992), Friedlander, Greenberg, and Robbins (1997), and Heckman, Lalonde, and Smith (1999). If the treatment that is applied can vary in intensity or type, we use the term multiple treatments when referring to them collectively. Relative to a single type of treatment this does not create complications, but now the choice of a benchmark for comparisons is more flexible.
There is a large statistical and econometric literature concerning the topic of unobserved heterogeneity. Observed heterogeneity refers to interindividual differences that are measured by regressors, and unobserved heterogeneity refers to all other differences. Both factors affect survival times. In the presence of unobserved heterogeneity even individuals with the same values of all covariates may have different hazards out of a given state. When unobserved heterogeneity is ignored, its impact is confounded with that of the baseline hazard.
To motivate further study consider a well-known empirical example. The aggregate hazard rate out of unemployment is known to be a declining function of the length of unemployment spell. If all individuals were identical then this would imply negative duration dependence, that is, a falling probability of escaping unemployment the longer an individual has remained unemployed. However, suppose that there are two types of individuals in the unemployed population, type F (fast), who have a constant hazard rate of 0.4, and type S (slow), whose constant hazard rate is 0.1. The population is a 50/50 mixture of the two types. Then for 100 type F people we observe 40 transitions in the first period, 24 transitions in the second period, and 14.4 in the third. For the type S, we observe 10, 9, and 8.1 transitions in the first, second, and third periods, respectively. Hence the aggregate proportion of transitions will be (40 + 10)/200 = 0.25, (24 + 9)/150 = 0.22, and (14.4 + 8.1)/117 = 0.192.
In many economic contexts the dependent or response variable of interest is a nonnegative integer or count that we wish to explain or analyze in terms of a set of regressors. Unlike the classical regression model, the response variable is discrete, with a distribution that places probability mass at nonnegative integer values only. Several models discussed earlier in the book, such as the binary outcome model and the duration model, can be shown to be closely related to the count data regression model. Regression models for counts, like other limited or discrete dependent variable models such as the logit and probit, are nonlinear with many properties and special features intimately connected to discreteness and nonlinearity.
Let us consider some examples from microeconometrics, beginning with sample data that are independent cross-section observations. Fertility studies often model the number of live births over a specified age interval of the mother, with interest in analyzing its variation in terms of, say, mother's schooling, age, and household income (Winkelmann, 1995). In some models of family decisions the number of children may appear as an explanatory variable with the acknowledgment that the variable is endogenous. Accident analysis studies model airline safety as measured by the number of accidents experienced by an airline over some period and seek to determine its relationship to airline profitability and other measures of the financial health of the airline (Rose, 1990).
Microeconometrics deals with the theory and applications of methods of data analysis developed for microdata pertaining to individuals, households, and firms. A broader definition might also include regional- and state-level data. Microdata are usually either cross sectional, in which case they refer to conditions at the same point in time, or longitudinal (panel) in which case they refer to the same observational units over several periods. Such observations are generated from both nonexperimental setups, such as censuses and surveys, and quasi-experimental or experimental setups, such as social experiments implemented by governments with the participation of volunteers.
A microeconometric model may be a full specification of the probability distribution of a set of microeconomic observations; it may also be a partial specification of some distributional properties, such as moments, of a subset of variables. The mean of a single dependent variable conditional on regressors is of particular interest.
There are several objectives of microeconometrics. They include both data description and causal inference. The first can be defined broadly to include moment properties of response variables, or regression equations that highlight associations rather than causal relations. The second category includes causal relationships that aim at measurement and/or empirical confirmation or refutation of conjectures and propositions regarding microeconomic behavior. The type and style of empirical investigations therefore span a wide spectrum.
This chapter deals with several different duration models that can be interpreted broadly as multivariate models, a category that covers both parallel and repeated transitions. Any transition model that involves more than one destination state can be regarded as a multivariate model because the analysis will involve joint distribution of two or more durations. The models we consider arise in a variety of ways and apply to several different types of data. Despite their differences, they are grouped in this chapter for reasons of organizational convenience.
To be concrete consider some examples. A familiar model from labor economics involves a transition from unemployment to employment or out of the labor force. The first transition can be further broken into return to the old job or to a new job. These destinations are mutually exclusive. An unemployment spell may end by a transition to any one of the destinations. A variant of this example considers an unemployed individual who could find either a new full-time or part-time job or remain unemployed. Thus there are three possible states (destinations). The models of Chapters 17 and 18 dealt with transitions between two states. One can still use the two-state methods to handle such data. For example, state 1 could be that of full-time employment and state 0 could be any other state. This would, as before, involve modeling one hazard rate.
Part 4, consisting of chapters 14 to 20, covers the core nonlinear limited dependent variable models for cross-section data, defined by the range of values taken by the dependent variable. Topics covered include models for binary, multinomial, duration and count data. The complications of censoring, truncation and sample selection are also studied. The essential base for Part 4 is least squares and maximum likelihood estimation.
Chapters 14–15 cover models for binary and multinomial data that are standard in the analysis of discrete outcomes and discrete choice. Maximum likelihood methods are dominant. Different parameterizations for the conditional probabilities in these models lead to different models, notably logit and probit models, which are wellestablished. Recent literature has focused on less restrictive modeling with more flexible functional forms for conditional probabilities and on accommodating individual unobserved heterogeneity. These objectives motivate the use of semiparametric methods and simulation-based estimation methods.
Censoring, truncation, or sample selection generate several important classes of models that are analyzed in Chapter 16. The long-established Tobit model is central to this literature, but its estimation and inference rely on strong distributional assumptions to permit consistent estimation. We also examine the newer semiparametric methods that rely on weaker assumptions.
Chapters 17–19 consider duration models in which the focus is on either the determinants of spell lengths, such as length of an unemployment spell, or on modeling the hazard rate of transitions from one initial state to another.
The preceding chapter considered models for discrete outcome variables that can take one of two possible values. Here we consider several possible outcomes, usually mutually exclusive. Examples include different ways to commute to work (by bus, car, or walking), various types of health insurance (fee-for-service, managed care, or none), different employment status (full-time, part-time, or none), choice of recreational site, occupational choice, and product choice.
Statistical inference is relatively straight forward in principle, as the data must be multinomial distributed, just as binary data must be Bernoulli or binomial distributed. Estimation is most often by maximum likelihood because the data are clearly multinomial distributed. For some complications, however, moment-based estimation is used instead.
Different multinomial models arise owing to different functional forms for the probabilities of the multinomial distribution, similar to the differences between probit and logit in the binary case. A distinction is also made between models where regressors vary across alternatives for a given individual and models where regressors are constant across alternatives. For example, in transportation mode choice some regressors, such as travel time or cost, will vary with choices whereas others, such as age, are choice invariant.
The simplest multinomial model, the conditional or multinomial logit model, is quite straightforward to use but is viewed as too restrictive in practice, especially if the multinomial outcome data arise from individual choice. For unordered outcomes less restrictive models can be obtained using the random utility model.
Discrete outcome or qualitative response models are models for a dependent variable that indicates in which one of m mutually exclusive categories the outcome of interest falls. Often there is no natural ordering of the categories. For example, categorization may be on the occupation of a worker.
This chapter considers the simplest case of binary outcomes, where there are two possible outcomes. Examples include whether or not an individual is employed and whether or not a consumer makes a purchase. Binary outcomes are simple to model and estimation is usually by maximum likelihood because the distribution of the data is necessarily defined by the Bernoulli model. If the probability of one outcome equals p, then the probability of the other outcome must be (1 − p). For regression applications the probability p will vary across individuals as a function of regressors. The two standard binary outcome models, the logit and the probit models, specify different functional forms for this probability as a function of regressors. The difference between these estimators is qualitatively similar to use of different functional forms for the conditional mean in least-squares regression.
Section 14.2 provides a data example. Section 14.3 presents a summary of statistical results for standard models including logit and probit models. In Section 14.4 binary outcome models are presented as arising from an underlying latent variable. This formulation is useful as it extends readily to multinomial models (see Chapter 15) and models for censored and selected samples (see Chapter 16).