Content Listing

3 - Linear regression: the basics
from Part 1A - Single-level regression
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 31-52
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Linear regression is a method that summarizes how the average values of a numerical outcome variable vary over subpopulations defined by linear functions of predictors. Introductory statistics and regression texts often focus on how regression can be used to represent relationships between variables, rather than as a comparison of average outcomes. By focusing on regression as a comparison of averages, we are being explicit about its limitations for defining these relationships causally, an issue to which we return in Chapter 9. Regression can be used to predict an outcome given a linear function of these predictors, and regression coefficients can be thought of as comparisons across predicted values or as comparisons among averages in the data.
One predictor
We begin by understanding the coefficients without worrying about issues of estimation and uncertainty. We shall fit a series of regressions predicting cognitive test scores of three- and four-year-old children given characteristics of their mothers, using data from a survey of adult American women and their children (a subsample from the National Longitudinal Survey of Youth).
For a binary predictor, the regression coefficient is the difference between the averages of the two groups
We start by modeling the children's test scores given an indicator for whether the mother graduated from high school (coded as 1) or not (coded as 0).

C - Software
from Appendixes
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 565-574
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Getting started with R, Bugs, and a text editor
Follow the instructions at www.stat.columbia.edu/∼gelman/arm/software/ to download, install, and set up R and Bugs on your Windows computer. The webpage is occasionally updated as the software improves, so we recommend checking back occasionally. R, OpenBugs, and WinBugs have online help with more information available at www.r-project.org, www.math.helsinki.fi/openbugs/, and www.mrc-bsu.cam.ac.uk/bugs/.
Set up a working directory on your computer for your R work. Every time you enter R, your working directory will automatically be set, and the necessary functions will be loaded in.
Configuring your computer display for efficient data analysis
We recommend working with three nonoverlapping open windows, as pictured in Figure C.1: an R console, the R graphics window, and a text editor (ideally a program such as Emacs or WinEdt that allows split windows, or the script window in the Windows version of R). When programming in Bugs, the text editor will have two windows open: a file (for example, project. R) with R commands, and a file (for example, project.bug) with the Bugs model. It is simplest to type commands into the text file with R commands and then cut and paste them into the R console. This is preferable to typing in the R console directly because copying and altering the commands is easier in the text editor. To run Bugs, there is no need to open a Bugs window; R will do this automatically when the function bugs() is called (assuming you have set up your computer as just described, which includes loading the R2WinBUGS package in R).

12 - Multilevel linear models: the basics
from Part 2A - Multilevel regression
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 251-278
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Multilevel modeling can be thought of in two equivalent ways:
We can think of a generalization of linear regression, where intercepts, and possibly slopes, are allowed to vary by group. For example, starting with a regression model with one predictor, yi = α + βxi + ∈i, we can generalize to the varyingintercept model, yi = αj[i] + βxi + ∈i, and the varying-intercept, varying-slope model, yi = αj[i] + βj[i]xi + ∈i (see Figure 11.1 on page 238).
Equivalently, we can think of multilevel modeling as a regression that includes a categorical input variable representing group membership. From this perspective, the group index is a factor with J levels, corresponding to J predictors in the regression model (or 2J if they are interacted with a predictor x in a varying-intercept, varying-slope model; or 3J if they are interacted with two predictors X(1), X(2); and so forth).
In either case, J−1 linear predictors are added to the model (or, to put it another way, the constant term in the regression is replaced by J separate intercept terms). The crucial multilevel modeling step is that these J coefficients are then themselves given a model (most simply, a common distribution for the J parameters αj or, more generally, a regression model for the αj's given group-level predictors). The group-level model is estimated simultaneously with the data-level regression of y.

Part 2B - Fitting multilevel models
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 343-344
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We next explain how to fit multilevel models in Bugs, as called from R. We illustrate with several examples and discuss some general issues in model fitting and tricks that can help us estimate multilevel models using less computer time. We also present the basics of Bayesian inference (as a generalization of the least squares and maximum likelihood methods used for classical regression), which is the approach used in problems such as multilevel models with potentially large numbers of parameters.
Appendix C discusses some software that is available to quickly and approximately fit multilevel models. We recommend using Bugs for its flexibility in modeling; however, these simpler approaches can be useful to get started, explore models quickly, and check results.

List of examples
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp xvii-xviii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

6 - Generalized linear models
from Part 1A - Single-level regression
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 109-134
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Introduction
Generalized linear modeling is a framework for statistical analysis that includes linear and logistic regression as special cases. Linear regression directly predicts continuous data y from a linear predictor Xβ = β0 + X1β1 + ⋯ + Xkβk. Logistic regression predicts Pr(y = 1) for binary data from a linear predictor with an inverselogit transformation. A generalized linear model involves:
A data vector y = (y1, …, yn)
Predictors X and coefficients β, forming a linear predictor Xβ
A link function g, yielding a vector of transformed data ŷ = g−1(Xβ) that are used to model the data
A data distribution, p(y|ŷ)
Possibly other parameters, such as variances, overdispersions, and cutpoints, involved in the predictors, link function, and data distribution.
The options in a generalized linear model are the transformation g and the data distribution p.
In linear regression, the transformation is the identity (that is, g(u) ≡ u) and the data distribution is normal, with standard deviation σ estimated from data.
In logistic regression, the transformation is the inverse-logit, g−1(u) = logit−1(u) (see Figure 5.2a on page 80) and the data distribution is defined by the probability for binary data: Pr(y = 1) = ŷ.

Part 3 - From data collection to model understanding to model checking
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 435-436
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We now go through the steps of understanding and working with multilevel regressions, including designing studies, summarizing inferences, checking the fit of models to data, and imputing missing data.

17 - Fitting multilevel linear and generalized linear models in Bugs and R
from Part 2B - Fitting multilevel models
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 375-386
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

21 - Understanding and summarizing the fitted models
from Part 3 - From data collection to model understanding to model checking
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 457-486
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Now that we can fit multilevel models, we should consider how to understand and summarize the parameters (and important transformations of these parameters) thus estimated.
Inferences from classical regression are typically summarized by a table of coefficient estimates and standard errors, sometimes with additional information on residuals and statistical significance (see, for example, the R output on page 39). With multilevel models, however, the sheer number of parameters adds a challenge to interpretation. The coefficient list in a multilevel model can be arbitrarily long (for example, the radon analysis has 85 county-level coefficients for the varying-intercept model, or 170 coefficients if the slope is allowed to vary also), and it is unrealistic to expect even the person who fit the model to be able to interpret each number separately. We prefer graphical displays such as the generic plot of a Bugs object or plots of fitted multilevel models such as displayed in the examples in Part 2A of this book.
Our general plan is to follow the same structures when plotting as when modeling. Thus, we plot data with data-level regressions (as in Figure 12.5 on page 266), and estimated group coefficients with group-level regressions (as in Figure 12.6). More complicated plots can be appropriate for non-nested models (for example, Figure 13.10 on page 291 and Figure 13.12 on page 293). More conventional plots of parameter estimates and standard errors (such as Figure 14.1 on page 306) can be helpful in multilevel models too.

19 - Debugging and speeding convergence
from Part 2B - Fitting multilevel models
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 415-434
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Once data and a model have been set up, we face the challenge of debugging or, more generally, building confidence in the model and estimation. The steps of Bugs and R as we have described them are straightforward, but cumulatively they require a bit of effort, both in setting up the model and checking it—adding many lines of code produces many opportunities for typos and confusion. In Section 19.1 we discuss some specific issues in Bugs and general strategies for debugging and confidence building. Another problem that often arises is computational speed, and in Sections 19.2–19.5 we discuss several specific methods to get reliable inferences faster when fitting multilevel models. The chapter concludes with Section 19.6, which is not about computation at all, but rather is a discussion of prior distributions for variance parameters. The section is included here because it discusses models that were inspired by the computational idea described in Section 19.5. It thus illustrates the interplay between computation and modeling which has often been so helpful in multilevel data analysis.
Debugging and confidence building
Our general approach to finding problems in statistical modeling software is to get various crude models (for example, complete pooling and no pooling, or models with no predictors) to work and then to gradually build up to the model we want to fit.

23 - Causal inference using multilevel models
from Part 3 - From data collection to model understanding to model checking
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 503-512
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Causal inference using regression has an inherent multilevel structure—the data give comparisons between units, but the desired causal inferences are within units. Experimental designs such as pairing and blocking assign different treatments to different units within a group. Observational analyses such as pairing or panel study attempt to capture groups of similar observations with variation in treatment assignment within groups.
Multilevel aspects of data collection
Hierarchical analysis of a paired design
Section 9.3 describes an experiment applied to school classrooms with a paired design: within each grade, two classes were chosen within each of several schools, and each pair was randomized, with the treatment assigned to one class and the control assigned to the other. The appropriate analysis then controls for grade and pair.
Including pair indicators in the Electric Company experiment. As in Section 9.3, we perform a separate analysis for each grade, which could be thought of as a model including interactions of treatment with grade indicators. Within any grade, let n be the number of classes (recall that the treatment and measurements are at the classroom, not the student, level) and J be the number of pairs, which is n/2 in this case. (We use the general notation n, J rather than simply “hard-coding” J = n/2 so that our analysis can also be used for more general randomized block designs with arbitrary numbers of units within each block.)

Preface
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp xix-xxii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Aim of this book
This book originated as lecture notes for a course in regression and multilevel modeling, offered by the statistics department at Columbia University and attended by graduate students and postdoctoral researchers in social sciences (political science, economics, psychology, education, business, social work, and public health) and statistics. The prerequisite is statistics up to and including an introduction to multiple regression.
Advanced mathematics is not assumed—it is important to understand the linear model in regression, but it is not necessary to follow the matrix algebra in the derivation of least squares computations. It is useful to be familiar with exponents and logarithms, especially when working with generalized linear models.
After completing Part 1 of this book, you should be able to fit classical linear and generalized linear regression models—and do more with these models than simply look at their coefficients and their statistical significance. Applied goals include causal inference, prediction, comparison, and data description. After completing Part 2, you should be able to fit regression models for multilevel data. Part 3 takes you from data collection, through model understanding (looking at a table of estimated coefficients is usually not enough), to model checking and missing data. The appendixes include some reference materials on key tips, statistical graphics, and software for model fitting.

Author index
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 601-606
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Part 2A - Multilevel regression
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 235-236
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We now introduce multilevel linear and generalized linear models, including issues such as varying intercepts and slopes and non-nested models. We view multilevel models either as regressions with potentially large numbers of coefficients that are themselves modeled, or as regressions with coefficients that can vary by group.

Subject index
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 607-625
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

25 - Missing-data imputation
from Part 3 - From data collection to model understanding to model checking
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 529-544
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

8 - Simulation for checking statistical procedures and model fits
from Part 1B - Working with regression inferences
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 155-166
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter describes a variety of ways in which probabilistic simulation can be used to better understand statistical procedures in general, and the fit of models to data in particular. In Sections 8.1–8.2, we discuss fake-data simulation, that is, controlled experiments in which the parameters of a statistical model are set to fixed “true” values, and then simulations are used to study the properties of statistical methods. Sections 8.3–8.4 consider the related but different method of predictive simulation, where a model is fit to data, then replicated datasets are simulated from this estimated model, and then the replicated data are compared to the actual data.
The difference between these two general approaches is that, in fake-data simulation, estimated parameters are compared to true parameters, to check that a statistical method performs as advertised. In predictive simulation, replicated datasets are compared to an actual dataset, to check the fit of a particular model.
Fake-data simulation
Simulation of fake data can be used to validate statistical algorithms and to check the properties of estimation procedures. We illustrate with a simple regression model, where we simulate fake data from the model, y = α + βx + ∊, refit the model to the simulated data, and check the coverage of the 68% and 95% intervals for the coefficent β.

Frontmatter
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp i-viii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

24 - Model checking and comparison
from Part 3 - From data collection to model understanding to model checking
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 513-528
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

There are generally many options available when modeling a data structure, and once we have successfully fit a model, it is important to check its fit to data. It is also often necessary to compare the fits of different models.
Our basic approach for checking model fit is—as we have described in Sections 8.3–8.4 for simple regression models—to simulate replicated datasets from the fitted model and compare these to the observed data. We discuss the general approach in Section 24.1 and illustrate in Section 24.2 with an extended example of a set of models fit to an experiment in animal learning. The methods we demonstrate are not specific to multilevel models but become particularly important as models become more complicated.
Although the methods described here are quite simple, we believe that they are not used as often as they could be, possibly because standard statistical techniques were developed before the use of computer simulation. In addition, fitting multilevel models is a challenge, and users are often so relieved to have successfully fit a model with convergence that there is a temptation to stop and rest rather than check the model fit. Section 24.3 discusses some tools for comparing different models fit to the same data.
Posterior predictive checking is a useful direct way of assessing the fit of the model to various aspects of the data. Our goal here is not to compare or choose among models but rather to explore the ways in which any of the models being considered might be lacking.

11 - Multilevel structures
from Part 2A - Multilevel regression
Andrew Gelman, Columbia University, New York, Jennifer Hill, Columbia University, New York
Book:

Data Analysis Using Regression and Multilevel/Hierarchical Models

Published online:

05 September 2012

Print publication:

18 December 2006, pp 237-250
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Textbooks

Refine search

Refine search

Actions for selected content:

36885 results in Cambridge Textbooks

3 - Linear regression: the basics

Summary

C - Software

Summary

12 - Multilevel linear models: the basics

Summary

Part 2B - Fitting multilevel models

Summary

List of examples

6 - Generalized linear models

Summary

Part 3 - From data collection to model understanding to model checking

Summary

17 - Fitting multilevel linear and generalized linear models in Bugs and R

21 - Understanding and summarizing the fitted models

Summary

19 - Debugging and speeding convergence

Summary

23 - Causal inference using multilevel models

Summary

Preface

Summary

Author index

Part 2A - Multilevel regression

Summary

Subject index

25 - Missing-data imputation

8 - Simulation for checking statistical procedures and model fits

Summary

Frontmatter

24 - Model checking and comparison

Summary

11 - Multilevel structures

Textbooks

Refine search

Refine search

Actions for selected content:

Save Search

36885 results in Cambridge Textbooks

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary