Hostname: page-component-745bb68f8f-grxwn Total loading time: 0 Render date: 2025-01-10T07:30:44.644Z Has data issue: false hasContentIssue false

A Bootstrap Method for Conducting Statistical Inference with Clustered Data

Published online by Cambridge University Press:  25 January 2021

Jeffrey J. Harden*
Affiliation:
University of North Carolina at Chapel Hill, USA
*
Jeffrey J. Harden, University of North Carolina at Chapel Hill, Department of Political Science, 312 Hamilton Hall, CB #3265, Chapel Hill, NC 27599 Email: jjharden@unc.edu

Abstract

U.S. state politics researchers often analyze data with observations grouped into clusters. This structure commonly produces unmodeled correlation within clusters, leading to downward bias in the standard errors of regression coefficients. Estimating robust cluster standard errors (RCSE) is a common approach to correcting this bias. However, despite their frequent use, recent work indicates that RCSE can also be biased downward. Here the author provides evidence of that bias and offers a potential solution. Through Monte Carlo simulation of an ordinary least squares (OLS) regression model, the author compares conventional standard error (OLS-SE) and RCSE performance to that of a bootstrap method that resamples clusters of observations (BCSE). The author shows that both OLS-SE and RCSE are biased downward, with OLS-SE being the most biased. In contrast, BCSE are not biased and consistently outperform the other two methods. The author concludes with three replications from recent work and offers recommendations to researchers.

Type
Research Article
Copyright
Copyright © The Author(s) 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alecxih, Lisa, and Corea, John. 1998. “Deriving State-Level Estimates from Three National Surveys: A Statistical Assessment and State Tabulations.” http://aspe.hhs.gov/daltcp/reports/deriving.pdf (Accessed May 15, 2009).Google Scholar
Arai, Mahmood. 2009. “Cluster-Robust Standard Errors Using R.” http://people.su.se/~ma/clustering.pdf (Accessed May 29, 2009).Google Scholar
Arceneaux, Kevin. 2005. “Using Cluster-Randomized Field Experiments to Study Voting Behavior.” Annals of the American Academy of Political and Social Science 601:169–79.CrossRefGoogle Scholar
Arceneaux, Kevin, and Huber, Gregory. 2007. “Identifying the Persuasive Effects of Presidential Advertising.” American Journal of Political Science 51:957–77.Google Scholar
Arceneaux, Kevin, and Nickerson, David W.. 2009. “Modeling Certainty with Clustered Data: A Comparison of Methods.” Political Analysis 17:177–90.CrossRefGoogle Scholar
Brambor, Thomas, Clark, William Roberts, and Golder, Matt. 2006. “Understanding Interaction Models: Improving Empirical Analyses.” Political Analysis 14:6382.CrossRefGoogle Scholar
Brown, Robert D., Jackson, Robert A., and Wright, Gerald C.. 1999. “Registration, Turnout, and State Party Systems.” Political Research Quarterly 52:463–79.CrossRefGoogle Scholar
Cameron, A. Colin, Gelbach, Jonah B., and Miller, Douglas L.. 2008. “Bootstrap Based Improvements for Inference with Clustered Errors.” Review of Economics and Statistics 90:414–27.CrossRefGoogle Scholar
Cameron, A. Colin, and Trivedi, Pravin K.. 2005. Microeconometrics: Methods and Applications. New York: Cambridge University Press.CrossRefGoogle Scholar
Carsey, Thomas M., and Jackson, Robert A.. 2001. “Misreport of Vote Choice in U.S. Senate and Gubernatorial Elections.” State Politics & Policy Quarterly 1:196209.CrossRefGoogle Scholar
Carsey, Thomas M., and Wright, Gerald C.. 1998. “State and National Factors in Gubernatorial and Senatorial Elections.” American Journal of Political Science 42:9941002.CrossRefGoogle Scholar
Efron, Bradley, and Tibshirani, Robert J.. 1993. An Introduction to the Bootstrap. New York: Chapman & Hall.CrossRefGoogle Scholar
Erikson, Robert S., Pinto, Pablo M., and Rader, Kelly T.. 2010. “Randomization Tests and Multi-Level Data in State Politics.” State Politics & Policy Quarterly 10:180–98.CrossRefGoogle Scholar
Feng, Ziding, McLerran, Dale, and Grizzle, James. 1996. “A Comparison of Statistical Methods for Clustered Data Analysis with Gaussian Error.” Statistics in Medicine 15:17931806.3.0.CO;2-2>CrossRefGoogle ScholarPubMed
Fisher, Ronald A. 1922. “On the Interpretation of x 2 from Contingency Tables, and the Calculation of p.Journal of the Royal Statistical Society 85:8794.CrossRefGoogle Scholar
Franzese, Robert J. 2005. “Empirical Strategies for Various Manifestations of Multilevel Data.” Political Analysis 13:430–46.CrossRefGoogle Scholar
Gelman, Andrew, and Hill, Jennifer. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. New York: Cambridge University Press.Google Scholar
Genz, Alan, Bretz, Frank, Hothorn, Torsten, Miwa, Tetsuhisa, Mi, Xuefei, Leisch, Friedrich, and Scheipl, Fabian. 2008. “mvtnorm: Multivariate Normal and t Distributions.” R package version 0.9-3.http://CRAN.R-project.org/package=mvtnorm (Accessed May 29, 2009).Google Scholar
Green, Donald P., and Vavreck, Lynn. 2008. “Analysis of Cluster-Randomized Experiments: A Comparison of Alternative Estimation Techniques.” Political Analysis 16:138–52.CrossRefGoogle Scholar
Harden, Jeffrey J. 2010. “Improving Statistical Inference with Clustered Data.” University of North Carolina at Chapel Hill. Typescript.Google Scholar
Harrell, Frank E. 2008a. “Design: Design Package.” R package version 2.1-2. http://biostat.mc.vanderbilt.edu/s/Design (Accessed May 29, 2009).Google Scholar
Harrell, Frank E. 2008b. “Hmisc: Harrell Miscellaneous.” R package version 3.4-4. http://biostat.mc.vanderbilt.edu/s/Hmisc (Accessed May 29, 2009).Google Scholar
Hill, Kim Quaile, and Leighley, Jan E.. 1996. “Political Parties and Class Mobilization in Contemporary United States Elections.” American Journal of Political Science 40:787804.CrossRefGoogle Scholar
Hogan, Robert E. 2008. “Policy Responsiveness and Incumbent Reelection in State Legislatures.” American Journal of Political Science 52:858–73.CrossRefGoogle Scholar
Huber, Peter J. 1967. “The Behavior of Maximum Likelihood Estimates under Non-standard Conditions.” In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 221–33.Google Scholar
Jackman, Simon. 2009. Bayesian Analysis for the Social Sciences. New York: John Wiley.CrossRefGoogle Scholar
Kennedy, Peter E. 1995. “Randomization Tests in Econometrics.” Journal of Business & Economic Statistics 13:8594.Google Scholar
King, Gary, Tomz, Michael, and Wittenberg, Jason. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44:341–55.CrossRefGoogle Scholar
Kish, Leslie. 1965. Survey Sampling. New York: John Wiley.Google Scholar
Künsch, Hans R. 1989. “The Jackknife and the Bootstrap for General Stationary Observations.” Annals of Statistics 17:1217–41.CrossRefGoogle Scholar
Liang, Kung-Yee, and Zeger, Scott L.. 1986. “Longitudinal Data Analysis Using Generalized Linear Models.” Biometrika 73:1322.CrossRefGoogle Scholar
Moulton, Brent R. 1990. “An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units.” Review of Economics and Statistics 72:334–38.CrossRefGoogle Scholar
Primo, David M., Jacobsmeier, Matthew L., and Milyo, Jeffrey. 2007. “Estimating the Impact of State Policies and Institutions with Mixed-Level Data.” State Politics & Policy Quarterly 7:446–59.CrossRefGoogle Scholar
Raudenbush, Stephen W., and Bryk, Anthony S.. 2002. Hierarchical Linear Models: Applications and Data Analysis Methods. 2nd ed. Thousand Oaks, CA: Sage.Google Scholar
R Development Core Team. 2008. “R: A Language and Environment for Statistical Computing.” Vienna, Austria:R Foundation for Statistical Computing. http://www.r-project.org.Google Scholar
Rogers, William H. 1993. “Regression Standard Errors in Clustered Samples.” Stata Technical Bulletin 13:1923.Google Scholar
StataCorp. 2007. “Stata Statistical Software: Release 10.” College Station, TX: StataCorp.Google Scholar
Tolbert, Caroline J., McNeal, Ramona S., and Smith, Daniel A.. 2003. “Enhancing Civic Engagement: The Effect of Direct Democracy on Political Participation and Knowledge.” State Politics & Policy Quarterly 3:2341.CrossRefGoogle Scholar
White, Halbert. 1980. “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.” Econometrica 48:817–38.CrossRefGoogle Scholar
Williams, Rick L. 2000. “A Note on Robust Variance Estimation for Cluster-Correlated Data.” Biometrics 56:645–46.CrossRefGoogle ScholarPubMed
Wolfinger, Raymond E., Highton, Benjamin, and Mullin, Megan. 2005. “How Postregistration Laws Affect the Turnout of Citizens Registered to Vote.” State Politics & Policy Quarterly 5:123.CrossRefGoogle Scholar
Zeileis, Achim. 2006. “Object-Oriented Computation of Sandwich Estimators.” Journal of Statistical Software 16:116.CrossRefGoogle Scholar
Zeileis, Achim, and Hothorn, Torsten. 2002. “Diagnostic Checking in Regression Relationships.” R News 2:710. http://CRAN.R-project.org/doc/Rnews/May 29, 2009). 2:7–10.Google Scholar
Zorn, Christopher. 2001. “Generalized Estimating Equation Models for Correlated Data: A Review with Applications.” American Journal of Political Science 45:470–90.CrossRefGoogle Scholar
Zorn, Christopher. 2006. “Comparing GEE and Robust Standard Errors for Conditionally Dependent Data.” Political Research Quarterly 59:329–41.CrossRefGoogle Scholar
Supplementary material: File

Harden supplementary material

Dataverse Files

Download Harden supplementary material(File)
File 168.6 KB