Hostname: page-component-745bb68f8f-cphqk Total loading time: 0 Render date: 2025-01-28T23:18:57.818Z Has data issue: false hasContentIssue false

Two Wrongs Make a Right: Addressing Underreporting in Binary Data from Multiple Sources

Published online by Cambridge University Press:  11 April 2017

Scott J. Cook*
Affiliation:
Department of Political Science, Texas A&M University, College Station, TX 77843, USA. Email: sjcook@tamu.edu
Betsabe Blas
Affiliation:
Department of Statistics, Federal University of Pernambuco, University City, Brazil 50670-901
Raymond J. Carroll
Affiliation:
Department of Statistics, Texas A&M University, College Station, TX 77843, USA School of Mathematical Sciences, University of Technology Sydney, Sydney 2007, Australia
Samiran Sinha
Affiliation:
Department of Statistics, Texas A&M University, College Station, TX 77843, USA
*

Abstract

Media-based event data—i.e., data comprised from reporting by media outlets—are widely used in political science research. However, events of interest (e.g., strikes, protests, conflict) are often underreported by these primary and secondary sources, producing incomplete data that risks inconsistency and bias in subsequent analysis. While general strategies exist to help ameliorate this bias, these methods do not make full use of the information often available to researchers. Specifically, much of the event data used in the social sciences is drawn from multiple, overlapping news sources (e.g., Agence France-Presse, Reuters). Therefore, we propose a novel maximum likelihood estimator that corrects for misclassification in data arising from multiple sources. In the most general formulation of our estimator, researchers can specify separate sets of predictors for the true-event model and each of the misclassification models characterizing whether a source fails to report on an event. As such, researchers are able to accurately test theories on both the causes of and reporting on an event of interest. Simulations evidence that our technique regularly outperforms current strategies that either neglect misclassification, the unique features of the data-generating process, or both. We also illustrate the utility of this method with a model of repression using the Social Conflict in Africa Database.

Type
Articles
Copyright
Copyright © The Author(s) 2017. Published by Cambridge University Press on behalf of the Society for Political Methodology. 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Authors’ note: For their helpful comments and suggestions, thanks to Kenneth Benoit, Graeme Blair, Chad Hazlett, Florian Hollenbach, Idean Salyehan, Nils Weidmann, the reviewers, and the editor(s). Replication materials are available online at Cook et al. (2016). All inquiries should be sent to the corresponding author at sjcook@tamu.edu

Contributing Editor: R. Michael Alvarez

References

Abrevaya, Jason, and Hausman, Jerry A.. 1999. Semiparametric estimation with mismeasured dependent variables: An application to duration models for unemployment spells. Annales d’Economie et de Statistique 55/56:243275.Google Scholar
Achen, Christopher H. 2002. Toward a new political methodology: Microfoundations and ART. Annual Review of Political Science 5(1):423450.Google Scholar
Carroll, Raymond J., Ruppert, David, Stefanski, Leonard A., and Crainiceanu, Ciprian M.. 2006. Measurement error in nonlinear models: a modern perspective . Boca Raton, FL: CRC Press.Google Scholar
Carroll, Raymond J., and Pederson, Shane. 1993. On robustness in the logistic regression model. Journal of the Royal Statistical Society, Series B 55:693706.Google Scholar
Cook, Scott, Blas, Betsabe, Carroll, Raymond, and Sinha, Samiran. 2016. Replication data for: Two wrongs make a right. doi:10.7910/DVN/92GMLB, Harvard Dataverse.Google Scholar
Copas, J. B. 1988. Binary regression models for contaminated data. Journal of the Royal Statistical Society, Series B 50:225265.Google Scholar
Davenport, Christian. 2007. State repression and political order. Annual Review of Political Science 10:123.Google Scholar
Davenport, Christian, and Ball, Patrick. 2002. Views to a kill exploring the implications of source selection in the case of Guatemalan state terror, 1977–1995. Journal of Conflict Resolution 46(3):427450.Google Scholar
Earl, Jennifer, Martin, Andrew, McCarthy, John D., and Soule, Sarah A.. 2004. The use of newspaper data in the study of collective action. Annual Review of Sociology 30:6580.Google Scholar
Hausman, Jerry A., Abrevaya, Jason, and Scott-Morton, Fiona M.. 1998. Misclassification of the dependent variable in a discrete-response setting. Journal of Econometrics 87(2):239269.Google Scholar
Heckman, James J.1977. Sample selection bias as a specification error (with an application to the estimation of labor supply functions).Google Scholar
Hendrix, Cullen S., and Salehyan, Idean. 2015. No news is good news: Mark and recapture for event data when reporting probabilities are less than one. International Interactions 41(2):392406.Google Scholar
Hendrix, Cullen S., and Salehyan, Idean. 2016. A house divided threat perception, military factionalism, and repression in Africa. Journal of Conflict Resolution (forthcoming).Google Scholar
Hug, Simon. 2003. Selection bias in comparative research: The case of incomplete data sets. Political Analysis 11(3):255274.Google Scholar
Hug, Simon. 2009. The effect of misclassifications in probit models: Monte Carlo simulations and applications. Political Analysis 18(1):78102.Google Scholar
Hug, Simon, and Wisler, Dominique. 1998. Correcting for selection bias in social movement research. Mobilization: An International Quarterly 3(2):141161.Google Scholar
Imai, Kosuke, and Yamamoto, Teppei. 2010. Causal inference with differential measurement error: Nonparametric identification and sensitivity analysis. American Journal of Political Science 54(2):543560.Google Scholar
Mackenzie, Darryl I., Nichols, James D., Andrew Royle, J., Pollock, Kenneth H., Bailey, Larissa L., and Hines, James E.. 2006. Occupancy estimation and modeling: Inferring patterns and dynamics of species occurrence . San Diego, CA: Elsevier Academic Press.Google Scholar
Maddala, Gangadharrao S. 1983. Limited-dependent and qualitative variables in econometrics. Number 1 . New York: Cambridge University Press.Google Scholar
Poe, Steven C., and Neal Tate, C.. 1994. Repression of human rights to personal integrity in the 1980s: A global analysis. American Political Science Review 88(4):853872.Google Scholar
Poe, Steven C., Neal Tate, C., and Keith, Linda Camp. 1999. Repression of the human right to personal integrity revisited: A global cross-national study covering the years 1976–1993. International Studies Quarterly 43(2):291313.Google Scholar
Poe, Steven C., Nicolas, Rost, and Carey, Sabine C.. 2006. Assessing risk and opportunity in conflict studies: A human rights analysis. Journal of Conflict Resolution 50(4):484507.Google Scholar
Salehyan, Idean, Hendrix, Cullen S., Hamner, Jesse, Case, Christina, Linebarger, Christopher, Stull, Emily, and Williams, Jennifer. 2012. Social conflict in Africa: A new database. International Interactions 38(4):503511.Google Scholar
Schrodt, Philip A. 2012. Precedents, progress, and prospects in political event data. International Interactions 38(4):546569.Google Scholar
Schrodt, Philip A., and Gerner, Deborah J.. 1994. Validity assessment of a machine-coded event data set for the Middle East, 1982–92. American Journal of Political Science 38(3):825854.Google Scholar
Strange, Austin M., Park, Bradley, Tierney, Michael J., Fuchs, Andreas, Dreher, Axel, and Ramachandran, Vijaya. 2013. China’s development finance to Africa: A media-based approach to data collection. Center for Global Development Working Paper No. 323.Google Scholar
Trumbore, Peter F., and Woo, Byungwon. 2014. Smugglers blues: Examining why countries become narcotics transit states using the new international narcotics production and transit (INAPT) data set. International Interactions 40(5):763787.Google Scholar
Weidmann, Nils B. 2014. On the accuracy of media-based conflict event data. Journal of Conflict Resolution 59(6):11291149.Google Scholar
Weidmann, Nils B. 2016. A closer look at reporting bias in conflict event data. American Journal of Political Science 60(1):206218.Google Scholar
Woolley, John T. 2000. Using media-based data in studies of politics. American Journal of Political Science 44(1):156173.Google Scholar
Supplementary material: File

Cook supplementary material

Cook supplementary material 1

Download Cook supplementary material(File)
File 217 KB