Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-01-26T02:00:24.406Z Has data issue: false hasContentIssue false

On the Quantification of Model Uncertainty: A Bayesian Perspective

Published online by Cambridge University Press:  01 January 2025

David Kaplan*
Affiliation:
University of Wisconsin–Madison
*
Correspondence should be made to David Kaplan, University of Wisconsin–Madison, Madison, USA. Email: dkaplan@education.wisc.edu

Abstract

Issues of model selection have dominated the theoretical and applied statistical literature for decades. Model selection methods such as ridge regression, the lasso, and the elastic net have replaced ad hoc methods such as stepwise regression as a means of model selection. In the end, however, these methods lead to a single final model that is often taken to be the model considered ahead of time, thus ignoring the uncertainty inherent in the search for a final model. One method that has enjoyed a long history of theoretical developments and substantive applications, and that accounts directly for uncertainty in model selection, is Bayesian model averaging (BMA). BMA addresses the problem of model selection by not selecting a final model, but rather by averaging over a space of possible models that could have generated the data. The purpose of this paper is to provide a detailed and up-to-date review of BMA with a focus on its foundations in Bayesian decision theory and Bayesian predictive modeling. We consider the selection of parameter and model priors as well as methods for evaluating predictions based on BMA. We also consider important assumptions regarding BMA and extensions of model averaging methods to address these assumptions, particularly the method of Bayesian stacking. Simple empirical examples are provided and directions for future research relevant to psychometrics are discussed.

Type
Application Reviews and Case Studies
Copyright
Copyright © 2021 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Akaike, H. Petrov, B. N., & Csaki, F. (1973). Information theory and an extension of the maximum likelihood principle Second international symposium on information theory, Budapest: Akademiai Kiado.Google Scholar
Berger, J. (2013). Statistical decision theory and Bayesian analysis. New York: Springer.Google Scholar
Bernardo, J., & Smith, A. F. M (2000). Bayesian theory. New York: Wiley.Google Scholar
Breiman, L. (1996). Stacked regressions. Machine Learning, 24, 4964.CrossRefGoogle Scholar
Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78, 13.2.0.CO;2>CrossRefGoogle Scholar
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach. (2). New York: Springer.Google Scholar
Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging. Cambridge: Cambridge University Press.Google Scholar
Clarke, B. S., & Clarke, J. L. (2018). Predictive statistics: Analysis and inference beyond models. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Clyde, M. A. (1999). Bayesian model averaging and model search strategies. Bayesian statistics. Oxford: Oxford University Press. 157185.Google Scholar
Clyde, M. A. (2003). Model averaging. In In S. J. Press (Ed.), Subjective and objective Bayesian statistics: Principles, models, and applications (pp. 320–335). Hoboken, NJ: Wiley-Interscience.Google Scholar
Clyde, M. A. (2017). BAS: Bayesian adaptive sampling for bayesian model averaging [Computer software manual]. (R package version 1.4.7).Google Scholar
Clyde, M. A., & George, E. I. (2004). Model uncertainty. Statistical Science, 19, 8194.CrossRefGoogle Scholar
Clyde, M. A., & Iversen, E. S. (2013). Bayesian model averaging in the M-open framework. Bayesian theory and applications. Oxford: Oxford University Press. 483498.CrossRefGoogle Scholar
Dawid, A. P. (1982). The well-calibrated Bayesian. Journal of the American Statistical Association, 77, 605610.CrossRefGoogle Scholar
Dawid, A. P. (1984). Statistical theory: The prequential approach. Journal of the Royal Statistical Society, Series A, 147, 202278.CrossRefGoogle Scholar
de Finetti, B.Good, I. J. (1962). Does it make sense to speak of good probability appraisers. The scientist speculates—A anthology of partly-baked ideas, London: Heinemann. 357364.Google Scholar
Draper, D. (1995). Assessment and propagation of model uncertainty (with discussion). Journal of the Royal Statistical Society (Series B), 57, 5598.CrossRefGoogle Scholar
Draper, D. (2013). Bayesian model specification: Heuristics and examples. Bayesian theory and applications. Oxford: Oxford University Press. 483498.Google Scholar
Draper, D., Hodges, J. S., Leamer, E. E., Morris, C. N., & Rubin, D. B. (1987). https://www.rand.org/pubs/notes/N2683.html A Research Agenda for Assessment and Propagation of Model Uncertainty (Tech. Rep.). Santa Monica, CA: Rand Corporation. Retrieved from (N-2683-RC).Google Scholar
Eicher, T. S., Papageorgiou, C., & Raftery, A. E. (2011). Default priors and predictive performance in Bayesian model averaging, with application to growth determinants. Journal of Applied Econometrics, 26 (1), 3055.CrossRefGoogle Scholar
Feldkircher, M. & Zeugner, S. (2009). Benchmark priors revisited: on adaptive shrinkage and the supermodel effect in Bayesian model averaging (No. 9-202). International Monetary Fund.Google Scholar
Fernández, C., Ley, E., & Steel, M. F. J (2001). Benchmark priors for Bayesian model averaging. Journal of Econometrics, 100, 381427.CrossRefGoogle Scholar
Fernández, C., Ley, E., & Steel, M. F. J (2001). Model uncertainty in cross-country growth regressions. Journal of Applied Econometrics, 16, 563576.CrossRefGoogle Scholar
Fletcher, D. (2018). Model averaging. Berlin: Springer.CrossRefGoogle Scholar
Foster, D. P., & George, E. I. (1994). The risk inflation criterion for multiple regression. Annals of Statistics, 22, 19471975.CrossRefGoogle Scholar
Furnival, G. M., & Wilson, R. W. Jr (1974). Regressions by leaps and bounds. Technometrics, 16, 499511.CrossRefGoogle Scholar
Geisser, S., & Eddy, W. F. (1979). Journal of the American Statistical Association. 74, 153160.CrossRefGoogle Scholar
Gelfand, A. E. Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (1996). Model determination using sampling-based methods. Markov Chain Monte Carlo in practice, Boca Raton: Chapman & Hall. 145161.Google Scholar
Gelman, A., Meng, X. -L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies: With commentary. Statistical Science, 6, 733807.Google Scholar
George, E., & Foster, D. (2000). Calibration and empirical Bayes variable selection. Biometrika, 1, 87Google Scholar
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (1996). Markov Chain Monte Carlo in practice, London: Chapman and Hall.Google Scholar
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102, 359378.CrossRefGoogle Scholar
Good, I. J. (1952). Rational decisions. Journal of the Royal Statistical Society Series B (Methodological), 14, 107114.CrossRefGoogle Scholar
Goodrich, B. Gabry, J., Ali, I., & Brilleman, S. (2020). https://mc-stan.org/rstanarm rstanarm: Bayesian applied regression modeling via Stan. Retrieved from (R package version 2.21.1)Google Scholar
Hannan, E. J., & Quinn, B. G. (1979). The determination of the order of an autoregression. Journal of the Royal Statistical Society Series B (Methodological), 41, 2 190195.CrossRefGoogle Scholar
Hansen, M. H., & Yu, B. (2001). Model selection and the principle of minimum description length. Journal of the American Statistical Association, 96, 746774.CrossRefGoogle Scholar
Heckman, J. J., & Kautz, T. (2012). Hard evidence on soft skills. Labour Economics, 19, 451464.CrossRefGoogle ScholarPubMed
Hinne, M., Gronau, Q. F., van den Bergh, D., & Wagenmakers, E. -J. (2020). A conceptual introduction to Bayesian model averaging. Advances in Methods and Practices in Psychological Science, 3, 200215.CrossRefGoogle Scholar
Hjort, N. L., & Claeskens, G. (2003). Frequentist model average estimators. Journal of the American Statistical Association, 98, 879899.CrossRefGoogle Scholar
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12 (1), 5567.CrossRefGoogle Scholar
Hoerl, R. W. (1985). Ridge analysis 25 years later. The American Statistician, 39, 3 186192.CrossRefGoogle Scholar
Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14, 382417.Google Scholar
Hsiang, T. C. (1975). A Bayesian View on Ridge Regression. Journal of the Royal Statistical Society, D (The Statistician), 24, 267–268.CrossRefGoogle Scholar
Jose, V. R. R, Nau, R. F., & Winkler, R. L. (2008). Scoring rules, generalized entropy, and utility maximization. Operations Research, 56, 11461157.CrossRefGoogle Scholar
Kaplan, D., & Chen, J. (2014). Bayesian model averaging for propensity score analysis. Multivariate Behavioral Research, 49, 505517.CrossRefGoogle ScholarPubMed
Kaplan, D., & Huang, M. (under review). Bayesian probabilistic forecasting with state NAEP data.Google Scholar
Kaplan, D., & Kuger, S. Kuger, S., Klieme, E., Jude, N., & Kaplan, D. (2016). The methodology of PISA: Past, present, and future. Assessing contexts of learning world-wide—extended context assessment frameworks, Dordrecht: Springer.Google Scholar
Kaplan, D., & Lee, C. (2015). Bayesian model averaging over directed acyclic graphs with implications for the predictive performance of structural equation models. Structural Equation Modeling,Google Scholar
Kaplan, D., & Yavuz, S. (2019). An approach to addressing multiple imputation model uncertainty using Bayesian model averaging. Multivariate Behavioral Research, 1, 21Google Scholar
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773795.CrossRefGoogle Scholar
Kuger, S., Klieme, E., Jude, N., & Kaplan, D. (2016). Assessing contexts of learning: An international perspective. Dordrecht: Springer.CrossRefGoogle Scholar
Kullback, S. (1959). Information theory and statistics. New York: Wiley.Google Scholar
Kullback, S. (1987). The Kullback–Leibler distance. The American Statistician, 41, 340341.Google Scholar
Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 7986.CrossRefGoogle Scholar
Le, T., & Clarke, B. (2017).A Bayes interpretation of stacking for M\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\cal{M}$$\end{document}-complete and M\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\cal{M}$$\end{document}-open settings. Bayesian Analysis, 12, 807829.Google Scholar
Leamer, E. E. (1978). Specification searches: Ad hoc inference with nonexperimental data, New York: Wiley.Google Scholar
Ley, E., & Steel, M. F. J (2009). On the effect of prior assumptions in bayesian model averaging with applications to growth regression. Journal of Applied Econometrics, 24, 651674.CrossRefGoogle Scholar
Li, Q.Lin, N. (2010). The Bayesian elastic net. Bayesian Analysis, 5, 151170.CrossRefGoogle Scholar
Liang, F., Paulo, R., Molina, G., Clyde, M. A., & Berger, J. (2008).Mixtures of g\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$g$$\end{document}-priors for Bayesian variable selection. Journal of the American Statistical Association, 103, 410423.CrossRefGoogle Scholar
Lindley, D. (1991). Making Decisions, London: Wiley.Google Scholar
Madigan, D., & Raftery, A. E. (1994). Model selection and accounting for model uncertainly in graphical models using Occam’s window. Journal of the American Statistical Association, 89, 15351546.CrossRefGoogle Scholar
Madigan, D., & York, J. (1995). Bayesian graphical models for discrete data. International Statistical Review, 63, 215232.CrossRefGoogle Scholar
Merkle, E. C., & Steyvers, M. (2013). Choosing a strictly proper scoring rule. Decision Analysis, 10, 292304.CrossRefGoogle Scholar
Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56, 177196.CrossRefGoogle Scholar
Mislevy, R. J., Beaton, A. E., Kaplan, B., & Sheehan, K. M. (1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement, 29, 133161.CrossRefGoogle Scholar
Montgomery, J. M.,Nyhan, B.(2010).Bayesian model averaging: Theoretical developments and practical applications.Political Analysis,18,245270.CrossRefGoogle Scholar
OECD. (2002). PISA 2000 Technical Report. Paris: Organization for Economic Cooperation and Development.Google Scholar
OECD. (2009). Pisa 2009 assessment framework-key competencies in reading, mathematics and science. Paris: Organization for Economic Cooperation and Development.Google Scholar
OECD. (2017). PISA 2015 Technical ReportParis: OECD.Google Scholar
OECD. (2018). https://doi.org/10.1787/9789264073234-en. Equity in Education: Breaking Down Barriers to Social Mobility (Tech. Rep.). Paris.Google Scholar
Park, T., &Casella, G.(2008).The Bayesian lasso.Journal of the American Statistical Association,103,681686.CrossRefGoogle Scholar
Piironen, J., &Vehtari, A.(2017).Comparison of Bayesian prediction methods for model selection.Statistics and Computing,27,711735.CrossRefGoogle Scholar
Raftery, A. E.Marsden, P. V.(1995).Bayesian model selection in social research (with discussion).Sociological Methodology,New York:Blackwell.111196.Google Scholar
Raftery, A. E.(1996).Approximate Bayes factors and accounting for model uncertainty in generalized linear models.Biometrika,83,251266.CrossRefGoogle Scholar
Raftery, A. E.,Gneiting, T.,Balabdaoui, F., &Polakowski, M.(2005).Using Bayesian model averaging to calibrate forecast ensembles.Monthly Weather Review,133,11551174.CrossRefGoogle Scholar
Raftery, A. E.,Hoeting, J., Volinsky, C., Painter, I., & Yeung, K. (2015). http://CRAN.R-project.org/package=BMABMA: Bayesian model averaging [Computer software manual]. Retrieved from (R package version 3.18.1).Google Scholar
Raftery, A. E.,Madigan, D., &Hoeting, J. A.(1997).Bayesian model averaging for linear regression models.Journal of the American Statistical Association,92,179191.CrossRefGoogle Scholar
Raftery, A. E., &Zheng, Y.(2003).Discussion: Performance of Bayesian model averaging.Journal of the American Statistical Association,98,931938.CrossRefGoogle Scholar
Rights, J.,Sterba, S.,Cho, S. -J., &Preacher, K.(2018).Addressing model uncertainty in item response theory person scores through model averaging.Behaviormetrika,45,495503.CrossRefGoogle Scholar
Rubin, D. B.(1981).The Bayesian bootstrap.The Annals of Statistics,9,130134.CrossRefGoogle Scholar
Sloughter, J. M.,Gneiting, T., &Raftery, A. E.(2013).Probabilistic wind vector forecasting using ensembles and Bayesian model averaging.Monthly Weather Review,141,21072119.CrossRefGoogle Scholar
Steel, M. F. J.(2020).Model averaging and its use in economics.Journal of Economic Literature,58,644719.CrossRefGoogle Scholar
Tibshirani, R.(1996).Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society Series B (Methodological),58,267288.CrossRefGoogle Scholar
Tierney, L., &Kadane, J. B.(1986).Accurate approximations for posterior moments and marginal densities.Journal of the American Statistical Association,81,8286.CrossRefGoogle Scholar
Vehtari, A.,Gabry, J., Yao, Y., & Gelman, A.(2019). loo: Efficient leave-one-out cross-validation and WAIC for Bayesian models. Retrieved from https://CRAN.R-project.org/package=loo (R package version 2.1.0).Google Scholar
Vehtari, A., Gelman, A., &Gabry, J.(2017).Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC.Statistics and Computing,27,14131432.CrossRefGoogle Scholar
Vehtari, A., &Ojanen, J.(2012).A survey of Bayesian predictive methods for model assessment, selection and comparison.Statistics Surveys,6,142228.CrossRefGoogle Scholar
Watanabe, S.(2010).Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory.Journal of Machine Learning Research,11,35713594.Google Scholar
Winkler, R. L.(1996).Scoring rules and the evaluation of probabilities.Test,5,160.CrossRefGoogle Scholar
Wolpert, D. H.(1992).Stacked generalization.Neural Networks,5,241259.CrossRefGoogle Scholar
Yao, Y.,Vehtari, A.,Simpson, D., &Gelman, A.(2018).Using stacking to average Bayesian predictive distributions (with discussion).Bayesian Analysis,13,9171007.CrossRefGoogle Scholar
Yeung, K. Y.,Bumbarner, R. E., &Raftery, A. E.(2005).Bayesian model averaging: Development of an improved multi-class, gene selection, and classification tool for microarray data.Bioinformatics,21,23942402.CrossRefGoogle ScholarPubMed
Zellner, A.Goel, P, &Zellner, A.(1986).On assessing prior distributions and Bayesian regression analysis with prior distributions.Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti . Studies in Bayesian Econometrics,New York:Elsevier.233243.Google Scholar
Zeugner, S., &Feldkircher, M.(2015).Bayesian model averaging employing fixed and flexible priors: The BMS package for R.Journal of Statistical Software,68,4137.CrossRefGoogle Scholar
Zou, H., &Hastie, T.(2005).Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society: Series B (Statistical Methodology),67,301320.CrossRefGoogle Scholar