Hostname: page-component-5b777bbd6c-gcwzt Total loading time: 0 Render date: 2025-06-20T19:52:56.893Z Has data issue: false hasContentIssue false

Optimizing Large-Scale Educational Assessment with a “Divide-and-Conquer” Strategy: Fast and Efficient Distributed Bayesian Inference in IRT Models

Published online by Cambridge University Press:  01 January 2025

Sainan Xu
Affiliation:
Northeast Normal University
Jing Lu*
Affiliation:
Northeast Normal University
Jiwei Zhang*
Affiliation:
Northeast Normal University
Chun Wang
Affiliation:
University of Washington
Gongjun Xu
Affiliation:
University of Michigan
*
Correspondence should be made to Jing Lu, Key Laboratory of Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun, Jilin, China. Email: luj282@nenu.edu.cn
Correspondence should be made to Jiwei Zhang, Faculty of Education, Key Laboratory of Applied Statistics of MOE, Northeast Normal University, Changchun, Jilin, China. Email: zhangjw713@nenu.edu.cn

Abstract

With the growing attention on large-scale educational testing and assessment, the ability to process substantial volumes of response data becomes crucial. Current estimation methods within item response theory (IRT), despite their high precision, often pose considerable computational burdens with large-scale data, leading to reduced computational speed. This study introduces a novel “divide- and-conquer” parallel algorithm built on the Wasserstein posterior approximation concept, aiming to enhance computational speed while maintaining accurate parameter estimation. This algorithm enables drawing parameters from segmented data subsets in parallel, followed by an amalgamation of these parameters via Wasserstein posterior approximation. Theoretical support for the algorithm is established through asymptotic optimality under certain regularity assumptions. Practical validation is demonstrated using real-world data from the Programme for International Student Assessment. Ultimately, this research proposes a transformative approach to managing educational big data, offering a scalable, efficient, and precise alternative that promises to redefine traditional practices in educational assessments.

Type
Theory and Methods
Copyright
© 2024 The Author(s), under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s11336-024-09978-1.

References

Ackerman, T. (1996). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement, 20(4), 311329.CrossRefGoogle Scholar
Agueh, M., Carlier, G. (2011). Barycenters in the Wasserstein space. SIAM Journal on Mathematical Analysis, 43(2), 904924.CrossRefGoogle Scholar
Alquier, P., Friel, N., Everitt, R., Boland, A. (2016). Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels. Statistics and Computing, 26 1–22947.CrossRefGoogle Scholar
Álvarez-Esteban, P. C., Del Barrio, E., Cuesta-Albertos, J. A., Matrán, C. (2016). A fixed-point approach to barycenters in Wasserstein space. Journal of Mathematical Analysis and Applications, 441(2), 744762.CrossRefGoogle Scholar
Baker, F. B., Kim, S. H. (2004). Item response theory: Parameter estimation techniques, New York: Dekker.CrossRefGoogle Scholar
Balamuta, J. J., Culpepper, S. A. (2022). Exploratory restricted latent class models with monotonicity requirements under Pólya-Gamma data augmentation. Psychometrika, 87(3), 903945.CrossRefGoogle ScholarPubMed
Béguin, A. A., Glas, C. A. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541561.CrossRefGoogle Scholar
Birnbaum, A. (1957). Efficient design and use of tests of a mental ability for various decision-making problems. Series Report No. 58–16. Randolph Air Force Base. USAF School of Aviation Medicine.Google Scholar
Bock, R. D., Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443459.CrossRefGoogle Scholar
Choi, H. M., Hobert, J. P. (2013). The Pólya-Gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic. Electronic Journal of Statistics, 7, 20542064.CrossRefGoogle Scholar
Cuhadar, I. (2022). Sample size requirements for parameter recovery in the 4-Parameter logistic model. Measurement: Interdisciplinary Research and Perspectives, 20(2), 5772.Google Scholar
Culpepper, S. A. (2016). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika, 81(4), 11421163.CrossRefGoogle ScholarPubMed
De Ayala, R. J. (2013). Theory and practice of item response theory, Cham: Guilford Publications.Google Scholar
de la Torre, J., Hong, Y. (2010). Parameter estimation with small sample size a higher-order IRT model approach. Applied Psychological Measurement, 34, 267285.CrossRefGoogle Scholar
Du, H., Enders, C., Keller, B. T., Bradbury, T. N., Karney, B. R. (2022). A Bayesian latent variable selection model for nonignorable missingness. Multivariate Behavioral Research, 57 2–3478512.CrossRefGoogle ScholarPubMed
Embretson, S. E., Reise, S. P. (2000). Item response theory for psychologists, Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
Fox, J. P. (2010). Bayesian item response modeling: Theory and applications, New York: Springer.CrossRefGoogle Scholar
Giordano, R., Broderick, T., Jordan, M. I. (2018). Covariances, robustness and variational bayes. Journal of Machine Learning Research, 19(51), 149.Google Scholar
Hartshorne, J. K., Tenenbaum, J. B., Pinker, S. (2018). A critical period for second language acquisition: Evidence from 2/3 million English speakers. Cognition, 177, 263277.CrossRefGoogle ScholarPubMed
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97109.CrossRefGoogle Scholar
Hoffman, M. D., Blei, D. M., Wang, C., Paisley, J. (2013). Stochastic variational inference. Journal of Machine Learning Research., 14(5), 13031347.Google Scholar
Jiang, Z., Templin, J. (2019). Gibbs samplers for logistic item response models via the Pólya-Gamma distribution: A computationally efficient data-augmentation strategy. Psychometrika, 84(2), 358374.CrossRefGoogle ScholarPubMed
Jimenez, A., Balamuta, J. J., Culpepper, S. A. (2023). A sequential exploratory diagnostic model using a Pólya-gamma data augmentation strategy. British Journal of Mathematical and Statistical Psychology, 76(3), 513538.CrossRefGoogle ScholarPubMed
Kass, R. E., Tierney, L., Kadane, J. B. (1990). The validity of posterior expansions based on Laplace’s method. Bayesian and Likelihood Methods in Statistics and Econometrics, 7, 473487.Google Scholar
König, C., Spoden, C., Frey, A. (2020). An optimized Bayesian hierarchical two-parameter logistic model for small-sample item calibration. Applied Psychological Measurement, 44(4), 311326.CrossRefGoogle ScholarPubMed
Korattikara, A., Chen, Y., & Welling, M. (2014). Austerity in MCMC land: Cutting the Metropolis-Hastings budget. In International Conference on Machine Learning, pp. 181–189.Google Scholar
Lee, C. Y. Y., Wand, M. P. (2016). Streamlined mean field variational Bayes for longitudinal and multilevel data analysis. Biometrical Journal, 58(4), 868895.CrossRefGoogle ScholarPubMed
Li, C., Srivastava, S., Dunson, D. B. (2017). Simple, scalable and accurate posterior interval estimation. Biometrika, 104(3), 665680.CrossRefGoogle Scholar
Lu, J., Zhang, J. W., Tao, J. (2018). Slice-Gibbs sampling algorithm for estimating the parameters of a multilevel item response model. Journal of Mathematical Psychology, 82, 1225.CrossRefGoogle Scholar
Martin, M. O., & Kelly, D. L. (1996). Third international mathematics and science study technical report volume 1: Design and development. Chestnut Hill: Boston College.Google Scholar
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., Teller, E. (1953). Equations of state space calculations by fast computing machines. The Journal of Chemical Physics, 21(6), 10871092.CrossRefGoogle Scholar
Minsker, S., Srivastava, S., Lin, L., Dunson, D. B. (2017). Robust and scalable Bayes via a median of subset posterior measures. The Journal of Machine Learning Research, 18(1), 44884527.Google Scholar
Minsker, S., Srivastava, S., Lin, L., & Dunson, D. (2014). Scalable and robust Bayesian inference via the median posterior. In International Conference on Machine Learning, pp. 1656–1664.Google Scholar
Neal, R. (2003). Slice sampling. The Annals of Statistics, 31(3), 705767.CrossRefGoogle Scholar
Neiswanger, W., Wang, C., & Xing, E. (2014). Asymptotically exact, embarrassingly parallel MCMC. In Proceedings of the 30th International Conference on Uncertainty in Artificial Intelligence, pp. 623–632.Google Scholar
OECD (2021). PISA 2018 technical report, Paris: OECD Publishing.Google Scholar
Pohl, S., Gräfe, L., Rose, N. (2014). Dealing with omitted and not-reached items in competence tests: Evaluating approaches accounting for missing responses in item response theory models. Educational and Psychological Measurement, 74(3), 423452.CrossRefGoogle Scholar
Polson, N. G., Scott, J. G., Windle, J. (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. Journal of the American Statistical Association, 108(504), 13391349.CrossRefGoogle Scholar
Quiroz, M., Kohn, R., Villani, M., Tran, M. N. (2019). Speeding up MCMC by efficient data subsampling. Journal of the American Statistical Association, 114(526), 831843.CrossRefGoogle Scholar
Reckase, M. D. (1972). Development and application of a multivariate logistic latent trait model, Syracuse University.Google Scholar
Reckase, M. D. (2009). Multidimensional item response theory, New York, NY: Springer.CrossRefGoogle Scholar
Robitzsch, A., Rupp, A. A. (2009). Impact of missing data on the detection of differential item functioning: The case of Mantel–Haenszel and logistic regression analysis. Educational and Psychological Measurement, 69(1), 1834.CrossRefGoogle Scholar
Rue, H., Martino, S., Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society Series B: Statistical Methodology, 71(2), 319392.CrossRefGoogle Scholar
San Martín, E. (2016). Identification of item response theory models. Handbook of item response theory, 2, 127150.Google Scholar
Schilling, S., Bock, R. D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533555.CrossRefGoogle Scholar
Scott, S. L., Blocker, A. W., Bonassi, F. V., Chipman, H. A., George, E. I., McCulloch, R. E. (2016). Bayes and big data: The consensus Monte Carlo algorithm. International Journal of Management Science and Engineering Management, 11(2), 7888.CrossRefGoogle Scholar
Shyamalkumar, N. D., Srivastava, S. (2022). An algorithm for distributed Bayesian inference. Stat, 11(1).CrossRefGoogle Scholar
Skrondal, A., Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models, Cham: Crc Press.CrossRefGoogle Scholar
Sportisse, A., Boyer, C., Josse, J. (2020). Imputation and low-rank estimation with missing not at random data. Statistics and Computing, 30(6), 16291643.CrossRefGoogle Scholar
Srivastava, S., Cevher, V., Dinh, Q., & Dunson, D. (2015). WASP: Scalable Bayes via barycenters of subset posteriors. In Artificial Intelligence and Statistics, pp. 912–920.Google Scholar
Srivastava, S., Li, C., Dunson, D. B. (2018). Scalable Bayes via barycenter in Wasserstein space. The Journal of Machine Learning Research, 19(1), 312346.Google Scholar
Srivastava, S., Xu, Y. (2021). Distributed Bayesian inference in linear mixed-effects models. Journal of Computational and Graphical Statistics, 30(3), 594611.CrossRefGoogle Scholar
Tan, L. S., Nott, D. J. (2014). A stochastic variational framework for fitting and diagnosing generalized linear mixed models. Bayesian Analysis, 9(4), 9631004.CrossRefGoogle Scholar
van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287308.CrossRefGoogle Scholar
van der Linden, W. J., Hambleton, R. K. (1997). Handbook of modern item response theory, New York: Springer-Verlag.CrossRefGoogle Scholar
van Rijn, P. W., Sinharay, S., Haberman, S. J., Johnson, M. S. (2016). Assessment of fit of item response theory models used in large-scale educational survey assessments. Large-Scale Assessments in Education, 4, 123.CrossRefGoogle Scholar
Vehtari, A., Gelman, A., Sivula, T., Jylänki, P., Tran, D., Sahai, S., Robert, C. P. (2020). Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data. The Journal of Machine Learning Research, 21(1), 577629.Google Scholar
Wang, C., Fan, Z., Chang, H. H., Douglas, J. (2013). A semiparametric model for jointly analyzing response times and accuracy in computerized testing. Journal of Educational and Behavioral Statistics, 38(4), 381417.CrossRefGoogle Scholar
Wang, C., Srivastava, S. (2023). Divide-and-conquer Bayesian inference in hidden Markov models. Electronic Journal of Statistics, 17(1), 895947.CrossRefGoogle Scholar
Wang, C., Xu, G., Shang, Z. (2018). A two-stage approach to differentiating normal and aberrant behavior in computer based testing. Psychometrika, 83, 223254.CrossRefGoogle ScholarPubMed
Wu, M., Davis, R. L., Domingue, B. W., Piech, C., & Goodman, N. (2020). Variational item response theory: Fast, accurate, and expressive. ArXiv:2002.00276.Google Scholar
Xue, J., Liang, F. (2019). Double-parallel Monte Carlo for Bayesian analysis of big data. Statistics and Computing, 29(1), 2332.CrossRefGoogle ScholarPubMed
Supplementary material: File

Xu et al. supplementary materials

Xu et al. supplementary materials
Download Xu et al. supplementary materials(File)
File 1.5 MB