Optimizing Large-Scale Educational Assessment with a “Divide-and-Conquer” Strategy: Fast and Efficient Distributed Bayesian Inference in IRT Models

Sainan Xu; Jing Lu; Jiwei Zhang; Chun Wang; Gongjun Xu

doi:10.1007/s11336-024-09978-1

Optimizing Large-Scale Educational Assessment with a “Divide-and-Conquer” Strategy: Fast and Efficient Distributed Bayesian Inference in IRT Models

Published online by Cambridge University Press: 01 January 2025

Sainan Xu ,

Chun Wang and

Sainan Xu: Affiliation:
Northeast Normal University
Jing Lu*: Affiliation:
Northeast Normal University
Jiwei Zhang*: Affiliation:
Northeast Normal University
Chun Wang: Affiliation:
University of Washington
Gongjun Xu: Affiliation:
University of Michigan
*: Correspondence should be made to Jing Lu, Key Laboratory of Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun, Jilin, China. Email: luj282@nenu.edu.cn
Correspondence should be made to Jiwei Zhang, Faculty of Education, Key Laboratory of Applied Statistics of MOE, Northeast Normal University, Changchun, Jilin, China. Email: zhangjw713@nenu.edu.cn

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

With the growing attention on large-scale educational testing and assessment, the ability to process substantial volumes of response data becomes crucial. Current estimation methods within item response theory (IRT), despite their high precision, often pose considerable computational burdens with large-scale data, leading to reduced computational speed. This study introduces a novel “divide- and-conquer” parallel algorithm built on the Wasserstein posterior approximation concept, aiming to enhance computational speed while maintaining accurate parameter estimation. This algorithm enables drawing parameters from segmented data subsets in parallel, followed by an amalgamation of these parameters via Wasserstein posterior approximation. Theoretical support for the algorithm is established through asymptotic optimality under certain regularity assumptions. Practical validation is demonstrated using real-world data from the Programme for International Student Assessment. Ultimately, this research proposes a transformative approach to managing educational big data, offering a scalable, efficient, and precise alternative that promises to redefine traditional practices in educational assessments.

Keywords

large-scale testing item response theory divide-and-conquer strategy distributed Bayesian inference Wasserstein posterior

Information

Type: Theory and Methods
Information: Psychometrika , Volume 89 , Issue 4 , December 2024 , pp. 1119 - 1147

DOI: https://doi.org/10.1007/s11336-024-09978-1 [Opens in a new window]
Copyright: © 2024 The Author(s), under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s11336-024-09978-1.

References

Ackerman, T. (1996). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement, 20(4), 311–329.CrossRef Google Scholar

Agueh, M., Carlier, G. (2011). Barycenters in the Wasserstein space. SIAM Journal on Mathematical Analysis, 43(2), 904–924.CrossRef Google Scholar

Alquier, P., Friel, N., Everitt, R., Boland, A. (2016). Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels. Statistics and Computing, 26 1–229–47.CrossRef Google Scholar

Álvarez-Esteban, P. C., Del Barrio, E., Cuesta-Albertos, J. A., Matrán, C. (2016). A fixed-point approach to barycenters in Wasserstein space. Journal of Mathematical Analysis and Applications, 441(2), 744–762.CrossRef Google Scholar

Baker, F. B., Kim, S. H. (2004). Item response theory: Parameter estimation techniques, New York: Dekker.CrossRef Google Scholar

Balamuta, J. J., Culpepper, S. A. (2022). Exploratory restricted latent class models with monotonicity requirements under Pólya-Gamma data augmentation. Psychometrika, 87(3), 903–945.CrossRef Google Scholar PubMed

Béguin, A. A., Glas, C. A. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541–561.CrossRef Google Scholar

Birnbaum, A. (1957). Efficient design and use of tests of a mental ability for various decision-making problems. Series Report No. 58–16. Randolph Air Force Base. USAF School of Aviation Medicine.Google Scholar

Bock, R. D., Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459.CrossRef Google Scholar

Choi, H. M., Hobert, J. P. (2013). The Pólya-Gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic. Electronic Journal of Statistics, 7, 2054–2064.CrossRef Google Scholar

Cuhadar, I. (2022). Sample size requirements for parameter recovery in the 4-Parameter logistic model. Measurement: Interdisciplinary Research and Perspectives, 20(2), 57–72.Google Scholar

Culpepper, S. A. (2016). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika, 81(4), 1142–1163.CrossRef Google Scholar PubMed

De Ayala, R. J. (2013). Theory and practice of item response theory, Cham: Guilford Publications.Google Scholar

de la Torre, J., Hong, Y. (2010). Parameter estimation with small sample size a higher-order IRT model approach. Applied Psychological Measurement, 34, 267–285.CrossRef Google Scholar

Du, H., Enders, C., Keller, B. T., Bradbury, T. N., Karney, B. R. (2022). A Bayesian latent variable selection model for nonignorable missingness. Multivariate Behavioral Research, 57 2–3478–512.CrossRef Google Scholar PubMed

Embretson, S. E., Reise, S. P. (2000). Item response theory for psychologists, Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar

Fox, J. P. (2010). Bayesian item response modeling: Theory and applications, New York: Springer.CrossRef Google Scholar

Giordano, R., Broderick, T., Jordan, M. I. (2018). Covariances, robustness and variational bayes. Journal of Machine Learning Research, 19(51), 1–49.Google Scholar

Hartshorne, J. K., Tenenbaum, J. B., Pinker, S. (2018). A critical period for second language acquisition: Evidence from 2/3 million English speakers. Cognition, 177, 263–277.CrossRef Google Scholar PubMed

Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97–109.CrossRef Google Scholar

Hoffman, M. D., Blei, D. M., Wang, C., Paisley, J. (2013). Stochastic variational inference. Journal of Machine Learning Research., 14(5), 1303–1347.Google Scholar

Jiang, Z., Templin, J. (2019). Gibbs samplers for logistic item response models via the Pólya-Gamma distribution: A computationally efficient data-augmentation strategy. Psychometrika, 84(2), 358–374.CrossRef Google Scholar PubMed

Jimenez, A., Balamuta, J. J., Culpepper, S. A. (2023). A sequential exploratory diagnostic model using a Pólya-gamma data augmentation strategy. British Journal of Mathematical and Statistical Psychology, 76(3), 513–538.CrossRef Google Scholar PubMed

Kass, R. E., Tierney, L., Kadane, J. B. (1990). The validity of posterior expansions based on Laplace’s method. Bayesian and Likelihood Methods in Statistics and Econometrics, 7, 473–487.Google Scholar

König, C., Spoden, C., Frey, A. (2020). An optimized Bayesian hierarchical two-parameter logistic model for small-sample item calibration. Applied Psychological Measurement, 44(4), 311–326.CrossRef Google Scholar PubMed

Korattikara, A., Chen, Y., & Welling, M. (2014). Austerity in MCMC land: Cutting the Metropolis-Hastings budget. In International Conference on Machine Learning, pp. 181–189.Google Scholar

Lee, C. Y. Y., Wand, M. P. (2016). Streamlined mean field variational Bayes for longitudinal and multilevel data analysis. Biometrical Journal, 58(4), 868–895.CrossRef Google Scholar PubMed

Li, C., Srivastava, S., Dunson, D. B. (2017). Simple, scalable and accurate posterior interval estimation. Biometrika, 104(3), 665–680.CrossRef Google Scholar

Lu, J., Zhang, J. W., Tao, J. (2018). Slice-Gibbs sampling algorithm for estimating the parameters of a multilevel item response model. Journal of Mathematical Psychology, 82, 12–25.CrossRef Google Scholar

Martin, M. O., & Kelly, D. L. (1996). Third international mathematics and science study technical report volume 1: Design and development. Chestnut Hill: Boston College.Google Scholar

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., Teller, E. (1953). Equations of state space calculations by fast computing machines. The Journal of Chemical Physics, 21(6), 1087–1092.CrossRef Google Scholar

Minsker, S., Srivastava, S., Lin, L., Dunson, D. B. (2017). Robust and scalable Bayes via a median of subset posterior measures. The Journal of Machine Learning Research, 18(1), 4488–4527.Google Scholar

Minsker, S., Srivastava, S., Lin, L., & Dunson, D. (2014). Scalable and robust Bayesian inference via the median posterior. In International Conference on Machine Learning, pp. 1656–1664.Google Scholar

Neal, R. (2003). Slice sampling. The Annals of Statistics, 31(3), 705–767.CrossRef Google Scholar

Neiswanger, W., Wang, C., & Xing, E. (2014). Asymptotically exact, embarrassingly parallel MCMC. In Proceedings of the 30th International Conference on Uncertainty in Artificial Intelligence, pp. 623–632.Google Scholar

OECD (2021). PISA 2018 technical report, Paris: OECD Publishing.Google Scholar

Pohl, S., Gräfe, L., Rose, N. (2014). Dealing with omitted and not-reached items in competence tests: Evaluating approaches accounting for missing responses in item response theory models. Educational and Psychological Measurement, 74(3), 423–452.CrossRef Google Scholar

Polson, N. G., Scott, J. G., Windle, J. (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. Journal of the American Statistical Association, 108(504), 1339–1349.CrossRef Google Scholar

Quiroz, M., Kohn, R., Villani, M., Tran, M. N. (2019). Speeding up MCMC by efficient data subsampling. Journal of the American Statistical Association, 114(526), 831–843.CrossRef Google Scholar

Reckase, M. D. (1972). Development and application of a multivariate logistic latent trait model, Syracuse University.Google Scholar

Reckase, M. D. (2009). Multidimensional item response theory, New York, NY: Springer.CrossRef Google Scholar

Robitzsch, A., Rupp, A. A. (2009). Impact of missing data on the detection of differential item functioning: The case of Mantel–Haenszel and logistic regression analysis. Educational and Psychological Measurement, 69(1), 18–34.CrossRef Google Scholar

Rue, H., Martino, S., Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society Series B: Statistical Methodology, 71(2), 319–392.CrossRef Google Scholar

San Martín, E. (2016). Identification of item response theory models. Handbook of item response theory, 2, 127–150.Google Scholar

Schilling, S., Bock, R. D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533–555.CrossRef Google Scholar

Scott, S. L., Blocker, A. W., Bonassi, F. V., Chipman, H. A., George, E. I., McCulloch, R. E. (2016). Bayes and big data: The consensus Monte Carlo algorithm. International Journal of Management Science and Engineering Management, 11(2), 78–88.CrossRef Google Scholar

Shyamalkumar, N. D., Srivastava, S. (2022). An algorithm for distributed Bayesian inference. Stat, 11(1).CrossRef Google Scholar

Skrondal, A., Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models, Cham: Crc Press.CrossRef Google Scholar

Sportisse, A., Boyer, C., Josse, J. (2020). Imputation and low-rank estimation with missing not at random data. Statistics and Computing, 30(6), 1629–1643.CrossRef Google Scholar

Srivastava, S., Cevher, V., Dinh, Q., & Dunson, D. (2015). WASP: Scalable Bayes via barycenters of subset posteriors. In Artificial Intelligence and Statistics, pp. 912–920.Google Scholar

Srivastava, S., Li, C., Dunson, D. B. (2018). Scalable Bayes via barycenter in Wasserstein space. The Journal of Machine Learning Research, 19(1), 312–346.Google Scholar

Srivastava, S., Xu, Y. (2021). Distributed Bayesian inference in linear mixed-effects models. Journal of Computational and Graphical Statistics, 30(3), 594–611.CrossRef Google Scholar

Tan, L. S., Nott, D. J. (2014). A stochastic variational framework for fitting and diagnosing generalized linear mixed models. Bayesian Analysis, 9(4), 963–1004.CrossRef Google Scholar

van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287–308.CrossRef Google Scholar

van der Linden, W. J., Hambleton, R. K. (1997). Handbook of modern item response theory, New York: Springer-Verlag.CrossRef Google Scholar

van Rijn, P. W., Sinharay, S., Haberman, S. J., Johnson, M. S. (2016). Assessment of fit of item response theory models used in large-scale educational survey assessments. Large-Scale Assessments in Education, 4, 1–23.CrossRef Google Scholar

Vehtari, A., Gelman, A., Sivula, T., Jylänki, P., Tran, D., Sahai, S., Robert, C. P. (2020). Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data. The Journal of Machine Learning Research, 21(1), 577–629.Google Scholar

Wang, C., Fan, Z., Chang, H. H., Douglas, J. (2013). A semiparametric model for jointly analyzing response times and accuracy in computerized testing. Journal of Educational and Behavioral Statistics, 38(4), 381–417.CrossRef Google Scholar

Wang, C., Srivastava, S. (2023). Divide-and-conquer Bayesian inference in hidden Markov models. Electronic Journal of Statistics, 17(1), 895–947.CrossRef Google Scholar

Wang, C., Xu, G., Shang, Z. (2018). A two-stage approach to differentiating normal and aberrant behavior in computer based testing. Psychometrika, 83, 223–254.CrossRef Google Scholar PubMed

Wu, M., Davis, R. L., Domingue, B. W., Piech, C., & Goodman, N. (2020). Variational item response theory: Fast, accurate, and expressive. ArXiv:2002.00276.Google Scholar

Xue, J., Liang, F. (2019). Double-parallel Monte Carlo for Bayesian analysis of big data. Statistics and Computing, 29(1), 23–32.CrossRef Google Scholar PubMed

Xu et al. supplementary materials

File 1.5 MB

Article contents

Optimizing Large-Scale Educational Assessment with a “Divide-and-Conquer” Strategy: Fast and Efficient Distributed Bayesian Inference in IRT Models

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Xu et al. supplementary materials

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests