Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-01-11T11:47:28.440Z Has data issue: false hasContentIssue false

Using EM Algorithm for Finite Mixtures and Reformed Supplemented EM for MIRT Calibration

Published online by Cambridge University Press:  01 January 2025

Ping Chen*
Affiliation:
Beijing Normal University
Chun Wang
Affiliation:
University of Washington
*
Correspondence should be made to Ping Chen, Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University, No. 19, Xin Jie Kou Wai Street, Hai Dian District, Beijing100875, China. Email: pchen@bnu.edu.cn

Abstract

This study revisits the parameter estimation issues in multidimensional item response theory more thoroughly and investigates some computation details that have seldom been addressed previously when implementing the expectation-maximization (EM) algorithm for finite mixtures (EM–FM). Two research questions are: Should we rescale after each EM cycle or after the final EM cycle? How to adapt the supplemented EM algorithm to the EM–FM framework to estimate standard errors (SEs) of all unknown parameters? Analytic details of the methods are provided, and a comprehensive simulation study is conducted to provide supporting evidence. Results reveal that rescaling after each EM cycle accelerates convergence without affecting the calibration accuracy. Moreover, the SEs of all model parameters, including item parameters and population mixing proportions, recover well when the sample size is relatively large (e.g., 2000).

Type
Theory and Methods
Copyright
Copyright © 2021 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ackerman, T.(1996).Graphical representation of multidimensional item response theory analyses.Applied Psychological Measurement,20,311329.CrossRefGoogle Scholar
Baker, F. B., &Kim, S. H.(2004).Item response theory: Parameter estimation techniques,2New York:Dekker.CrossRefGoogle Scholar
Bartolucci, F.(2007).A class of multidimensional IRT models for testing unidimensionality and clustering items.Psychometrika,72,141157.CrossRefGoogle Scholar
Bartolucci, F.,Bacci, S., &Gnaldi, M.(2014).MultiLCIRT: An R package for multidimensional latent class item response models.Computational Statistics and Data Analysis,71,971985.CrossRefGoogle Scholar
Bartolucci, F.,Bacci, S., &Gnaldi, M.(2015).Statistical analysis of questionnaires: A unified approach based on Stata and R,Boca Raton:Chapman and Hall/CRC Press.CrossRefGoogle Scholar
Birnbaum, A.Lord, F. M., &Novick, M. R.(1968).Some latent trait models.Statistical theories of mental test scores,Reading, MA:Addison-Wesley.Google Scholar
Bock, R. D., &Aitkin, M.(1981).Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm.Psychometrika,46,443459.CrossRefGoogle Scholar
Bolt, D. M., &Lall, V. F.(2003).Estimation of compensatory and non-compensatory multidimensional item response models using Markov chain Monte Carlo.Applied Psychological Measurement,27,395414.CrossRefGoogle Scholar
Bolt, D.Maydeu-Olivares, A., &McArdle, J. J.(2005).Limited and full information estimation of item response theory models.Contemporary Psychometrics: A festschrift for Roderick P. McDonald,Mahwah, NJ:Lawrence Erlbaum Associates.2771.Google Scholar
Bono, R.,Blanca, M. J.,Arnau, J., &Gómez-Benito, J.(2017).Non-normal distributions commonly used in health, education, and social sciences: A systematic review.Frontiers in Psychology,8,1602CrossRefGoogle ScholarPubMed
Cai, L.(2008).SEM of another flavour: Two new applications of the supplemented EM algorithm.British Journal of Mathematical and Statistical Psychology,61,309329.CrossRefGoogle ScholarPubMed
Cai, L.(2010).Metropolis-Hastings Robbins–Monro algorithm for confirmatory item factor analysis.Journal of Educational and Behavioral Statistics,35,307335.CrossRefGoogle Scholar
Cai, L., &Hansen, H.(2013).Limited-information goodness-of-fit testing of hierarchical item factor models.British Journal of Mathematical and Statistical Psychology,66,245276.CrossRefGoogle ScholarPubMed
Cai, L., &Thissen, D.Reise, S. P., &Revicki, D. A.(2015).Modern approaches to parameter estimation in item response theory.Handbook of item response theory modeling: Applications to typical performance assessment,New York, NY:Routledge.4159.Google Scholar
Chalmers, R. P.(2012).mirt: A multidimensional item response theory package for the R environment.Journal of Statistical Software,48,129.CrossRefGoogle Scholar
Chang, H. H.,Qian, J. H., &Ying, Z. L.a\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$a$$\end{document}-stratified multistage computerized adaptive testing with b blocking.(2001).Applied Psychological Measurement,25,333341.CrossRefGoogle Scholar
Chen, P.(2017).A comparative study of online item calibration methods in multidimensional computerized adaptive testing.Journal of Educational and Behavioral Statistics,42,559590.CrossRefGoogle Scholar
Chen, P., &Wang, C.(2016).A new online calibration method for multidimensional computerized adaptive testing.Psychometrika,81,674701.CrossRefGoogle ScholarPubMed
Chen, P.,Wang, C.,Xin, T., &Chang, H-H(2017).Developing new online calibration methods for multidimensional computerized adaptive testing.British Journal of Mathematical and Statistical Psychology,70,81117.CrossRefGoogle ScholarPubMed
Chen, Y.,Li, X., &Zhang, S.(2019).Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis.Psychometrika,84,124146.CrossRefGoogle ScholarPubMed
Curran, P. J.,West, S. G., &Finch, G. F.(1996).The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis.Psychological Methods,1,1629.CrossRefGoogle Scholar
de la Torre, J.(2009).DINA model and parameter estimation: A didactic.Journal of Educational and Behavioral Statistics,34,115130.CrossRefGoogle Scholar
Dempster, A. P.,Laird, N. M., &Rubin, D. B.(1977).Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion).Journal of the Royal Statistical Society B,39,138.CrossRefGoogle Scholar
Edwards, M. C.(2010).A Markov chain Monte Carlo approach to confirmatory item factor analysis.Psychometrika,75,474497.CrossRefGoogle Scholar
Haberman, S. J.,von Davier, M., &Lee, Y-H(2008).Comparison of multidimensional item response models: Multivariate normal ability distributions versus multivariate polytomous ability distributions (ETS Research Report RR-08-45),Princeton, NJ:ETS.Google Scholar
Heinen, T.(1996).Latent class and discrete latent trait models: Similarities and differences,Thousand Oaks, CA:Sage Publications.Google Scholar
Jamshidian, M., &Jennrich, R. I.(2000).Standard errors for EM estimation.Journal of the Royal Statistical Society: Series B,62,257270.CrossRefGoogle Scholar
Kim, S.(2006).A comparative study of IRT fixed parameter calibration methods.Journal of Educational Measurement,43,355381.CrossRefGoogle Scholar
Kim, S., &Kolen, M. J.(2016).Multiple group IRT fixed-parameter estimation for maintaining an established ability scale (CASMA Research Report Number 49),Iowa City, IA:University of Iowa.Google Scholar
Lewis, C. (1985). Discussion. In D. J. Weiss (Ed.), Proceedings of the 1982 item response theory and computerized adaptive testing conference (pp. 203–209). Minneapolis: University of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory.Google Scholar
Lord, F. M., &Novick, M. R.(1968).Statistical theories of mental test scores,Menlo Park:Addison-Wesley.Google Scholar
Meng, X.-L., &Rubin, D. B.(1991).Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm.Journal of American Statistical Association,86,899909.CrossRefGoogle Scholar
Meng, X.-L., &Schilling, S. G.(1996).Fitting full-information factor models and an empirical investigation of bridge sampling.Journal of the American Statistical Association,91,12541267.CrossRefGoogle Scholar
Mislevy, R. J.(1984).Estimating latent distributions.Psychometrika,49,359381.CrossRefGoogle Scholar
Orchard, T., &Woodbury, M. A.LeCam, L. M.,Neyman, J., &Scott, E. L.(1972).A missing information principle: Theory and application.Proceedings of the sixth Berkeley symposium on mathematical statistics and probability,Berkeley, CA:University of California Press.697715.Google Scholar
Paek, I., &Cai, L.(2014).A comparison of item parameter standard error estimation procedures for unidimensional and multidimensional item response theory modeling.Educational and Psychological Measurement,74,5876.CrossRefGoogle Scholar
Reckase, M. D.(2009).Multidimensional item response theory,New York, NY:Springer.CrossRefGoogle Scholar
Schilling, S., &Bock, R. D.(2005).High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature.Psychometrika,70,533555.Google Scholar
Segall, D. O.(1996).Multidimensional adaptive testing.Psychometrika,61,331354.CrossRefGoogle Scholar
Tian, W.,Cai, L.,Thissen, D., &Xin, T.(2013).Numerical differentiation methods for computing error covariance matrices in item response theory modeling: An evaluation and a new proposal.Educational and Psychological Measurement,73,412439.CrossRefGoogle Scholar
Titterington, D. M.,Smith, A. F. M., &Makov, U. E.(1985).Statistical analysis of finite mixture distributions,New York:Wiley.Google Scholar
Vale, D. C., & Maurelli, V. A. (1983). Simulating multivariate nonnormal distributions. Psychometrika, 48, 465–471.CrossRefGoogle Scholar
von Davier, M.(2008).A general diagnostic model applied to language testing data.British Journal of Mathematical and Statistical Psychology,61,287307.CrossRefGoogle ScholarPubMed
Waller, N., & Jones, J. (2016). Fungible: Fungible coefficients and Monte Carlo functions. R package: https://cran.r-project.org/web/packages/fungible/fungible.pdf.Google Scholar
Wang, C.(2015).On latent trait estimation in multidimensional compensatory item response models.Psychometrika,80,428449.CrossRefGoogle ScholarPubMed
Wang, C., &Chang, H.(2011).Item selection in multidimensional computerized adaptive testing-Gaining information from different angles.Psychometrika,76,363384.CrossRefGoogle Scholar
Wang, C.,Su, S. Y., &Weiss, D. J.(2018).Robustness of parameter estimation to assumptions of normality in the multidimensional graded response model.Multivariate Behavioral Research,53,403418.CrossRefGoogle ScholarPubMed
Woodruff, D. J., &Hanson, B. A.(1996).Estimation of item response models using the EM algorithm for finite mixtures (ACT Research Report 96–6),Iowa City, IA:ACT Inc.Google Scholar
Woods, C. M.(2007).Empirical histograms in item response theory with ordinal data.Educational and Psychological Measurement,67,7387.CrossRefGoogle Scholar
Woods, C. M.Reise, S. P., &Revicki, D. A.(2015).Estimating the latent density in unidimensional IRT to permit non-normality.Handbook of item response theory modeling: Applications to typical performance assessment,New York, NY:Routledge.6084.Google Scholar
Yao, L. H.(2012).Multidimensional CAT item selection methods for domain scores and composite scores: Theory and applications.Psychometrika,77,495523.CrossRefGoogle ScholarPubMed
Yen, W. M.(1984).Effects of local item dependence on the fit and equating performance of the three-parameter logistic model.Applied Psychological Measurement,8,125145.CrossRefGoogle Scholar
Zhang, H., Chen, Y., & Li, X. (2019). A note on exploratory item factor analysis by singular value decomposition. arXiv:1907.08713.Google Scholar