How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-Fit Statistics in Categorical Data Analysis

Alberto Maydeu-Olivares; Rosa Montaño

doi:10.1007/s11336-012-9293-1

How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-Fit Statistics in Categorical Data Analysis

Published online by Cambridge University Press: 01 January 2025

Alberto Maydeu-Olivares and

Rosa Montaño

Show author details

Alberto Maydeu-Olivares*: Affiliation:
Faculty of Psychology, University of Barcelona
Rosa Montaño: Affiliation:
Universidad de Santiago de Chile
*: Requests for reprints should be sent to Alberto Maydeu-Olivares, Faculty of Psychology, University of Barcelona, P. Valle de Hebrón, 171, 08035 Barcelona, Spain. E-mail: amaydeu@ub.edu

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

We investigate the performance of three statistics, R1, R2 (Glas in Psychometrika 53:525–546, 1988), and M2 (Maydeu-Olivares & Joe in J. Am. Stat. Assoc. 100:1009–1020, 2005, Psychometrika 71:713–732, 2006) to assess the overall fit of a one-parameter logistic model (1PL) estimated by (marginal) maximum likelihood (ML). R1 and R2 were specifically designed to target specific assumptions of Rasch models, whereas M2 is a general purpose test statistic. We report asymptotic power rates under some interesting violations of model assumptions (different item discrimination, presence of guessing, and multidimensionality) as well as empirical rejection rates for correctly specified models and some misspecified models. All three statistics were found to be more powerful than Pearson’s X2 against two- and three-parameter logistic alternatives (2PL and 3PL), and against multidimensional 1PL models. The results suggest that there is no clear advantage in using goodness-of-fit statistics specifically designed for Rasch-type models to test these models when marginal ML estimation is used.

Keywords

discrete data power IRT maximum likelihood

Type: Original Paper
Information: Psychometrika , Volume 78 , Issue 1 , January 2013 , pp. 116 - 133

DOI: https://doi.org/10.1007/s11336-012-9293-1 [Opens in a new window]
Copyright: Copyright © The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

This research was supported by an ICREA-Academia Award and Grant SGR 2009 74 from the Catalan Government, and by Grants PSI2009-07726 and PR2010-0252 from the Spanish Ministry of Education awarded to the first author, and by a Dissertation Research Award of the Society of Multivariate Experimental Psychology awarded to the second author. The authors are indebted to the reviewers and to David Thissen for comments that improved the manuscript.

References

Agresti, A., & Yang, M. (1987). An empirical investigation of some effects of sparseness in contingency tables. Computational Statistics & Data Analysis, 5, 9–21CrossRef Google Scholar

Andersen, E.B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140CrossRef Google Scholar

Bartholomew, D.J., & Leung, S.O. (2002). A goodness of fit test for sparse 2^p contingency tables. British Journal of Mathematical & Statistical Psychology, 55, 1–15CrossRef Google Scholar PubMed

Bartholomew, D., & Tzamourani, P. (1999). The goodness of fit of latent trait models in attitude measurement. Sociological Methods & Research, 27, 525–546CrossRef Google Scholar

Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46, 443–459CrossRef Google Scholar

Bock, R.D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35, 179–197CrossRef Google Scholar

Cai, L., Maydeu-Olivares, A., Coffman, D.L., & Thissen, D. (2006). Limited information goodness of fit testing of item response theory models for sparse 2^p tables. British Journal of Mathematical & Statistical Psychology, 59, 173–194CrossRef Google Scholar PubMed

Christoffersson, A. (1975). Factor analysis of dichotomized variables. Psychometrika, 40, 5–32CrossRef Google Scholar

De Leeuw, J., & Verhelst, N. (1986). Maximum likelihood estimation in generalized Rasch models. Journal of Educational and Behavioral Statistics, 11, 183–196CrossRef Google Scholar

Fischer, G.H., & Molenaar, I.W. (1995). Rasch models: foundations, recent developments and applications, New York: SpringerCrossRef Google Scholar

Glas, C.A.W. (1988). The derivation of some tests for the Rasch model from the multinomial distribution. Psychometrika, 53, 525–546CrossRef Google Scholar

Glas, C.A.W., & Verhelst, N.D. (1989). Extensions of the partial credit model. Psychometrika, 54, 635–659CrossRef Google Scholar

Glas, C.A.W., & Verhelst, N.D. (1995). Testing the Rasch model. In Fischer, G.H., & Molenaar, I.W. (Eds.), Rasch models: foundations, recent developments and applications, New York: Springer 6996CrossRef Google Scholar

Glas, C.A.W. (2009). Personal communication. Google Scholar

Irtel, H. (1995). An extension of the concept of specific objectivity. Psychometrika, 60, 115–118CrossRef Google Scholar

Joe, H., & Maydeu-Olivares, A. (2010). A general family of limited information goodness-of-fit statistics for multinomial data. Psychometrika, 75, 393–419CrossRef Google Scholar

Jöreskog, K.G. (1994). On the estimation of polychoric correlations and their asymptotic covariance matrix. Psychometrika, 59, 381–389CrossRef Google Scholar

Jöreskog, K.G., & Moustaki, I. (2001). Factor analysis of ordinal variables: a comparison of three approaches. Multivariate Behavioral Research, 36, 347–387CrossRef Google Scholar PubMed

Koehler, K., & Larntz, K. (1980). An empirical investigation of goodness of fit statistics for sparse multidimensional tables. Journal of the American Statistical Association, 75, 336–344CrossRef Google Scholar

Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 79–86CrossRef Google Scholar

Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores, Reading: Addison-WesleyGoogle Scholar

Mathai, A.M., & Provost, S.B. (1992). Quadratic forms in random variables: theory and applications, New York: Marcel DekkerGoogle Scholar

Maydeu-Olivares, A., & Joe, H. (2005). Limited and full information estimation and goodness-of-fit testing in 2ⁿ tables: a unified approach. Journal of the American Statistical Association, 100, 1009–1020CrossRef Google Scholar

Maydeu-Olivares, A., & Joe, H. (2006). Limited information goodness-of-fit in multidimensional contingency tables. Psychometrika, 71, 713–732CrossRef Google Scholar

Maydeu-Olivares, A., & Joe, H. (2008). An overview of limited information goodness-of-fit testing in multidimensional contingency tables. In Shigemasu, K., Okada, A., Imaizumi, T., & Hoshino, T. (Eds.), New trends in psychometrics, Tokyo: Universal Academy Press 253262Google Scholar

Maydeu-Olivares, A., & Liu, Y. (2012). Item diagnostics in multivariate discrete data. Manuscript under review. Google Scholar

Mavridis, D., Moustaki, I., & Knott, M. (2007). Goodness-of-fit measures for latent variable models for binary data. In Lee, S.-Y. (Eds.), Handbook of latent variables and related models, Amsterdam: Elsevier 135162Google Scholar

McDonald, R.P. (1999). Test theory: a unified treatment, Mahwah: Lawrence ErlbaumGoogle Scholar

Montaño, R. (2009). Una comparación de las estadísticas de bondad de ajuste R ₁y M ₂para modelos de la Teoría de Respuesta al Ítem [Comparing the R ₁and M ₂statistics for goodness of fit assessment in IRT models]. Unpublished Ph.D. dissertation, University of Barcelona. Google Scholar

Pfanzagel, J. (1993). A case of asymptotic equivalence between conditional and marginal maximum likelihood estimators. Journal of Statistical Planning and Inference, 35, 301–307CrossRef Google Scholar

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests, Copenhagen: Paedagogiske InstitutGoogle Scholar

Reiser, M. (1996). Analysis of residuals for the multinomial item response model. Psychometrika, 61, 509–528CrossRef Google Scholar

Reiser, M. (2008). Goodness-of-fit testing using components based on marginal frequencies of multinomial data. British Journal of Mathematical & Statistical Psychology, 61, 331–360CrossRef Google Scholar PubMed

Satorra, A., & Saris, W.E. (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50, 83–90CrossRef Google Scholar

Suárez-Falcon, J.C., & Glas, C.A.W. (2003). Evaluation of global testing procedure for item fit to the Rasch model. British Journal of Mathematical & Statistical Psychology, 56, 127–143CrossRef Google Scholar

Swaminathan, H., Hambleton, R.K., & Rogers, H.J. (2007). Assessing the fit of item response models. In Rao, C.R., Sinharay, S. (Eds.), Psychometrics, Amsterdam: Elsevier 683718Google Scholar

Teugels, J.L. (1990). Some representations of the multivariate Bernoulli and binomial distributions. Journal of Multivariate Analysis, 32, 256–268CrossRef Google Scholar

Thissen, D. (1982). Marginal maximum likelihood estimation for the one-parameter logistic models. Psychometrika, 47, 175–186CrossRef Google Scholar

van den Wollenberg, A.L. (1982). Two new test statistics for the Rasch model. Psychometrika, 47, 123–139CrossRef Google Scholar

Article contents

How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-Fit Statistics in Categorical Data Analysis

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests