Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-01-11T06:33:12.289Z Has data issue: false hasContentIssue false

Paradoxical Results in Multidimensional Item Response Theory

Published online by Cambridge University Press:  01 January 2025

Giles Hooker*
Affiliation:
Cornell University
Matthew Finkelman
Affiliation:
Tufts University School of Dental Medicine
Armin Schwartzman
Affiliation:
Harvard School of Public Health
*
Requests for reprints should be sent to Giles Hooker, Cornell University, Ithaca, NY 14853, USA. E-mail: giles.hooker@cornell.edu

Abstract

In multidimensional item response theory (MIRT), it is possible for the estimate of a subject’s ability in some dimension to decrease after they have answered a question correctly. This paper investigates how and when this type of paradoxical result can occur. We demonstrate that many response models and statistical estimates can produce paradoxical results and that in the popular class of linearly compensatory models, maximum likelihood estimates are guaranteed to do so. In light of these findings, the appropriateness of multidimensional item response methods for assigning scores in high-stakes testing is called into question.

Type
Theory and Methods
Copyright
Copyright © 2009 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ackerman, T. (1996). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement, 20(4), 311329CrossRefGoogle Scholar
Antal, T. (2007). On multidimensional item response theory—a coordinate free approach. Electronic Journal of Statistics, 1, 290306CrossRefGoogle Scholar
Bock, R., Gibbons, R., Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261280CrossRefGoogle Scholar
Bolt, D., Lall, V. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 27(6), 395414CrossRefGoogle Scholar
Finkelman, M., Hooker, G., & Wang, J. (2009). Technical Report BU-1768-M, Department of Biological Statistics and Computational Biology, Cornell University.Google Scholar
Haertel, E. (1990). Continuous and discrete latent structure models of item response data. Psychometrika, 55, 477494CrossRefGoogle Scholar
Junker, B., Sijtsma, K. (2000). Latent and manifest monotonicity in item response models. Applied Psychological Measurement, 24, 6581CrossRefGoogle Scholar
Reckase, M. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401412CrossRefGoogle Scholar
Reckase, M. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 2536CrossRefGoogle Scholar
Roussos, L., DiBello, L., Stout, W., Hartz, S., Henson, R., Templin, J. (2007). The fusion model skills diagnosis system (pp. 275318). Cambridge: Cambridge University PressGoogle Scholar
Segall, D. O. (2000). Principles of multidimensional adaptive testing. In van der Linden, W.J., Glas, C.A.W. (Eds.), Computerized adaptive testing: theory and practice (pp. 2752). Dordrecht: Kluwer AcademicGoogle Scholar
Thissen, D., Steinberg, L. (1997). A response model for multiple choice items. In van der Linden, W.J., Hambleton, R.K. (Eds.), Handbook of item response theory (pp. 5165). New York: SpringerCrossRefGoogle Scholar
van der Linden, W. J. (1998). Stochastic order in dichotomous item response models for fixed, adaptive and multidimensional tests. Psychometrika, 63(3), 211226CrossRefGoogle Scholar
van der Linden, W. J. (1999). Multidimensional adaptive testing with a minimum error-variance criterion. Journal of Educational and Behavioral Statistics, 24, 398412CrossRefGoogle Scholar
von Davier, M. (2005). A general diagnostic model applied to language testing data (ETS research report 05-16). Educational Testing Service, Princeton, NJ.Google Scholar
Wächter, A., Biegler, L. T. (2006). On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming, 106, 2557CrossRefGoogle Scholar
Whitely, S. (1980). Multicomponent latent trait models for ability tests. Psychometrika, 45, 479494CrossRefGoogle Scholar