Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-01-26T22:50:10.130Z Has data issue: false hasContentIssue false

Multiple-Choice Tests: Polytomous IRT Models Misestimate Item Information

Published online by Cambridge University Press:  18 December 2014

Miguel A. García-Pérez*
Affiliation:
Universidad Complutense (Spain)
*
*Correspondence concerning this article should be sent to Miguel A. García-Pérez. Departamento de Metodología. Facultad de Psicología. Universidad Complutense. Campus de Somosaguas. 28223. Madrid (Spain). Phone: +34–913943061. Fax: +34–913943189. E-mail: miguel@psi.ucm.es

Abstract

Likert-type items and polytomous models are preferred over yes–no items and dichotomous models for the measurement of attitudes, because a broader range of response categories provides superior item and test information functions. Yet, for ability assessment with multiple-choice tests, the dichotomous three-parameter logistic model (3PLM) is often chosen. Because multiple-choice responses are polytomous before they are categorized as correct or incorrect, a polytomous characterization might render more efficient tests. Early studies suggested that the nominal response model (NRM) is advantageous in this respect. We investigate the reasons for those results and the outcomes of a polytomous characterization based on the multiple-choice model (MCM). An empirical data set is used to compare polytomous (NRM and MCM) and dichotomous (3PLM) characterizations of a test. The results revealed superior item and test information functions from polytomous models. Yet, close inspection suggests that these outcomes are artifactual and two simulation studies confirmed this point. These studies revealed a structural inadequacy of the NRM for multiple-choice items and that the MCM characterization outperforms the 3PLM characterization only when distractor endorsement frequencies vary non-monotonically with ability, although this feature is rarely observed in empirical data sets.

Type
Research Article
Copyright
Copyright © Universidad Complutense de Madrid and Colegio Oficial de Psicólogos de Madrid 2014 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abad, F. J., Olea, J., & Ponsoda, V. (2009). The multiple-choice model: Some solutions for estimation of parameters in the presence of omitted responses. Applied Psychological Measurement, 33, 200221. http://dx.doi.org/10.1177/0146621608320760 Google Scholar
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 2951. http://dx.doi.org/10.1007/BF02291411 Google Scholar
Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2001). A mixture item response model for multiple-choice data. Journal of Educational and Behavioral Statistics, 26, 381409. http://dx.doi.org/10.3102/10769986026004381 Google Scholar
Bolt, D. M., Wollack, J. A., & Suh, Y. (2012). Application of a multidimensional nested logit model to multiple-choice test items. Psychometrika, 77, 339357. http://dx.doi.org/10.1007/s11336-012-9257-5 Google Scholar
Bortolotti, S. L. V., Tezza, R., de Andrade, D. F., Bornia, A. C., & de Sousa Júnior, A. F. (2013). Relevance and advantages of using the item response theory. Quality & Quantity, 47, 23412360. http://dx.doi.org/10.1007/s11135-012-9684-5 Google Scholar
Bulut, O., & Kan, A. (2012). Application of computerized adaptive testing to entrance examination for graduate studies in Turkey. Eğitim Araştırmaları / Eurasian Journal of Educational Research, 49, 6180.Google Scholar
De Ayala, R. J. (1989). A comparison of the nominal response model and the three-parameter logistic model in computerized adaptive testing. Educational and Psychological Measurement, 49, 789805. http://dx.doi.org/10.1177/001316448904900403 CrossRefGoogle Scholar
De Ayala, R. J. (1992). The nominal response model in computerized adaptive testing. Applied Psychological Measurement, 16, 327343. http://dx.doi.org/10.1177/014662169201600403 Google Scholar
Dodd, B. G., De Ayala, R. J., & Koch, W. R. (1995). Computerized adaptive testing with polytomous items. Applied Psychological Measurement, 19, 522. http://dx.doi.org/10.1177/014662169501900103 Google Scholar
du Toit, M. (2003). IRT from SSI: BILOG-MG, MULTILOG, PARSCALE, TESTFACT. Lincolnwood, IL: Scientific Software International.Google Scholar
García-Pérez, M. A. (1989). Item sampling, guessing, partial information and decision-making in achievement testing. In Roskam, E. E. (Ed.) Mathematical Psychology in Progress (pp. 249265). New York, NY: Springer.Google Scholar
García-Pérez, M. A. (1993). In defence of ‘none of the above’. British Journal of Mathematical and Statistical Psychology, 46, 213229. http://dx.doi.org/10.1111/j.2044-8317.1993.tb01013.x Google Scholar
Kalender, I. (2012). Computerized adaptive testing for student selection to higher education. Yükseköğretim Dergisi / Journal of Higher Education, 2, 1319.Google Scholar
Kang, T., Cohen, A. S., & Sung, H.-J. (2009). Model selection indices for polytomous items. Applied Psychological Measurement, 33, 499518. http://dx.doi.org/10.1177/0146621608327800 CrossRefGoogle Scholar
Lorenzo-Seva, U., & Ferrando, P. J. (2006). FACTOR: A computer program to fit the exploratory factor analysis model. Behavior Research Methods, 38, 8891. http://dx.doi.org/10.3758/BF03192753 Google Scholar
Nogami, Y., & Hayashi, N. (2010). A Japanese adaptive test of English as a foreign language: Developmental and operational aspects. In van der Linden, W. J. & Glas, C. A. W. (Eds.) Elements of Adaptive Testing (pp. 151165). New York, NY: Springer.Google Scholar
Olatunji, B. O., Woods, C. M., de Jong, P. J., Teachman, B. A., Sawchuk, C. N., & David, B. (2009). Development and initial validation of an abbreviated spider phobia questionnaire using item response theory. Behavior Therapy, 40, 114130. http://dx.doi.org/10.1016/j.beth.2008.04.002 Google Scholar
Olea, J., Abad, F. J., Ponsoda, V., & Ximénez, C. (2004). Un test adaptativo informatizado para evaluar el conocimiento de inglés escrito: Diseño y comprobaciones psicométricas [A computerized adaptive test for the assessment of written English: Design and psychometric properties]. Psicothema, 16, 519525.Google Scholar
Olea, J., Abad, F. J., Ponsoda, V., Barrada, J. R., & Aguado, D. (2011). eCAT-Listening: Design and psychometric properties of a computerized adaptive test on English Listening. Psicothema, 23, 802807.Google Scholar
Reif, M. (2013). Package mcIRT - IRT models for multiple choice items (R package version 0.41) [computer software]. Retrieved from http://cran.r-project.org/web/packages/mcIRT Google Scholar
Revuelta, J. (2004). Analysis of distractor difficulty in multiple-choice items. Psychometrika, 69, 217234. http://dx.doi.org/10.1007/BF02295941 CrossRefGoogle Scholar
Revuelta, J. (2005). An item response model for nominal data based on the rising selection ratios criterion. Psychometrika, 70, 305324. http://dx.doi.org/10.1007/s11336-002-0975-y Google Scholar
Revuelta, J. (2010). Estimating difficulty from polytomous categorical data. Psychometrika, 75, 331350. http://dx.doi.org/10.1007/s11336-009-9145-9 Google Scholar
Rudner, L. M. (2010). Implementing the Graduate Management Admission Test computerized adaptive test. In van der Linden, W. J. & Glas, C. A. W. (Eds.), Elements of Adaptive Testing (pp. 151165). New York, NY: Springer.Google Scholar
Samejima, F. (1970). Estimation of latent ability using a response pattern of graded scores. Psychometrika, 35, 139. http://dx.doi.org/10.1007/BF02290599 Google Scholar
San Martín, E., del Pino, G., & De Boeck, P. (2006). IRT models for ability-based guessing. Applied Psychological Measurement, 30, 183203. http://dx.doi.org/10.1177/0146621605282773 CrossRefGoogle Scholar
Suh, Y., & Bolt, D. M. (2010). Nested logit models for multiple-choice item response data. Psychometrika, 75, 454473. http://dx.doi.org/10.1007/s11336-010-9163-7 Google Scholar
Thissen, D., & Steinberg, L. (1984). A response model for multiple choice items. Psychometrika, 49, 501519. http://dx.doi.org/10.1007/BF02302588 Google Scholar
Thissen, D., Steinberg, L., & Fitzpatrick, A. R. (1989). Multiple-choice models: The distractors are also part of the item. Journal of Educational Measurement, 26, 161176. http://dx.doi.org/10.1111/j.1745-3984.1989.tb00326.x Google Scholar
Verschoor, A. J., & Straetmans, G. J. J. M. (2010). MATHCAT: A flexible testing system in mathematics education for adults. In van der Linden, W. J. & Glas, C. A. W. (Eds.) Elements of Adaptive Testing (pp. 137149). New York, NY: Springer.Google Scholar