Hostname: page-component-745bb68f8f-g4j75 Total loading time: 0 Render date: 2025-01-11T03:43:37.488Z Has data issue: false hasContentIssue false

Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model

Published online by Cambridge University Press:  01 January 2025

Carolin Strobl*
Affiliation:
Universität Zürich
Julia Kopf
Affiliation:
Ludwig-Maximilians-Universität München
Achim Zeileis
Affiliation:
Universität Innsbruck
*
Requests for reprints should be sent to Carolin Strobl, Department of Psychology, Universität Zürich, Binzmühlestr. 14, 8050 Zürich, Switzerland. E-mail: Carolin.Strobl@psychologie.uzh.ch

Abstract

A variety of statistical methods have been suggested for detecting differential item functioning (DIF) in the Rasch model. Most of these methods are designed for the comparison of pre-specified focal and reference groups, such as males and females. Latent class approaches, on the other hand, allow the detection of previously unknown groups exhibiting DIF. However, this approach provides no straightforward interpretation of the groups with respect to person characteristics. Here, we propose a new method for DIF detection based on model-based recursive partitioning that can be considered as a compromise between those two extremes. With this approach it is possible to detect groups of subjects exhibiting DIF, which are not pre-specified, but result from combinations of observed covariates. These groups are directly interpretable and can thus help generate hypotheses about the psychological sources of DIF. The statistical background and construction of the new method are introduced by means of an instructive example, and extensive simulation studies are presented to support and illustrate the statistical properties of the method, which is then applied to empirical data from a general knowledge quiz. A software implementation of the method is freely available in the R system for statistical computing.

Type
Original Paper
Copyright
Copyright © 2013 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Andersen, E. (1972). A goodness of fit test for the Rasch model. Psychometrika, 38, 123140.CrossRefGoogle Scholar
Andrews, D.W.K. (1993). Tests for parameter instability and structural change with unknown change point. Econometrica, 61, 821856.CrossRefGoogle Scholar
Ben-Shakhar, G., & Sinai, Y. (1991). Gender differences in multiple-choice tests: the role of differential guessing tendencies. Journal of Educational Measurement, 28(1), 2335.CrossRefGoogle Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In Lord, F., & Novick, M. (Eds.), Statistical theories of mental test scores. Reading: Addison-Wesley.Google Scholar
Boulesteix, A.L. (2006). Maximally selected chi-square statistics and binary splits of nominal variables. Biometrical Journal, 48(5), 838848.CrossRefGoogle ScholarPubMed
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. New York: Chapman and Hall.Google Scholar
Cohen, A., & Bolt, D. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(3), 133148.CrossRefGoogle Scholar
Dobra, A., & Gehrke, J. (2001). Bias correction in classification tree construction. In Brodley, C.E., & Danyluk, A.P. (Eds.), Proceedings of the seventeenth international conference on machine learning (ICML 2001) (pp. 9097). San Mateo: Morgan Kaufmann.Google Scholar
Fischer, G., & Molenaar, I. (Eds.) (1995). Rasch models: foundations, recent developments and applications. New York: Springer.CrossRefGoogle Scholar
Fraley, C., & Raftery, A. (2002). Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association, 97(458), 611631.CrossRefGoogle Scholar
Fraley, C., & Raftery, A. (2012). mclust: Model-based clustering/Normal mixture modeling. R package version 3.4.11. http://CRAN.R-project.org/package=mclust.Google Scholar
Gelin, M., Carleton, B., Smith, M., & Zumbo, B. (2004). The dimensionality and gender differential item functioning of the mini asthma quality of life questionnaire (MiniAQLQ). Social Indicators Research, 68, 91105.CrossRefGoogle Scholar
Gustafsson, J. (1980). Testing and obtaining fit of data in the Rasch model. British Journal of Mathematical & Statistical Psychology, 33(2), 205233.CrossRefGoogle Scholar
Hancock, G., & Samuelsen, K. (Eds.) (2007). Advances in latent variable mixture models. Charlotte: Information Age.Google Scholar
Hochberg, Y., & Tamhane, A. (Eds.) (1987). Multiple comparison procedures. New York: Wiley.CrossRefGoogle Scholar
Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: a conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651674.CrossRefGoogle Scholar
Hothorn, T., & Lausen, B. (2003). On the exact distribution of maximally selected rank statistics. Computational Statistics & Data Analysis, 43(2), 121137.CrossRefGoogle Scholar
Hothorn, T., & Zeileis, A. (2008). Generalized maximally selected statistics. Biometrics, 64(4), 12631269.CrossRefGoogle ScholarPubMed
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193218.CrossRefGoogle Scholar
Kelderman, H., & MacReady, G. (1990). The use of loglinear models for assessing differential item functioning across manifest and latent examinee groups. Journal of Educational Measurement, 27(4), 307327.CrossRefGoogle Scholar
Koziol, J. (1991). On maximally selected chi-square statistics. Biometrics, 47(4), 15571561.CrossRefGoogle Scholar
Liou, M. (1994). More on the computation of higher-order derivatives on the elementary symmetric functions in the Rasch model. Applied Psychological Measurement, 18(1), 5362.CrossRefGoogle Scholar
Maij-de Meij, A., Kelderman, H., & Van der Flier, H. (2008). Fitting a mixture item response theory model to personality questionnaire data: characterizing latent classes and investigating possibilities for improving prediction. Applied Psychological Measurement, 32(8), 611631.CrossRefGoogle Scholar
Mair, P., & Hatzinger, R. (2007). Extended Rasch modeling: the eRm package for the application of IRT models in R. Journal of Statistical Software, 20, 9. http://www.jstatsoft.org/v20/i09/.CrossRefGoogle Scholar
Mair, P., Hatzinger, R., & Maier, M. (2012). eRm: extended Rasch modeling. R package version 0.15-0. http://CRAN.R-project.org/package=eRm.Google Scholar
Marcus, R., Peritz, E., & Gabriel, K. (1976). Closed testing procedures with special reference to ordered analysis of variance. Biometrika, 63(3), 655660.CrossRefGoogle Scholar
Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149174.CrossRefGoogle Scholar
Merkle, E.C., Fan, J., & Zeileis, A. (2013). Testing for measurement invariance with respect to an ordinal variable. Psychometrika, forthcoming.Google Scholar
Merkle, E.C., & Zeileis, A. (2013). Tests of measurement invariance without subgroups: a generalization of classical methods. Psychometrika, 78(1), 5982.CrossRefGoogle ScholarPubMed
Miller, R., & Siegmund, D. (1982). Maximally selected chi square statistics. Biometrics, 38(4), 10111016.CrossRefGoogle Scholar
Milligan, G., & Cooper, M. (1986). A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research, 21(4), 441458.CrossRefGoogle ScholarPubMed
Mislevy, R., & Verhelst, N. (1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55(2), 195215.CrossRefGoogle Scholar
Pedraza, O., Graff-Radford, N., Smith, G., Ivnik, R., Willis, F., Petersen, R., & Lucas, J. (2009). Differential item functioning of the Boston Naming Test in cognitively normal African American and Caucasian older adults. Journal of the International Neuropsychological Society, 15(05), 758768.CrossRefGoogle ScholarPubMed
Penfield, D. (2007). Assessing differential step functioning in polytomous items using a common odds ratio estimator. Journal of Educational Measurement, 44(3), 187210.CrossRefGoogle Scholar
Penfield, D., Alvarez, K., & Lee, O. (2009). Using a taxonomy of differential step functioning to improve the interpretation of DIF in polytomous items: an illustration. Applied Measurement in Education, 22(1), 6178.CrossRefGoogle Scholar
Perkins, A., Stump, T., Monahan, P., & McHorney, C. (2006). Assessment of differential item functioning for demographic comparisons in the MOS SF-36 health survey. Quality of Life Research, 15, 331348.CrossRefGoogle ScholarPubMed
R Development Core Team (2012). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. http://www.R-project.org/.Google Scholar
Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8(2), 185205.CrossRefGoogle ScholarPubMed
Rizopoulos, D. (2006). ltm: an R package for latent variable modeling and item response analysis. Journal of Statistical Software, 17, 5. http://www.jstatsoft.org/v17/i05/.CrossRefGoogle Scholar
Rizopoulos, D. (2012). ltm: latent trait models under IRT. R package version 0.9-9. http://CRAN.R-project.org/package=ltm.Google Scholar
Rost, J. (1990). Rasch models in latent classes: an integration of two approaches to item analysis. Applied Psychological Measurement, 14(3), 271282.CrossRefGoogle Scholar
Shih, Y.S. (2004). A note on split selection bias in classification trees. Computational Statistics & Data Analysis, 45(3), 457466.CrossRefGoogle Scholar
Smit, J., Kelderman, H., & Van der Flier, H. (2000). The mixed Birnbaum model: estimation using collateral information. Methods of Psychological Research Online, 5, 113.Google Scholar
Strobl, C., Boulesteix, A.L., & Augustin, T. (2007). Unbiased split selection for classification trees based on the Gini index. Computational Statistics & Data Analysis, 52(1), 483501.CrossRefGoogle Scholar
Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application and characteristics of classification and regression trees, bagging and random forests. Psychological Methods, 14(4), 323348.CrossRefGoogle ScholarPubMed
Strobl, C., Wickelmaier, F., & Zeileis, A. (2011). Accounting for individual differences in Bradley–Terry models by means of recursive partitioning. Journal of Educational and Behavioral Statistics, 36(2), 135153.CrossRefGoogle Scholar
Trepte, S., & Verbeet, M. (Eds.) (2010). Allgemeinbildung in Deutschland—Erkenntnisse aus dem SPIEGEL Studentenpisa-Test. Wiesbaden: VS Verlag.Google Scholar
Van den Noortgate, W., & De Boeck, P. (2005). Assessing and explaining differential item functioning using logistic mixed models. Journal of Educational and Behavioral Statistics, 30(4), 443464.CrossRefGoogle Scholar
Westers, P., & Kelderman, H. (1992). Examining differential item functioning due to item difficulty and alternative attractiveness. Psychometrika, 57(1), 107118.CrossRefGoogle Scholar
Woods, C., Oltmanns, T., & Turkheimer, E. (2009). Illustration of MIMIC-model DIF testing with the schedule for nonadaptive and adaptive personality. Journal of Psychopathology and Behavioral Assessment, 31, 320330.CrossRefGoogle ScholarPubMed
Zeileis, A., & Hornik, K. (2007). Generalized m-fluctuation tests for parameter instability. Statistica Neerlandica, 61(4), 488508.CrossRefGoogle Scholar
Zeileis, A., Hothorn, T., & Hornik, K. (2008). Model-based recursive partitioning. Journal of Computational and Graphical Statistics, 17(2), 492514.CrossRefGoogle Scholar
Zeileis, A., Strobl, C., Wickelmaier, F., & Kopf, J. (2012). psychotree: recursive partitioning based on psychometric models. R package version 0.12-2. http://CRAN.R-project.org/package=psychotree.Google Scholar