Hostname: page-component-745bb68f8f-cphqk Total loading time: 0 Render date: 2025-01-25T19:39:36.155Z Has data issue: false hasContentIssue false

Item Screening in Graphical Loglinear Rasch Models

Published online by Cambridge University Press:  01 January 2025

Svend Kreiner*
Affiliation:
University of Copenhagen
Karl Bang Christensen
Affiliation:
University of Copenhagen
*
Requests for reprints should be sent to Svend Kreiner, Department of Biostatistics, University of Copenhagen, Oster Farimagsgade 5, B, POB 2029, 1014 Copenhagen K, Denmark. E-mail: s.kreiner@biostat.ku.dk

Abstract

In behavioural sciences, local dependence and DIF are common, and purification procedures that eliminate items with these weaknesses often result in short scales with poor reliability. Graphical loglinear Rasch models (Kreiner & Christensen, in Statistical Methods for Quality of Life Studies, ed. by M. Mesbah, F.C. Cole & M.T. Lee, Kluwer Academic, pp. 187–203, 2002) where uniform DIF and uniform local dependence are permitted solve this dilemma by modelling the local dependence and DIF. Identifying loglinear Rasch models by a stepwise model search is often very time consuming, since the initial item analysis may disclose a great deal of spurious and misleading evidence of DIF and local dependence that has to disposed of during the modelling procedure.

Like graphical models, graphical loglinear Rasch models possess Markov properties that are useful during the statistical analysis if they are used methodically. This paper describes how. It contains a systematic study of the Markov properties and the way they can be used to distinguish spurious from genuine evidence of DIF and local dependence and proposes a strategy for initial item screening that will reduce the time needed to identify a graphical loglinear Rasch model that fits the item responses. The last part of the paper illustrates the item screening procedure on simulated data and on data on the PF subscale measuring physical functioning in the SF36 Health Survey inventory.

Type
Original Paper
Copyright
Copyright © 2011 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 6791.CrossRefGoogle Scholar
Agresti, A. (1984). Analysis of ordinal categorical data, New York: Wiley.Google Scholar
Andersen, E.B. (1977). Sufficient statistics and latent trait models. Psychometrika, 42, 6981.CrossRefGoogle Scholar
Anderson, C.J., Böckenholt, U. (2000). Graphical regression models for polytomous variables. Psychometrika, 65, 497509.CrossRefGoogle Scholar
Anderson, C.J., Yu, H.-T. (2007). Log-multiplicative association models as item response models. Psychometrika, 72, 523.CrossRefGoogle Scholar
Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72, 141158.CrossRefGoogle Scholar
Bartolucci, F., Forcina, A. (2005). Likelihood inference on the underlying structure of IRT models. Psychometrika, 70, 3144.CrossRefGoogle Scholar
Benjamini–Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289300.CrossRefGoogle Scholar
Besag, J., Clifford, P. (1991). Sequential Monte Carlo p-values. Biometrika, 78, 301304.CrossRefGoogle Scholar
Bishop, Y.M.M., Fienberg, S.E., Holland, P.W. (1975). Discrete multivariate analysis: theory and practice, Cambridge: MIT Press.Google Scholar
Christensen, K.B., Kreiner, S. (2007). A Monte Carlo approach to unidimensionality testing in polytomous Rasch models. Journal of Applied Psychological Measurement, 31, 2030.CrossRefGoogle Scholar
Clauser, B., Mazor, K.M., Hambleton, R.K. (1994). The effect of score group width on the Mantel–Haenszel procedure. Journal of Educational Measurement, 31, 6778.CrossRefGoogle Scholar
Davis, J.A. (1967). A partial coefficient for Goodman and Kruskal’s Gamma. Journal of the American Statistical Association, 69, 174180.Google Scholar
Dawid, A.P. (1979). Conditional independence in statistical theory (with discussion). Journal of the Royal Statistical Society, Series A, 147, 278292.CrossRefGoogle Scholar
Fayers, P.M., Machin, D. (2007). Quality of life: the assessment, analysis, and interpretation of patient reported outcomes, (2nd ed.). Chichester: Wiley.CrossRefGoogle Scholar
Fidalgo, A.M., Mellenbergh, G.J., Muniz, J. (2000). Effects of DIF, test length, and purification type on robustness and power of Mantel–Haenszel procedures. Methods of Psychological Research Online, 5, 4353.Google Scholar
Fischer, G.H. (1995). The derivation of polytomous Rasch models. In Fischer, G.H., Molenaar, I.W. (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 293306). New York: Springer.CrossRefGoogle Scholar
Finch, H. (2005). The MIMIC model as a method for detecting DIF: comparison with Mantel–Haenszel, SIBTEST and the IRT Likelihood Ratio. Applied Psychological Measurement, 29, 278295.CrossRefGoogle Scholar
Frank, O., Strauss, D. (1986). Markov graphs. Journal of the American Statistical Association, 81, 832842.CrossRefGoogle Scholar
French, B.F., Maller, S.J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67, 373393.CrossRefGoogle Scholar
Hagenaars, J.A. (1998). Categorical causal modelling: latent class analysis and directed Log-linear models with latent variables. Sociological Methods and Research, 26, 436486.CrossRefGoogle Scholar
Hanson, B.A. (1998). Uniform DIF and DIF defined by differences in item response functions. Journal of Educational and Behavioral Statistics, 23, 244253.CrossRefGoogle Scholar
Holland, P.W. (1981). When are item response models consistent with observed data. Psychometrika, 46, 7992.CrossRefGoogle Scholar
Holland, P.W., Hoskens, M. (2003). Classical test theory as a first-order item response theory: Application to true-score prediction from a possible nonparallel test. Psychometrika, 68, 123150.CrossRefGoogle Scholar
Holland, P.W., Rosenbaum, P.R. (1986). Conditional association and unidimensionality in monotone latent variable models. Annals of Statistics, 14, 15231543.CrossRefGoogle Scholar
Holland, P.W., Thayer, D.T. (1988). Differential item performance and the Mantel–Haenszel procedure. In Wainer, H., Braun, H. (Eds.), Test validity (pp. 129145). Hillsdale: Lawrence Erlbaum Associates.Google Scholar
Hoskens, M., De Boeck, P. (1997). A parametric model for local dependence among test items. Psychological Methods, 2, 261277.CrossRefGoogle Scholar
Humphreys, K., Titterington, D.M. (2003). Variational approximations for categorical causal modelling with latent variables. Psychometrika, 68, 391412.CrossRefGoogle Scholar
Ip, E.H. (2001). Testing for local dependence in dichotomous item response models. Psychometrika, 66, 109132.CrossRefGoogle Scholar
Ip, E.H. (2002). Locally dependent latent trait model and the Dutch Identity revisited. Psychometrika, 67, 367386.CrossRefGoogle Scholar
Junker, B.W. (1993). Conditional association, essential independence and monotone unidimensional item response models. Annals of Statistics, 21, 13591378.CrossRefGoogle Scholar
Junker, B.W., Sijtsma, K. (2000). Latent and manifest monotonicity in item response models. Applied Psychological Measurement, 24, 6581.CrossRefGoogle Scholar
Kelderman, H. (1984). Loglinear Rasch model tests. Psychometrika, 49, 223245.CrossRefGoogle Scholar
Kelderman, H. (1989). Item bias detection using loglinear IRT. Psychometrika, 54, 681697.CrossRefGoogle Scholar
Kelderman, H. (1992). Computing maximum likelihood estimates of loglinear models from marginal sums with special attention to loglinear item response theory. Psychometrika, 57, 437450.CrossRefGoogle Scholar
Kelderman, H. (2005). Building IRT models from scratch: Graphical models, exchangeability, marginal freedom, scale type, and latent traits. In van der Ark, A., Croon, M.A., Sijtsma, K. (Eds.), New developments in categorical data analysis for the social and behavioural Sciences (pp. 167187). Hillsdale: Lawrence Erlbaum.Google Scholar
Kreiner, S. (1986). Computerized exploratory screening of large-dimensional contingency tables. In De Antoni, F., Lauro, N., Rizzi, A. (Eds.), COMPSTAT 1986 (pp. 4348). Heidelberg: Physica Verlag.Google Scholar
Kreiner, S. (1987). Analysis of multidimensional contingency tables by exact conditional tests: Techniques and strategies. Scandinavian Journal of Statistics, 14, 97112.Google Scholar
Kreiner, S. (1993/2006). Validation of index scales for analysis of survey data. In Dean, K. (Eds.), Population health research (pp. 116144). London: Sage Publications. Reprinted in D.J. Bartolomew (Ed.) (2006), Measurement, vol. III (pp. 297–328). London: Sage Publications.Google Scholar
Kreiner, S. (2003). Introduction to DIGRAM (Research report 03/10). Copenhagen: Dept. of Biostatistics, Univ. of Copenhagen.Google Scholar
Kreiner, S. (2007). Validity and objectivity: reflections on the role and nature of Rasch models. Nordic Psychology, 59, 268298.CrossRefGoogle Scholar
Kreiner, S., Christensen, K.B. (2002). Graphical Rasch models. In Mesbah, M., Cole, F.C., Lee, M.T. (Eds.), Statistical methods for quality of life studies (pp. 187203). Dordrecht: Kluwer Academic.CrossRefGoogle Scholar
Kreiner, S., Christensen, K.B. (2004). Analysis of local dependence and multidimensionality in graphical loglinear Rasch models. Communications in Statistics. Theory and Methods, 33, 12391276.CrossRefGoogle Scholar
Kreiner, S., Christensen, K.B. (2006). Validity and objectivity in health related summated scales: Analysis by graphical loglinear Rasch models. In von Davier, M., Carstensen, C.H. (Eds.), Multivariate and mixture distribution Rasch models—extensions and applications (pp. 329346). New York: Springer.Google Scholar
Kreiner, S., Pedersen, J.H., & Siersma, V. (2009). Derivation and testing hypotheses in chain graph models (Research report 09/9). Copenhagen: Dept. of Biostatistics, University of Copenhagen. Retrieved from http://biostat.ku.dk/reports/2009/Research_report_09-09.pdf.Google Scholar
Lauritzen, S.L. (1996). Graphical models, Oxford: Clarendon Press.CrossRefGoogle Scholar
Lord, F.M. (1980). Applications of item response theory to practical testing problems, Hillsdale: Lawrence Erlbaum.Google Scholar
Mazor, K.M., Clauser, B.E., Hambleton, R.K. (1992). The effect of sample size on the functioning of the Mantel–Haenszel statistic. Educational and Psychological Measurement, 52, 443451.CrossRefGoogle Scholar
Mellenbergh, G.J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105108.CrossRefGoogle Scholar
Park, D.G., Lautenschlager, G.J. (1990). Improving IRT item bias with iterative linking and ability scale purification. Applied Psychological Measurement, 14, 11631173.CrossRefGoogle Scholar
Penfield, R.D. (2001). Assessing differential item functioning among multiple groups: A comparison of three Mantel–Haenszel procedures. Applied Measurement in Education, 14, 235259.CrossRefGoogle Scholar
Penfield, R.D., Camilli, G. (2007). Differential item functioning and item bias. In Rao, C.R., Sinharay, S. (Eds.), Handbook of statistics: psychometrics (pp. 125168). Amsterdam: Elsevier.Google Scholar
Raju, N.S., Drasgow, F., Slinde, J.A. (1993). An empirical comparison of the area methods, Lord’s chi-square test, and the Mantel–Haenszel technique for assessing differential item functioning. Educational and Psychological Measurement, 53, 301315.CrossRefGoogle Scholar
Rasch, G. (1961/2006). On general laws and the meaning of measurement in psychology. In Neyman, J. (Eds.), Proceedings of the 4th Berkley symposium on mathematical statistics and probability (pp. 321333). Berkeley: University of California Press. Reprinted in D.J. Bartolomew (Ed.). Measurement, vol. I (pp 319–334). London: Sage Publications.Google Scholar
Rijmen, F., Vansteelandt, K., De Boeck, P. (2008). Latent class models for diary method data: Parameter estimation by local computations. Psychometrika, 73, 167182.CrossRefGoogle ScholarPubMed
Rosenbaum, P.R. (1984). Testing the conditional independence and monotonicity assumptions of item response theory. Psychometrika, 49, 425435.CrossRefGoogle Scholar
Rosenbaum, P.R. (1988). Item Bundles. Psychometrika, 53, 349359.CrossRefGoogle Scholar
Rosenbaum, P.R. (1989). Criterion-related construct validity. Psychometrika, 54, 625633.CrossRefGoogle Scholar
Sue, Y.-H., Wang, W.-C. (2005). Efficiency of the Mantel, Generalized Mantel–Haenszel, and logistic discriminant function analysis methods in detecting differential item functioning for polytomous items. Applied Measurement in Education, 18, 313350.CrossRefGoogle Scholar
Swaminathan, H., Rogers, J.H. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361370.CrossRefGoogle Scholar
Tjur, T. (1982). A connection between Rasch’s item analysis model and a multiplicative Poisson model. Scandinavian Journal of Statistics, 9, 2330.Google Scholar
Van der Ark, L.A., Bergsma, W.P. (2010). A Note on stochastic ordering of the latent trait using the sum of polytomous item scores. Psychometrika, 75, 272279.CrossRefGoogle Scholar
Williams, N.J., Beretvas, S.N. (2006). DIF identification using HGLM for polytomous items. Applied Psychological Measurement, 30, 2242.CrossRefGoogle Scholar
Zumbo, B.D. (1999). A handbook on the theory and methods of differential item functioning (DIF), Ottawa: Directorate of Human Resources Research and Evaluation, National Defence.Google Scholar