Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-01-12T11:53:34.692Z Has data issue: false hasContentIssue false

Item Response Theory Observed-Score Kernel Equating

Published online by Cambridge University Press:  01 January 2025

Björn Andersson*
Affiliation:
Beijing Normal University Uppsala University
Marie Wiberg
Affiliation:
Umeå University
*
Correspondence should be made to Björn Andersson, Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University, No. 19 Xinjiekou Wai Street, Haidian District, 100875 Beijing, China. Email: bjoern.andersson@bnu.edu.cn

Abstract

Item response theory (IRT) observed-score kernel equating is introduced for the non-equivalent groups with anchor test equating design using either chain equating or post-stratification equating. The equating function is treated in a multivariate setting and the asymptotic covariance matrices of IRT observed-score kernel equating functions are derived. Equating is conducted using the two-parameter and three-parameter logistic models with simulated data and data from a standardized achievement test. The results show that IRT observed-score kernel equating offers small standard errors and low equating bias under most settings considered.

Type
Original Paper
Copyright
Copyright © 2016 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Electronic supplementary material The online version of this article (doi:10.1007/s11336-016-9528-7) contains supplementary material, which is available to authorized users.

The first author acknowledges the financial support from the Collaborative Innovation Center of Assessment toward Basic Education Quality at Beijing Normal University. The research in this article by the second author was funded by the Swedish Research Council Grant 2014-578.

References

Andersson, B., Bränberg, K., & Wiberg, M. (2013). Performing the kernel method of test equating with the package kequate. Journal of Statistical Software, 55, (6), 125.CrossRefGoogle Scholar
Battauz, M. (2015). equateIRT: An R package for IRT test equating. Journal of Statistical Software, 68, (7), 122.CrossRefGoogle Scholar
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443459.CrossRefGoogle Scholar
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48, (6), 129.CrossRefGoogle Scholar
Dorans, N., Feigenbaum, M., Lawrence, I., Dorans, N., Feigenbaum, M., Feryok, N., Sehmitt, A., & Wright, N. (1994). Equating issues engendered by changes to the SAT and PSAT/NMSQT. Technical issues related to the introduction of the new SAT and PSAT/NMSQT, Princeton, NJ: Educational Testing Service 91122.Google Scholar
Ferguson, T. (1996). A course in large sample theory, London: Chapman & Hall.CrossRefGoogle Scholar
Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144149.CrossRefGoogle Scholar
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications, Boston: Kluwer.CrossRefGoogle Scholar
Holland, P. W., & Thayer, D. T. (1989). The kernel method of equating score distributions (Technical Report No. 89-84). Princeton, NJ: Educational Testing Service.Google Scholar
Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43, 355381.CrossRefGoogle Scholar
Kolen, M. J., & Brennan, R. J. (2014). Test equating: Methods and practices, 3New York: Springer.Google Scholar
Lee, Y.-H., & von Davier, A. A., von Davier, A. A. (2011). Equating through alternative kernels. Statistical models for test equating, scaling, and linking, New York: Springer.Google Scholar
Li, Y. H., & Lissitz, R. W. (2004). Applications of the analytically derived asymptotic standard errors of item response theory item parameter estimates. Journal of Educational Measurement, 41, 85117.CrossRefGoogle Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems, Hillsdale, NJ: Erlbaum.Google Scholar
Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8, 452461.CrossRefGoogle Scholar
Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 44, 226233.CrossRefGoogle Scholar
Loyd, B. H., & Hoover, H. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179193.CrossRefGoogle Scholar
Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139160.CrossRefGoogle Scholar
Mislevy, R. J., & Bock, R. D. (1990). BILOG 3: Item analysis and test scoring with binary logistic models, Mooresville, IN: Scientific Software.Google Scholar
Moses, T., & Holland, P. W. (2010). A comparison of statistical selection strategies for univariate and bivariate log-linear models. British Journal of Mathematical and Statistical Psychology, 63, 557574.CrossRefGoogle ScholarPubMed
Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51, 123.Google Scholar
Ogasawara, H. (2001). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25, 5367.CrossRefGoogle Scholar
Ogasawara, H. (2003). Asymptotic standard errors of IRT observed-score equating methods. Psychometrika, 68, 193211.CrossRefGoogle Scholar
Ogasawara, H. (2009). Asymptotic cumulants of the parameter estimators in item response theory. Computational Statistics, 24, 313331.CrossRefGoogle Scholar
R Development Core Team. (2013). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.Google Scholar
Rijmen, F., Qu, Y., & von Davier, A. A., von Davier, A. A. (2011). Hypothesis testing of equating differences in the kernel equating framework. Statistical models for test equating, scaling, and linking, New York: Springer 317326.Google Scholar
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201210.CrossRefGoogle Scholar
van der Linden, W. J., von Davier, A. A. (2011). Local observed-score equating. Statistical models for test equating, scaling, and linking, New York: Springer 317326.Google Scholar
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating, New York: Springer.CrossRefGoogle Scholar
von Davier, A. A. (2010, July). Equating observed-scores: The percentile rank, gaussian kernel, and IRT observed-score equating methods. Workshop given at the International Meeting of the Psychometric Society, Athens, GA.Google Scholar
Wiberg, M., van der Linden, W. J., von Davier, A. A. (2014). Local observed-score kernel equating. Journal of Educational Measurement, 51, 5774.CrossRefGoogle Scholar
Yuan, K.-H., Cheng, Y., & Patton, J. (2013). Information matrices and standard errors for MLEs of item parameters in IRT. Psychometrika, 79, 232254.CrossRefGoogle ScholarPubMed
Supplementary material: File

Andersson and Wiberg supplementary material

Andersson and Wiberg supplementary material
Download Andersson and  Wiberg supplementary material(File)
File 286.9 KB