Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-01-26T23:47:46.728Z Has data issue: false hasContentIssue false

The Missing Data Assumptions of the Neat Design and their Implications for Test Equating

Published online by Cambridge University Press:  01 January 2025

Sandip Sinharay*
Affiliation:
ETS, Princeton
Paul W. Holland
Affiliation:
ETS, Princeton
*
Requests for reprints should be sent to Sandip Sinharay, ETS, Princeton, NJ, USA. E-mail: ssinharay@ets.org

Abstract

The Non-Equivalent groups with Anchor Test (NEAT) design involves missingdata that are missing by design. Three nonlinear observed score equating methods used with a NEAT design are the frequency estimation equipercentile equating (FEEE), the chain equipercentile equating (CEE), and the item-response-theory observed-score-equating (IRT OSE). These three methods each make different assumptions about the missing data in the NEAT design. The FEEE method assumes that the conditional distribution of the test score given the anchor test score is the same in the two examinee groups. The CEE method assumes that the equipercentile functions equating the test score to the anchor test score are the same in the two examinee groups. The IRT OSE method assumes that the IRT model employed fits the data adequately, and the items in the tests and the anchor test do not exhibit differential item functioning across the two examinee groups. This paper first describes the missing data assumptions of the three equating methods. Then it describes how the missing data in the NEAT design can be filled in a manner that is coherent with the assumptions made by each of these equating methods. Implications on equating are also discussed.

Type
Original Paper
Copyright
Copyright © 2010 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bishop, Y.M.M., Fienberg, E.F., & Holland, P.W. (1975). Discrete multivariate analysis, Cambridge: MIT Press.Google Scholar
Braun, H.I., & Holland, P.W. (1982). Observed-score test equating: A mathematical analysis of some ETS equating procedures. In P.W. Holland, , Rubin, D.B. (Eds.), Test equating (pp. 71135). New York: Academic Press.Google Scholar
Haberman, S.J. (2006). An elementary test of the normal 2PL model against the normal 3PL model (ETS RR-06-10). Princeton, NJ: ETS.Google Scholar
Holland, P.W. (1990). On the sampling theory foundations of item response theory models. Psychometrika, 55(4), 577601.CrossRefGoogle Scholar
Holland, P.W., & Thayer, D.T. (2000). Univariate and bivariate loglinear models for discrete test score distributions. Journal of Educational and Behavioral Statistics, 25, 133183.CrossRefGoogle Scholar
Holland, P.W., Sinharay, S., von Davier, A.A., & Han, N. (2008). An approach to evaluating the missing data assumptions of the chain and post-stratification equating methods for the NEAT design. Journal of Educational Measurement, 45, 1743.CrossRefGoogle Scholar
Kolen, M.J., & Brennan, R.J. (2004). Test equating, scaling, and linking, (2nd ed.). New York: Springer.CrossRefGoogle Scholar
Liou, M., & Cheng, P.E. (1995). Equipercentile equating via data-imputation techniques. Psychometrika, 60(1), 119136.CrossRefGoogle Scholar
Little, R.J., & Rubin, D.B. (2002). Statistical analysis with missing data, (2nd ed.). New York: Wiley.CrossRefGoogle Scholar
Livingston, S.A., Dorans, N.J., & Wright, N.K. (1990). What combination of sampling and equating methods works best?. Applied Measurement in Education, 3, 7395.CrossRefGoogle Scholar
Lord, F.M., & Wingersky, M.S. (1984). Comparison of IRT true-score and equipercentile observed-score equatings. Applied Psychological Measurement, 8, 452461.CrossRefGoogle Scholar
Marco, G.L., Petersen, N.S., & Stewart, E.E. (1983). A test of the adequacy of curvilinear score equating models. In Weiss, D. (Eds.), New horizons in testing: latent trait test theory and computerized adaptive testing, New York: Academic Press.Google Scholar
Miyazaki, K., Hoshino, T., Mayekawa, S., & Shigemasu, K. (2009). A new concurrent calibration method for nonequivalent group design under nonrandom assignment. Psychometrika, 74, 120.CrossRefGoogle Scholar
Puhan, G. (2010). A comparison of chained linear and post stratification linear equating under different testing conditions. Journal of Educational Measurement, 47(1), 5475.CrossRefGoogle Scholar
Sinharay, S. (2008). Chain equating versus post-stratification equating: An illustrative comparison. Paper presented at the conference to honor Paul Holland, Princeton, NJ.Google Scholar
Sinharay, S., & Holland, P.W. (in press). A fair comparison of three nonlinear equating methods in applications of the NEAT design. Journal of Educational Measurement.Google Scholar
Thisted, R. (1988). Elements of statistical computing, New York: Chapman and Hall.Google Scholar
von Davier, A.A., Holland, P.W., & Thayer, D.T. (2004). The kernel method of test equating, New York: Springer.CrossRefGoogle Scholar
von Davier, A.A., Holland, P.W., Livingston, S.A., Casabianca, J., Grant, M.C., & Martin, K. (2006). An evaluation of the kernel equating method. A special study with pseudo-tests constructed from real test data (ETS RR-06-02). Princeton, NJ: ETS.Google Scholar
Wang, T., Lee, W.-C., Brennan, R.J., & Kolen, M.J. (2008). A comparison of the frequency estimation and chained equipercentile methods under the common-item non-equivalent groups design. Applied Psychological Measurement, 32, 632651.CrossRefGoogle Scholar