Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-01-11T08:58:00.053Z Has data issue: false hasContentIssue false

On the Sampling Theory Roundations of Item Response Theory Models

Published online by Cambridge University Press:  01 January 2025

Paul W. Holland*
Affiliation:
Educational Testing Service
*
Requests for reprints should be sent to Paul W. Holland, Educational Testing Service, Rosedale Road 21-T, Princeton, NJ 08541.

Abstract

Item response theory (IT) models are now in common use for the analysis of dichotomous item responses. This paper examines the sampling theory foundations for statistical inference in these models. The discussion includes: some history on the “stochastic subject” versus the random sampling interpretations of the probability in IRT models; the relationship between three versions of maximum likelihood estimation for IRT models; estimating θ versus estimating θ-predictors; IRT models and loglinear models; the identifiability of IRT models; and the role of robustness and Bayesian statistics from the sampling theory perspective.

Type
Original Paper
Copyright
Copyright © 1990 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

A presidential address can serve many different functions. This one is a report of investigations I started at least ten years ago to understand what IRT was all about. It is a decidedly one-sided view, but I hope it stimulates controversy and further research. I have profited from discussions of this material with many people including: Brian Junker, Charles Lewis, Nicholas Longford, Robert Mislevy, Ivo Molenaar, Donald Rock, Donald Rubin, Lynne Steinberg, Martha Stocking, William Stout, Dorothy Thayer, David Thissen, Wim van der Linden, Howard Wainer, and Marilyn Wingersky. Of course, none of them is responsible for any errors or misstatements in this paper. The research was supported in part by the Cognitive Science Program, Office of Naval Research under Contract No. Nooo14-87-K-0730 and by the Program Statistics Research Project of Educational Testing Service.

References

Andersen, E. B. (1970). Asymptotic properties of conditional maximum likelihood estimators. Journal of the Royal Statistical Society, Series B, 32, 283301.CrossRefGoogle Scholar
Andersen, E. B. (1980). Discrete statistical models with social science applications, Amsterdam: North Holland.Google Scholar
Birch, M. W. (1964). A new proof of the Pearson-Fisher theorem. Annals of Mathematical Statistics, 35, 718824.CrossRefGoogle Scholar
Birnbaum, Z. W. (1967). Statistical theory for logistic mental test models with a prior distribution of ability, Princeton, NJ: Educational Testing Service.CrossRefGoogle Scholar
Bock, R. D. (1967, March). Fitting a response model for n dichotomous items. Paper read at the Psychometric Society Meeting, Madison, WI.Google Scholar
Bock, R. D., & Aitken, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443459.CrossRefGoogle Scholar
Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35, 179197.CrossRefGoogle Scholar
Bush, R. R., & Mosteller, F. (1955). Stochastic models for learning, New York: Wiley.CrossRefGoogle Scholar
Cressie, N., & Holland, P. W. (1983). Characterizing the manifest probabilities of latent trait models. Psychometrika, 48, 129141.CrossRefGoogle Scholar
de Leeuw, J., & Verhelst, N. (1986). Maximum likelihood estimation in generalized Rasch models. Journal of Educational Statistics, 11, 183196.CrossRefGoogle Scholar
Follman, D. A. (1988). Consistent estimation in the Rasch model based on nonparametric margins. Psychometrika, 53, 553562.CrossRefGoogle Scholar
Guttman, L. (1941). The quantification of a class of attributes: A theory and method of scale construction. In Horst, P. et al. (Eds.), The prediction of personal adjustment (pp. 319348). New York: Social Science Research Council.Google Scholar
Guttman, L. (1950). The basis for scalogram analysis. In Stoufer, S. A. et al. (Eds.), Studies in social psychology in World War II, Vol. 4, measurement and prediction (pp. 6090). Princeton, NJ: Princeton University Press.Google Scholar
Haberman, S. J. (1977). Maximum likelihood estimates in exponential response models. Annals of Statistics, 5, 815841.CrossRefGoogle Scholar
Holland, P. W. (1981). when are item response models consistent with observed data?. Psychometrika, 46, 7992.CrossRefGoogle Scholar
Holland, P. W. (1990). The Dutch Identity: A new tool for the study of item response models. Psychometrika, 55, 518.CrossRefGoogle Scholar
Holland, P. W., & Rosenbaum, P. R. (1986). Conditional association and unidimensionality in monotone latent variable models. Annals of Statistics, 14, 15231543.CrossRefGoogle Scholar
Junker, B. W. (1988). Statistical aspects of a new latent trait model. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign, Department of Statistics.Google Scholar
Junker, B. W. (1989). conditional association, essential independence and local independence, Unpublished manuscript, University of Illinois at Urbana-Champaign, Department of Statistics.Google Scholar
Junker, B. W. (in press). Essential independence and likelihood-based ability estimation for polytomous items. Psychometrika.Google Scholar
Lawley, D. N. (1943). On problems connected with item selection and test construction. Proceedings of the Royal Statistical Society of Edinburgh, 61, 273287.Google Scholar
Lazarsfeld, P. F. (1950). The logical and mathematical foundations of latent structure analysis. In Stoufer, S. A. et al. (Eds.), Studies in social psychology in Wold War II, Vol. 4, measurement and prediction (pp. 362412). Princeton, NJ: Princeton University Press.Google Scholar
Lazarsfeld, P. F. (1959). Latent structure analysis. In Koch, S. (Eds.), Psychology: A study of a science, Volume 3 (pp. 476543). New York: McGraw Hill.Google Scholar
Leonard, T. (1975). Bayesian estimation methods for two-way contingency tables. Journal of the royal Statistical Society, Series B, 37, 2337.CrossRefGoogle Scholar
Levine, M. V. (1989). Ability distribution, pattern probabilities and quasidensities, Champaign, IL: University of Illinois, Model Based Measurement Laboratory.Google Scholar
Lewis, C. (1985). Developments in nonparametric ability estimation. In Weiss, D. J. (Eds.), Proceedings of the 1982 IRT/CAT conference (pp. 105122). Minneapolis, MN: University of Minnesota.Google Scholar
Lewis, C. (1990). A discrete, ordinal IRT model. Paper presented at the Annual Meeting of the American Educational Research Association, Boston, MA.Google Scholar
Lindsay, B., Clogg, C. C., & Grego, J. (in press). Semi-parametric estimation in the Rasch model and related exponential response models, including a simple latent class model for item analysis. Journal of the American Statistical Association.Google Scholar
Lord, F. M. (1952). A theory of test scores. Psychometrika Monograph No. 7, 17 (4, Pt. 2).Google Scholar
Lord, F. M. (1967). An analysis of the Verbal Scholastic Aptitude Test using Brinbaum's three-parametric logistic model, Princeton, NJ: Education Testing Service.Google Scholar
Lord, F. M. (1974). Estimation of latent ability and item parameters when they are omitted responses.. Psychometrika, 39, 247264.CrossRefGoogle Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores, Reading, MA: Addison-Wesley.Google Scholar
Mislevy, R., & Stocking, M. (1989). A consumer's guide to LOGIST and BILOG. Applied Psychological Measurement, 13, 5775.CrossRefGoogle Scholar
Oakes, D. (1988). Semi-parametric models. In Kotz, S. & Johnson, N. L. (Eds.), Encyclopedia of statistical science, Volume 8 (pp. 367369). New York: Wiley.Google Scholar
Rasch, G. (1960). Probabilistic medoels for some intelligence and attainment tests, Copenhagen: Nielson and Lydiche. (for Danmarks Paedagogiske Institut).Google Scholar
Rosenbaum, P. R. (1984). Testing the conditional independence and monotonicity assumptions of item response theory. Psychometrika, 49, 425436.CrossRefGoogle Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph No. 17, 33, (4, Pt. 2).Google Scholar
Samejima, F. (1972). A general model for free response data. Psychometrika Monograph No. 18, 34, (4, Pt. 2).Google Scholar
Samejima, F. (1983). Some methods and approaches of estimating the operating characteristics of discrete item responses. In Wainer, H. & Messick, S. (Eds.), Principals (sic) of modern psychological measurement (pp. 154182). Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
Stout, W. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrika, 52, 589617.CrossRefGoogle Scholar
Stout, W. (1990). A new item response theory modeling approach with applications to unidimensionality assesment and ability estimation. Psychometrika, 55, 293325.CrossRefGoogle Scholar
Thissen, D. (1982). Marginal maximum liklihood estimation for the one-parameter logistic model. Psychometrika, 47, 175186.CrossRefGoogle Scholar
Tjur, T. (1982). A connection between Rasch's item analysis model and a multiplicative Poisson model. Scandinavian Journal of Statistics, 9, 2330.Google Scholar
Tsao, R. (1967). A second order exponental model for multidimensional dichotomous contingency tables with applications in medical diagnosis. Unpublished doctoral disseration, Harvard University, Department of Statistics.Google Scholar
Tucker, L. R. (1964). Maximum validity of a test with equivlent items. Psychometrika, 11, 114.CrossRefGoogle Scholar
Wainer, H. et al. (1990). Computerized adaptive testing: A primer, Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
Wright, B. D. (1977). Solving meassurement problems with the Rasch model. Journal of Educational Measurement, 14, 97116.CrossRefGoogle Scholar
Wright, B. D., & Douglas, G. A. (1977). Best procedures for sample-free item analysis. Applied Psychological Measurement, 1, 281295.CrossRefGoogle Scholar
Wright, B. D., & Stone, M. H. (1979). Best test design, Chicago: Mesa Press.Google Scholar