Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-01-25T18:55:00.109Z Has data issue: false hasContentIssue false

Evidence and Inference in Educational Assessment

Published online by Cambridge University Press:  01 January 2025

Robert J. Mislevy*
Affiliation:
Educational Testing Service
*
Requests for reprints should be sent to Robert J. Mislevy, Educational Testing Service, Princeton, NJ 08541.

Abstract

Educational assessment concerns inference about students' knowledge, skills, and accomplishments. Because data are never so comprehensive and unequivocal as to ensure certitude, test theory evolved in part to address questions of weight, coverage, and import of data. The resulting concepts and techniques can be viewed as applications of more general principles for inference in the presence of uncertainty. Issues of evidence and inference in educational assessment are discussed from this perspective.

Type
Original Paper
Copyright
Copyright © 1994 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Presidential address to the Psychometric Society, presented June 25, 1994, in Champaign, Illinois.

Supported by (1) Contract No. N00014-91-J-4101, R&T 4421573-01, from the Cognitive Science Program, Cognitive and Neural Sciences Division, Office of Naval Research, (2) the National Center for Research on Evaluation, Standards, Student Testing (CRESST), Educational Research and Development Program, cooperative agreement number R117G10027 and CFDA catalog number 84.117G, as administered by the Office of Educational Research and Improvement, U.S. Department of Education, and (3) the Statistical and Psychometric Research Division of Educational Testing Service. I am grateful for comments and suggestions from Henry Braun, Drew Gitomer, Richard Patz, Jonathan Troper, and Howard Wainer.

References

Aitkin, M., Longford, N. (1986). Statistical modeling issues in school effectiveness studies. Journal of the Royal Statistical Society, 149, 143.CrossRefGoogle Scholar
American Council on the Training of Foreign Languages (1989). ACTFL proficiency guidelines, Yonkers, NY: Author.Google Scholar
Andreassen, S., Jensen, F. V., Olesen, K. G. (1990). Medical expert systems based on causal probabilistic networks, Aalborg, Denmark: Aalborg University, Institute of Electronic Systems.Google Scholar
Andersen, S. K., Jensen, F. V., Olesen, K. G., Jensen, F. (1989). HUGIN: A shell for building Bayesian belief universes for expert systems [computer program], Aalborg, Denmark: HUGIN Expert.Google Scholar
Anderson, T. J., Twining, W. L. (1991). Analysis of evidence, Boston: Little, Brown, & Co..Google Scholar
Askin, W. (1985). Evaluating the Advanced Placement portfolio in studio art, Princeton, NJ: Educational Testing Service.Google Scholar
Bentham, J. (1825). A treatise on judicial evidence, London: Hunt & Clarke.Google Scholar
Bentham, J. (1827). Rationale of judicial evidence, London: Hunt & Clarke.Google Scholar
Box, G. E. P., Tiao, G. C. (1973). Bayesian inference in statistical analysis, Reading, MA: Addison-Wesley.Google Scholar
Cohen, L. J. (1977). The probable and the provable, Oxford: The Clarendon Press.CrossRefGoogle Scholar
Cronbach, L. J., Gleser, G. C., Nanda, H., Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles, New York: Wiley.Google Scholar
de Finetti, B. (1974). Theory of probability, London: Wiley.Google Scholar
Diaconis, P., Freedman, D. (1980). Finite exchangeable sequences. The Annals of Probability, 8, 745764.CrossRefGoogle Scholar
Falmagne, J-C. (1989). A latent trait model via a stochastic learning theory for a knowledge space. Psychometrika, 54, 283303.CrossRefGoogle Scholar
Glaser, R., Lesgold, A., Lajoie, S. (1987). Toward a cognitive theory for the measurement of achievement. In Ronning, R., Glover, J., Conoley, J. C., Witt, J. (Eds.), The influence of cognitive psychology on testing and measurement: The Buros-Nebraska Symposium on measurement and testing (pp. 4185). Hillsdale, NJ: Erlbaum.Google Scholar
Good, I. J. (1950). Probability and the weighting of evidence, New York: Hafner.Google Scholar
Greeno, J. G. (1989). A perspective on thinking. American Psychologist, 44, 134141.CrossRefGoogle Scholar
Gulliksen, H. (1961). Measurement of learning and mental abilities. Psychometrika, 26, 93107.CrossRefGoogle ScholarPubMed
Haertel, E. H., Wiley, D. E. (1993). Representations of ability structures: Implications for testing. In Frederiksen, N., Mislevy, R. J., Bejar, I. I. (Eds.), Test theory for a new generation of tests (pp. 359384). Hillsdale, NJ: Erlbaum.Google Scholar
Holland, P. W., Rosenbaum, P. R. (1986). Conditional association and unidimensionality in monotone latent trait variable models. Annals of Statistics, 14, 15231543.CrossRefGoogle Scholar
Inhelder, B., Piaget, J. (1958). The growth of logical thinking from childhood to adolescence, New York: Basic.CrossRefGoogle Scholar
Jöreskog, K. G., Sörbom, D. (1979). Advances in factor analysis and structural equation models, Cambridge, MA: Abt Books.Google Scholar
Kahneman, D., Slovic, P., Tversky, A. (1982). Judgment under uncertainty: Heuristics and biases, Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Kempf, W. (1983). Some theoretical concerns about applying latent trait models in educational testing. In Anderson, S. B., Helmick, J. S. (Eds.), On educational testing (pp. 252270). San Francisco: Josey-Bass.Google Scholar
Kolmogorov, A. N. (1950). Foundations of the theory of probability, New York: Chelsea.Google Scholar
Koretz, D. (1992). Evaluating and validating indicators of mathematics and science education, Santa Monica, CA: RAND.Google Scholar
Kuhn, T. S. (1970). The structure of scientific revolutions 2nd ed.,, Chicago: University of Chicago Press.Google Scholar
Kyllonen, P. C., Lohman, D. F., Snow, R. E. (1984). Effects of aptitudes, strategy training, and test facets on spatial task performance. Journal of Educational Psychology, 76, 130145.CrossRefGoogle Scholar
Lakatos, I. (1970). Falsification and the methodology of scientific research programs. In Lakatos, I., Musgrove, A. (Eds.), Criticism and the growth of knowledge (pp. 91196). Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Lauritzen, S. L., Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application to expert systems (with discussion). Journal of the Royal Statistical Society, 50, 157224.CrossRefGoogle Scholar
Lazarsfeld, P. F. (1950). The logical and mathematical foundation of latent structure analysis. In Stouffer, S. A., Guttman, L., Suchman, E. A., Lazarsfeld, P. F., Star, S. A., Clausen, J. A. (Eds.), Studies in social psychology in World War II, Volume 4: Measurement and prediction (pp. 362412). Princeton, NJ: Princeton University Press.Google Scholar
Levine, M., Drasgow, F. (1982). Appropriateness measurement: Review, critique, and validating studies. British Journal of Mathematical and Statistical Psychology, 35, 4256.CrossRefGoogle Scholar
Lewis, C. (1986). Test theory and Psychometrika: The past twenty-five years. Psychometrika, 51, 1122.CrossRefGoogle Scholar
Linacre, J. M. (1989). Multi-faceted Rasch measurement, Chicago: MESA Press.Google Scholar
Lindley, D. V., Novick, M. R. (1981). The role of exchangeability of inference. Annals of Statistics, 9, 4558.CrossRefGoogle Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems, Hillsdale, NJ: Erlbaum.Google Scholar
Martin, J. D., VanLehn, K. (1993). OLEA: Progress toward a multi-activity, Bayesian student modeler. In Brna, S. P., Ohlsson, S., Pain, H. (Eds.), Artificial intelligence in education: Proceedings of AI-ED 93 (pp. 410417). Charlottesville, VA: Association for the Advancement of Computing in Education.Google Scholar
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177196.CrossRefGoogle Scholar
Mislevy, R. J. (in press). Probability-based inference in cognitive diagnosis. In Nichols, P., Chipman, S., & Brennan, R. (Eds.), Cognitively diagnostic assessment. Hillsdale, NJ: Erlbaum.Google Scholar
Mislevy, R. J., Sheehan, K. M. (1989). The role of collateral information about examinees in item parameter estimation. Psychometrika, 54, 661679.CrossRefGoogle Scholar
Mislevy, R. J., Sheehan, K. M., Wingersky, M. S. (1993). How to equate tests with little or no data. Journal of Educational Measurement, 30, 5578.CrossRefGoogle Scholar
Mislevy, R. J., Yamamoto, K., Anacker, S. (1992). Toward a test theory for assessing student understanding. In Lesh, R. A., Lamon, S. (Eds.), Assessments of authentic performance in school mathematics (pp. 293318). Washington, DC: American Association for the Advancement of Science.Google Scholar
Mitchell, R. (1992). Testing for learning: How new approaches to evaluation can improve American schools, New York: The Free Press.Google Scholar
Myford, C. M., & Mislevy, R. J. (in press). Monitoring and improving a portfolio assessment system (ETS Research Report). Princeton, NJ: Educational Testing Service.Google Scholar
Noetic Systems (1991). ERGO [computer program], Baltimore, MD: Author.Google Scholar
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference, San Mateo, CA: Kaufmann.Google Scholar
Peploe, M., Wollen, P., Antonioni, M. (1975). The passenger, New York, NY: Random House.Google Scholar
Platt, W. J. (1975). Policy making and international studies in educational evaluation. In Purves, A. C., Levine, D. U. (Eds.), Educational policy and international assessment (pp. 3359). Berkeley, CA: McCutchen.Google Scholar
Posner, G. (1993). Case closed: Lee Harvey Oswald and the assassination of JFK, New York: Random House.Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests, Copenhagen: Danish Institute for Educational Research.Google Scholar
Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics, 12, 11511172.CrossRefGoogle Scholar
Schum, D. A. (1981). Sorting out the effects of witness sensitivity and response-criterion placement upon the inferential value of testimonial evidence. Organizational Behavior and Human Performance, 27, 153196.CrossRefGoogle Scholar
Schum, D. A. (1987). Evidence and inference for the intelligence analyst, Lanham, MD: University Press of America.Google Scholar
Shafer, G. (1976). A mathematical theory of evidence, Princeton: Princeton University Press.CrossRefGoogle Scholar
Shafer, G., Shenoy, P. (1988). Bayesian and belief-function propagation, Lawrence, KS: University of Kansasm, School of Business.Google Scholar
Siegler, R. S. (1981). Developmental sequences within and between concepts. Monograph of the Society for Research in Child Development, Serial No. 189, 46.Google Scholar
Spearman, C. (1904). “General intelligence” objectively determined and measured. American Journal of Psychology, 15, 201292.CrossRefGoogle Scholar
Spearman, C. (1927). The abilities of man: Their nature and measurement, New York: Macmillan.Google Scholar
Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345354.CrossRefGoogle Scholar
Tatsuoka, K. K. (1987). Validation of cognitive sensitivity for item response curves. Journal of Educational Measurement, 24, 233245.CrossRefGoogle Scholar
Tatsuoka, K. K. (1990). Toward an integration of item response theory and cognitive error diagnosis. In Frederiksen, N., Glaser, R., Lesgold, A., Shafto, M. G. (Eds.), Diagnostic monitoring of skill and knowledge acquisition (pp. 453488). Hillsdale, NJ: Erlbaum.Google Scholar
Thompson, P. W. (1982). Were lions to speak, we wouldn't understand. Journal of Mathematical Behavior, 3, 147165.Google Scholar
Twining, W. L. (1985). Theories of evidence: Bentham and Wigmore, Stanford, CA: Stanford University Press.Google Scholar
VanLehn, K. (1990). Mind bugs: The origins of procedural misconceptions, Cambridge, MA: MIT Press.Google Scholar
Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., Mislevy, R. J., Steinberg, L., Thissen, D. (1990). Computerized adaptive testing: A primer, Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
Wigmore, J. H. (1937). The science of judicial proof 3rd ed.,, Boston: Little, Brown, & Co.Google Scholar
Wolf, D., Bixby, J., Glenn, J., Gardner, H. (1991). To use their minds well: Investigating new forms of student assessment. In Grant, G. (Eds.), Review of Educational Research, Vol. 17 (pp. 3174). Washington, DC: American Educational Research Association.Google Scholar
Wright, S. (1934). The method of path coefficients. Annals of Mathematical Statistics, 5, 161215.CrossRefGoogle Scholar
Yamamoto, K. (1987). A model that combines IRT and latent class models. Unpublished doctoral dissertation, University of Illinois, Champaign-Urbana.Google Scholar