Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-01-25T20:38:41.940Z Has data issue: false hasContentIssue false

Cluster Analysis for Cognitive Diagnosis: Theory and Applications

Published online by Cambridge University Press:  01 January 2025

Chia-Yi Chiu
Affiliation:
Rutgers, The State University of New Jersey
Jeffrey A. Douglas*
Affiliation:
University of Illinois
Xiaodong Li
Affiliation:
Merck & Company, Inc.
*
Requests for reprints should be sent to Jeffrey A. Douglas, 101 Illini Hall, 725 S. Wright St., Champaign, IL 61820, USA. E-mail: jeffdoug@uiuc.edu

Abstract

Latent class models for cognitive diagnosis often begin with specification of a matrix that indicates which attributes or skills are needed for each item. Then by imposing restrictions that take this into account, along with a theory governing how subjects interact with items, parametric formulations of item response functions are derived and fitted. Cluster analysis provides an alternative approach that does not require specifying an item response model, but does require an item-by-attribute matrix. After summarizing the data with a particular vector of sum-scores, K-means cluster analysis or hierarchical agglomerative cluster analysis can be applied with the purpose of clustering subjects who possess the same skills. Asymptotic classification accuracy results are given, along with simulations comparing effects of test length and method of clustering. An application to a language examination is provided to illustrate how the methods can be implemented in practice.

Type
Theory and Methods
Copyright
Copyright © 2009 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

We would like to thank the English Language Institute at the University of Michigan for data and the National Science Foundation for funding (grant number 0648882).

References

Blashfield, P.K. (1976). Mixture model tests of cluster analysis: accuracy of four agglomerative hierachical methods. Psychological Bulletin, 83, 377385.CrossRefGoogle Scholar
Bradley, P.S., Fayyad, U.M. (1998). Refining initial points for K-means clustering. In Shavlik, J. (Eds.), Proceedings of the fifteenth international conference on machine learning (pp. 9199). Burlington: Morgan Kaufmann.Google Scholar
Bartholomew, D.J. (1987). Latent variable models and factor analysis, New York: Oxford University Press.Google Scholar
Cunnningham, K.M., Ogilvie, J.C. (1972). Evaluation of hierachical grouping techniques: A preliminary study. Computer Journal, 15, 209213.CrossRefGoogle Scholar
de la Torre, J., Douglas, J.A. (2004). Higher order latent trait models for cognitive diagnosis. Psychometrika, 69, 333353.CrossRefGoogle Scholar
Embretson, S. (1997). Multicomponent response models. In van der Linden, W.J., Hambleton, R.K. (Eds.), Handbook of modern item response theory (pp. 305321). New York: Springer.CrossRefGoogle Scholar
Everitt, B.S., Landau, S., Leese, M. (2001). Cluster analysis, (4th ed.). London: Arnold.Google Scholar
Forgy, E.W. (1965). Cluster analysis of multivariate data: Efficiency versus interpretability of classifications. Biometrics, 21, 768769.Google Scholar
Hartigan, J.A. (1978). Asymptotic distributions for clustering criteria. The Annals of Statistics, 6, 117131.CrossRefGoogle Scholar
Haertel, E.H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 333352.CrossRefGoogle Scholar
Hands, S., Everitt, B.S. (1987). A Monte Carlo study of the recovery of cluster structure in binary data by hierarchical clustering techiniques. Multivariate Behavioural Research, 22, 235243.CrossRefGoogle Scholar
Hartigan, J.A. (1975). Clustering algorithms, New York: Wiley.Google Scholar
Hartz, S., Roussos, L., Henson, R., & Templin, J. (2005). The Fusion Model for skill diagnosis: Blending theory with practicality. Unpublished manuscript.Google Scholar
Henson, R., & Templin, J. (2007). Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago, IL.Google Scholar
Hoeffding, W. (1963). Probabilistic inequalities for sums of bounded random variables. Annals of Mathematical Statistics, 58, 1330.Google Scholar
Hubert, L.J., Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193218.CrossRefGoogle Scholar
Junker, B.W., Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258272.CrossRefGoogle Scholar
Kaufman, J., Rousseuw, P. (1990). Finding groups in data: An introduction to cluster analysis, New York: Wiley.CrossRefGoogle Scholar
Kuiper, F.K., Fisher, L. (1975). A Monte Carlo comparison of six clustering procedures. Biometrics, 31, 777783.CrossRefGoogle Scholar
Lattin, J., Carroll, J.D., Green, P.E. (2003). Analyzing multivariate data, Pacific Grove: Brooks/Cole, Thomson Learning.Google Scholar
Liu, Y., Douglas, J., & Henson, R. (2007). Testing person fit in cognitive diagnosis. Unpublished manuscript.Google Scholar
MacQueen, J. (1967). Some methods of classification and analysis of multivariate observations. In Le Cam, L.M., Neyman, J. (Eds.), Proceedings of the fifth Bekeley Symposium on Mathematical Statistics and Probability (pp. 281–207). Berkeley: University of California Press.Google Scholar
Macready, G.B., Dayton, C.M. (1977). The use of probabilistic models in the assessment of mastery. Journal of Educational Statistics, 33, 379416.Google Scholar
Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187212.CrossRefGoogle Scholar
Milligan, G.W. (1980). An examination of the effects of six types of error perturbation on fifteen clustering algorithms. Psychometrika, 45, 325342.CrossRefGoogle Scholar
Muthén, L.K., Muthén, B.O. (2006). Mplus user’s guide, (4th ed.). Los Angeles: Muthén & Muthén.Google Scholar
Pena, J., Lozano, J., Larranaga, P. (1999). An empirical comparison of four initialization methods for the K-means algorithm. Pattern Recognition Letters, 20, 10271040.CrossRefGoogle Scholar
Pollard, D. (1981). Strong consistency of K-means clustering. The Annals of Statistics, 9(1), 135140.CrossRefGoogle Scholar
Pollard, D. (1982). Quantization and the method of K-means. IEEE Transactions on Information Theory, 28, 199205.CrossRefGoogle Scholar
Punj, G., Stewart, D.W. (1983). Cluster analysis in marketing research: A review and suggestions for application. Journal of Marketing Research, 20, 134148.CrossRefGoogle Scholar
Rupp, A.A., & Templin, J.L. (2007). Unique characteristics of cognitive diagnosis models. The Annual Meeting of the National Council for Measurement in Education, Chicago, April 2007.Google Scholar
Steinley, D. (2003). Local optima in k-means clustering: What you don’t know may hurt you. Psychological Methods, 8, 294304.CrossRefGoogle ScholarPubMed
Steinley, D. (2006). K-mean clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 134.CrossRefGoogle Scholar
Tatsuoka, C. (2002). Data-analytic methods for latent partially ordered classification models. Applied Statistics (JRSS-C), 51, 337350.Google Scholar
Tatsuoka, K. (1985). A probabilistic model for diagnosing misconceptions in the pattern classification approach. Journal of Educational Statistics, 12, 5573.CrossRefGoogle Scholar
Templin, J.L., Henson, R.A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287305.CrossRefGoogle ScholarPubMed
Templin, J., Henson, R., & Douglas, J. (2007). General theory and estimation of cognitive diagnosis models: Using Mplus to rerive model estimates. Unpublished manuscript.Google Scholar
von Davier, M. (2005). A general diagnostic model applied to language testing data. Educational Testing Service, Research Report, RR-05-16.CrossRefGoogle Scholar
Ward, J.H. (1963). Hierarchical Grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236244.CrossRefGoogle Scholar
Willse, J.T., Henson, R.A., & Templin, J.L. (2007). Using sumscores or IRT in place of cognitive diagnostic models: Can more familiar models do the job? Presented at the annual meeting of the National Council on Measurement in Education, Chicago, Illinois.Google Scholar