Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-01-11T06:26:57.032Z Has data issue: false hasContentIssue false

Exemplar-Based Clustering via Simulated Annealing

Published online by Cambridge University Press:  01 January 2025

Michael J. Brusco*
Affiliation:
Florida State University
Hans-Friedrich Köhn
Affiliation:
University of Missouri-Columbia
*
Requests for reprints should be sent to Michael J. Brusco, Department of Marketing, College of Business, Florida State University, Tallahassee, FL 32306-1110, USA. E-mail: mbrusco@cob.fsu.edu

Abstract

Several authors have touted the p-median model as a plausible alternative to within-cluster sums of squares (i.e., K-means) partitioning. Purported advantages of the p-median model include the provision of “exemplars” as cluster centers, robustness with respect to outliers, and the accommodation of a diverse range of similarity data. We developed a new simulated annealing heuristic for the p-median problem and completed a thorough investigation of its computational performance. The salient findings from our experiments are that our new method substantially outperforms a previous implementation of simulated annealing and is competitive with the most effective metaheuristics for the p-median problem.

Type
Theory and Methods
Copyright
Copyright © 2009 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

An erratum to this article can be found at http://dx.doi.org/10.1007/s11336-009-9140-1

References

Aarts, E., Korst, J. (1989). Simulated annealing and Boltzmann machines: A stochastic approach to combinatorial optimization and neural computing, New York: WileyGoogle Scholar
Alba, E., Dominguez, E. (2006). Comparative analysis of modern optimization tools for the p-median problem. Statistics and Computing, 16, 251260CrossRefGoogle Scholar
Alp, O., Erkut, E., Drezner, Z. (2003). An efficient genetic algorithm for the p-median problem. Annals of Operations Research, 122, 2142CrossRefGoogle Scholar
Avella, P., Sassano, A., Vasil’ev, I. (2007). Computational study of large-scale p-median problems. Mathematical Programming A, 109, 89114CrossRefGoogle Scholar
Beasley, J.E. (1990). OR-Library: Distributing test problems by electronic mail. Journal of the Operational Research Society, 41, 10691072CrossRefGoogle Scholar
Beltran, C., Tadonki, C., & Vial, J. (2006). Solving the p-median problem with a semi-Lagrangian relaxation. Computational Optimization and Applications, June 5, 2006, doi: 10.1007/s10589-006-6513-6.CrossRefGoogle Scholar
Brusco, M.J., Cradit, J.D., Tashchian, A. (2003). Multicriterion clusterwise regression for joint segmentation settings: An application to customer value. Journal of Marketing Research, 40, 225234CrossRefGoogle Scholar
Brusco, M.J., Köhn, H.-F. (2008). Comment on ‘Clustering by passing messages between data points’. Science, 319, 726CrossRefGoogle ScholarPubMed
Brusco, M.J., Köhn, H.-F. (2008). Optimal partitioning of a data set based on the p-median model. Psychometrika, 73, 89105CrossRefGoogle Scholar
Brusco, M.J., Köhn, H.-F., Stahl, S. (2008). Heuristic implementation of dynamic programming for matrix permutation problems in combinatorial data analysis. Psychometrika, 73, 503522CrossRefGoogle Scholar
Brusco, M.J., Steinley, D. (2007). A comparison of heuristic procedures for minimum within-cluster sums of squares partitioning. Psychometrika, 72, 583600CrossRefGoogle Scholar
Ceulemans, E., Van Mechelen, I. (2008). CLASSI: A classification model for the study of sequential processes and individual differences therein. Psychometrika, 73, 107124CrossRefGoogle Scholar
Ceulemans, E., Van Mechelen, I., Leenen, I. (2007). The local minima problem in hierarchical classes analysis: An evaluation of a simulated annealing algorithm and various multistart procedures. Psychometrika, 72, 377391CrossRefGoogle Scholar
Chiyoshi, F., Galvão, R.D. (2000). A statistical analysis of simulated annealing applied to the p-median problem. Annals of Operations Research, 96, 6174CrossRefGoogle Scholar
Christofides, N., Beasley, J.E. (1982). A tree search algorithm for the p-median problem. European Journal of Operational Research, 10, 196204CrossRefGoogle Scholar
Cornuejols, G., Fisher, M.L., Nemhauser, G.L. (1977). Location of bank accounts to optimize float: An analytic study of exact and approximate algorithms. Management Science, 23, 789810CrossRefGoogle Scholar
Du Merle, O., & Vial, J.-P. (2002). Proximal-ACCPM, a cutting plane method for column generation and Lagrangian relaxation: application to the p -median problem (Technical report 2002.23). HEC Genève, University of Genève.Google Scholar
Forgy, E.W. (1965). Cluster analyses of multivariate data: Efficiency versus interpretability of classifications. Biometrics, 21, 768Google Scholar
Frey, B., Dueck, D. (2007). Clustering by passing messages between data points. Science, 315, 972976CrossRefGoogle ScholarPubMed
Frey, B., Dueck, D. (2008). Response to comment on “Clustering by passing messages between data points”. Science, 319, 726CrossRefGoogle Scholar
Galvão, R.D. (1980). A dual-bounded algorithm for the p-median problem. Operations Research, 28, 11121121CrossRefGoogle Scholar
Hanjoul, P., Peeters, D. (1985). A comparison of two dual-based procedures for solving the p-median problem. European Journal of Operational Research, 20, 387396CrossRefGoogle Scholar
Hansen, P., Mladenović, N. (1997). Variable neighborhood search for the p-median. Location Science, 5, 207226CrossRefGoogle Scholar
Hansen, P., Mladenović, N. (2008). Complement to a comparative analysis of heuristics for the p-median problem. Statistics and Computing, 18, 4146CrossRefGoogle Scholar
Hansen, P., Mladenović, N., Perez-Brito, D. (2001). Variable neighborhood decomposition search. Journal of Heuristics, 7, 335350CrossRefGoogle Scholar
Hartigan, J.A., Wong, M.A. (1979). Algorithm AS136: A k-means clustering program. Applied Statistics, 28, 100128CrossRefGoogle Scholar
Howard, R.N. (1966). Classifying a population into homogeneous groups. In Lawrence, J.R. (Eds.), Operational research and social sciences (pp. 585594). London: TavistockGoogle Scholar
Hubert, L., Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193218CrossRefGoogle Scholar
Kaufman, L., Rousseeuw, P.J. (1990). Finding groups in data: an introduction to cluster analysis, New York: WileyCrossRefGoogle Scholar
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P. (1983). Optimization by simulated annealing. Science, 220, 671680CrossRefGoogle ScholarPubMed
Klastorin, T. (1985). The p-median problem for cluster analysis: A comparative test using the mixture model approach. Management Science, 31, 8495CrossRefGoogle Scholar
Kuehn, A.A., Hamburger, M.J. (1963). A heuristic program for locating warehouses. Management Science, 9, 643666CrossRefGoogle Scholar
Levanova, T., Loresh, M.A. (2004). Algorithms of ant system and simulated annealing for the p-median problem. Automation and Remote Control, 65, 431438CrossRefGoogle Scholar
Lin, S., Kernighan, B.W. (1973). An effective heuristic algorithm for the traveling salesman problem. Operations Research, 21, 498516CrossRefGoogle Scholar
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Le Cam, L.M., Neyman, J. (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (pp. 281297). Berkeley: University of California PressGoogle Scholar
Maranzana, F.E. (1964). On the location of supply points to minimize transportation costs. Operational Research Quarterly, 15, 261270CrossRefGoogle Scholar
Mladenović, N., Brimberg, J., Hansen, P., Moreno-Pérez, J.A. (2007). The p-median problem: A survey of metaheuristic approaches. European Journal of Operational Research, 179, 927939CrossRefGoogle Scholar
Moreno-Pérez, J.A., García-Roda, J.L., Moreno-Vega, J.M. (1994). A parallel genetic algorithm for the discrete p-median problem. Studies in Location Analysis, 7, 131141Google Scholar
Mulvey, J.M., Crowder, H.P. (1979). Cluster analysis: An application of Lagrangian relaxation. Management Science, 25, 329340CrossRefGoogle Scholar
Murillo, A., Vera, J.-F., Heiser, W.J. (2005). A permutation-translation simulated annealing algorithm for L 1 and L 2 unidimensional scaling. Journal of Classification, 22, 119138CrossRefGoogle Scholar
Murray, A.T., Church, R.L. (1996). Applying simulated annealing to location-planning models. Journal of Heuristics, 2, 3153CrossRefGoogle Scholar
Narula, S.C., Ogbu, U.I., Samuelsson, H.M. (1977). An algorithm for the p-median problem. Operations Research, 25, 709713CrossRefGoogle Scholar
Rao, M.R. (1971). Cluster analysis and mathematical programming. Journal of the American Statistical Association, 66, 622626CrossRefGoogle Scholar
Resende, M.G.C., Werneck, R.F. (2003). On the implementation of a swap-based local-search procedure for the p-median problem. In Ladner, R.E. (Eds.), Proceedings of the fifth workshop on algorithm engineering and experiments (pp. 119127). Philadelphia: SIAMGoogle Scholar
Resende, M.G.C., Werneck, R.F. (2004). A hybrid heuristic for the p-median problem. Journal of Heuristics, 10, 5988CrossRefGoogle Scholar
ReVelle, C.S., Swain, R. (1970). Central facilities location. Geographical Analysis, 2, 3042CrossRefGoogle Scholar
Rolland, E., Schilling, D.A., Current, J.R. (1996). A efficient tabu search procedure for the p-median problem. European Journal of Operational Research, 96, 329342CrossRefGoogle Scholar
Rosing, K.E. (1997). An empirical investigation of the effectiveness of a vertex substitution heuristic. Environment and Planning B, 24, 5967CrossRefGoogle Scholar
Rosing, K.E., ReVelle, C.S. (1997). Heuristic concentration: Two stage solution construction. European Journal of Operational Research, 97, 7586CrossRefGoogle Scholar
Rosing, K.E., ReVelle, C.S., Rolland, E., Schilling, D.A., Current, J.R. (1998). Heuristic concentration and tabu search: A head to head comparison. European Journal of Operational Research, 104, 9399CrossRefGoogle Scholar
Steinhaus, H. (1956). Sur la division des corps matériels en parties. Bulletin de l’Académie Polonaise des Sciences, Classe III Mathématique, Astronomie, Physique, Chimie, Géologie, et Géographie, IV(12), 801804Google Scholar
Steinley, D. (2004). Properties of the Hubert-Arabie adjusted Rand index. Psychological Methods, 9, 386396CrossRefGoogle ScholarPubMed
Steinley, D. (2006). K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 134CrossRefGoogle ScholarPubMed
Taillard, E.D. (2003). Heuristic methods for large centroid clustering problems. Journal of Heuristics, 9, 5174CrossRefGoogle Scholar
Teitz, M.B., Bart, P. (1968). Heuristic methods for estimating the generalized vertex median of a weighted graph. Operations Research, 16, 955961CrossRefGoogle Scholar
Thorndike, R.L. (1953). Who belongs in the family?. Psychometrika, 18, 267276CrossRefGoogle Scholar
van Laarhoven, P.J.M., Aarts, E.H.L. (1987). Simulated annealing: Theory and applications, Dordrecht: KluwerCrossRefGoogle Scholar
Vera, J.-F., Heiser, W.J., Murillo, A. (2007). Global optimization in any Minkowski Metric: A permutation-translation simulated annealing algorithm for multidimensional scaling. Journal of Classification, 24, 277301CrossRefGoogle Scholar
Vinod, H. (1969). Integer programming and the theory of grouping. Journal of the American Statistical Association, 64, 506517CrossRefGoogle Scholar
Whitaker, R. (1983). A fast algorithm for the greedy interchange of large-scale clustering and median location problems. INFOR, 21, 95108Google Scholar