Cluster Analysis: An Application of Lagrangian Relaxation

This paper presents and tests an effective optimization algorithm for clustering homogeneous data. The algorithm iteratively employs a subgradient method for determining lower bounds and a simple search procedure for determining upper bounds. The overall objective is to assign n objects to m mutually exclusive “clusters” such that the sum of the distances from each object to a designated cluster median is minimum. The model represents a special case of the uncapacitated facility location and m-median problems. This technique has proven efficient for examples with n ≤ 200 i.e., the number of 0-1 variables ≤ 40,000; computational experiences with 10 real-world clustering applications are provided. A comparison with a hierarchical agglomerative heuristic, the minimum squared error method, is included. It is shown that the optimization algorithm is an effective solution technique for the homogeneous clustering problem, and also a good method for providing tight lower bounds for evaluating the quality of solutions generated by other procedures.

[1]  T. Dalenius The Problem of Optimum Stratification , 1950 .

[2]  I. J. Schoenberg,et al.  The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[3]  S. Agmon The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[4]  Walter D. Fisher On Grouping for Maximum Homogeneity , 1958 .

[5]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[6]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[7]  T. L. Ray,et al.  A Branch-Bound Algorithm for Plant Location , 1966, Oper. Res..

[8]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[9]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[10]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[11]  Kurt Spielberg,et al.  Algorithms for the Simple Plant-Location Problem with Some Side Conditions , 1969, Oper. Res..

[12]  Robert E. Jensen,et al.  A Dynamic Programming Algorithm for Cluster Analysis , 1969, Oper. Res..

[13]  Richard M. Karp,et al.  The Traveling-Salesman Problem and Minimum Spanning Trees , 1970, Oper. Res..

[14]  M. Rao Cluster Analysis and Mathematical Programming , 1971 .

[15]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[16]  Leon Cooper,et al.  N‐DIMENSIONAL LOCATION MODELS: AN APPLICATION TO CLUSTER ANALYSIS , 1973 .

[17]  Philip Wolfe,et al.  Validation of subgradient optimization , 1974, Math. Program..

[18]  L. Schrage Implicit representation of variable upper bounds in linear programming , 1975 .

[19]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[20]  P. Hansen,et al.  Complete-Link Cluster Analysis by Graph Coloring , 1978 .

[21]  George L. Nemhauser,et al.  Note--On "Location of Bank Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms" , 1979 .