Optimizing cluster structures with inner product induced norm based dissimilarity measures: Theoretical development and convergence analysis

Dissimilarity measures play a key role in exploring the inherent cluster structure of the data for any partitional clustering algorithm. Commonly used dissimilarity functions for clustering purpose are so far confined to the Euclidean, exponential and Mahalanobish distances. In this article we develop generalized algorithms to solve the partitional clustering problems formulated with a general class of Inner Product Induced Norm (IPIN) based dissimilarity measures. We provide an in-depth mathematical analysis of the underlying optimization framework and analytically address the issue of existence of a solution and its uniqueness. In absence of a closed form solution, we develop a fast stochastic gradient descent algorithm and the Minimization by Incremental Surrogate Optimization (MISO) algorithm (in case of constrained optimization) with exponential convergence rate to obtain the solution. We carry out a convergence analysis of the fuzzy and k-means clustering algorithms with the IPIN based dissimilarity measures and also establish how these algorithms guarantee convergence to a stationary point. In addition, we investigate the nature of the stationary point. Novelty of the paper lies in the introduction of a generalized class of divergence measures, development of fuzzy and k-means clustering algorithms with the general class of divergence measures and a thorough convergence analysis of the developed algorithms.

[1]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[2]  S. Miyamoto,et al.  Algorithms for L1 and Lp Fuzzy c-Means and Their Convergence , 1998 .

[3]  W. T. Tucker,et al.  Convergence theory for fuzzy c-means: Counterexamples and repairs , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[5]  H. Robbins A Stochastic Approximation Method , 1951 .

[6]  Marc Teboulle,et al.  A Unified Continuous Optimization Framework for Center-Based Clustering Methods , 2007, J. Mach. Learn. Res..

[7]  Isak Gath,et al.  Fuzzy clustering for the estimation of the parameters of the components of mixtures of normal distributions , 1989, Pattern Recognit. Lett..

[8]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[9]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[10]  Tian-Wei Sheu,et al.  A New Fuzzy Possibility Clustering Algorithms Based on Unsupervised Mahalanobis Distances , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[11]  Jeng-Ming Yih,et al.  Fuzzy C-means algorithm based on standard mahalanobis distances , 2009 .

[12]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[13]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[14]  Hsiang-Chuan Liu,et al.  Fuzzy C-Means Algorithm Based on Common Mahalanobis Distances , 2009, J. Multiple Valued Log. Soft Comput..

[15]  Miin-Shen Yang,et al.  Exponential-Distance Weighted K-Means Algorithm with Spatial Constraints for Color Image Segmentation , 2011, 2011 International Conference on Multimedia and Signal Processing.

[16]  Swagatam Das,et al.  Automated feature weighting in clustering with separable distances and inner product induced norms - A theoretical generalization , 2015, Pattern Recognit. Lett..

[17]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[18]  Jongwoo Kim,et al.  A note on the Gustafson-Kessel and adaptive fuzzy clustering algorithms , 1999, IEEE Trans. Fuzzy Syst..

[19]  E. M. L. Beale,et al.  Nonlinear Programming: A Unified Approach. , 1970 .

[20]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[21]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[22]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[23]  Hui Xiong,et al.  A Generalization of Distance Functions for Fuzzy $c$ -Means Clustering With Centroids of Arithmetic Means , 2012, IEEE Transactions on Fuzzy Systems.

[24]  Igor Yu. Solodov,et al.  Ultrasonics of non-linear contacts: propagation, reflection and NDE-applications , 1998 .

[25]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[26]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[27]  Tong Zhang,et al.  Proximal Stochastic Dual Coordinate Ascent , 2012, ArXiv.

[28]  H. Kesten Accelerated Stochastic Approximation , 1958 .

[29]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[30]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[31]  James C. Bezdek,et al.  A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Hsiang-Chuan Liu,et al.  Fuzzy C-Mean Algorithm Based on Mahalanobis Distances and Better Initial Values , 2007 .

[33]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .