Probabilistic Multilevel Clustering via Composite Transportation Distance

We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach.

[1]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[2]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[3]  Darina Dvinskikh,et al.  Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters , 2018, NeurIPS.

[4]  Michael I. Jordan,et al.  Revisiting k-means: New Algorithms via Bayesian Nonparametrics , 2011, ICML.

[5]  Shane T. Jensen,et al.  Nonparametric multi-level clustering of human epilepsy seizures , 2016 .

[6]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[7]  Dinh Q. Phung,et al.  Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts , 2014, ICML.

[8]  Rasitha Jayasekare,et al.  Modeling discrete stock price changes using a mixture of Poisson distributions , 2016 .

[9]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[10]  L. Hubert,et al.  Comparing partitions , 1985 .

[11]  Svetha Venkatesh,et al.  Scalable Nonparametric Bayesian Multilevel Clustering , 2016, UAI.

[12]  C. Villani Topics in Optimal Transportation , 2003 .

[13]  Jeffrey W. Miller,et al.  Mixture Models With a Prior on the Number of Components , 2015, Journal of the American Statistical Association.

[14]  Sebastian Tschiatschek,et al.  Introduction to Probabilistic Graphical Models , 2014 .

[15]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[16]  F. Bassetti,et al.  On minimum Kantorovich distance estimators , 2006 .

[17]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[18]  Justin Solomon,et al.  Parallel Streaming Wasserstein Barycenters , 2017, NIPS.

[19]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[20]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[21]  Justin Solomon,et al.  Stochastic Wasserstein Barycenters , 2018, ICML.

[22]  Yangqiu Song,et al.  On-line evolutionary exponential family mixture , 2009, IJCAI 2009.

[23]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[24]  A. Gelfand,et al.  The Nested Dirichlet Process , 2008 .

[25]  Pedro Larrañaga,et al.  An Introduction to Probabilistic Graphical Models , 2002, Estimation of Distribution Algorithms.

[26]  X. Nguyen Convergence of latent mixing measures in finite and infinite mixture models , 2011, 1109.3250.

[27]  David Pollard,et al.  Quantization and the method of k -means , 1982, IEEE Trans. Inf. Theory.

[28]  Dinh Q. Phung,et al.  Multilevel Clustering via Wasserstein Means , 2017, ICML.