On semi-supervised active clustering of stable instances with oracles

Abstract We consider the problem of semi-supervised active clustering under multiplicative perturbation stability with respect to the distance function. Stable instances have an optimal solution that does not change when the distances are perturbed. This captures the notion that the optimal solution is tolerant to measurement errors and uncertainty in the points. Semi-supervision allows us to have an oracle O which answers pairwise queries. We design efficient algorithms to solve problems of multiplicative perturbation stability for semi-supervised clustering by using an ideal as well as a noisy oracle model. We present theoretical performance guarantee of the algorithms.

[1]  Arya Mazumdar,et al.  Clustering with Noisy Queries , 2017, NIPS.

[2]  Shalev Ben-David,et al.  Data stability in clustering: A closer look , 2011, Theor. Comput. Sci..

[3]  Or Sheffet,et al.  Beyond Worst-Case Analysis in Privacy and Clustering: Exploiting Explicit and Implicit Assumptions , 2013 .

[4]  Maria-Florina Balcan,et al.  Clustering under Perturbation Resilience , 2011, SIAM J. Comput..

[5]  Maria-Florina Balcan,et al.  Symmetric and Asymmetric $k$-center Clustering under Stability , 2015, ArXiv.

[6]  Nathan Linial,et al.  Are Stable Instances Easy? , 2009, Combinatorics, Probability and Computing.

[7]  Maria-Florina Balcan,et al.  Local algorithms for interactive clustering , 2013, ICML.

[8]  Shai Ben-David,et al.  Representation Learning for Clustering: A Statistical Framework , 2015, UAI.

[9]  Shai Ben-David,et al.  Clusterability: A Theoretical Study , 2009, AISTATS.

[10]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[11]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[12]  R. Prim Shortest connection networks and some generalizations , 1957 .

[13]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[14]  Avrim Blum,et al.  Center-based clustering under perturbation stability , 2010, Inf. Process. Lett..

[15]  Shai Ben-David,et al.  Clustering with Same-Cluster Queries , 2016, NIPS.

[16]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[17]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[18]  Konstantin Makarychev,et al.  Algorithms for stable and perturbation-resilient problems , 2017, STOC.