Low-dimensional Data Embedding via Robust Ranking

We describe a new method called t-ETE for finding a low-dimensional embedding of a set of objects in Euclidean space. We formulate the embedding problem as a joint ranking problem over a set of triplets, where each triplet captures the relative similarities between three objects in the set. By exploiting recent advances in robust ranking, t-ETE produces high-quality embeddings even in the presence of a significant amount of noise and better preserves local scale than known methods, such as t-STE and t-SNE. In particular, our method produces significantly better results than t-SNE on signature datasets while also being faster to compute.

[1]  Daniel P. W. Ellis,et al.  The Quest for Ground Truth in Musical Artist Similarity , 2002, ISMIR.

[2]  J. Naudts Deformed exponentials and logarithms in generalized thermostatistics , 2002, cond-mat/0203489.

[3]  Estimators, escort probabilities, and phi-exponential families in statistical physics , 2004, math-ph/0402005.

[4]  J. Naudts Generalized thermostatistics based on deformed exponential and logarithmic functions , 2003, cond-mat/0311438.

[5]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[6]  David J. Kriegman,et al.  Generalized Non-metric Multidimensional Scaling , 2007, AISTATS.

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  S. V. N. Vishwanathan,et al.  T-logistic Regression , 2010, NIPS.

[9]  Timothy D. Sears Generalized Maximum Entropy, Convexity and Machine Learning , 2010 .

[10]  Adam Tauman Kalai,et al.  Adaptively Learning the Crowd Kernel , 2011, ICML.

[11]  Xiang Zhang,et al.  Metric Learning from Relative Comparisons by Minimizing Squared Residual , 2012, 2012 IEEE 12th International Conference on Data Mining.

[12]  Kilian Q. Weinberger,et al.  Stochastic triplet embedding , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[13]  Nan Ding Statistical machine learning in the t-exponential family of distributions , 2013 .

[14]  S. V. N. Vishwanathan,et al.  Ranking via Robust Binary Classification , 2014, NIPS.

[15]  Serge J. Belongie,et al.  Cost-Effective HITs for Relative Similarity Comparisons , 2014, HCOMP.

[16]  Ehsan Amid,et al.  Multiview Triplet Embedding: Learning Attributes in Multiple Maps , 2015, ICML.

[17]  Aristides Gionis,et al.  A Kernel-Learning Approach to Semi-supervised Clustering with Relative Distance Comparisons , 2015, ECML/PKDD.