论文信息 - Heavy-Tailed Symmetric Stochastic Neighbor Embedding

Heavy-Tailed Symmetric Stochastic Neighbor Embedding

Stochastic Neighbor Embedding (SNE) has shown to be quite promising for data visualization. Currently, the most popular implementation, t-SNE, is restricted to a particular Student t-distribution as its embedding distribution. Moreover, it uses a gradient descent algorithm that may require users to tune parameters such as the learning step size, momentum, etc., in finding its optimum. In this paper, we propose the Heavy-tailed Symmetric Stochastic Neighbor Embedding (HSSNE) method, which is a generalization of the t-SNE to accommodate various heavy-tailed embedding similarity functions. With this generalization, we are presented with two difficulties. The first is how to select the best embedding similarity among all heavy-tailed functions and the second is how to optimize the objective function once the heavy-tailed function has been selected. Our contributions then are: (1) we point out that various heavy-tailed embedding similarities can be characterized by their negative score functions. Based on this finding, we present a parameterized subset of similarity functions for choosing the best tail-heaviness for HSSNE; (2) we present a fixed-point optimization algorithm that can be applied to all heavy-tailed functions and does not require the user to set any parameters; and (3) we present two empirical studies, one for unsupervised visualization showing that our optimization algorithm runs as fast and as good as the best known t-SNE implementation and the other for semi-supervised visualization showing quantitative superiority using the homogeneity measure as well as qualitative advantage in cluster separation over t-SNE.

[1] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[2] Geoffrey J. McLachlan,et al. Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[3] Mikhail Belkin,et al. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[4] Geoffrey E. Hinton,et al. Stochastic Neighbor Embedding , 2002, NIPS.

[5] Dorin Comaniciu,et al. Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6] Geoffrey E. Hinton,et al. Visualizing Similarity Data with a Mixture of Maps , 2007, AISTATS.

[7] Tony R. Martinez,et al. Iterative Non-linear Dimensionality Reduction with Manifold Sculpting , 2007, NIPS.

[8] Miguel Á. Carreira-Perpiñán,et al. Gaussian Mean-Shift Is an EM Algorithm , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[10] Johan A. K. Suykens,et al. Data Visualization and Dimensionality Reduction Using Kernel Maps With a Reference Point , 2008, IEEE Transactions on Neural Networks.