A Large-Dimensional Analysis of Symmetric SNE
暂无分享,去创建一个
Stochastic Neighbour Embedding methods (SNE, t-SNE) aim at finding a faithful low-dimensional representation of a high-dimensional dataset. Despite their popularity, being solution to a non-convex optimization, the behavior of these tools is not well understood. This work provides first answers by leveraging a large dimensional statistics approach, where the number n and dimension p of the large-dimensional data are of the same magnitude. We derive and study the canonical equation verified by the critical points of this non-convex optimization problem. The study notably reveals that, in a simple setup, the achievable SNE solutions correspond to a subset of those critical points. In particular, when the clusters composing the dataset are balanced in size, these solutions are symmetrical and assume closed-form expressions.As a major conclusion, the analysis rigorously proves a long-standing heuristic statement on the "proper normalization" of the symmetric SNE: out of two natural normalization choices, only the claimed proper one leads to non-trivial solutions.
[1] Geoffrey E. Hinton,et al. Stochastic Neighbor Embedding , 2002, NIPS.
[2] R. Couillet,et al. Kernel spectral clustering of large dimensional data , 2015, 1510.03547.
[3] Stefan Steinerberger,et al. Clustering with t-SNE, provably , 2017, SIAM J. Math. Data Sci..
[4] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[5] Pravesh Kothari,et al. An Analysis of the t-SNE Algorithm for Data Visualization , 2018, COLT.