论文信息 - A more globally accurate dimensionality reduction method using triplets

A more globally accurate dimensionality reduction method using triplets

We first show that the commonly used dimensionality reduction (DR) methods such as t-SNE and LargeVis poorly capture the global structure of the data in the low dimensional embedding. We show this via a number of tests for the DR methods that can be easily applied by any practitioner to the dataset at hand. Surprisingly enough, t-SNE performs the best w.r.t. the commonly used measures that reward the local neighborhood accuracy such as precision-recall while having the worst performance in our tests for global structure. We then contrast the performance of these two DR method against our new method called TriMap. The main idea behind TriMap is to capture higher orders of structure with triplet information (instead of pairwise information used by t-SNE and LargeVis), and to minimize a robust loss function for satisfying the chosen triplets. We provide compelling experimental evidence on large natural datasets for the clear advantage of the TriMap DR results. As LargeVis, TriMap scales linearly with the number of data points.

Manfred K. Warmuth | Ehsan Amid | E. Amid

[1] Leland McInnes,et al. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[2] Geoffrey E. Hinton,et al. Stochastic Neighbor Embedding , 2002, NIPS.

[3] Eric O. Postma,et al. Dimensionality Reduction: A Comparative Review , 2008 .

[4] Pravesh Kothari,et al. An Analysis of the t-SNE Algorithm for Data Visualization , 2018, COLT.

[5] Jingzhou Liu,et al. Visualizing Large-scale and High-dimensional Data , 2016, WWW.

[6] J. Naudts. Deformed exponentials and logarithms in generalized thermostatistics , 2002, cond-mat/0203489.

[7] Zenglin Xu,et al. Heavy-Tailed Symmetric Stochastic Neighbor Embedding , 2009, NIPS.

[8] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[9] Laurens van der Maaten,et al. Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[10] Pietro Perona,et al. Self-Tuning Spectral Clustering , 2004, NIPS.

[11] Jarkko Venna,et al. Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization , 2010, J. Mach. Learn. Res..