论文信息 - Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning

Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning

Neighborhood Components Analysis (NCA) is a popular method for learning a distance metric to be used within a k-nearest neighbors (kNN) classifier. A key assumption built into the model is that each point stochastically selects a single neighbor, which makes the model well-justified only for kNN with k = 1. However, kNN classifiers with k > 1 are more robust and usually preferred in practice. Here we present kNCA, which generalizes NCA by learning distance metrics that are appropriate for kNN with arbitrary k. The main technical contribution is showing how to efficiently compute and optimize the expected accuracy of a kNN classifier. We apply similar ideas in an unsupervised setting to yield kSNE and kt-SNE, generalizations of Stochastic Neighbor Embedding (SNE, t-SNE) that operate on neighborhoods of size k, which provide an axis of control over embeddings that allow for more homogeneous and interpretable regions. Empirically, we show that kNCA often improves classification accuracy over state of the art methods, produces qualitative differences in the embeddings as k is varied, and is more robust with respect to label noise.

[1] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[2] Mukund Balasubramanian,et al. The isomap algorithm and topological stability. , 2002, Science.

[3] Geoffrey E. Hinton,et al. Stochastic Neighbor Embedding , 2002, NIPS.

[4] Geoffrey E. Hinton,et al. Neighbourhood Components Analysis , 2004, NIPS.

[5] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[6] Amir Globerson,et al. Metric Learning by Collapsing Classes , 2005, NIPS.

[7] Rong Jin,et al. Distance Metric Learning: A Comprehensive Survey , 2006 .

[8] Geoffrey E. Hinton,et al. Visualizing Similarity Data with a Mixture of Maps , 2007, AISTATS.

[9] Geoffrey E. Hinton,et al. Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[10] Inderjit S. Dhillon,et al. Information-theoretic metric learning , 2006, ICML '07.

[11] Eric O. Postma,et al. Dimensionality Reduction: A Comparative Review , 2008 .

[12] Lei Wang,et al. Positive Semidefinite Metric Learning with Boosting , 2009, NIPS.

[13] Laurens van der Maaten,et al. Learning a Parametric Embedding by Preserving Local Structure , 2009, AISTATS.

[14] Ran Xu,et al. Random forests for metric learning with implicit pairwise position dependence , 2012, KDD.

[15] Brendan J. Frey,et al. Fast Exact Inference for Recursive Cardinality Models , 2012, UAI.