Dimension Free Generalization Bounds for Non Linear Metric Learning

In this work we study generalization guarantees for the metric learning problem, where the metric is induced by a neural network type embedding of the data. Specifically, we provide uniform generalization bounds for two regimes – the sparse regime, and a non-sparse regime which we term bounded amplification. The sparse regime bounds correspond to situations where `1-type norms of the parameters are small. Similarly to the situation in classification, solutions satisfying such bounds can be obtained by an appropriate regularization of the problem. On the other hand, unregularized SGD optimization of a metric learning loss typically does not produce sparse solutions. We show that despite this lack of sparsity, by relying on a different, new property of the solutions, it is still possible to provide dimension free generalization guarantees. Consequently, these bounds can explain generalization in non sparse real experimental situations. We illustrate the studied phenomena on the MNIST and 20newsgroups datasets.

[1]  Jan Vondrák,et al.  High probability generalization bounds for uniformly stable algorithms with nearly optimal rate , 2019, COLT.

[2]  Amaury Habrard,et al.  Robustness and generalization for metric learning , 2012, Neurocomputing.

[3]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[5]  Joelle Pineau,et al.  Multitask Metric Learning: Theory and Algorithm , 2019, AISTATS.

[6]  Davide Chicco,et al.  Siamese Neural Networks: An Overview , 2021, Artificial Neural Networks, 3rd Edition.

[7]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[8]  Tong Zhang,et al.  Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[9]  Andreas Christmann,et al.  On the robustness of regularized pairwise learning methods based on kernels , 2015, J. Complex..

[10]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[11]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[12]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[13]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[14]  Marius Kloft,et al.  Sharper Generalization Bounds for Pairwise Learning , 2020, NeurIPS.

[15]  Qiong Cao,et al.  Generalization bounds for metric and similarity learning , 2012, Machine Learning.

[16]  Ryan P. Adams,et al.  Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach , 2018, ICLR.

[17]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[18]  Kristin Branson,et al.  Sample Complexity of Learning Mahalanobis Distance Metrics , 2015, NIPS.

[19]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[20]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[21]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[22]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[23]  Olivier Bousquet,et al.  Sharper bounds for uniformly stable algorithms , 2019, COLT.