Orthogonality-Promoting Distance Metric Learning: Convex Relaxation and Theoretical Analysis

Distance metric learning (DML), which learns a distance metric from labeled "similar" and "dissimilar" data pairs, is widely utilized. Recently, several works investigate orthogonality-promoting regularization (OPR), which encourages the projection vectors in DML to be close to being orthogonal, to achieve three effects: (1) high balancedness -- achieving comparable performance on both frequent and infrequent classes; (2) high compactness -- using a small number of projection vectors to achieve a "good" metric; (3) good generalizability -- alleviating overfitting to training data. While showing promising results, these approaches suffer three problems. First, they involve solving non-convex optimization problems where achieving the global optimal is NP-hard. Second, it lacks a theoretical understanding why OPR can lead to balancedness. Third, the current generalization error analysis of OPR is not directly on the regularizer. In this paper, we address these three issues by (1) seeking convex relaxations of the original nonconvex problems so that the global optimal is guaranteed to be achievable; (2) providing a formal analysis on OPR's capability of promoting balancedness; (3) providing a theoretical analysis that directly reveals the relationship between OPR and generalization performance. Experiments on various datasets demonstrate that our convex methods are more effective in promoting balancedness, compactness, and generalization, and are computationally more efficient, compared with the nonconvex methods.

[1]  Kaizhu Huang,et al.  Sparse Metric Learning via Smooth Optimization , 2009, NIPS.

[2]  Pengtao Xie,et al.  Diversifying Restricted Boltzmann Machine for Document Modeling , 2015, KDD.

[3]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[4]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[5]  Inderjit S. Dhillon,et al.  Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[6]  Jeff A. Bilmes,et al.  Ratio semi-definite classifiers , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Qi Tian,et al.  Batch-Orthogonal Locality-Sensitive Hashing for Angular Similarity , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Rongrong Ji,et al.  Low-Rank Similarity Metric Learning in High Dimensions , 2015, AAAI.

[9]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[10]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[12]  Grigorios Tsoumakas,et al.  Focused Ensemble Selection: A Diversity-Based Method for Greedy Ensemble Selection , 2008, ECAI.

[13]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[14]  Ross B. Girshick,et al.  Reducing Overfitting in Deep Networks by Decorrelating Representations , 2015, ICLR.

[15]  Jian Pei,et al.  Distance metric learning using dropout: a structured regularization approach , 2014, KDD.

[16]  Miguel Á. Carreira-Perpiñán,et al.  An ensemble diversity approach to supervised binary hashing , 2016, NIPS.

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[19]  AMIN JALALI,et al.  Variational Gram Functions: Convex Analysis and Optimization , 2015, SIAM J. Optim..

[20]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[21]  Jian Sun,et al.  Graph Cuts for Supervised Binary Coding , 2014, ECCV.

[22]  Jian Sun,et al.  Optimized Product Quantization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[24]  Yong Chen,et al.  Diversity Regularized Latent Semantic Match for Hashing , 2017, Neurocomputing.

[25]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[26]  Pengtao Xie,et al.  Near-Orthogonality Regularization in Kernel Methods , 2017, UAI.

[27]  Wu-Jun Li,et al.  Isotropic Hashing , 2012, NIPS.

[28]  Miguel Á. Carreira-Perpiñán,et al.  Learning Independent, Diverse Binary Hash Functions: Pruning and Locality , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[29]  Amaury Habrard,et al.  Robustness and generalization for metric learning , 2012, Neurocomputing.

[30]  Lawrence O. Hall,et al.  Ensemble diversity measures and their application to thinning , 2004, Inf. Fusion.

[31]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Peng Li,et al.  Distance Metric Learning with Eigenvalue Optimization , 2012, J. Mach. Learn. Res..

[33]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[34]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Wei Liu,et al.  Output Regularized Metric Learning with Side Information , 2008, ECCV.

[36]  Suvrit Sra,et al.  Geometric Mean Metric Learning , 2016, ICML.

[37]  Gang Niu,et al.  Information-Theoretic Semi-Supervised Metric Learning via Entropy Regularization , 2012, Neural Computation.

[38]  Ryan P. Adams,et al.  Priors for Diversity in Generative Latent Variable Models , 2012, NIPS.

[39]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[40]  Thierry Paul,et al.  Quantum computation and quantum information , 2007, Mathematical Structures in Computer Science.

[41]  Brendan McCane,et al.  NOKMeans: Non-Orthogonal K-means Hashing , 2014, ACCV.

[42]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[43]  Yang Yu,et al.  Diversity Regularized Machine , 2011, IJCAI.

[44]  Tat-Seng Chua,et al.  An efficient sparse metric learning in high-dimensional space via l1-penalized log-determinant regularization , 2009, ICML '09.

[45]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[46]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[47]  Gunnar Rätsch,et al.  Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection , 2004, J. Mach. Learn. Res..

[48]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[49]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[50]  Fei Wang,et al.  Survey on distance metric learning and dimensionality reduction in data mining , 2014, Data Mining and Knowledge Discovery.

[51]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[52]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[53]  Kristin Branson,et al.  Sample Complexity of Learning Mahalanobis Distance Metrics , 2015, NIPS.

[54]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[55]  R. Gorenflo,et al.  Analytical properties and applications of the Wright function , 2007, math-ph/0701069.

[56]  Lin Xiao,et al.  Hierarchical Classification via Orthogonal Transfer , 2011, ICML.

[57]  Horst Bischof,et al.  Large scale metric learning from equivalence constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Li-Rong Dai,et al.  Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[59]  Wenbin Yao,et al.  Diversity regularized metric learning for person re-identification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[60]  Wenwu Zhu,et al.  Deep Multimodal Hashing with Orthogonal Regularization , 2015, IJCAI.

[61]  Davide Anguita,et al.  Human Activity Recognition on Smartphones Using a Multiclass Hardware-Friendly Support Vector Machine , 2012, IWAAL.

[62]  Pengtao Xie Learning Compact and Effective Distance Metrics with Diversity Regularization , 2015, ECML/PKDD.