Training Deep Neural Networks via Direct Loss Minimization

Supervised training of deep neural nets typically relies on minimizing cross-entropy. However, in many domains, we are interested in performing well on metrics specific to the application. In this paper we propose a direct loss minimization approach to train deep neural networks, which provably minimizes the application-specific loss function. This is often non-trivial, since these functions are neither smooth nor decomposable and thus are not amenable to optimization with standard gradient-based methods. We demonstrate the effectiveness of our approach in the context of maximizing average precision for ranking problems. Towards this goal, we develop a novel dynamic programming algorithm that can efficiently compute the weight updates. Our approach proves superior to a variety of baselines in the context of action classification and object detection, especially in the presence of label noise.

[1]  David A. McAllester,et al.  Generalization bounds and consistency for latent-structural probit and ramp loss , 2011, MLSLP.

[2]  Maksims Volkovs,et al.  BoltzRank: learning to maximize expected ranking gain , 2009, ICML '09.

[3]  C. V. Jawahar,et al.  Efficient Optimization for Average Precision SVM , 2014, NIPS.

[4]  Quoc V. Le,et al.  Abstract , 2003, Appetite.

[5]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[6]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[7]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[8]  Yann LeCun,et al.  Loss Functions for Discriminative Training of Energy-Based Models , 2005, AISTATS.

[9]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[10]  Alan L. Yuille,et al.  Learning Deep Structured Models , 2014, ICML.

[11]  Lawrence K. Saul,et al.  Matrix updates for perceptron training of continuous density hidden Markov models , 2009, ICML '09.

[12]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[13]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[14]  David A. McAllester,et al.  Direct Error Rate Minimization of Hidden Markov Models , 2011, INTERSPEECH.

[15]  Tamir Hazan,et al.  Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[16]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[17]  Marc Toussaint,et al.  Direct Loss Minimization Inverse Optimal Control , 2015, Robotics: Science and Systems.

[18]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Richard S. Zemel,et al.  Structured Output Learning with High Order Loss Functions , 2012, AISTATS.