Convex optimization with an interpolation-based projection and its application to deep learning

Convex optimizers have known many applications as differentiable layers within deep neural architectures. One application of these convex layers is to project points into a convex set. However, both forward and backward passes of these convex layers are significantly more expensive to compute than those of a typical neural network. We investigate in this paper whether an inexact, but cheaper projection, can drive a descent algorithm to an optimum. Specifically, we propose an interpolation-based projection that is computationally cheap and easy to compute given a convex, domain defining, function. We then propose an optimization algorithm that follows the gradient of the composition of the objective and the projection and prove its convergence for linear objectives and arbitrary convex and Lipschitz domain defining inequality constraints. In addition to the theoretical contributions, we demonstrate empirically the practical interest of the interpolation projection when used in conjunction with neural networks in a reinforcement learning and a supervised learning setting.

[1]  Stephen P. Boyd,et al.  Disciplined Convex Programming , 2006 .

[2]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[3]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[4]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[5]  Naum Zuselevich Shor,et al.  Minimization Methods for Non-Differentiable Functions , 1985, Springer Series in Computational Mathematics.

[6]  Franziska Wulf,et al.  Minimization Methods For Non Differentiable Functions , 2016 .

[7]  Yangyang Xu,et al.  Primal-Dual Stochastic Gradient Method for Convex Programs with Many Functional Constraints , 2018, SIAM J. Optim..

[8]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, STOC '84.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[11]  Luca Bertinetto,et al.  Meta-learning with differentiable closed-form solvers , 2018, ICLR.

[12]  Stephen P. Boyd,et al.  Differentiable Convex Optimization Layers , 2019, NeurIPS.

[13]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[14]  Franz Rendl,et al.  Regularization Methods for Semidefinite Programming , 2009, SIAM J. Optim..

[15]  Ronald Fedkiw,et al.  Coercing Machine Learning to Output Physically Accurate Results , 2020, J. Comput. Phys..

[16]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[17]  M. Patriksson,et al.  Conditional subgradient optimization -- Theory and applications , 1996 .

[18]  Jan Peters,et al.  Projections for Approximate Policy Iteration Algorithms , 2019, ICML.

[19]  Subhransu Maji,et al.  Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Dimitri P. Bertsekas,et al.  Convex Optimization Algorithms , 2015 .

[21]  Bruno Scherrer,et al.  Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.

[22]  J. B. Rosen The gradient projection method for nonlinear programming: Part II , 1961 .

[23]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[24]  Zhiqiang Zhou,et al.  Algorithms for stochastic optimization with function or expectation constraints , 2020, Comput. Optim. Appl..

[25]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[26]  Anne Auger,et al.  COCO: a platform for comparing continuous optimizers in a black-box setting , 2016, Optim. Methods Softw..

[27]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[28]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[29]  Patrick L. Combettes,et al.  Convex set theoretic image recovery by extrapolated iterations of parallel subgradient projections , 1997, IEEE Trans. Image Process..

[30]  J. B. Rosen The Gradient Projection Method for Nonlinear Programming. Part I. Linear Constraints , 1960 .

[31]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[32]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[33]  D. Bertsekas Approximate policy iteration: a survey and some new methods , 2011 .

[34]  Byron Boots,et al.  Differentiable MPC for End-to-end Planning and Control , 2018, NeurIPS.

[35]  N. Maculan,et al.  Global optimization : from theory to implementation , 2006 .

[36]  Joshua B. Tenenbaum,et al.  End-to-End Differentiable Physics for Learning and Control , 2018, NeurIPS.

[37]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[38]  Stephen P. Boyd,et al.  Fitting a Kalman Smoother to Data , 2019, 2020 American Control Conference (ACC).

[39]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[40]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[41]  Sham M. Kakade,et al.  Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.