An interior-point stochastic approximation method and an L1-regularized delta rule

The stochastic approximation method is behind the solution to many important, actively-studied problems in machine learning. Despite its far-reaching application, there is almost no work on applying stochastic approximation to learning problems with general constraints. The reason for this, we hypothesize, is that no robust, widely-applicable stochastic approximation method exists for handling such problems. We propose that interior-point methods are a natural solution. We establish the stability of a stochastic interior-point approximation method both analytically and empirically, and demonstrate its utility by deriving an on-line learning algorithm that also performs feature selection via L1 regularization.

[1]  H. Robbins A Stochastic Approximation Method , 1951 .

[2]  Margaret H. Wright,et al.  Some properties of the Hessian of the logarithmic barrier function , 1994, Math. Program..

[3]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  James C. Spall,et al.  Adaptive stochastic approximation by the simultaneous perturbation method , 2000, IEEE Trans. Autom. Control..

[6]  M. J. D. Powell,et al.  Nonlinear Programming—Sequential Unconstrained Minimization Techniques , 1969 .

[7]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[8]  B. T. Poljak Nonlinear programming methods in the presence of noise , 1978, Math. Program..

[9]  L. Trefethen,et al.  Numerical linear algebra , 1997 .

[10]  J. Spall,et al.  Stochastic optimization with inequality constraints using simultaneous perturbations and penalty functions , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[11]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[12]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[13]  Stephen J. Wright Effects of Finite-Precision Arithmetic on Interior-Point Methods for Nonlinear Programming , 2001, SIAM J. Optim..

[14]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[15]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[16]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[17]  J. Spall,et al.  Model-free control of nonlinear stochastic systems with discrete-time measurements , 1998, IEEE Trans. Autom. Control..

[18]  T. Tsuchiya,et al.  On the formulation and theory of the Newton interior-point method for nonlinear programming , 1996 .

[19]  Michael I. Jordan,et al.  Statistical software debugging , 2005 .

[20]  Gordon V. Cormack,et al.  Spam Corpus Creation for TREC , 2005, CEAS.

[21]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[22]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[23]  Anders Forsgren,et al.  Interior Methods for Nonlinear Optimization , 2002, SIAM Rev..

[24]  Gordon V. Cormack,et al.  Online supervised spam filter evaluation , 2007, TOIS.

[25]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[26]  Margaret H. Wright,et al.  Ill-Conditioning and Computational Error in Interior Methods for Nonlinear Programming , 1998, SIAM J. Optim..

[27]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..