SPSA Method Using Diagonalized Hessian Estimate

Simultaneous perturbation stochastic approximation (SPSA) and its adaptive version (ASPSA) are two commonly used methods in stochastic optimization problems, analagous to the gradient descent and Newton-Raphson methods in deterministic optimization. However, both methods have potential shortcomings. SPSA, as a first-order-type method, has typically rapid improvement in the early stages, but slow convergence at the later stages of the search process. ASPSA, as a second-order method, has typically faster convergence in the later stages, but a more numerically challenging implementation. We propose a method (diagSPSA or diagSG) using only diagonal elements of Hessian estimates to re-scale gradients when updating parameters in each iteration. This method uses part of the information of Hessian matrices and has low computational cost. We prove the convergence performance and asymptotic behaviors of diagSPSA. In addition, this paper presents a theoretical efficiency analysis, comparing the new method diagSG against stochastic gradient method (SG). We also make numerical tests for the efficiency of both diagSPSA and diagSG.

[1]  Alain Vande Wouwer,et al.  Application of stochastic approximation techniques in neural modelling and control , 2003, Int. J. Syst. Sci..

[2]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[3]  Shalabh Bhatnagar,et al.  Simultaneous Perturbation Newton Algorithms for Simulation Optimization , 2015, J. Optim. Theory Appl..

[4]  Tim Hesterberg,et al.  Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[5]  Te-son Kuo,et al.  Trace bounds on the solution of the algebraic matrix Riccati and Lyapunov equation , 1986 .

[6]  Shalabh Bhatnagar,et al.  Adaptive System Optimization Using Random Directions Stochastic Approximation , 2015, IEEE Transactions on Automatic Control.

[7]  Karla Hern'andez Cuevas Cyclic Stochastic Optimization: Generalizations, Convergence, and Applications in Multi-Agent Systems , 2017, 1707.06700.

[8]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  V. Fabian On Asymptotic Normality in Stochastic Approximation , 1968 .

[11]  James C. Spall,et al.  Feedback and Weighting Mechanisms for Improving Jacobian Estimates in the Adaptive Simultaneous Perturbation Algorithm , 2007, IEEE Transactions on Automatic Control.

[12]  James C. Spall,et al.  Adaptive stochastic approximation by the simultaneous perturbation method , 2000, IEEE Trans. Autom. Control..

[13]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[14]  Jingyi Zhu,et al.  Efficient Implementation of Second-Order Stochastic Approximation Algorithms in High-Dimensional Problems , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[15]  C. Z. Wei Multivariate Adaptive Stochastic Approximation , 1987 .

[16]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[17]  James C. Spall,et al.  Generalization of a Result of Fabian on the Asymptotic Normality of Stochastic Approximation , 2019, Autom..

[18]  L. A. Prashanth,et al.  Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods , 2012 .