A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality

Stochastic mirror descent (SMD) is a fairly new family of algorithms that has recently found a wide range of applications in optimization, machine learning, and control. It can be considered a generalization of the classical stochastic gradient algorithm (SGD), where instead of updating the weight vector along the negative direction of the stochastic gradient, the update is performed in a "mirror domain" defined by the gradient of a (strictly convex) potential function. This potential function, and the mirror domain it yields, provides considerable flexibility in the algorithm compared to SGD. While many properties of SMD have already been obtained in the literature, in this paper we exhibit a new interpretation of SMD, namely that it is a risk-sensitive optimal estimator when the unknown weight vector and additive noise are non-Gaussian and belong to the exponential family of distributions. The analysis also suggests a modified version of SMD, which we refer to as symmetric SMD (SSMD). The proofs rely on some simple properties of Bregman divergence, which allow us to extend results from quadratics and Gaussians to certain convex functions and exponential families in a rather seamless way.

[1]  Rhodes,et al.  Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games , 1973 .

[2]  Angelia Nedic,et al.  On Stochastic Subgradient Mirror-Descent Algorithm with Weighted Averaging , 2013, SIAM J. Optim..

[3]  Babak Hassibi,et al.  A Characterization of Stochastic Mirror Descent Algorithms and Their Convergence Properties , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[5]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[6]  P. Whittle Risk-Sensitive Optimal Control , 1990 .

[7]  Babak Hassibi,et al.  Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization , 2018, ICLR.

[8]  T. Kailath,et al.  Indefinite-quadratic estimation and control: a unified approach to H 2 and H ∞ theories , 1999 .

[9]  D. Jacobson,et al.  Optimization of stochastic linear systems with additive measurement and process noise using exponential performance criteria , 1974 .

[10]  Thomas Kailath,et al.  Hoo Optimality Criteria for LMS and Backpropagation , 1993, NIPS 1993.

[11]  H. Robbins A Stochastic Approximation Method , 1951 .

[12]  Dale Schuurmans,et al.  General Convergence Results for Linear Discriminant Updates , 1997, COLT '97.

[13]  Stephen P. Boyd,et al.  Stochastic Mirror Descent in Variationally Coherent Optimization Problems , 2017, NIPS.

[14]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[15]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[16]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[17]  Nathan Srebro,et al.  Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.

[18]  Maxim Raginsky,et al.  Continuous-time stochastic Mirror Descent on a network: Variance reduction, consensus, convergence , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[19]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[20]  Nicolò Cesa-Bianchi,et al.  Mirror Descent Meets Fixed Share (and feels no regret) , 2012, NIPS.

[21]  J. Speyer,et al.  Optimal stochastic estimation with exponential cost criteria , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[22]  Ali H. Sayed,et al.  H∞ optimality of the LMS algorithm , 1996, IEEE Trans. Signal Process..