Online learning and optimization of Markov jump linear models

The problem of online learning and optimization of unknown Markov jump linear models is considered. A new online learning algorithm, referred to as Markovian simultaneous perturbations stochastic approximation (MSPSA), is proposed. It is shown that ν/ MSPSA achieves the minimax regret order of Θ(√T). Using the Van Trees inequality (stochastic Cramér-Rao bound), it is shown ν/ that Θ(√T) is the lowest regret order achievable. Simulation results show scenarios that MSPSA offers significant gain over the greedy certainty equivalent approaches.

[1]  Arnaud Doucet,et al.  Particle filters for state estimation of jump Markov linear systems , 2001, IEEE Trans. Signal Process..

[2]  Assaf J. Zeevi,et al.  Dynamic Pricing with an Unknown Demand Model: Asymptotically Optimal Semi-Myopic Policies , 2014, Oper. Res..

[3]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[4]  A. Zeevi,et al.  Non-Stationary Stochastic Optimization , 2014 .

[5]  T. Lai Asymptotically efficient adaptive control in stochastic regression models , 1986 .

[6]  Lang Tong,et al.  Retail pricing for stochastic demand with unknown parameters: An online machine learning approach , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7]  S. Boyd,et al.  Pricing and learning with uncertain demand , 2003 .

[8]  Vikram Krishnamurthy,et al.  Expectation maximization algorithms for MAP estimation of jump Markov linear systems , 1999, IEEE Trans. Signal Process..

[9]  T. W. Anderson,et al.  Some Experimental Results on the Statistical Properties of Least Squares Estimates in Control Problems , 1976 .

[10]  Josef Broder,et al.  Dynamic Pricing Under a General Parametric Choice Model , 2012, Oper. Res..

[11]  J. Spall Implementation of the simultaneous perturbation algorithm for stochastic optimization , 1998 .

[12]  R. Gill,et al.  Applications of the van Trees inequality : a Bayesian Cramr-Rao bound , 1995 .

[13]  D. Bertsimas,et al.  Dynamic Pricing ; A Learning Approach , 2001 .

[14]  H. Robbins,et al.  Iterated least squares in multiperiod control , 1982 .

[15]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[16]  Eric W. Cope,et al.  Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.

[17]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[18]  T. Lai,et al.  Asymptotically efficient self-tuning regulators , 1987 .

[19]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[20]  Yossi Aviv,et al.  A Partially Observed Markov Decision Process for Dynamic Pricing , 2005, Manag. Sci..

[21]  Gang George Yin,et al.  How does a stochastic optimization/approximation algorithm adapt to a randomly evolving optimum/root with jump Markov sample paths , 2009, Math. Program..

[22]  R. Agrawal The Continuum-Armed Bandit Problem , 1995 .

[23]  Omar Besbes,et al.  Online Companion: Non-stationary Stochastic Optimization , 2015 .

[24]  R. P. Marques,et al.  Discrete-Time Markov Jump Linear Systems , 2004, IEEE Transactions on Automatic Control.

[25]  H. Robbins,et al.  A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .

[26]  Assaf J. Zeevi,et al.  Chasing Demand: Learning and Earning in a Changing Environment , 2016, Math. Oper. Res..

[27]  Harry L. Van Trees,et al.  Detection, Estimation, and Modulation Theory, Part I , 1968 .

[28]  H. Robbins,et al.  Adaptive Design and Stochastic Approximation , 1979 .

[29]  Björn Wittenmark,et al.  On Self Tuning Regulators , 1973 .

[30]  Bert Zwart,et al.  Simultaneously Learning and Optimizing Using Controlled Variance Pricing , 2014, Manag. Sci..

[31]  Ronald J. Balvers,et al.  Actively Learning about Demand and the Dynamics of Price Adjustment , 1990 .