论文信息 - Online learning and optimization of Markov jump linear models

Online learning and optimization of Markov jump linear models

The problem of online learning and optimization of unknown Markov jump linear models is considered. A new online learning algorithm, referred to as Markovian simultaneous perturbations stochastic approximation (MSPSA), is proposed. It is shown that ν/ MSPSA achieves the minimax regret order of Θ(√T). Using the Van Trees inequality (stochastic Cramér-Rao bound), it is shown ν/ that Θ(√T) is the lowest regret order achievable. Simulation results show scenarios that MSPSA offers significant gain over the greedy certainty equivalent approaches.

[1] Arnaud Doucet,et al. Particle filters for state estimation of jump Markov linear systems , 2001, IEEE Trans. Signal Process..

[2] Assaf J. Zeevi,et al. Dynamic Pricing with an Unknown Demand Model: Asymptotically Optimal Semi-Myopic Policies , 2014, Oper. Res..

[3] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[4] A. Zeevi,et al. Non-Stationary Stochastic Optimization , 2014 .

[5] T. Lai. Asymptotically efficient adaptive control in stochastic regression models , 1986 .

[6] Lang Tong,et al. Retail pricing for stochastic demand with unknown parameters: An online machine learning approach , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7] S. Boyd,et al. Pricing and learning with uncertain demand , 2003 .

[8] Vikram Krishnamurthy,et al. Expectation maximization algorithms for MAP estimation of jump Markov linear systems , 1999, IEEE Trans. Signal Process..

[9] T. W. Anderson,et al. Some Experimental Results on the Statistical Properties of Least Squares Estimates in Control Problems , 1976 .

[10] Josef Broder,et al. Dynamic Pricing Under a General Parametric Choice Model , 2012, Oper. Res..

[11] J. Spall. Implementation of the simultaneous perturbation algorithm for stochastic optimization , 1998 .

[12] R. Gill,et al. Applications of the van Trees inequality : a Bayesian Cramr-Rao bound , 1995 .

[13] D. Bertsimas,et al. Dynamic Pricing ; A Learning Approach , 2001 .

[14] H. Robbins,et al. Iterated least squares in multiperiod control , 1982 .

[15] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[16] Eric W. Cope,et al. Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.

[17] Frank Thomson Leighton,et al. The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[18] T. Lai,et al. Asymptotically efficient self-tuning regulators , 1987 .

[19] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[20] Yossi Aviv,et al. A Partially Observed Markov Decision Process for Dynamic Pricing , 2005, Manag. Sci..

[21] Gang George Yin,et al. How does a stochastic optimization/approximation algorithm adapt to a randomly evolving optimum/root with jump Markov sample paths , 2009, Math. Program..

[22] R. Agrawal. The Continuum-Armed Bandit Problem , 1995 .

[23] Omar Besbes,et al. Online Companion: Non-stationary Stochastic Optimization , 2015 .

[24] R. P. Marques,et al. Discrete-Time Markov Jump Linear Systems , 2004, IEEE Transactions on Automatic Control.

[25] H. Robbins,et al. A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .

[26] Assaf J. Zeevi,et al. Chasing Demand: Learning and Earning in a Changing Environment , 2016, Math. Oper. Res..

[27] Harry L. Van Trees,et al. Detection, Estimation, and Modulation Theory, Part I , 1968 .

[28] H. Robbins,et al. Adaptive Design and Stochastic Approximation , 1979 .

[29] Björn Wittenmark,et al. On Self Tuning Regulators , 1973 .

[30] Bert Zwart,et al. Simultaneously Learning and Optimizing Using Controlled Variance Pricing , 2014, Manag. Sci..

[31] Ronald J. Balvers,et al. Actively Learning about Demand and the Dynamics of Price Adjustment , 1990 .