Simulation-based optimization of Markov reward processes
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[2] Peter W. Glynn,et al. Stochastic approximation for Monte Carlo optimization , 1986, WSC '86.
[3] M. Kurano. LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES , 1987 .
[4] Peter W. Glynn,et al. Likelilood ratio gradient estimation: an overview , 1987, WSC '87.
[5] M. D. Wilkinson,et al. Management science , 1989, British Dental Journal.
[6] Donald L. Iglehart,et al. Importance sampling for stochastic simulations , 1989 .
[7] P. L’Ecuyer,et al. A Unified View of the IPA, SF, and LR Gradient Estimation Techniques , 1990 .
[8] Paul Glasserman,et al. Gradient Estimation Via Perturbation Analysis , 1990 .
[9] Peter W. Glynn,et al. Gradient estimation for ratios , 1991, 1991 Winter Simulation Conference Proceedings..
[10] Paul Glasserman,et al. Gradient estimation for regenerative processes , 1992, WSC '92.
[11] Michael C. Fu,et al. Smoothed perturbation analysis derivative estimation for Markov chains , 1994, Oper. Res. Lett..
[12] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[13] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[14] E. Chong,et al. Stochastic optimization of regenerative systems using infinitesimal perturbation analysis , 1994, IEEE Trans. Autom. Control..
[15] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[16] V. Tresp,et al. Missing and noisy data in nonlinear time-series prediction , 1995, Proceedings of 1995 IEEE Workshop on Neural Networks for Signal Processing.
[17] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[18] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[19] B. Delyon. General results on the convergence of stochastic algorithms , 1996, IEEE Trans. Autom. Control..
[20] V. Borkar. Stochastic approximation with two time scales , 1997 .
[21] D. Bertsekas. Gradient convergence in gradient methods , 1997 .
[22] Xi-Ren Cao,et al. Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..
[23] Xi-Ren Cao,et al. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization , 1998, IEEE Trans. Control. Syst. Technol..
[24] Peter Marbach,et al. Simulation-based optimization of Markov decision processes , 1998 .
[25] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[26] J. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes: implementation issues , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).
[27] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[28] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[29] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[30] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[31] Tamer Basar,et al. Analysis of Recursive Stochastic Algorithms , 2001 .
[32] P. Glynn. LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .
[33] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .