Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice
暂无分享,去创建一个
[1] E. Nadaraya. On Estimating Regression , 1964 .
[2] G. S. Watson,et al. Smooth regression analysis , 1964 .
[3] J. Yackel. Limit theorems for semi-Markov processes , 1966 .
[4] C. J. Stone,et al. Consistent Nonparametric Regression , 1977 .
[5] K. Athreya,et al. Limit theorems for semi-Markov processes , 1978, Bulletin of the Australian Mathematical Society.
[6] Luc Devroye,et al. The uniform convergence of nearest neighbor regression function estimators and their application in optimization , 1978, IEEE Trans. Inf. Theory.
[7] C. J. Stone,et al. Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .
[8] Leo Breiman,et al. Classification and Regression Trees , 1984 .
[9] Harald Niederreiter,et al. Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.
[10] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[11] S. Hu. THE STRONG UNIFORM CONSISTENCY OF KERNEL DENSITY ESTIMATES FOR φ—MIXING SAMPLE , 1993 .
[12] M. K. Ghosh,et al. Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .
[13] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[14] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[15] John Rust. Using Randomization to Break the Curse of Dimensionality , 1997 .
[16] G. Lugosi,et al. On the Strong Universal Consistency of Nearest Neighbor Regression Function Estimates , 1994 .
[17] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[18] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[19] Geoffrey J. Gordon. Stable Fitted Reinforcement Learning , 1995, NIPS.
[20] Jianqing Fan. Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability 66 , 1996 .
[21] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.
[22] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[23] Sean P. Meyn. The policy iteration algorithm for average reward Markov decision processes with general state space , 1997, IEEE Trans. Autom. Control..
[24] Xi-Ren Cao,et al. Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..
[25] John Rust. A Comparison of Policy Iteration Methods for Solving Continuous-State, Infinite-Horizon Markovian Decision Problems Using Random, Quasi-Random, and Deterministic Discretizations , 1997 .
[26] H. Müller,et al. Local Polynomial Modeling and Its Applications , 1998 .
[27] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[28] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[29] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[30] Trevor J. Hastie,et al. Optimal Kernel Shapes for Local Linear Regression , 1999, NIPS.
[31] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[32] V. Borkar. A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL , 2000, Probability in the Engineering and Informational Sciences.
[33] H. Kushner. Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .
[34] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[35] P. Glynn,et al. Hoeffding's inequality for uniformly ergodic Markov chains , 2002 .