暂无分享,去创建一个
[1] S. Haykin. Kalman Filtering and Neural Networks , 2001 .
[2] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[3] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[4] Bin Wang,et al. A Kalman filter-based actor-critic learning approach , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).
[5] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[6] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[7] Liang Lin,et al. Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches , 2018, ArXiv.
[8] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[9] Leif H. Finkel,et al. A Neural Implementation of the Kalman Filter , 2009, NIPS.
[10] Petros G. Voulgaris,et al. On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..
[11] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[12] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[13] Lawrence Carin,et al. Learning Structural Weight Uncertainty for Sequential Decision-Making , 2017, AISTATS.
[14] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[15] James Vuckovic,et al. Kalman Gradient Descent: Adaptive Variance Reduction in Stochastic Optimization , 2018, ArXiv.
[16] Robert Fitch,et al. Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation , 2007, ICML '07.
[17] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[18] Matthieu Geist,et al. Sample Efficient On-Line Learning of Optimal Dialogue Policies with Kalman Temporal Differences , 2011, IJCAI.
[19] Jeffrey K. Uhlmann,et al. New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.
[20] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[21] Didrik Nielsen,et al. Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam , 2018, ICML.
[22] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[23] Sebastian Tschiatschek,et al. Successor Uncertainties: exploration and uncertainty in temporal difference learning , 2018, NeurIPS.
[24] Shie Mannor,et al. Shallow Updates for Deep Reinforcement Learning , 2017, NIPS.
[25] Martha White,et al. Accelerated Gradient Temporal Difference Learning , 2016, AAAI.
[26] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[27] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[28] Finale Doshi-Velez,et al. Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks , 2016, ICLR.
[29] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.
[30] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[31] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[32] Finale Doshi-Velez,et al. Decomposition of Uncertainty for Active Learning and Reliable Reinforcement Learning in Stochastic Systems , 2017, ArXiv.
[33] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[34] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[35] A.H. Haddad,et al. Applied optimal estimation , 1976, Proceedings of the IEEE.
[36] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[37] David Choi,et al. A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..
[38] Brian Karrer,et al. The decoupled extended Kalman filter for dynamic exponential-family factorization models , 2018, J. Mach. Learn. Res..
[39] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[40] Finale Doshi-Velez,et al. Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning , 2017, ICML.
[41] Michal Valko,et al. Bayesian Policy Gradient and Actor-Critic Algorithms , 2016, J. Mach. Learn. Res..
[42] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[43] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[44] James Martens,et al. New perspectives on the natural gradient method , 2014, ArXiv.
[45] Takao Miura,et al. Model Selection Based on Kalman Temporal Differences Learning , 2017, 2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC).
[46] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[47] Rudolph van der Merwe,et al. The unscented Kalman filter for nonlinear estimation , 2000, Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373).
[48] Rudolph van der Merwe,et al. Sigma-point kalman filters for probabilistic inference in dynamic state-space models , 2004 .
[49] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[50] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[51] Matthieu Geist,et al. Kalman Temporal Differences , 2010, J. Artif. Intell. Res..
[52] Qiang Liu,et al. A Kernel Loss for Solving the Bellman Equation , 2019, NeurIPS.
[53] Richard S. Sutton,et al. Directly Estimating the Variance of the {\lambda}-Return Using Temporal-Difference Methods , 2018 .
[54] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .
[55] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[56] Richard S. Sutton,et al. On the role of tracking in stationary environments , 2007, ICML '07.
[57] Yann Ollivier,et al. Online natural gradient as a Kalman filter , 2017, 1703.00209.
[58] Arash Givchi,et al. Quasi Newton Temporal Difference Learning , 2014, ACML.
[59] Dimitri P. Bertsekas,et al. Incremental Least Squares Methods and the Extended Kalman Filter , 1996, SIAM J. Optim..
[60] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[61] Simo Särkkä,et al. Bayesian Filtering and Smoothing , 2013, Institute of Mathematical Statistics textbooks.
[62] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[63] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[64] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[65] Lee A. Feldkamp,et al. Decoupled extended Kalman filter training of feedforward layered networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.
[66] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[67] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.