暂无分享,去创建一个
[1] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.
[2] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[3] Marco Wiering,et al. Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[4] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.
[5] Alejandro Agostini,et al. Reinforcement Learning with a Gaussian mixture model , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).
[6] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[7] Darwin G. Caldwell,et al. Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning , 2013, Robotics Auton. Syst..
[8] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[9] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[10] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[11] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.
[12] Michael L. Littman,et al. An Ensemble of Linearly Combined Reinforcement-Learning Agents , 2013, AAAI.
[13] Andrei V. Kelarev,et al. Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks , 2009, Australasian Conference on Artificial Intelligence.
[14] Tobias J. Oechtering,et al. On the Entropy Computation of Large Complex Gaussian Mixture Distributions , 2015, IEEE Transactions on Signal Processing.
[15] P. Hall,et al. On the estimation of entropy , 1993 .
[16] Miguel Á. Carreira-Perpiñán,et al. Mode-Finding for Mixtures of Gaussian Distributions , 2000, IEEE Trans. Pattern Anal. Mach. Intell..
[17] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[18] Dale Schuurmans,et al. Striving for Simplicity in Off-policy Deep Reinforcement Learning , 2019, ArXiv.
[19] Hugh F. Durrant-Whyte,et al. On entropy approximation for Gaussian mixture random vectors , 2008, 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.
[20] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[21] Artemy Kolchinsky,et al. Estimating Mixture Entropy with Pairwise Distances , 2017, Entropy.
[22] Shie Mannor,et al. Distributional Policy Optimization: An Alternative Approach for Continuous Control , 2019, NeurIPS.
[23] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[24] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[25] Andrew G. Barto,et al. PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.
[26] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[27] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[29] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[30] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[31] Joelle Pineau,et al. OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning , 2017, AAAI.
[32] Pieter Abbeel,et al. SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning , 2021, ICML.
[33] H. Joe. Estimation of entropy and other functionals of a multivariate density , 1989 .
[34] Peng-Yeng Yin,et al. Maximum entropy-based optimal threshold selection using deterministic reinforcement learning with controlled randomization , 2002, Signal Process..
[35] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[36] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[37] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[38] Yee Whye Teh,et al. Mix&Match - Agent Curricula for Reinforcement Learning , 2018, ICML.
[39] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.
[40] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[41] Naomi S. Altman,et al. Quantile regression , 2019, Nature Methods.
[42] Friedhelm Schwenker,et al. Ensemble Methods for Reinforcement Learning with Function Approximation , 2011, MCS.
[43] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.
[44] Jun Song,et al. Optimistic Distributionally Robust Policy Optimization , 2020, ArXiv.