Entropic Regularization of Markov Decision Processes
暂无分享,去创建一个
[1] Daniel Polani,et al. Information Theory of Decisions and Actions , 2011 .
[2] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[3] Sebastian Nowozin,et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.
[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[5] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[6] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[7] Zhihua Zhang,et al. A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning , 2019, NeurIPS.
[8] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[9] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..
[10] Jordi Grau-Moya,et al. Bounded Rationality, Abstraction, and Hierarchical Decision-Making: An Information-Theoretic Optimality Principle , 2015, Front. Robot. AI.
[11] David Lopez-Paz,et al. Geometrical Insights for Implicit Generative Modeling , 2017, Braverman Readings in Machine Learning.
[12] Turgut Var,et al. A dynamic programming—integer programming algorithm for allocating touristic investments , 1972 .
[13] Ofir Nachum,et al. Path Consistency Learning in Tsallis Entropy Regularized MDPs , 2018, ICML.
[14] Kyungjae Lee,et al. Maximum Causal Tsallis Entropy Imitation Learning , 2018, NeurIPS.
[15] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[16] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[17] Marc Teboulle,et al. Entropic Proximal Mappings with Applications to Nonlinear Programming , 1992, Math. Oper. Res..
[18] R. Bellman. Dynamic programming. , 1957, Science.
[19] Shie Mannor,et al. Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..
[20] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[21] T. Morimoto. Markov Processes and the H -Theorem , 1963 .
[22] Andrzej Cichocki,et al. Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.
[23] Huaiyu Zhu,et al. Information geometric measurements of generalisation , 1995 .
[24] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[25] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[26] Kyungjae Lee,et al. Sparse Markov Decision Processes With Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning , 2018, IEEE Robotics and Automation Letters.
[27] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..
[28] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[29] Kyungjae Lee,et al. Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning , 2019, ArXiv.
[30] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[31] Eckehard Olbrich,et al. Autonomy: An information theoretic perspective , 2008, Biosyst..
[32] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[33] Frank Nielsen,et al. An Elementary Introduction to Information Geometry , 2018, Entropy.
[34] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[35] Zhihua Zhang,et al. A Unified Framework for Regularized Reinforcement Learning , 2019, ArXiv.
[36] Sergio Verdú,et al. $f$ -Divergence Inequalities , 2015, IEEE Transactions on Information Theory.
[37] Shun-ichi Amari,et al. Differential-geometrical methods in statistics , 1985 .
[38] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[39] S. M. Ali,et al. A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .
[40] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[41] Philip S. Thomas,et al. A Notation for Markov Decision Processes , 2015, ArXiv.
[42] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[43] David H. Wolpert,et al. Information Theory - The Bridge Connecting Bounded Rational Game Theory and Statistical Physics , 2004, ArXiv.
[44] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[45] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .
[46] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.
[47] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[48] Y. Freund,et al. The non-stochastic multi-armed bandit problem , 2001 .
[49] Doina Precup,et al. An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.
[50] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[51] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[52] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .