暂无分享,去创建一个
[1] S. M. Ali,et al. A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .
[2] T. Morimoto. Markov Processes and the H -Theorem , 1963 .
[3] Marc Teboulle,et al. Entropic Proximal Mappings with Applications to Nonlinear Programming , 1992, Math. Oper. Res..
[4] H. Kappen. Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.
[5] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[6] Thomas P. Minka,et al. Divergence measures and message passing , 2005 .
[7] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[8] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[9] Alex M. Andrew,et al. Reinforcement Learning: : An Introduction , 1998 .
[10] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[11] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[12] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[13] Shie Mannor,et al. Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..
[14] Huaiyu Zhu,et al. Information geometric measurements of generalisation , 1995 .
[15] Andrzej Cichocki,et al. Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.
[16] Alexander J. Smola,et al. Unifying Divergence Minimization and Statistical Inference Via Convex Duality , 2006, COLT.
[17] John S. Bridle,et al. Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.
[18] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[19] Y. Freund,et al. The non-stochastic multi-armed bandit problem , 2001 .
[20] Sebastian Nowozin,et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.
[21] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[22] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[23] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .
[24] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[25] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..
[26] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[27] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[28] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[29] Sergio Verdú,et al. $f$ -Divergence Inequalities , 2015, IEEE Transactions on Information Theory.
[30] Shun-ichi Amari,et al. Differential-geometrical methods in statistics , 1985 .
[31] Doina Precup,et al. An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.
[32] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[33] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..