Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
暂无分享,去创建一个
Yao Liu | Emma Brunskill | Pierre-Luc Bacon | Pierre-Luc Bacon | Emma Brunskill | Yao Liu | E. Brunskill
[1] A. Dubi,et al. The Interpretation of Conditional Monte Carlo as a Form of Importance Sampling , 1979 .
[2] W. Newey,et al. Double machine learning for treatment and causal parameters , 2016 .
[3] Yifei Ma,et al. Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling , 2019, NeurIPS.
[4] Marc G. Bellemare,et al. Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.
[5] P. Glynn,et al. Likelihood Ratio Gradient Estimation for Steady-State Parameters , 2017, Stochastic Systems.
[6] Pierre L'Ecuyer,et al. Importance Sampling in Rare Event Simulation , 2009, Rare Event Simulation using Monte Carlo Methods.
[7] Gerardo Rubino,et al. Introduction to Rare Event Simulation , 2009, Rare Event Simulation using Monte Carlo Methods.
[8] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[9] Pierre L'Ecuyer,et al. Efficiency improvement and variance reduction , 1994, Proceedings of Winter Simulation Conference.
[10] Donald L. Iglehart,et al. Simulation methods for queues: An overview , 1988, Queueing Syst. Theory Appl..
[11] Shie Mannor,et al. Consistent On-Line Off-Policy Evaluation , 2017, ICML.
[12] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[13] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[14] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[15] Bruno Tuffin,et al. Approximate zero-variance simulation , 2008, 2008 Winter Simulation Conference.
[16] Galin L. Jones. On the Markov chain central limit theorem , 2004, math/0409112.
[17] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .
[18] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning , 2019 .
[19] Sean P. Meyn,et al. A Liapounov bound for solutions of the Poisson equation , 1996 .
[20] B. L. Granovsky. Optimal Formulae of the Conditional Monte Carlo , 1981 .
[21] Paul Bratley,et al. A guide to simulation (2nd ed.) , 1986 .
[22] S. Ross. Simulating Average Delay–Variance Reduction by Conditioning , 1988, Probability in the Engineering and Informational Sciences.
[23] P. Glynn. Importance sampling for markov chains: asymptotics for the variance , 1994 .
[24] Paul Glasserman,et al. Filtered Monte Carlo , 1993, Math. Oper. Res..
[25] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[26] James A. Bucklew. Conditional importance sampling estimators , 2005, IEEE Transactions on Information Theory.
[27] L. Breiman. The Strong Law of Large Numbers for a Class of Markov Chains , 1960 .
[28] M. J. Fryer,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[29] J. M. Hammersley,et al. Conditional Monte Carlo , 1956, JACM.
[30] Masatoshi Uehara,et al. Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes , 2019, J. Mach. Learn. Res..
[31] Paul Bratley,et al. A guide to simulation , 1983 .
[32] Hoang Minh Le,et al. Empirical Analysis of Off-Policy Policy Evaluation for Reinforcement Learning , 2019 .
[33] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning , 2019 .
[34] Philip S. Thomas,et al. Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation , 2017, NIPS.
[35] Dennis D. Boos,et al. A Converse to Scheffe's Theorem , 1985 .
[36] T. Schaul,et al. Conditional Importance Sampling for Off-Policy Learning , 2019, AISTATS.
[37] Marcello Restelli,et al. Policy Optimization via Importance Sampling , 2018, NeurIPS.
[38] Luca Martino,et al. Advances in Importance Sampling , 2021, Wiley StatsRef: Statistics Reference Online.
[39] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[40] Rajan Srinivasan. Some results in importance sampling and an application to detection , 1998, Signal Process..