Minimax Weight and Q-Function Learning for Off-Policy Evaluation
暂无分享,去创建一个
[1] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.
[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[3] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..
[4] J. Hahn. On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .
[5] Peter Stone,et al. Importance Sampling Policy Evaluation with an Estimated Behavior Policy , 2018, ICML.
[6] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[7] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[8] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[9] Bernhard Schölkopf,et al. Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..
[10] Stefan Wager,et al. Augmented minimax linear estimation , 2017, The Annals of Statistics.
[11] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[12] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[13] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[14] P. Bickel. Efficient and Adaptive Estimation for Semiparametric Models , 1993 .
[15] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[16] Nathan Kallus,et al. Generalized Optimal Matching Methods for Causal Inference , 2016, J. Mach. Learn. Res..
[17] Qiang Liu,et al. A Kernel Loss for Solving the Bellman Equation , 2019, NeurIPS.
[18] Yifei Ma,et al. Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling , 2019, NeurIPS.
[19] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[20] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[21] Nan Jiang,et al. On Oracle-Efficient PAC RL with Rich Observations , 2018, NeurIPS.
[22] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.
[23] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[24] Mehryar Mohri,et al. Rademacher Complexity Bounds for Non-I.I.D. Processes , 2008, NIPS.
[25] Ziyang Tang. Harnessing Infinite-Horizon Off-Policy Evaluation: Double Robustness via Duality , 2019 .
[26] Masatoshi Uehara,et al. Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes , 2019, J. Mach. Learn. Res..
[27] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[28] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2002 .
[29] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon: Double Reinforcement Learning in Infinite-Horizon Processes , 2019, ArXiv.
[30] David A. Hirshberg,et al. Balancing Out Regression Error: Efficient Treatment Effect Estimation without Smooth Propensities , 2017 .
[31] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning , 2019 .
[32] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[33] Nan Jiang,et al. On Value Functions and the Agent-Environment Boundary , 2019, ArXiv.
[34] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[35] S. Eguchi,et al. A paradox concerning nuisance parameters and projected estimating functions , 2004 .
[36] Ilya Kostrikov,et al. AlgaeDICE: Policy Gradient from Arbitrary Experience , 2019, ArXiv.
[37] W. Newey,et al. Large sample estimation and hypothesis testing , 1986 .
[38] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .
[39] Jiawei Huang,et al. Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization , 2020, ArXiv.
[40] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[41] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[42] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[43] K. Do,et al. Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .
[44] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[45] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[46] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.