Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes
暂无分享,去创建一个
[1] Chris A. J. Klaassen,et al. Consistent Estimation of the Influence Function of Locally Asymptotically Linear Estimators , 1987 .
[2] J. Pearl,et al. Causal Inference , 2011, Twenty-one Mental Models That Can Change Policing.
[3] B. Chakraborty,et al. Statistical methods for dynamic treatment regimes , 2013 .
[4] M. Kosorok. Introduction to Empirical Processes and Semiparametric Inference , 2008 .
[5] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[6] A. Tsiatis. Semiparametric Theory and Missing Data , 2006 .
[7] Aad Van Der Vbart,et al. ON DIFFERENTIABLE FUNCTIONALS , 1988 .
[8] Sergey Levine,et al. Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.
[9] Qi Li,et al. Nonparametric Econometrics: Theory and Practice , 2006 .
[10] Michael R. Kosorok,et al. Estimating Dynamic Treatment Regimes in Mobile Health Using V-Learning , 2016, Journal of the American Statistical Association.
[11] Xiaotong Shen,et al. On methods of sieves and penalization , 1997 .
[12] J M Robins,et al. Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.
[13] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.
[14] Judea Pearl,et al. Causal Inference , 2010 .
[15] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[16] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[17] Stefan Wager,et al. Adaptive Concentration of Regression Trees, with Application to Random Forests , 2015, 1503.06388.
[18] Masatoshi Uehara,et al. Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation , 2020, ICML.
[19] L. Hansen. Large Sample Properties of Generalized Method of Moments Estimators , 1982 .
[20] Yifei Ma,et al. Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling , 2019, NeurIPS.
[21] Thorsten Joachims,et al. The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.
[22] P. Bickel. Efficient and Adaptive Estimation for Semiparametric Models , 1993 .
[23] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[24] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[25] J. Robins,et al. Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .
[26] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[27] Khashayar Khosravi,et al. Non-Parametric Inference Adaptive to Intrinsic Dimension , 2019, CLeaR.
[28] K. Do,et al. Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .
[29] Gary Chamberlain,et al. Comment: Sequential Moment Restrictions in Panel Data , 1992 .
[30] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[31] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2002 .
[32] James M. Robins,et al. Characterization of parameters with a mixed bias property , 2019, Biometrika.
[33] Marie Davidian,et al. Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. , 2013, Biometrika.
[34] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[35] L. Hansen,et al. Finite Sample Properties of Some Alternative Gmm Estimators , 2015 .
[36] Xiaohong Chen. Chapter 76 Large Sample Sieve Estimation of Semi-Nonparametric Models , 2007 .
[37] J. Hahn. On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .
[38] Yu-Xiang Wang,et al. Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning , 2020, AISTATS.
[39] Aurélien F. Bibaut,et al. Fast rates for empirical risk minimization with cadlag losses with bounded sectional variation norm , 2019, 1907.09244.
[40] Stefan Wager,et al. Uniform Convergence of Random Forests via Adaptive Concentration , 2015 .
[41] Gautam Tripathi,et al. A matrix extension of the Cauchy-Schwarz inequality , 1999 .
[42] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[43] P. Bartlett,et al. Local Rademacher complexities , 2005, math/0508275.
[44] Iván Díaz,et al. Machine learning in the estimation of causal effects: targeted minimum loss-based estimation and double/debiased machine learning. , 2019, Biostatistics.
[45] Eric J. Tchetgen Tchetgen,et al. Identification and Doubly Robust Estimation of Data Missing Not at Random with an Ancillary Variable , 2015 .
[46] Xiaohong Chen,et al. Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions , 2003 .
[47] James M. Robins,et al. Unified Methods for Censored Longitudinal Data and Causality , 2003 .
[48] Aurélien F. Bibaut,et al. Fast rates for empirical risk minimization over c\`adl\`ag functions with bounded sectional variation norm , 2019 .
[49] Kenji Fukumizu,et al. Deep Neural Networks Learn Non-Smooth Functions Effectively , 2018, AISTATS.
[50] Chunrong Ai,et al. Semiparametric Efficiency Bound for Models of Sequential Moment Restrictions Containing Unknown Functions , 2009 .
[51] Fredrik D. Johansson,et al. Guidelines for reinforcement learning in healthcare , 2019, Nature Medicine.
[52] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[53] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .
[54] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[55] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..
[56] John Langford,et al. Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.
[57] J. Robins. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect , 1986 .
[58] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[59] Masatoshi Uehara,et al. Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning , 2019, NeurIPS.
[60] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[61] Clive G. Bowsher,et al. Identifying sources of variation and the flow of information in biochemical networks , 2012, Proceedings of the National Academy of Sciences.
[62] S. Murphy,et al. Optimal dynamic treatment regimes , 2003 .
[63] C. J. Stone,et al. The Use of Polynomial Splines and Their Tensor Products in Multivariate Function Estimation , 1994 .
[64] J. Robins,et al. Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.
[65] W. Newey,et al. Large sample estimation and hypothesis testing , 1986 .
[66] James M. Robins,et al. Marginal Structural Models versus Structural nested Models as Tools for Causal inference , 2000 .
[67] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[68] Nikos Vlassis,et al. More Efficient Off-Policy Evaluation through Regularized Targeted Learning , 2019, ICML.
[69] Mark J. van der Laan,et al. Cross-Validated Targeted Minimum-Loss-Based Estimation , 2011 .
[70] Mark J. van der Laan,et al. The Highly Adaptive Lasso Estimator , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).
[71] R. Strawderman,et al. Constructing dynamic treatment regimes over indefinite time horizons , 2018, Biometrika.
[72] Jinyong Hahn,et al. Efficient estimation of panel data models with sequential moment restrictions , 1997 .