Off-Policy Evaluation in Partially Observable Environments
暂无分享,去创建一个
[1] Matthijs T. J. Spaan,et al. Partially Observable Markov Decision Processes , 2010, Encyclopedia of Machine Learning.
[2] Judea Pearl,et al. Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution , 2018, WSDM.
[3] David Sontag,et al. Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models , 2019, ICML.
[4] Bernhard Schölkopf,et al. Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .
[5] J M Robins,et al. Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.
[6] Elias Bareinboim,et al. Counterfactual Data-Fusion for Online Reinforcement Learners , 2017, ICML.
[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[8] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.
[9] Judea Pearl,et al. Causal Inference , 2010 .
[10] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[11] J. Pearl,et al. Measurement bias and effect restoration in causal inference , 2014 .
[12] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[13] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[14] E. Bareinboim,et al. Markov Decision Processes with Unobserved Confounders : A Causal Approach , 2016 .
[15] Yishay Mansour,et al. Reinforcement Learning in POMDPs Without Resets , 2005, IJCAI.
[16] Elias Bareinboim,et al. Bandits with Unobserved Confounders: A Causal Approach , 2015, NIPS.
[17] P. Spirtes,et al. Causation, prediction, and search , 1993 .
[18] J. Robins,et al. Comparison of dynamic treatment regimes via inverse probability weighting. , 2006, Basic & clinical pharmacology & toxicology.
[19] Srivatsan Srinivasan,et al. Evaluating Reinforcement Learning Algorithms in Observational Health Settings , 2018, ArXiv.
[20] Alexandros G. Dimakis,et al. Contextual Bandits with Latent Confounders: An NMF Approach , 2016, AISTATS.
[21] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[22] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[23] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[24] J. Pearl. Causality: Models, Reasoning and Inference , 2000 .
[25] Anne Condon,et al. On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.
[26] I. V. Romanovskii. Existence of an Optimal Stationary Policy in a Markov Decision Process , 1965 .
[27] Steve J. Young,et al. Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..
[28] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[29] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[30] Béatrice Finance,et al. A Causal Multi-armed Bandit Approach for Domestic Robots' Failure Avoidance , 2017, ICONIP.
[31] Masatoshi Uehara,et al. Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes , 2019, J. Mach. Learn. Res..
[32] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[33] Xiaoyan Zhu,et al. Linguistically Regularized LSTMs for Sentiment Classification , 2016, ArXiv.
[34] J. Pearl,et al. Causal Inference , 2011, Twenty-one Mental Models That Can Change Policing.
[35] Charles Bordenave,et al. Circular law theorem for random Markov matrices , 2008, 0808.1502.
[36] Max Welling,et al. Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.
[37] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[38] Bernhard Schölkopf,et al. Deconfounding Reinforcement Learning in Observational Settings , 2018, ArXiv.
[39] Z. Geng,et al. Identifying Causal Effects With Proxy Variables of an Unmeasured Confounder. , 2016, Biometrika.