暂无分享,去创建一个
Nando de Freitas | Arnaud Doucet | Arthur Gretton | Liyuan Xu | Yutian Chen | Tom Le Paine | Caglar Gulcehre
[1] Luofeng Liao,et al. Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach , 2020, NeurIPS.
[2] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[3] Luofeng Liao,et al. Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning , 2021, ArXiv.
[4] Vasilis Syrgkanis,et al. Adversarial Generalized Method of Moments , 2018, ArXiv.
[5] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[6] Arthur Gretton,et al. Kernel Instrumental Variable Regression , 2019, NeurIPS.
[7] Ehsan Saleh. Deterministic Bellman Residual Minimization , 2019 .
[8] Ye Luo,et al. Causal Reinforcement Learning: An Instrumental Variable Approach , 2021, SSRN Electronic Journal.
[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[10] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[11] Nishanth Dikkala,et al. Minimax Estimation of Conditional Moment Models , 2020, NeurIPS.
[12] Andrew Bennett,et al. Deep Generalized Method of Moments for Instrumental Variable Analysis , 2019, NeurIPS.
[13] Sergey Levine,et al. Benchmarks for Deep Off-Policy Evaluation , 2021, ICLR.
[14] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[15] Mohammad Norouzi,et al. Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization , 2021, ICLR.
[16] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[17] J. Stock,et al. Retrospectives Who Invented Instrumental Variable Regression , 2003 .
[18] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[19] Bo Dai,et al. Off-Policy Evaluation via the Regularized Lagrangian , 2020, NeurIPS.
[20] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[21] Nando de Freitas,et al. Hyperparameter Selection for Offline Reinforcement Learning , 2020, ArXiv.
[22] Sameera S. Ponda,et al. Autonomous navigation of stratospheric balloons using reinforcement learning , 2020, Nature.
[23] Nando de Freitas,et al. Learning Deep Features in Instrumental Variable Regression , 2020, ICLR.
[24] J. Florens,et al. Linear Inverse Problems in Structural Econometrics Estimation Based on Spectral Decomposition and Regularization , 2003 .
[25] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.
[26] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[27] Pawel Wawrzynski,et al. Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.
[28] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[29] Joshua D. Angrist,et al. Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records , 1990 .
[30] Kevin Leyton-Brown,et al. Deep IV: A Flexible Approach for Counterfactual Prediction , 2017, ICML.
[31] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[32] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .
[33] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[34] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[35] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[36] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[37] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[38] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[39] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[40] L. Hansen. Large Sample Properties of Generalized Method of Moments Estimators , 1982 .
[41] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[42] Timothy M. Christensen,et al. Optimal sup-norm rates and uniform inference on nonlinear functionals of nonparametric IV regression: Nonlinear functionals of nonparametric IV , 2018 .
[43] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[44] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[45] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[46] J. Horowitz,et al. Measuring the price responsiveness of gasoline demand: Economic shape restrictions and nonparametric demand estimation , 2011 .
[47] Philip G. Wright,et al. The tariff on animal and vegetable oils , 1928 .
[48] Sergio Gomez Colmenarejo,et al. RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning , 2020 .
[49] Qiang Liu,et al. Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning , 2020, ICLR.
[50] Tor Lattimore,et al. Behaviour Suite for Reinforcement Learning , 2019, ICLR.
[51] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[52] Joshua D. Angrist,et al. Identification of Causal Effects Using Instrumental Variables , 1993 .
[53] W. Newey,et al. Instrumental variable estimation of nonparametric models , 2003 .
[54] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[55] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[56] Ali Mousavi,et al. Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders , 2020, AISTATS.
[57] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[58] Hoang Minh Le,et al. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning , 2019, NeurIPS Datasets and Benchmarks.
[59] Krikamol Muandet,et al. Dual IV: A Single Stage Instrumental Variable Regression , 2019, ArXiv.