暂无分享,去创建一个
[1] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[2] Andrew Bennett,et al. Deep Generalized Method of Moments for Instrumental Variable Analysis , 2019, NeurIPS.
[3] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[4] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[5] Kevin Leyton-Brown,et al. Deep IV: A Flexible Approach for Counterfactual Prediction , 2017, ICML.
[6] Sergio Gomez Colmenarejo,et al. Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.
[7] Krikamol Muandet,et al. Dual IV: A Single Stage Instrumental Variable Regression , 2019, ArXiv.
[8] Tor Lattimore,et al. Behaviour Suite for Reinforcement Learning , 2019, ICLR.
[9] Nando de Freitas,et al. Hyperparameter Selection for Offline Reinforcement Learning , 2020, ArXiv.
[10] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[11] Jieping Ye,et al. Environment Reconstruction with Hidden Confounders for Reinforcement Learning based Recommendation , 2019, KDD.
[12] Arthur Gretton,et al. Kernel Instrumental Variable Regression , 2019, NeurIPS.
[13] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[15] Joshua D. Angrist,et al. Identification of Causal Effects Using Instrumental Variables , 1993 .
[16] Elias Bareinboim,et al. Causal Inference by Surrogate Experiments: z-Identifiability , 2012, UAI.
[17] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.
[18] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.
[19] Yoav Freund,et al. Foundations of Machine Learning , 2012 .
[20] Emma Brunskill,et al. Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding , 2020, NeurIPS.
[21] Anton Nakov. Jackknife instrumental variables estimation: replication and extension of angrist, imbens and krueger (1999) , 2010 .
[22] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.