Bandits with partially observable confounded data
暂无分享,去创建一个
Shie Mannor | Uri Shalit | Guy Tennenholtz | Yonathan Efroni | Shie Mannor | Uri Shalit | Yonathan Efroni | Guy Tennenholtz
[1] J. Pearl. Causal inference in statistics: An overview , 2009 .
[2] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[3] Thorsten Joachims,et al. Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..
[4] Nikhil R. Devanur,et al. Bandits with Global Convex Constraints and Objective , 2019, Oper. Res..
[5] Thorsten Joachims,et al. Multi-armed Bandit Problems with History , 2012, AISTATS.
[6] J M Robins,et al. Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.
[7] Elias Bareinboim,et al. Bandits with Unobserved Confounders: A Causal Approach , 2015, NIPS.
[8] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[9] Fredrik D. Johansson,et al. Guidelines for reinforcement learning in healthcare , 2019, Nature Medicine.
[10] Louis Wehenkel,et al. Batch mode reinforcement learning based on the synthesis of artificial trajectories , 2013, Ann. Oper. Res..
[11] Michael R. Lyu,et al. CBRAP: Contextual Bandits with RAndom Projection , 2017, AAAI.
[12] Paul Covington,et al. Deep Neural Networks for YouTube Recommendations , 2016, RecSys.
[13] John C. S. Lui,et al. Combining Offline Causal Inference and Online Bandit Learning for Data Driven Decisions , 2020, ArXiv.
[14] Leland Gerson Neuberg,et al. CAUSALITY: MODELS, REASONING, AND INFERENCE, by Judea Pearl, Cambridge University Press, 2000 , 2003, Econometric Theory.
[15] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[16] J. C. A. Barata,et al. The Moore–Penrose Pseudoinverse: A Tutorial Review of the Theory , 2011, 1110.6882.
[17] Moritz Werling,et al. Reinforcement Learning for Autonomous Maneuvering in Highway Scenarios , 2017 .
[18] Nikhil R. Devanur,et al. Linear Contextual Bandits with Knapsacks , 2015, NIPS.
[19] Peter Szolovits,et al. MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.
[20] Andreas Krause,et al. High-Dimensional Gaussian Process Bandits , 2013, NIPS.
[21] Donald Gillies,et al. Causality: Models, Reasoning, and Inference Judea Pearl , 2001 .
[22] G. Stewart. On the Perturbation of Pseudo-Inverses, Projections and Linear Least Squares Problems , 1977 .
[23] P. Wedin. Perturbation theory for pseudo-inverses , 1973 .
[24] Elias Bareinboim,et al. Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes , 2019, NeurIPS.
[25] Christos Thrampoulidis,et al. Linear Stochastic Bandits Under Safety Constraints , 2019, NeurIPS.
[26] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[27] Wen Li,et al. On perturbation bounds for orthogonal projections , 2016, Numerical Algorithms.
[28] Tor Lattimore,et al. Causal Bandits: Learning Good Interventions via Causal Inference , 2016, NIPS.
[29] Amir Leshem,et al. Finite Sample Performance of Linear Least Squares Estimators Under Sub-Gaussian Martingale Difference Noise , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Susan A. Murphy,et al. Linear fitted-Q iteration with multiple reward functions , 2013, J. Mach. Learn. Res..
[31] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[32] Judea Pearl,et al. What Counterfactuals Can Be Tested , 2007, UAI.
[33] Mélanie Frappier,et al. The Book of Why: The New Science of Cause and Effect , 2018, Science.
[34] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[35] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[36] G. Stewart. On the Continuity of the Generalized Inverse , 1969 .
[37] Shie Mannor,et al. Off-Policy Evaluation in Partially Observable Environments , 2020, AAAI.
[38] David Sontag,et al. Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models , 2019, ICML.
[39] Joel A. Tropp,et al. An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..
[40] Yao Liu,et al. Combining Parametric and Nonparametric Models for Off-Policy Evaluation , 2019, ICML.
[41] Babak Hassibi,et al. Stochastic Linear Bandits with Hidden Low Rank Structure , 2019, ArXiv.
[42] Elias Bareinboim,et al. Sensitivity Analysis of Linear Structural Causal Models , 2019, ICML.
[43] Benjamin Van Roy,et al. Conservative Contextual Linear Bandits , 2016, NIPS.
[44] Rafic Younes,et al. Review of Optimization Methods for Cancer Chemotherapy Treatment Planning , 2015 .
[45] Renyuan Xu,et al. Learning in Generalized Linear Contextual Bandits with Stochastic Delays , 2019, NeurIPS.
[46] M. de Rijke,et al. Deep Learning with Logged Bandit Feedback , 2018, ICLR.
[47] Huazheng Wang,et al. Learning Hidden Features for Contextual Bandits , 2016, CIKM.
[48] Jason Weston,et al. Learning through Dialogue Interactions by Asking Questions , 2016, ICLR.
[49] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[50] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[51] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[52] Zvi Griliches,et al. Specification Bias in Estimates of Production Functions , 1957 .
[53] Uri Shalit,et al. Removing Hidden Confounding by Experimental Grounding , 2018, NeurIPS.