Toward Minimax Off-policy Value Estimation
暂无分享,去创建一个
[1] E. L. Lehmann,et al. Theory of point estimation , 1950 .
[2] R. Z. Khasʹminskiĭ,et al. Statistical estimation : asymptotic theory , 1981 .
[3] D. Rubin,et al. The central role of the propensity score in observational studies for causal effects , 1983 .
[4] D. Rubin,et al. Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .
[5] P. Holland. Statistics and Causal Inference , 1985 .
[6] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .
[7] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[8] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[9] service Topic collections Notes , .
[10] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[11] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2002 .
[12] Tim Hesterberg,et al. Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.
[13] Eli Upfal,et al. Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .
[14] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[15] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[16] John Langford,et al. Exploration scavenging , 2008, ICML '08.
[17] Neil D. Lawrence,et al. Dataset Shift in Machine Learning , 2009 .
[18] Christian Igel,et al. Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search , 2009, ICML '09.
[19] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.
[20] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[21] Sanjoy Dasgupta,et al. Two faces of active learning , 2009, Theor. Comput. Sci..
[22] Yaoliang Yu,et al. Analysis of Kernel Mean Matching under Covariate Shift , 2012, ICML.
[23] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[24] Olivier Nicol,et al. Data-driven evaluation of Contextual Bandit algorithms and applications to Dynamic Recommendation , 2014 .