论文信息 - Learning Deep Features in Instrumental Variable Regression - 字舞流文

Learning Deep Features in Instrumental Variable Regression

Instrumental variable (IV) regression is a standard strategy for learning causal relationships between confounded treatment and outcome variables from observational data by utilizing an instrumental variable, which affects the outcome only through the treatment. In classical IV regression, learning proceeds in two stages: stage 1 performs linear regression from the instrument to the treatment; and stage 2 performs linear regression from the treatment to the outcome, conditioned on the instrument. We propose a novel method, deep feature instrumental variable regression (DFIV), to address the case where relations between instruments, treatments, and outcomes may be nonlinear. In this case, deep neural nets are trained to define informative nonlinear features on the instruments and treatments. We propose an alternating training regime for these features to ensure good end-to-end performance when composing stages 1 and 2, thus obtaining highly flexible feature maps in a computationally efficient manner. DFIV outperforms recent state-of-the-art methods on challenging IV benchmarks, including settings involving high dimensional image data. DFIV also exhibits competitive performance in off-policy policy evaluation for reinforcement learning, which can be understood as an IV regression task.

Nando de Freitas | Arthur Gretton | Liyuan Xu | Yutian Chen | Siddarth Srinivasan | Arnaud Doucet | A. Doucet | N. D. Freitas | A. Gretton | Yutian Chen | Siddarth Srinivasan | Liyuan Xu

[1] Philip G. Wright,et al. The tariff on animal and vegetable oils , 1928 .

[2] G. Wahba,et al. Generalized Inverses in Reproducing Kernel Spaces: An Approach to Regularization of Linear Operator Equations , 1974 .

[3] L. Hansen. Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[4] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[5] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .

[6] Joshua D. Angrist,et al. Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records , 1990 .

[7] Joshua D. Angrist,et al. Identification of Causal Effects Using Instrumental Variables , 1993 .

[8] J. Angrist,et al. Jackknife Instrumental Variables Estimation , 1995 .

[9] Joshua D. Angrist,et al. Split-Sample Instrumental Variables Estimates of the Return to Schooling , 1995 .

[10] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[11] J. Florens,et al. Nonparametric Instrumental Regression , 2010 .

[12] J. Stock,et al. Retrospectives Who Invented Instrumental Variable Regression , 2003 .

[13] J. Florens,et al. Linear Inverse Problems in Structural Econometrics Estimation Based on Spectral Decomposition and Regularization , 2003 .

[14] Xiaohong Chen,et al. Semi‐Nonparametric IV Estimation of Shape‐Invariant Engel Curves , 2003 .

[15] W. Newey,et al. Instrumental variable estimation of nonparametric models , 2003 .

[16] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[17] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[20] Joshua D. Angrist,et al. Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .

[21] J. Horowitz,et al. Measuring the price responsiveness of gasoline demand: Economic shape restrictions and nonparametric demand estimation , 2011 .

[22] Xiaohong Chen,et al. Estimation of Nonparametric Conditional Moment Models with Possibly Nonsmooth Generalized Residuals , 2009 .

[23] Elias Bareinboim,et al. Causal Inference by Surrogate Experiments: z-Identifiability , 2012, UAI.

[24] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[25] Christian Hansen,et al. Instrumental variables estimation with many weak instruments using regularized JIVE , 2014 .

[26] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[27] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[28] Kevin Leyton-Brown,et al. Deep IV: A Flexible Approach for Counterfactual Prediction , 2017, ICML.

[29] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[30] Timothy M. Christensen,et al. Optimal sup-norm rates and uniform inference on nonlinear functionals of nonparametric IV regression: Nonlinear functionals of nonparametric IV , 2018 .

[31] Stefano V. Albrecht,et al. Stabilizing Generative Adversarial Networks: A Survey , 2019, 1910.00927.

[32] Jieping Ye,et al. Environment Reconstruction with Hidden Confounders for Reinforcement Learning based Recommendation , 2019, KDD.

[33] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.

[34] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[35] Andrew Bennett,et al. Deep Generalized Method of Moments for Instrumental Variable Analysis , 2019, NeurIPS.

[36] Krikamol Muandet,et al. Dual IV: A Single Stage Instrumental Variable Regression , 2019, ArXiv.

[37] Arthur Gretton,et al. Kernel Instrumental Variable Regression , 2019, NeurIPS.

[38] Tor Lattimore,et al. Behaviour Suite for Reinforcement Learning , 2019, ICLR.

[39] Krikamol Muandet,et al. Dual Instrumental Variable Regression , 2019, NeurIPS.

[40] Emma Brunskill,et al. Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding , 2020, NeurIPS.

[41] Sergio Gomez Colmenarejo,et al. Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.

[42] Nando de Freitas,et al. Hyperparameter Selection for Offline Reinforcement Learning , 2020, ArXiv.

[43] Hoang Minh Le,et al. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning , 2019, NeurIPS Datasets and Benchmarks.