Causal Effect Inference with Deep Latent-Variable Models

Learning individual-level causal effects from observational data, such as inferring the most effective medication for a specific patient, is a problem of growing importance for policy makers. The most important aspect of inferring causal effects from observational data is the handling of confounders, factors that affect both an intervention and its outcome. A carefully designed observational study attempts to measure all important confounders. However, even if one does not have direct access to all confounders, there may exist noisy and uncertain measurement of proxies for confounders. We build on recent advances in latent variable modeling to simultaneously estimate the unknown latent space summarizing the confounders and the causal effect. Our method is based on Variational Autoencoders (VAE) which follow the causal structure of inference with proxies. We show our method is significantly more robust than existing methods, and matches the state-of-the-art on previous benchmarks focused on individual treatment effects.

[1]  Illtyd Trethowan Causality , 1938 .

[2]  M. Wickens A Note on the Use of Proxy Variables , 1972 .

[3]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[4]  J. Kruskal More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling , 1976 .

[5]  P. Frost Proxy Variables and Specification Bias , 1979 .

[6]  S. Greenland,et al.  Correcting for misclassification in two-way tables and matched-pair studies. , 1983, International journal of epidemiology.

[7]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[8]  Jerry A. Hausman,et al.  Errors in Variables in Panel Data , 1984 .

[9]  J. Selen Adjusting for errors in classification and measurement in the analysis of partly and purely categorical data , 1986 .

[10]  Bo Thiesson,et al.  Learning Mixtures of DAG Models , 1998, UAI.

[11]  Jeffrey A. Smith,et al.  Does Matching Overcome Lalonde's Critique of Nonexperimental Estimators? , 2000 .

[12]  D. Almond,et al.  The Costs of Low Birth Weight , 2004 .

[13]  Sanjeev Arora,et al.  LEARNING MIXTURES OF SEPARATED NONSPHERICAL GAUSSIANS , 2005, math/0503457.

[14]  Joshua D. Angrist,et al.  Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .

[15]  Manabu Kuroki,et al.  On Identifying Total Effects in the Presence of Latent Variables and Selection bias , 2008, UAI.

[16]  S. Kolenikov,et al.  Socioeconomic Status Measurement with Discrete Proxy Variables: Is Principal Component Analysis a Reliable Answer? , 2009 .

[17]  J. Wooldridge On estimating firm-level production functions using proxy variables to control for unobservables , 2009 .

[18]  C. Matias,et al.  Identifiability of parameters in latent structure models with many observed variables , 2008, 0809.5032.

[19]  Judea Pearl,et al.  On Measurement Bias in Causal Inference , 2010, UAI.

[20]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[21]  Sander Greenland,et al.  Bias Analysis , 2011, International Encyclopedia of Statistical Science.

[22]  Alexander Kukush,et al.  Measurement Error Models , 2011, International Encyclopedia of Statistical Science.

[23]  L. Pritchett,et al.  Estimating Wealth Effects Without Expenditure Data—Or Tears: An Application To Educational Enrollments In States Of India* , 2001, Demography.

[24]  M. Montgomery,et al.  Measuring living standards with proxy variables , 2011, Demography.

[25]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[26]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[27]  David Sontag,et al.  Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests , 2013, NIPS.

[28]  J. Pearl Detecting Latent Heterogeneity , 2013, Probabilistic and Causal Inference.

[29]  J. Wooldridge Introduction to Econometrics , 2013 .

[30]  J. Pearl,et al.  Measurement bias and effect restoration in causal inference , 2014 .

[31]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[32]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[33]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[34]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[35]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[36]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[37]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Stephen R Cole,et al.  All your data are always missing: incorporating bias due to measurement error into the potential outcomes framework. , 2015, International journal of epidemiology.

[40]  Max Jaderberg,et al.  Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[41]  Alexander Peysakhovich,et al.  Combining observational and experimental data to find heterogeneous treatment effects , 2016, ArXiv.

[42]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.

[43]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[44]  Dustin Tran,et al.  Operator Variational Inference , 2016, NIPS.

[45]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[46]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[47]  Dustin Tran,et al.  Edward: A library for probabilistic modeling, inference, and criticism , 2016, ArXiv.

[48]  Dustin Tran,et al.  Variational Gaussian Process , 2015, ICLR.

[49]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[50]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[51]  Sanjeev Arora,et al.  Provable learning of noisy-OR networks , 2016, STOC.

[52]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[53]  Z. Geng,et al.  Identifying Causal Effects With Proxy Variables of an Unmeasured Confounder. , 2016, Biometrika.

[54]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.