Entropic Causal Inference

We consider the problem of identifying the causal direction between two discrete random variables using observational data. Unlike previous work, we keep the most general functional model but make an assumption on the unobserved exogenous variable: Inspired by Occam's razor, we assume that the exogenous variable is simple in the true causal direction. We quantify simplicity using R\'enyi entropy. Our main result is that, under natural assumptions, if the exogenous variable has low $H_0$ entropy (cardinality) in the true direction, it must have high $H_0$ entropy in the wrong direction. We establish several algorithmic hardness results about estimating the minimum entropy exogenous variable. We show that the problem of finding the exogenous variable with minimum entropy is equivalent to the problem of finding minimum joint entropy given $n$ marginal distributions, also known as minimum entropy coupling problem. We propose an efficient greedy algorithm for the minimum entropy coupling problem, that for $n=2$ provably finds a local optimum. This gives a greedy algorithm for finding the exogenous variable with minimum $H_1$ (Shannon Entropy). Our greedy entropy-based causal inference algorithm has similar performance to the state of the art additive noise models in real datasets. One advantage of our approach is that we make no use of the values of random variables but only their distributions. Our method can therefore be used for causal inference for both ordinal and also categorical data, unlike additive noise models.

[1]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[2]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[3]  L. Ronkin Liouville's theorems for functions holomorphic on the zero set of a polynomial , 1979 .

[4]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[5]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[6]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2003, J. Mach. Learn. Res..

[7]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[8]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[9]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[10]  Bernhard Schölkopf,et al.  Probabilistic latent variable models for distinguishing between cause and effect , 2010, NIPS.

[11]  Peter Bühlmann,et al.  Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs , 2011, J. Mach. Learn. Res..

[12]  Shmuel Onn,et al.  Generating uniform random vectors over a simplex with implications to the volume of a certain polytope and to multivariate extremes , 2011, Ann. Oper. Res..

[13]  Bernhard Schölkopf,et al.  Causal Inference on Discrete Data Using Additive Noise Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Bernhard Schölkopf,et al.  Information-geometric approach to inferring causal directions , 2012, Artif. Intell..

[15]  Mladen Kovacevic,et al.  On the hardness of entropy minimization and related problems , 2012, 2012 IEEE Information Theory Workshop.

[16]  Frederick Eberhardt,et al.  Experiment selection for causal discovery , 2013, J. Mach. Learn. Res..

[17]  Bernhard Schölkopf,et al.  Causal Discovery via Reproducing Kernel Hilbert Space Embeddings , 2014, Neural Computation.

[18]  David Lopez-Paz,et al.  The Randomized Causation Coefficient , 2014, J. Mach. Learn. Res..

[19]  Bernhard Schölkopf,et al.  Towards a Learning Theory of Cause-Effect Inference , 2015, ICML.

[20]  Bernhard Schölkopf,et al.  Telling cause from effect in deterministic linear dynamical systems , 2015, ICML.

[21]  Bernhard Schölkopf,et al.  Removing systematic errors for exoplanet search via latent causes , 2015, ICML.

[22]  Alexandros G. Dimakis,et al.  Learning Causal Graphs with Small Interventions , 2015, NIPS.

[23]  Todd P. Coleman,et al.  Directed Information Graphs , 2012, IEEE Transactions on Information Theory.

[24]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[25]  Sreeram Kannan,et al.  Causal Strength via Shannon Capacity: Axioms, Estimators and Applications , 2016, ArXiv.

[26]  Sreeram Kannan,et al.  Conditional Dependence via Shannon Capacity: Axioms, Estimators and Applications , 2016, ICML.

[27]  Ioannis Kontoyiannis,et al.  Estimating the Directed Information and Testing for Causality , 2015, IEEE Transactions on Information Theory.