论文信息 - Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models - 字舞流文

Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models

Models which estimate main effects of individual variables alongside interaction effects have an identifiability challenge: effects can be freely moved between main effects and interaction effects without changing the model prediction. This is a critical problem for interpretability because it permits "contradictory" models to represent the same function. To solve this problem, we propose pure interaction effects: variance in the outcome which cannot be represented by any smaller subset of features. This definition has an equivalence with the Functional ANOVA decomposition. To compute this decomposition, we present a fast, exact algorithm that transforms any piecewise-constant function (such as a tree-based model) into a purified, canonical representation. We apply this algorithm to Generalized Additive Models with interactions trained on several datasets and show large disparity, including contradictions, between the effects before and after purification. These results underscore the need to specify data distributions and ensure identifiability before interpreting model parameters.

Rich Caruana | Chun-Hao Chang | Sarah Tan | Giles Hooker | Benjamin Lengerich

[1] John T. Ormerod,et al. Penalized Wavelets: Embedding Wavelets into Semiparametric Regression , 2011 .

[2] P. L. Davies. Interactions in the Analysis of Variance , 2012 .

[3] Samuel B. Green,et al. The Overparameterized Analysis of Variance Model , 1999 .

[4] M. Wand,et al. Simple Incorporation of Interactions into Additive Models , 2001, Biometrics.

[5] R. Pace,et al. Sparse spatial autoregressions , 1997 .

[6] Mark Gerstein,et al. Interpretable Sparse High-Order Boltzmann Machines , 2014, AISTATS.

[7] C. Prieur,et al. Generalized Hoeffding-Sobol Decomposition for Dependent Variables -Application to Sensitivity Analysis , 2011, 1112.1788.

[8] G. Hooker. Generalized Functional ANOVA Diagnostics for High-Dimensional Functions of Dependent Variables , 2007 .

[9] V. Barnett,et al. Applied Linear Statistical Models , 1975 .

[10] Houtao Deng,et al. Interpreting tree ensembles with inTrees , 2018, International Journal of Data Science and Analytics.

[11] Rich Caruana,et al. Detecting statistical interactions with additive groves of trees , 2008, ICML '08.

[12] Rich Caruana,et al. InterpretML: A Unified Framework for Machine Learning Interpretability , 2019, ArXiv.

[13] Andreas Ziegler,et al. Do little interactions get lost in dark random forests? , 2016, BMC Bioinformatics.

[14] I. Sobola,et al. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates , 2001 .

[15] Paul H. C. Eilers,et al. Flexible smoothing with B-splines and penalties , 1996 .

[16] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[17] W. Terbeck,et al. Interactions and outliers in the two-way analysis of variance , 1998 .

[18] Eric P. Xing,et al. Learning Sample-Specific Models with Low-Rank Personalized Regression , 2019, NeurIPS.

[19] Dag Tjøstheim,et al. NONPARAMETRIC ESTIMATION AND TESTING OF INTERACTION IN ADDITIVE MODELS , 2002, Econometric Theory.

[20] C. J. Stone,et al. The Use of Polynomial Splines and Their Tensor Products in Multivariate Function Estimation , 1994 .

[21] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[22] Tommi S. Jaakkola,et al. Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[23] Mike Wu,et al. Optimizing for Interpretability in Deep Neural Networks with Tree Regularization , 2019, J. Artif. Intell. Res..

[24] Johannes Gehrke,et al. Accurate intelligible models with pairwise interactions , 2013, KDD.

[25] Michael H. Kutner. Applied Linear Statistical Models , 1974 .

[26] R. Tibshirani,et al. Generalized Additive Models , 1986 .

[27] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[28] Rich Caruana,et al. Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation , 2017, AIES.

[29] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[30] Satoshi Hara,et al. Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach , 2016, AISTATS.

[31] Johannes Gehrke,et al. Intelligible models for classification and regression , 2012, KDD.

[32] S. Utev. Central limit theorem for dependent random variables , 1990 .

[33] Jerome H. Friedman,et al. Diagnostics and extrapolation in machine learning , 2004 .

[34] Antonio R. Linero,et al. Interaction Detection with Bayesian Decision Tree Ensembles , 2018, AISTATS.

[35] Chandan Singh,et al. Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees , 2019, ArXiv.

[36] Rachel Ostroff,et al. Factorized sparse learning models with interpretable high order feature interactions , 2014, KDD.

[37] L. A. Marascuilo,et al. Appropriate Post Hoc Comparisons for Interaction and Nested Hypotheses in Analysis of Variance Designs: The Elimination of Type IV Errors1 , 1970 .

[38] Kevin Leyton-Brown,et al. An Efficient Approach for Assessing Hyperparameter Importance , 2014, ICML.

[39] Jianhua Z. Huang. Projection estimation in multiple regression with application to functional ANOVA models , 1998 .

[40] Johannes Gehrke,et al. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[41] Yan Liu,et al. Detecting Statistical Interactions from Neural Network Weights , 2017, ICLR.

[42] John Mingers,et al. An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[43] Mickael Bech,et al. Effects coding in discrete choice experiments. , 2005, Health economics.

[44] Robert Rosenthal,et al. Definition and Interpretation of Interaction Effects , 2001 .

[45] COMPAS Risk Scales : Demonstrating Accuracy Equity and Predictive Parity Performance of the COMPAS Risk Scales in Broward County , 2016 .

[46] Peter Szolovits,et al. MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[47] R. Tibshirani,et al. A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[48] Jacob Bien,et al. Reluctant Interaction Modeling , 2019, 1907.08414.

[49] Ricardo Fraiman,et al. An anova test for functional data , 2004, Comput. Stat. Data Anal..