论文信息 - Evaluation of Local Model-Agnostic Explanations Using Ground Truth

Evaluation of Local Model-Agnostic Explanations Using Ground Truth

Explanation techniques are commonly evaluated using human-grounded methods, limiting the possibilities for large-scale evaluations and rapid progress in the development of new techniques. We propose a functionally-grounded evaluation procedure for local model-agnostic explanation techniques. In our approach, we generate ground truth for explanations when the black-box model is Logistic Regression and Gaussian Naive Bayes and compare how similar each explanation is to the extracted ground truth. In our empirical study, explanations of Local Interpretable Model-agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), and Local Permutation Importance (LPI) are compared in terms of how similar they are to the extracted ground truth. In the case of Logistic Regression, we find that the performance of the explanation techniques is highly dependent on the normalization of the data. In contrast, Local Permutation Importance outperforms the other techniques on Naive Bayes, irrespective of normalization. We hope that this work lays the foundation for further research into functionally-grounded evaluation methods for explanation techniques.

[1] Dumitru Erhan,et al. A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[2] Wojciech Samek,et al. Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[3] Himabindu Lakkaraju,et al. Robust and Stable Black Box Explanations , 2020, ICML.

[4] Been Kim,et al. Sanity Checks for Saliency Maps , 2018, NeurIPS.

[5] Thomas Lukasiewicz,et al. Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods , 2019, ArXiv.

[6] Martin Wattenberg,et al. SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[7] Been Kim,et al. Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[8] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[9] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[10] Le Song,et al. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation , 2018, ICML.

[11] Bernd Bischl,et al. Visualizing the Feature Importance for Black Box Models , 2018, ECML/PKDD.

[12] Henrik Boström,et al. A study of data and label shift in the LIME framework , 2019, ArXiv.

[13] Tommi S. Jaakkola,et al. On the Robustness of Interpretability Methods , 2018, ArXiv.