ML-LOO: Detecting Adversarial Examples with Feature Attribution
暂无分享,去创建一个
Michael I. Jordan | Cho-Jui Hsieh | Jianbo Chen | Jane-Ling Wang | Puyudi Yang | Cho-Jui Hsieh | Jane-ling Wang | Jianbo Chen | Puyudi Yang
[1] Motoaki Kawanabe,et al. How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..
[2] Zhitao Gong,et al. Adversarial and Clean Data Are Not Twins , 2017, aiDM@SIGMOD.
[3] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Ananthram Swami,et al. The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).
[5] Cho-Jui Hsieh,et al. Towards Robust Neural Networks via Random Self-ensemble , 2017, ECCV.
[6] Cho-Jui Hsieh,et al. Rob-GAN: Generator, Discriminator, and Adversarial Attacker , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Alexander Binder,et al. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.
[8] Yan Wang,et al. Detecting Adversarial Perturbations with Saliency , 2018 .
[9] Chia-Mu Yu,et al. On the Limitation of Local Intrinsic Dimensionality for Characterizing the Subspaces of Adversarial Examples , 2018, ICLR.
[10] Yanjun Qi,et al. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.
[11] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.
[12] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[13] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.
[14] Avanti Shrikumar,et al. Learning Important Features Through Propagating Activation Differences , 2017, ICML.
[15] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.
[16] Jan Hendrik Metzen,et al. On Detecting Adversarial Perturbations , 2017, ICLR.
[17] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[18] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.
[20] Lewis D. Griffin,et al. A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples , 2016, ArXiv.
[21] Aleksander Madry,et al. There Is No Free Lunch In Adversarial Robustness (But There Are Unexpected Benefits) , 2018, ArXiv.
[22] Huichen Lihuichen. DECISION-BASED ADVERSARIAL ATTACKS: RELIABLE ATTACKS AGAINST BLACK-BOX MACHINE LEARNING MODELS , 2017 .
[23] Dawn Xiaodong Song,et al. Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.
[24] Aleksander Madry,et al. Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.
[25] Seyed-Mohsen Moosavi-Dezfooli,et al. Robustness of classifiers: from adversarial to random noise , 2016, NIPS.
[26] Shin Ishii,et al. Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.
[27] Le Song,et al. L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data , 2018, ICLR.
[28] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.
[29] Alois Knoll,et al. Guessing Smart: Biased Sampling for Efficient Black-Box Adversarial Attacks , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[30] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.
[31] J. Zico Kolter,et al. Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.
[32] Aleksander Madry,et al. Robustness May Be at Odds with Accuracy , 2018, ICLR.
[33] Abubakar Abid,et al. Interpretation of Neural Networks is Fragile , 2017, AAAI.
[34] Cho-Jui Hsieh,et al. Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network , 2018, ICLR.
[35] Patrick D. McDaniel,et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.
[36] Xinlei Chen,et al. Visualizing and Understanding Neural Models in NLP , 2015, NAACL.
[37] Xin Li,et al. Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[38] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.
[39] Chih-Kuan Yeh,et al. On the (In)fidelity and Sensitivity for Explanations. , 2019, 1901.09392.
[40] Suman Jana,et al. Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).
[41] Daniel Jurafsky,et al. Understanding Neural Networks through Representation Erasure , 2016, ArXiv.
[42] Pascal Frossard,et al. Analysis of classifiers’ robustness to adversarial perturbations , 2015, Machine Learning.
[43] Yang Song,et al. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.
[44] Erik Strumbelj,et al. An Efficient Explanation of Individual Classifications using Game Theory , 2010, J. Mach. Learn. Res..
[45] Aleksander Madry,et al. Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors , 2018, ICLR.
[46] Logan Engstrom,et al. Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.
[47] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).
[48] Kevin Gimpel,et al. Early Methods for Detecting Adversarial Images , 2016, ICLR.
[49] Ryan R. Curtin,et al. Detecting Adversarial Samples from Artifacts , 2017, ArXiv.
[50] P. Chalasani,et al. Adversarial Learning and Explainability in Structured Datasets. , 2018, 1810.06583.
[51] Pushmeet Kohli,et al. Training verified learners with learned verifiers , 2018, ArXiv.
[52] Patrick D. McDaniel,et al. On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.
[53] Dan Boneh,et al. Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.
[54] Ananthram Swami,et al. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).
[55] Jinfeng Yi,et al. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.
[56] Yair Zick,et al. Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).
[57] Daniel Cullina,et al. Enhancing robustness of machine learning systems via data transformations , 2017, 2018 52nd Annual Conference on Information Sciences and Systems (CISS).
[58] James Bailey,et al. Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.
[59] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.
[60] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.
[61] Kibok Lee,et al. A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.
[62] Xiangyu Zhang,et al. Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples , 2018, NeurIPS.
[63] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.
[64] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.