论文信息 - Analysis of Dominant Classes in Universal Adversarial Perturbations

Analysis of Dominant Classes in Universal Adversarial Perturbations

The reasons why Deep Neural Networks are susceptible to being fooled by adversarial examples remains an open discussion. Indeed, many different strategies can be employed to efficiently generate adversarial attacks, some of them relying on different theoretical justifications. Among these strategies, universal (inputagnostic) perturbations are of particular interest, due to their capability to fool a network independently of the input in which the perturbation is applied. In this work, we investigate an intriguing phenomenon of universal perturbations, which has been reported previously in the literature, yet without a proven justification: universal perturbations change the predicted classes for most inputs into one particular (dominant) class, even if this behavior is not specified during the creation of the perturbation. In order to justify the cause of this phenomenon, we propose a number of hypotheses and experimentally test them using a speech command classification problem in the audio domain as a testbed. Our analyses reveal interesting properties of universal perturbations, suggest new methods to generate such attacks and provide an explanation of dominant classes, under both a geometric and a data-feature perspective.

Jose A. Lozano | Roberto Santana | Jon Vadillo

[1] Philip H. S. Torr,et al. With Friends Like These, Who Needs Adversaries? , 2018, NeurIPS.

[2] R. Venkatesh Babu,et al. NAG: Network for Adversary Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Wen Gao,et al. Universal Adversarial Perturbations Generative Network For Speaker Recognition , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[4] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[5] Pete Warden,et al. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.

[6] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] R. Venkatesh Babu,et al. Ask, Acquire, and Attack: Data-free UAP Generation using Class Impressions , 2018, ECCV.

[8] Bernt Schiele,et al. Disentangling Adversarial Robustness and Generalization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Lewis D. Griffin,et al. A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples , 2016, ArXiv.

[10] R. Venkatesh Babu,et al. Fast Feature Fool: A data independent approach to universal adversarial perturbations , 2017, BMVC.

[11] In So Kweon,et al. CD-UAP: Class Discriminative Universal Adversarial Perturbation , 2020, AAAI.

[12] R. Venkatesh Babu,et al. Generalizable Data-Free Objective for Crafting Universal Adversarial Perturbations , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] I. Elamvazuthi,et al. Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.

[14] Tara N. Sainath,et al. Convolutional neural networks for small-footprint keyword spotting , 2015, INTERSPEECH.

[15] Aleksander Madry,et al. Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[16] Sameer Singh,et al. Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[17] Farinaz Koushanfar,et al. Universal Adversarial Perturbations for Speech Recognition Systems , 2019, INTERSPEECH.

[18] Pascal Frossard,et al. Analysis of universal adversarial perturbations , 2017, ArXiv.

[19] S. Ridout,et al. “Did you hear?” , 2015, Medical Humanities.

[20] Universal adversarial attacks on deep neural networks for medical image classification. , 2020 .

[21] Seyed-Mohsen Moosavi-Dezfooli,et al. Universal Adversarial Attacks on Text Classifiers , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22] Jian Liu,et al. AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations , 2020, CCS.

[23] Chaoning Zhang,et al. Understanding Adversarial Examples From the Mutual Influence of Images and Perturbations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Chenchen Liu,et al. Towards Robust Training of Neural Networks by Regularizing Adversarial Gradients , 2018, ArXiv.

[25] Kenneth T. Co,et al. Universal Adversarial Perturbations to Understand Robustness of Texture vs. Shape-biased Training , 2019, ArXiv.

[26] Seyed-Mohsen Moosavi-Dezfooli,et al. Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[28] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .

[29] Roberto Santana,et al. Universal adversarial examples in speech command classification , 2019, ArXiv.

[30] George Danezis,et al. Learning Universal Adversarial Perturbations with Generative Models , 2017, 2018 IEEE Security and Privacy Workshops (SPW).

[31] Thomas Brox,et al. Universal Adversarial Perturbations Against Semantic Image Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32] Valentin Khrulkov,et al. Art of Singular Vectors and Universal Adversarial Perturbations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.