Excessive Invariance Causes Adversarial Vulnerability

Despite their impressive performance, deep neural networks exhibit striking failures on out-of-distribution inputs. One core idea of adversarial example research is to reveal neural network errors under such distribution shifts. We decompose these errors into two complementary sources: sensitivity and invariance. We show deep networks are not only too sensitive to task-irrelevant changes of their input, as is well-known from epsilon-adversarial examples, but are also too invariant to a wide range of task-relevant changes, thus making vast regions in input space vulnerable to adversarial attacks. We show such excessive invariance occurs across various tasks and architecture types. On MNIST and ImageNet one can manipulate the class-specific content of almost any image without changing the hidden activations. We identify an insufficiency of the standard cross-entropy loss as a reason for these failures. Further, we extend this objective based on an information-theoretic analysis so it encourages the model to consider all task-dependent features in its decision. This provides the first approach tailored explicitly to overcome excessive invariance and resulting vulnerabilities.

[1]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[2]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[3]  J. Urgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992 .

[4]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[5]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[6]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Eero P. Simoncelli,et al.  Metamers of the ventral stream , 2011, Nature Neuroscience.

[8]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[9]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[11]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[12]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[13]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[16]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17]  Stéphane Mallat,et al.  Understanding deep convolutional networks , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[20]  Matthias Bethge,et al.  Testing models of peripheral encoding using metamerism in an oddity paradigm. , 2016, Journal of vision.

[21]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[22]  David J. Fleet,et al.  Adversarial Manipulation of Deep Representations , 2015, ICLR.

[23]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[24]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[25]  Yoshua Bengio,et al.  Measuring the tendency of CNNs to Learn Surface Statistical Regularities , 2017, ArXiv.

[26]  Leon A. Gatys,et al.  Texture and art with deep neural networks , 2017, Current Opinion in Neurobiology.

[27]  Raquel Urtasun,et al.  The Reversible Residual Network: Backpropagation Without Storing Activations , 2017, NIPS.

[28]  AmirEmad Ghassami,et al.  Interaction information for causal inference: The case of directed triangle , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[29]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[30]  Alexander A. Alemi,et al.  An Information-Theoretic Analysis of Deep Latent-Variable Models , 2017, ArXiv.

[31]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[32]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[33]  Matthias Bethge,et al.  Robust Perception through Analysis by Synthesis , 2018, ArXiv.

[34]  Nicholas Carlini,et al.  Unrestricted Adversarial Examples , 2018, ArXiv.

[35]  Pascal Fernsel,et al.  Analysis of Invariance and Robustness via Invertibility of ReLU-Networks , 2018, ArXiv.

[36]  Arnold W. M. Smeulders,et al.  i-RevNet: Deep Invertible Networks , 2018, ICLR.

[37]  Ryan P. Adams,et al.  Motivating the Rules of the Game for Adversarial Example Research , 2018, ArXiv.

[38]  Will Grathwohl Scalable Reversible Generative Models with Free-form Continuous Dynamics , 2018 .

[39]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[40]  Stefano Soatto,et al.  Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[41]  Yang Song,et al.  Constructing Unrestricted Adversarial Examples with Generative Models , 2018, NeurIPS.

[42]  Hamza Fawzi,et al.  Adversarial vulnerability for any classifier , 2018, NeurIPS.

[43]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[44]  Yang Song,et al.  Generative Adversarial Examples , 2018, NIPS 2018.

[45]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[46]  Matthias Bethge,et al.  Adversarial Vision Challenge , 2018, The NeurIPS '18 Competition.

[47]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[48]  Yoshua Bengio,et al.  Mutual Information Neural Estimation , 2018, ICML.

[49]  Aleksander Madry,et al.  There Is No Free Lunch In Adversarial Robustness (But There Are Unexpected Benefits) , 2018, ArXiv.

[50]  Leon A. Gatys,et al.  Diverse feature visualizations reveal invariances in early layers of deep neural networks , 2018, ECCV.

[51]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[52]  Ullrich Köthe,et al.  Analyzing Inverse Problems with Invertible Neural Networks , 2018, ICLR.

[53]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[54]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[55]  Ilya P. Razenshteyn,et al.  Adversarial examples from computational constraints , 2018, ICML.

[56]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[57]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.