论文信息 - Excessive Invariance Causes Adversarial Vulnerability

Excessive Invariance Causes Adversarial Vulnerability

Despite their impressive performance, deep neural networks exhibit striking failures on out-of-distribution inputs. One core idea of adversarial example research is to reveal neural network errors under such distribution shifts. We decompose these errors into two complementary sources: sensitivity and invariance. We show deep networks are not only too sensitive to task-irrelevant changes of their input, as is well-known from epsilon-adversarial examples, but are also too invariant to a wide range of task-relevant changes, thus making vast regions in input space vulnerable to adversarial attacks. We show such excessive invariance occurs across various tasks and architecture types. On MNIST and ImageNet one can manipulate the class-specific content of almost any image without changing the hidden activations. We identify an insufficiency of the standard cross-entropy loss as a reason for these failures. Further, we extend this objective based on an information-theoretic analysis so it encourages the model to consider all task-dependent features in its decision. This provides the first approach tailored explicitly to overcome excessive invariance and resulting vulnerabilities.

[1] Thomas M. Cover,et al. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[2] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[3] J. Urgen Schmidhuber,et al. Learning Factorial Codes by Predictability Minimization , 1992 .

[4] Aapo Hyvärinen,et al. Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[5] David Barber,et al. The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[6] A. Kraskov,et al. Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7] Eero P. Simoncelli,et al. Metamers of the ventral stream , 2011, Nature Neuroscience.

[8] Stéphane Mallat,et al. Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[9] Iasonas Kokkinos,et al. Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[11] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[12] Naftali Tishby,et al. Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[13] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[16] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17] Stéphane Mallat,et al. Understanding deep convolutional networks , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[20] Matthias Bethge,et al. Testing models of peripheral encoding using metamerism in an oddity paradigm. , 2016, Journal of vision.

[21] Max Welling,et al. The Variational Fair Autoencoder , 2015, ICLR.

[22] David J. Fleet,et al. Adversarial Manipulation of Deep Representations , 2015, ICLR.

[23] Samy Bengio,et al. Density estimation using Real NVP , 2016, ICLR.

[24] Naftali Tishby,et al. Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[25] Yoshua Bengio,et al. Measuring the tendency of CNNs to Learn Surface Statistical Regularities , 2017, ArXiv.

[26] Leon A. Gatys,et al. Texture and art with deep neural networks , 2017, Current Opinion in Neurobiology.

[27] Raquel Urtasun,et al. The Reversible Residual Network: Backpropagation Without Storing Activations , 2017, NIPS.

[28] AmirEmad Ghassami,et al. Interaction information for causal inference: The case of directed triangle , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[29] Geoffrey E. Hinton,et al. Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.