论文信息 - Theoretically Principled Trade-off between Robustness and Accuracy

Theoretically Principled Trade-off between Robustness and Accuracy

We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. Although this problem has been widely studied empirically, much remains unknown concerning the theory underlying this trade-off. In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. Inspired by our theoretical analysis, we also design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed algorithm performs well experimentally in real-world datasets. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of ~2,000 submissions, surpassing the runner-up approach by $11.41\%$ in terms of mean $\ell_2$ perturbation distance.

[1] F. Barthe. Extremal Properties of Central Half-Spaces for Product Measures , 2001 .

[2] Tong Zhang,et al. Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[3] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .

[4] Prasad Raghavendra,et al. Hardness of Learning Halfspaces with Noise , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[5] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[6] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[7] Uri Shaham,et al. Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization , 2015, ArXiv.

[8] Dale Schuurmans,et al. Learning with a Strong Adversary , 2015, ArXiv.

[9] Csaba Szepesvári,et al. Multiclass Classification Calibration Functions , 2016, ArXiv.

[10] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Yang Song,et al. Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Eduardo Valle,et al. Exploring the space of adversarial images , 2015, 2016 International Joint Conference on Neural Networks (IJCNN).

[15] Moustapha Cissé,et al. Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[16] Hao Chen,et al. MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.