Patternless Adversarial Attacks on Video Recognition Networks

Deep neural networks for classification of videos, just like image classification networks, may be subjected to adversarial manipulation. The main difference between image classifiers and video classifiers is that the latter usually use temporal information contained within the video in the form of optical flow or implicitly by various differences between adjacent frames. In this work we present a manipulation scheme for fooling video classifiers by introducing a spatial patternless temporal perturbation that is practically unnoticed by human observers and undetectable by leading image adversarial pattern detection algorithms. After demonstrating the manipulation of action classification of single videos, we generalize the procedure to make adversarial patterns with temporal invariance that generalizes across different classes for both targeted and untargeted attacks.

[1]  Yi Zhu,et al.  Hidden Two-Stream Convolutional Networks for Action Recognition , 2017, ACCV.

[2]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jitendra Malik,et al.  SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Mubarak Shah,et al.  Real-World Anomaly Detection in Surveillance Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[8]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[9]  Amit K. Roy-Chowdhury,et al.  Adversarial Perturbations Against Real-Time Video Classification Systems , 2018, NDSS.

[10]  Lei Wang,et al.  Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks , 2018, IEEE Access.

[11]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[12]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[13]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[14]  HeKaiming,et al.  Faster R-CNN , 2017 .

[15]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[16]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Tat-Seng Chua,et al.  Heuristic Black-Box Adversarial Attacks on Video Recognition Models , 2019, AAAI 2020.

[19]  Lorenzo Torresani,et al.  C3D: Generic Features for Video Analysis , 2014, ArXiv.

[20]  Herschel Rabitz,et al.  Targeted Nonlinear Adversarial Perturbations in Images and Videos , 2018, ArXiv.

[21]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[22]  Cordelia Schmid,et al.  Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Yiran Chen,et al.  Adversarial Attacks for Optical Flow-Based Action Recognition Classifiers , 2018, ArXiv.

[26]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[27]  James Bailey,et al.  Black-box Adversarial Attacks on Video Recognition Models , 2019, ACM Multimedia.

[28]  Hang Su,et al.  Sparse Adversarial Perturbations for Videos , 2018, AAAI.

[29]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[30]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[32]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).