论文信息 - Elephant in the room

Elephant in the room

We showcase a family of common failures of state-of-the art object detectors. These are obtained by replacing image sub-regions by another sub-image that contains a trained object. We call this "object transplanting". Modifying an image in this manner is shown to have a non-local impact on object detection. Slight changes in object position can affect its identity according to an object detector as well as that of other objects in the image. We provide some analysis and suggest possible reasons for the reported phenomena.

John K. Tsotsos | Amir Rosenfeld | Richard S. Zemel | R. Zemel | Amir Rosenfeld

[1] Frank Rosenblatt,et al. PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[2] I. Biederman. Perceiving Real-World Scenes , 1972, Science.

[3] A. Treisman,et al. Illusory conjunctions in the perception of objects , 1982, Cognitive Psychology.

[4] Edward H. Adelson,et al. Shiftable multiscale transforms , 1992, IEEE Trans. Inf. Theory.

[5] John K. Tsotsos,et al. Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[6] Christoph von der Malsburg,et al. The What and Why of Binding The Modeler’s Perspective , 1999, Neuron.

[7] John K. Tsotsos,et al. The different stages of visual recognition need different attentional binding strategies , 2008, Brain Research.

[8] Antonio Torralba,et al. Context models and out-of-context objects , 2012, Pattern Recognition Letters.

[9] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[10] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[11] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[13] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[14] John K. Tsotsos. Complexity Level Analysis Revisited: What Can 30 Years of Hindsight Tell Us about How the Brain Might Represent Visual Information? , 2017, Front. Psychol..

[15] Sergio Guadarrama,et al. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Abhinav Gupta,et al. A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Alan L. Yuille,et al. DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection Under Partial Occlusion , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.

[20] Yair Weiss,et al. Why do deep convolutional networks generalize so poorly to small image transformations? , 2018, J. Mach. Learn. Res..