Shortcut Learning in Deep Neural Networks

Deep learning has triggered the current rise of artificial intelligence and is the workhorse of today's machine intelligence. Numerous success stories have rapidly spread all over science, industry and society, but its limitations have only recently come into focus. In this perspective we seek to distil how many of deep learning's problem can be seen as different symptoms of the same underlying problem: shortcut learning. Shortcuts are decision rules that perform well on standard benchmarks but fail to transfer to more challenging testing conditions, such as real-world scenarios. Related issues are known in Comparative Psychology, Education and Linguistics, suggesting that shortcut learning may be a common characteristic of learning systems, biological and artificial alike. Based on these observations, we develop a set of recommendations for model interpretation and benchmarking, highlighting recent advances in machine learning to improve robustness and transferability from the lab to real-world applications.

[1]  C. L. Morgan An introduction to comparative psychology , 1900 .

[2]  C. L. Morgan An introduction to comparative psychology, New ed., rev. , 1903 .

[3]  R. Rosenthal,et al.  Clever Hans : the horse of Mr. Von Osten , 1911 .

[4]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[5]  L. Kamin Predictability, surprise, attention, and conditioning , 1967 .

[6]  F. Marton,et al.  ON QUALITATIVE DIFFERENCES IN LEARNING—II OUTCOME AS A FUNCTION OF THE LEARNER'S CONCEPTION OF THE TASK , 1976 .

[7]  J. Biggs Individual differences in study processes and the Quality of Learning Outcomes , 1979 .

[8]  A. Dickinson Contemporary Animal Learning Theory , 1981 .

[9]  David Marr,et al.  Vision: A computational investigation into the human representation , 1983 .

[10]  I. Biederman,et al.  Scene perception: Detecting and judging objects undergoing relational violations , 1982, Cognitive Psychology.

[11]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[12]  J. Ohala Papers in Laboratory Phonology: The phonetics and phonology of aspects of assimilation , 1990 .

[13]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[14]  G. Marcus Rethinking Eliminative Connectionism , 1998, Cognitive Psychology.

[15]  T. Malim,et al.  Introduction Comparative psychology , 1998 .

[16]  K. Scouller The influence of assessment method on students' learning approaches: Multiple choice question examination versus assignment essay , 1998 .

[17]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[18]  Christine Chin,et al.  Learning in Science: A Comparison of Deep and Surface Approaches. , 2000 .

[19]  Alessandra Vicentini,et al.  The Economy Principle in language: Notes and Observations from Early Modern English Grammars , 2003 .

[20]  M. Bouton Learning and Behavior: A Contemporary Synthesis , 2006 .

[21]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[22]  Wolfgang Sanz,et al.  Laser-optical investigation of stator-rotor interaction in a transonic turbine , 2006, J. Vis..

[23]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[24]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  M. Andrés Learning and behavior: A contemporary synthesis , 2008 .

[26]  Steffen Bickel,et al.  Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..

[27]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[28]  Jan Drewes,et al.  Animal detection in natural scenes: critical features revisited. , 2010, Journal of vision.

[29]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[30]  Hector J. Levesque,et al.  The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[31]  M. Castelhano,et al.  Scene context influences without scene gist: Eye movements guided by spatial associations in visual search , 2011, Psychonomic bulletin & review.

[32]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[33]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[34]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[35]  Dr. Tom Murphy The First Level of Super Mario Bros . is Easy with Lexicographic Orderings and Time Travel , 2013 .

[36]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[37]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[39]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[41]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[42]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[43]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[44]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[45]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[46]  Dhruv Batra,et al.  Analyzing the Behavior of Visual Question Answering Models , 2016, EMNLP.

[47]  Arnold W. M. Smeulders,et al.  Generating captions without looking beyond objects , 2016, ArXiv.

[48]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[49]  Alan L. Yuille,et al.  UnrealCV: Connecting Computer Vision to Unreal Engine , 2016, ECCV Workshops.

[50]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[51]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[52]  Kevin Waugh,et al.  DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.

[53]  Zhanxing Zhu,et al.  Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes , 2017, ArXiv.

[54]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[55]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[56]  Alan L. Yuille,et al.  Object Recognition with and without Objects , 2016, IJCAI.

[57]  Andrew Y. Ng,et al.  CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning , 2017, ArXiv.

[58]  Yoshua Bengio,et al.  Measuring the tendency of CNNs to Learn Surface Statistical Regularities , 2017, ArXiv.

[59]  Irving Biederman,et al.  On the Semantics of a Glance at a Scene , 2017 .

[60]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[61]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[62]  Jieyu Zhao,et al.  Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints , 2017, EMNLP.

[63]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[64]  Yash Goyal,et al.  Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Jun Zhu,et al.  Visual Concepts and Compositional Voting , 2017, ArXiv.

[66]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[67]  Yongxin Yang,et al.  Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[68]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[69]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[70]  John K. Tsotsos,et al.  Elephant in the room , 2018 .

[71]  Percy Liang,et al.  Fairness Without Demographics in Repeated Loss Minimization , 2018, ICML.

[72]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[73]  Hongjing Lu,et al.  Deep convolutional networks do not classify based on global object shape , 2018, PLoS Comput. Biol..

[74]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[75]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[76]  Bernhard Schölkopf,et al.  Generalization in anti-causal learning , 2018, ArXiv.

[77]  Yoav Goldberg,et al.  Breaking NLI Systems with Sentences that Require Simple Lexical Inferences , 2018, ACL.

[78]  Matthias Bethge,et al.  Generalisation in humans and deep neural networks , 2018, NeurIPS.

[79]  Gary Marcus,et al.  Deep Learning: A Critical Appraisal , 2018, ArXiv.

[80]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[81]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[82]  Andrew Zisserman,et al.  From Same Photo: Cheating on Visual Kinship Challenges , 2018, ACCV.

[83]  Gaurav Malhotra,et al.  What a difference a pixel makes: An empirical examination of features used by CNNs for categorisation , 2018 .

[84]  Rachel Rudinger,et al.  Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[85]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[86]  Risto Miikkulainen,et al.  The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities , 2018, Artificial Life.

[87]  Zachary C. Lipton,et al.  How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks , 2018, EMNLP.

[88]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[89]  Benjamin Recht,et al.  Do CIFAR-10 Classifiers Generalize to CIFAR-10? , 2018, ArXiv.

[90]  Andrea Vedaldi,et al.  Deep Image Prior , 2017, International Journal of Computer Vision.

[91]  Marcus A. Badgeley,et al.  Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study , 2018, PLoS medicine.

[92]  Marcus A. Badgeley,et al.  Confounding variables can degrade generalization performance of radiological deep learning models , 2018, ArXiv.

[93]  Yoav Goldberg,et al.  Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets , 2019, EMNLP.

[94]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[95]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[96]  Zhitao Gong,et al.  Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[97]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[98]  Franccois Chollet,et al.  On the Measure of Intelligence , 2019, ArXiv.

[99]  Deva Ramanan,et al.  Are we Asking the Right Questions in MovieQA? , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[100]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[101]  Colin Wei,et al.  Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.

[102]  Aleksander Madry,et al.  Learning Perceptually-Aligned Representations via Adversarial Robustness , 2019, ArXiv.

[103]  Alexander Rich,et al.  Lessons for artificial intelligence from the study of natural stupidity , 2019, Nat. Mach. Intell..

[104]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[105]  Toniann Pitassi,et al.  Flexibly Fair Representation Learning by Disentanglement , 2019, ICML.

[106]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[107]  Fred Zhang,et al.  SGD on Neural Networks Learns Functions of Increasing Complexity , 2019, NeurIPS.

[108]  Alexander Binder,et al.  Unmasking Clever Hans predictors and assessing what machines really learn , 2019, Nature Communications.

[109]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[110]  Aleksander Madry,et al.  Adversarial Robustness as a Prior for Learned Representations , 2019 .

[111]  Boris Katz,et al.  ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.

[112]  Matthias Bethge,et al.  Excessive Invariance Causes Adversarial Vulnerability , 2018, ICLR.

[113]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[114]  Matthias Bethge,et al.  Engineering a Less Artificial Intelligence , 2019, Neuron.

[115]  Chico Q. Camargo,et al.  Deep learning generalizes because the parameter-function map is biased towards simple functions , 2018, ICLR.

[116]  Matthias Bethge,et al.  Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.

[117]  Ali Farhadi,et al.  HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[118]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[119]  Samuel F. Dodge,et al.  Human and DNN Classification Performance on Images With Quality Distortions , 2019, ACM Trans. Appl. Percept..

[120]  Yoshua Bengio,et al.  Tackling Climate Change with Machine Learning , 2019, ACM Comput. Surv..

[121]  Benjamin Beyret,et al.  The Animal-AI Olympics , 2019, Nature Machine Intelligence.

[122]  Matthias Hein,et al.  Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[123]  Kentaro Inui,et al.  When Choosing Plausible Alternatives, Clever Hans can be Clever , 2019, EMNLP.

[124]  Yair Weiss,et al.  Why do deep convolutional networks generalize so poorly to small image transformations? , 2018, J. Mach. Learn. Res..

[125]  Joachim Denzler,et al.  Deep learning and process understanding for data-driven Earth system science , 2019, Nature.

[126]  Cameron Buckner The Comparative Psychology of Artificial Intelligences , 2019 .

[127]  Joan Bruna,et al.  Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias , 2019, NeurIPS.

[128]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[129]  Adam Trischler,et al.  How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG , 2018, EMNLP.

[130]  Seth Lloyd,et al.  Deep neural networks are biased towards simple functions , 2018, ArXiv.

[131]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[132]  Matthias Bethge,et al.  The Notorious Difficulty of Comparing Human and Machine Perception , 2020, 2019 Conference on Cognitive Computational Neuroscience.

[133]  Radoslaw Martin Cichy,et al.  Deep Neural Networks as Scientific Models , 2019, Trends in Cognitive Sciences.

[134]  Ke Sun,et al.  Lightlike Neuromanifolds, Occam's Razor and Deep Learning , 2019, ArXiv.

[135]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[136]  Bernhard Schölkopf,et al.  Causality for Machine Learning , 2019, ArXiv.

[137]  Julian Togelius,et al.  Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning , 2019, IJCAI.

[138]  Alexander S. Ecker,et al.  Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming , 2019, ArXiv.

[139]  Hung-Yu Kao,et al.  Probing Neural Network Comprehension of Natural Language Arguments , 2019, ACL.

[140]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[141]  Andrew Kyle Lampinen,et al.  What shapes feature representations? Exploring datasets, architectures, and training , 2020, NeurIPS.

[142]  Prateek Jain,et al.  The Pitfalls of Simplicity Bias in Neural Networks , 2020, NeurIPS.

[143]  Felix A. Wichmann,et al.  Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency , 2020, NeurIPS.

[144]  Understanding the Limitations of Conditional Generative Models , 2019, ICLR.

[145]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[146]  M. Tschannen,et al.  Automatic Shortcut Removal for Self-Supervised Representation Learning , 2020, ICML.

[147]  Philip M. Long,et al.  Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.

[148]  Dawn Song,et al.  Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[149]  Christina Heinze-Deml,et al.  Conditional variance penalties and domain shift robustness , 2017, Machine Learning.