Categorical Perception: A Groundwork for Deep Learning

Abstract Classification is one of the major tasks that deep learning is successfully tackling. Categorization is also a fundamental cognitive ability. A well-known perceptual consequence of categorization in humans and other animals, categorical perception, is notably characterized by a within-category compression and a between-category separation: two items, close in input space, are perceived closer if they belong to the same category than if they belong to different categories. Elaborating on experimental and theoretical results in cognitive science, here we study categorical effects in artificial neural networks. We combine a theoretical analysis that makes use of mutual and Fisher information quantities and a series of numerical simulations on networks of increasing complexity. These formal and numerical analyses provide insights into the geometry of the neural representation in deep layers, with expansion of space near category boundaries and contraction far from category boundaries. We investigate categorical representation by using two complementary approaches: one mimics experiments in psychophysics and cognitive neuroscience by means of morphed continua between stimuli of different categories, while the other introduces a categoricality index that, for each layer in the network, quantifies the separability of the categories at the neural population level. We show on both shallow and deep neural networks that category learning automatically induces categorical perception. We further show that the deeper a layer, the stronger the categorical effects. As an outcome of our study, we propose a coherent view of the efficacy of different heuristic practices of the dropout regularization technique. More generally, our view, which finds echoes in the neuroscience literature, insists on the differential impact of noise in any given layer depending on the geometry of the neural representation that is being learned, that is, on how this geometry reflects the structure of the categories.

[1]  Y. Zhang,et al.  Sensory-to-Category Transformation via Dynamic Reorganization of Ensemble Structures in Mouse Auditory Cortex , 2019, Neuron.

[2]  Petri Koistinen,et al.  Using additive noise in back-propagation training , 1992, IEEE Trans. Neural Networks.

[3]  K. Jarrod Millman,et al.  Array programming with NumPy , 2020, Nat..

[4]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[5]  Nikolaus Kriegeskorte,et al.  Individual differences among deep neural network models , 2020, Nature Communications.

[6]  C. C. Wood Discriminability, response bias, and phoneme categories in discrimination of voice onset time. , 1976, The Journal of the Acoustical Society of America.

[7]  Yoshua Bengio,et al.  Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[8]  C. Koch,et al.  Category-specific visual responses of single neurons in the human medial temporal lobe , 2000, Nature Neuroscience.

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  H. Lane,et al.  THE MOTOR THEORY OF SPEECH PERCEPTION: A CRITICAL REVIEW. , 1965, Psychological review.

[11]  P. Kuhl,et al.  Perceptual magnet and phoneme boundary effects in speech perception: Do they arise from a common mechanism? , 2000, Perception & psychophysics.

[12]  Christopher M. Bishop,et al.  Training with Noise is Equivalent to Tikhonov Regularization , 1995, Neural Computation.

[13]  Richard E. Blahut,et al.  Principles and practice of information theory , 1987 .

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Eleanor M. Caves,et al.  Categorical perception of colour signals in a songbird , 2018, Nature.

[16]  Christian Van den Broeck,et al.  Statistical Mechanics of Learning , 2001 .

[17]  Claude Alain,et al.  Tracing the emergence of categorical speech perception in the human auditory system , 2013, NeuroImage.

[18]  E. Chang,et al.  Categorical Speech Representation in Human Superior Temporal Gyrus , 2010, Nature Neuroscience.

[19]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[20]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[21]  S. Moreau,et al.  What Is Discrimination , 2010 .

[22]  David J. Freedman,et al.  A Comparison of Primate Prefrontal and Inferior Temporal Cortices during Visual Categorization , 2003, The Journal of Neuroscience.

[23]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[24]  H Sompolinsky,et al.  Simple models for reading neuronal population codes. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[25]  W. D. Ward,et al.  Categorical perception--phenomenon or epiphenomenon: evidence from experiments in the perception of melodic musical intervals. , 1978, The Journal of the Acoustical Society of America.

[26]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[27]  James L. McClelland,et al.  The parallel distributed processing approach to semantic cognition , 2003, Nature Reviews Neuroscience.

[28]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[29]  Christian K. Machens,et al.  The geometry of the representation of decision variable and stimulus difficulty in the parietal cortex , 2021, bioRxiv.

[30]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[31]  Guozhong An,et al.  The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.

[32]  B. Repp Categorical Perception: Issues, Methods, Findings , 1984 .

[33]  H. Lane,et al.  IDENTIFICATION AND DISCRIMINATION FUNCTIONS FOR A VISUAL CONTINUUM AND THEIR RELATION TO THE MOTOR THEORY OF SPEECH PERCEPTION. , 1965, Journal of experimental psychology.

[34]  David J. Freedman,et al.  Categorical representation of visual stimuli in the primate prefrontal cortex. , 2001, Science.

[35]  Vaibhava Goel,et al.  Annealed dropout training of deep networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[36]  Keiji Tanaka,et al.  Matching Categorical Object Representations in Inferior Temporal Cortex of Man and Monkey , 2008, Neuron.

[37]  M. Bornstein,et al.  Discrimination and matching within and between hues measured by reaction times: some implications for categorical perception and levels of information processing , 1984, Psychological research.

[38]  Yoshua Bengio,et al.  Better Mixing via Deep Representations , 2012, ICML.

[39]  Sompolinsky,et al.  Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[40]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[41]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[42]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[43]  S. Grossberg,et al.  Neural network models of categorical perception , 2000, Perception & psychophysics.

[44]  H. Goto,et al.  Auditory perception by normal Japanese adults of the sounds "L" and "R". , 1971, Neuropsychologia.

[45]  A. Liberman,et al.  An Effect of Learning on Speech Perception: The Discrimination of Durations of Silence with and without Phonemic Significance , 1961 .

[46]  S. Harnad Categorical Perception: The Groundwork of Cognition , 1990 .

[47]  J. Nadal,et al.  Nonlinear neurons in the low-noise limit: a factorial code maximizes information transfer Network 5 , 1994 .

[48]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[49]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[50]  René Vidal,et al.  Curriculum Dropout , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[52]  G. Cottrell,et al.  A Simple Neural Network Models Categorical Perception of Facial Expressions , 1998 .

[53]  Paavo Camps,et al.  Don't ignore Dropout in Fully Convolutional Networks , 2019, ArXiv.

[54]  Matthew H Tong,et al.  Why is the fusiform face area recruited for novel categories of expertise? A neurocomputational investigation , 2008, Brain Research.

[55]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[56]  Adriaan G. Tijsseling,et al.  Warping Similarity Space in Category Learning by Backprop Nets , 1997 .

[57]  Laurent Bonnasse-Gahot,et al.  Perception of categories: From coding efficiency to reaction times , 2011, Brain Research.

[58]  Nojun Kwak,et al.  Analysis on the Dropout Effect in Convolutional Neural Networks , 2016, ACCV.

[59]  B. C. Griffith,et al.  The discrimination of speech sounds within and across phoneme boundaries. , 1957, Journal of experimental psychology.

[60]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[61]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[62]  Jean-Pierre Nadal,et al.  Neural coding of categories: information efficiency and optimal population codes , 2008, Journal of Computational Neuroscience.

[63]  Stephen A. Ritz,et al.  Distinctive features, categorical perception, and probability learning: some applications of a neural model , 1977 .

[64]  Pascal Vincent,et al.  Dropout as data augmentation , 2015, ArXiv.

[65]  A. Lotto,et al.  Role of experience for language-specific functional mappings of vowel sounds. , 1998, The Journal of the Acoustical Society of America.

[66]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[67]  Dazhi Zhao,et al.  Equivalence between dropout and data augmentation: A mathematical check , 2019, Neural Networks.

[68]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[69]  F. Keil,et al.  Categorical effects in the perception of faces , 1995, Cognition.

[70]  Kiyotoshi Matsuoka,et al.  Noise injection into inputs in back-propagation learning , 1992, IEEE Trans. Syst. Man Cybern..

[71]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[72]  Robert L. Goldstone Influences of categorization on perceptual discrimination. , 1994, Journal of experimental psychology. General.

[73]  P. Marler,et al.  Categorical perception of a natural stimulus continuum: birdsong. , 1989, Science.

[74]  David J. Freedman,et al.  Dynamic population coding of category information in inferior temporal and prefrontal cortex. , 2008, Journal of neurophysiology.