Attentional Bias in Human Category Learning: The Case of Deep Learning

Category learning performance is influenced by both the nature of the category's structure and the way category features are processed during learning. Shepard (1964, 1987) showed that stimuli can have structures with features that are statistically uncorrelated (separable) or statistically correlated (integral) within categories. Humans find it much easier to learn categories having separable features, especially when attention to only a subset of relevant features is required, and harder to learn categories having integral features, which require consideration of all of the available features and integration of all the relevant category features satisfying the category rule (Garner, 1974). In contrast to humans, a single hidden layer backpropagation (BP) neural network has been shown to learn both separable and integral categories equally easily, independent of the category rule (Kruschke, 1993). This “failure” to replicate human category performance appeared to be strong evidence that connectionist networks were incapable of modeling human attentional bias. We tested the presumed limitations of attentional bias in networks in two ways: (1) by having networks learn categories with exemplars that have high feature complexity in contrast to the low dimensional stimuli previously used, and (2) by investigating whether a Deep Learning (DL) network, which has demonstrated humanlike performance in many different kinds of tasks (language translation, autonomous driving, etc.), would display human-like attentional bias during category learning. We were able to show a number of interesting results. First, we replicated the failure of BP to differentially process integral and separable category structures when low dimensional stimuli are used (Garner, 1974; Kruschke, 1993). Second, we show that using the same low dimensional stimuli, Deep Learning (DL), unlike BP but similar to humans, learns separable category structures more quickly than integral category structures. Third, we show that even BP can exhibit human like learning differences between integral and separable category structures when high dimensional stimuli (face exemplars) are used. We conclude, after visualizing the hidden unit representations, that DL appears to extend initial learning due to feature development thereby reducing destructive feature competition by incrementally refining feature detectors throughout later layers until a tipping point (in terms of error) is reached resulting in rapid asymptotic learning.

[1]  W. R. Garner,et al.  Filtering and condensation tasks with integral and separable dimensions , 1975 .

[2]  Kenneth J Kutrz The divergent autoencoder (DIVA) model of category learning. , 2007, Psychonomic bulletin & review.

[3]  Pablo V. A. Barros,et al.  What Can Computational Models Learn From Human Selective Attention? A Review From an Audiovisual Unimodal and Crossmodal Perspective , 2020, Frontiers in Integrative Neuroscience.

[4]  J. E. Mazur,et al.  Learning as accumulation: a reexamination of the learning curve. , 1978, Psychological bulletin.

[5]  Kenneth J. Kurtz,et al.  Human Category Learning: Toward a Broader Explanatory Account , 2015 .

[6]  A Treisman,et al.  Feature binding, attention and object perception. , 1998, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[7]  S. Hanson,et al.  Spherical Units as Dynamic Consequential Regions: Implications for Attention, Competition and Categorization , 1990, NIPS 1990.

[8]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[9]  W. R. Garner The Processing of Information and Structure , 1974 .

[10]  Harold Gulliksen,et al.  A Rational Equation of the Learning Curve Based on Thorndike's Law of Effect , 1934 .

[11]  R. Shepard,et al.  Toward a universal law of generalization for psychological science. , 1987, Science.

[12]  R. Shepard Attention and the metric structure of the stimulus space. , 1964 .

[13]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[14]  Linda B. Smith,et al.  Is there a developmental trend from integrality to separability in perception , 1978 .

[15]  E. Thorndike The fundamentals of learning , 1972 .

[16]  R. Shepard,et al.  Learning and memorization of classifications. , 1961 .

[17]  D. Medin,et al.  SUSTAIN: a network model of category learning. , 2004, Psychological review.

[18]  John K. Kruschke,et al.  Human Category Learning: Implications for Backpropagation Models , 1993 .

[19]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[20]  M. Posner INFORMATION REDUCTION IN THE ANALYSIS OF SEQUENTIAL TASKS. , 1964, Psychological review.

[21]  G. Bower,et al.  Evaluating an adaptive network model of human learning , 1988 .

[22]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[23]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .