Emergence of foveal image sampling from learning to attend in visual scenes

We describe a neural attention model with a learnable retinal sampling lattice. The model is trained on a visual search task requiring the classification of an object embedded in a visual scene amidst background distractors using the smallest number of fixations. We explore the tiling properties that emerge in the model's retinal sampling lattice after training. Specifically, we show that this lattice resembles the eccentricity dependent sampling lattice of the primate retina, with a high resolution region in the fovea surrounded by a low resolution periphery. Furthermore, we find conditions where these emergent properties are amplified or eliminated providing clues to their function.

[1]  Geoffrey E. Hinton,et al.  Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[2]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[3]  W. Geisler,et al.  Models of overt attention , 2011 .

[4]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[5]  David C. Van Essen,et al.  Information Processing Strategies and Pathways in the Primate Visual System . , 1995 .

[6]  A. Cowey,et al.  Retinal ganglion cells that project to the dorsal lateral geniculate nucleus in the macaque monkey , 1984, Neuroscience.

[7]  C. Curcio,et al.  Topography of ganglion cells in human retina , 1990, The Journal of comparative neurology.

[8]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[9]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[10]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[13]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[16]  Koray Kavukcuoglu,et al.  Multiple Object Recognition with Visual Attention , 2014, ICLR.