Neural representation of action sequences: how far can a simple snippet-matching model take us?

The macaque Superior Temporal Sulcus (STS) is a brain area that receives and integrates inputs from both the ventral and dorsal visual processing streams (thought to specialize in form and motion processing respectively). For the processing of articulated actions, prior work has shown that even a small population of STS neurons contains sufficient information for the decoding of actor invariant to action, action invariant to actor, as well as the specific conjunction of actor and action. This paper addresses two questions. First, what are the invariance properties of individual neural representations (rather than the population representation) in STS? Second, what are the neural encoding mechanisms that can produce such individual neural representations from streams of pixel images? We find that a simple model, one that simply computes a linear weighted sum of ventral and dorsal responses to short action "snippets", produces surprisingly good fits to the neural data. Interestingly, even using inputs from a single stream, both actor-invariance and action-invariance can be accounted for, by having different linear weights.

[1]  W. Reichardt,et al.  Autocorrelation, a principle for the evaluation of sensory information by the central nervous system , 1961 .

[2]  D. B. Bender,et al.  Visual properties of neurons in inferotemporal cortex of the Macaque. , 1972, Journal of neurophysiology.

[3]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[4]  R. Desimone,et al.  Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. , 1981, Journal of neurophysiology.

[5]  A. J. Mistlin,et al.  Visual analysis of body movements by neurones in the temporal cortex of the macaque monkey: A preliminary report , 1985, Behavioural Brain Research.

[6]  Leslie G. Ungerleider,et al.  Organization of visual inputs to the inferior temporal and posterior parietal cortex in macaques , 1991, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[7]  Leslie G. Ungerleider,et al.  ‘What’ and ‘where’ in the human brain , 1994, Current Opinion in Neurobiology.

[8]  Anthony J. Movshon,et al.  Visual Response Properties of Striate Cortical Neurons Projecting to Area MT in Macaque Monkeys , 1996, The Journal of Neuroscience.

[9]  D. Perrett,et al.  Integration of form and motion in the anterior superior temporal polysensory area (STPa) of the macaque monkey. , 1996, Journal of neurophysiology.

[10]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[11]  R. Blake,et al.  Brain Areas Involved in Perception of Biological Motion , 2000, Journal of Cognitive Neuroscience.

[12]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[13]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  J A Beintema,et al.  Perception of biological motion without local image motion , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[15]  J. Haxby,et al.  fMRI Responses to Video and Point-Light Displays of Moving Humans and Manipulable Objects , 2003, Journal of Cognitive Neuroscience.

[16]  T. Poggio,et al.  Cognitive neuroscience: Neural mechanisms for the recognition of biological movements , 2003, Nature Reviews Neuroscience.

[17]  D. Perrett,et al.  Single cell integration of animate form, motion and location in the superior temporal cortex of the macaque monkey. , 2004, Cerebral cortex.

[18]  J. Lange,et al.  A Model of Biological Motion Perception from Configural Form Cues , 2006, The Journal of Neuroscience.

[19]  Eero P. Simoncelli,et al.  How MT cells analyze the motion of visual patterns , 2006, Nature Neuroscience.

[20]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  R. Vogels,et al.  Functional differentiation of macaque visual temporal cortical neurons using a parametric action space. , 2009, Cerebral cortex.

[24]  D. Sheinberg,et al.  Temporal Cortex Neurons Encode Articulated Actions as Slow Sequences of Integrated Poses , 2010, The Journal of Neuroscience.

[25]  Christopher C. Pack,et al.  Hierarchical processing of complex motion along the primate dorsal visual pathway , 2012, Proceedings of the National Academy of Sciences.