A Biologically Inspired System for Action Recognition

We present a biologically-motivated system for the recognition of actions from video sequences. The approach builds on recent work on object recognition based on hierarchical feedforward architectures [25, 16, 20] and extends a neurobiological model of motion processing in the visual cortex [10]. The system consists of a hierarchy of spatio-temporal feature detectors of increasing complexity: an input sequence is first analyzed by an array of motion- direction sensitive units which, through a hierarchy of processing stages, lead to position-invariant spatio-temporal feature detectors. We experiment with different types of motion-direction sensitive units as well as different system architectures. As in [16], we find that sparse features in intermediate stages outperform dense ones and that using a simple feature selection approach leads to an efficient system that performs better with far fewer features. We test the approach on different publicly available action datasets, in all cases achieving the highest results reported to date.

[1]  D. Whitteridge,et al.  The representation of the visual field on the cerebral cortex in monkeys , 1961, The Journal of physiology.

[2]  J. Robson Spatial and Temporal Contrast-Sensitivity Functions of the Visual System , 1966 .

[3]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[4]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[5]  J. Movshon,et al.  Spatial summation in the receptive fields of simple cells in the cat's striate cortex. , 1978, The Journal of physiology.

[6]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[7]  C. Gross,et al.  Visual topography of striate projection zone (MT) in posterior superior temporal sulcus of the macaque. , 1981, Journal of neurophysiology.

[8]  D C Van Essen,et al.  Functional properties of neurons in middle temporal visual area of the macaque monkey. I. Selectivity for stimulus direction, speed, and orientation. , 1983, Journal of neurophysiology.

[9]  P. Grobstein Analysis of Visual Behavior, David J. Ingle, Melvyn A. Goodale, Richard J.W. Mansfield (Eds.). MIT press, Cambridge, MA and London (1982), 834 , 1983 .

[10]  Andrew B. Watson,et al.  A look at motion in the frequency domain , 1983 .

[11]  Leslie G. Ungerleider,et al.  Object vision and spatial vision: two cortical pathways , 1983, Trends in Neurosciences.

[12]  T. Albright Direction and orientation selectivity of neurons in visual area MT of the macaque. , 1984, Journal of neurophysiology.

[13]  E. Adelson,et al.  The analysis of moving visual patterns , 1985 .

[14]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[15]  Keiji Tanaka,et al.  Integration of direction signals of image motion in the superior temporal sulcus of the macaque monkey , 1986, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[16]  Leslie G. Ungerleider,et al.  Cortical connections of visual area MT in the macaque , 1986, The Journal of comparative neurology.

[17]  W. Newsome,et al.  Motion selectivity in macaque visual cortex. II. Spatiotemporal range of directional interactions in MT and V1. , 1986, Journal of neurophysiology.

[18]  W. Newsome,et al.  Motion selectivity in macaque visual cortex. I. Mechanisms of direction and speed selectivity in extrastriate area MT. , 1986, Journal of neurophysiology.

[19]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[20]  D J Heeger,et al.  Model for the extraction of image flow. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[21]  B. Kȩdzia [Contrast sensitivity function of the visual system]. , 1988, Klinika oczna.

[22]  J. Freyd,et al.  Apparent Motion of the Human Body , 1990 .

[23]  J. Leo van Hemmen,et al.  Temporal association , 1991 .

[24]  M. Stryker Temporal associations , 1991, Nature.

[25]  E. Adelson,et al.  Directionally selective complex cells and the computation of motion energy in cat visual cortex , 1992, Vision Research.

[26]  I. Ohzawa,et al.  Spatiotemporal organization of simple-cell receptive fields in the cat's striate cortex. II. Linearity of temporal and spatial summation. , 1993, Journal of neurophysiology.

[27]  Leslie G. Ungerleider,et al.  Cortical connections of inferior temporal area TEO in macaque monkeys , 1993, The Journal of comparative neurology.

[28]  G. Orban,et al.  Speed and direction selectivity of macaque middle temporal neurons. , 1993, Journal of neurophysiology.

[29]  Andrew T. Smith,et al.  Visual detection of motion , 1994 .

[30]  G. Orban,et al.  Responses of macaque STS neurons to optic flow components: a comparison of areas MT and MST. , 1994, Journal of neurophysiology.

[31]  M. Graziano,et al.  Tuning of MST neurons to spiral motions , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[32]  L. Palmer,et al.  Contribution of linear mechanisms to the specification of local motion by simple cells in areas 17 and 18 of the cat , 1994, Visual Neuroscience.

[33]  I. Ohzawa,et al.  Receptive-field dynamics in the central visual pathways , 1995, Trends in Neurosciences.

[34]  C. Gross Brain Mechanisms of Perception and Memory: From Neuron to Behavior.Taketoshi Ono , Larry R. Squire , Marcus E. Raichle , David I. Perrett , Masaji Fukuda , 1995 .

[35]  R A Andersen,et al.  The Analysis of Complex Motion Patterns by Form/Cue Invariant MSTd Neurons , 1996, The Journal of Neuroscience.

[36]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[37]  Eero P. Simoncelli,et al.  A model of neuronal responses in visual area MT , 1998, Vision Research.

[38]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[39]  J. C. Anderson,et al.  The Connection from Cortical Area V1 to V5: A Light and Electron Microscopic Study , 1998, The Journal of Neuroscience.

[40]  J. Decety,et al.  Neural mechanisms subserving the perception of human actions , 1999, Trends in Cognitive Sciences.

[41]  S. Grossberg,et al.  A neural model of motion processing and visual navigation by cortical area MST. , 1999, Cerebral cortex.

[42]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[43]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[44]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[45]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[46]  M. Irani,et al.  Event-Based Video Analysis, , 2001 .

[47]  G. Rizzolatti,et al.  Neurophysiological mechanisms underlying the understanding and imitation of action , 2001, Nature Reviews Neuroscience.

[48]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[49]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[50]  R. Blake,et al.  Brain Areas Active during Visual Perception of Biological Motion , 2002, Neuron.

[51]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[52]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[53]  David A. Forsyth,et al.  Automatic Annotation of Everyday Movements , 2003, NIPS.

[54]  Nicholas J. Priebe,et al.  The Neural Representation of Speed in Macaque Area MT/V5 , 2003, The Journal of Neuroscience.

[55]  T. Poggio,et al.  Cognitive neuroscience: Neural mechanisms for the recognition of biological movements , 2003, Nature Reviews Neuroscience.

[56]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[57]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[58]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[59]  E. L. Schwartz,et al.  Afferent geometry in the primate visual cortex and the generation of neuronal trigger features , 1977, Biological Cybernetics.

[60]  Jason Lee,et al.  A stochastic model for the detection of coherent motion , 2004, Biological Cybernetics.

[61]  Jitendra Malik,et al.  Twist Based Acquisition and Tracking of Animal and Human Kinematics , 2004, International Journal of Computer Vision.

[62]  B. Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[63]  J. Tanji,et al.  Integration of temporal order and object information in the monkey lateral prefrontal cortex. , 2004, Journal of neurophysiology.

[64]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[65]  Martin A. Giese,et al.  Learning Features of Intermediate Complexity for the Recognition of Biological Motion , 2005, ICANN.

[66]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[67]  Pietro Perona,et al.  Hybrid models for human motion recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[68]  Antonino Casile,et al.  Critical features for the recognition of biological motion. , 2005, Journal of vision.

[69]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[70]  John K. Tsotsos,et al.  Attending to visual motion , 2005, Comput. Vis. Image Underst..

[71]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[72]  J. Lange,et al.  A Model of Biological Motion Perception from Configural Form Cues , 2006, The Journal of Neuroscience.

[73]  Eero P. Simoncelli,et al.  How MT cells analyze the motion of visual patterns , 2006, Nature Neuroscience.

[74]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[75]  Nicholas J. Priebe,et al.  Tuning for Spatiotemporal Frequency and Speed in Directionally Selective Neurons of Macaque Striate Cortex , 2006, The Journal of Neuroscience.

[76]  J. Perrone A Single Mechanism Can Explain the Speed Tuning Properties of MT and V1 Complex Neurons , 2006, The Journal of Neuroscience.

[77]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[78]  J. Movshon,et al.  Motion Integration by Neurons in Macaque MT Is Local, Not Global , 2007, The Journal of Neuroscience.

[79]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  R. Blake,et al.  Perception of human motion. , 2007, Annual review of psychology.