Automatic Detection and Tracking of Human Motion with a View-Based Representation

This paper proposes a solution for the automatic detection and tracking of human motion in image sequences. Due to the complexity of the human body and its motion, automatic detection of 3D human motion remains an open, and important, problem. Existing approaches for automatic detection and tracking focus on 2D cues and typically exploit object appearance (color distribution, shape) or knowledge of a static background. In contrast, we exploit 2D optical flow information which provides rich descriptive cues, while being independent of object and background appearance. To represent the optical flow patterns of people from arbitrary viewpoints, we develop a novel representation of human motion using low-dimensional spatio-temporal models that are learned using motion capture data of human subjects. In addition to human motion (the foreground) we probabilistically model the motion of generic scenes (the background); these statistical models are defined as Gibbsian fields specified from the first-order derivatives of motion observations. Detection and tracking are posed in a principled Bayesian framework which involves the computation of a posterior probability distribution over the model parameters (i.e., the location and the type of the human motion) given a sequence of optical flow observations. Particle filtering is used to represent and predict this non-Gaussian posterior distribution over time. The model parameters of samples from this distribution are related to the pose parameters of a 3D articulated model (e.g. the approximate joint angles and movement direction). Thus the approach proves suitable for initializing more complex probabilistic models of human motion. As shown by experiments on real image sequences, our method is able to detect and track people under different viewpoints with complex backgrounds.

[1]  Michael J. Black Explaining optical flow events with parametrized spatio-temporal tracking , 1999, CVPR 1999.

[2]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[3]  Larry S. Davis,et al.  W/sup 4/: Who? When? Where? What? A real time system for detecting and tracking people , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[4]  Michael Isard,et al.  Object localization by Bayesian correlation , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Michael J. Black,et al.  Learning image statistics for Bayesian tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[6]  David J. Fleet,et al.  Learning parameterized models of image motion , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Andrew Blake,et al.  Statistical Foreground Modelling for Object Localisation , 2000, ECCV.

[8]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[9]  Tomaso A. Poggio,et al.  A pattern classification approach to dynamical object detection , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[10]  Hans-Hellmut Nagel,et al.  Tracking Persons in Monocular Image Sequences , 1999, Comput. Vis. Image Underst..

[11]  L. Davis,et al.  el-based tracking of humans in action: , 1996 .

[12]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[13]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[14]  H. Nagel,et al.  Tracking of persons in monocular image sequences , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[15]  Dariu Gavrila,et al.  Pedestrian Detection from a Moving Vehicle , 2000, ECCV.

[16]  Haluk Derin,et al.  Video Data Compression for Multimedia Computing , 1997 .

[17]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[18]  Tomaso A. Poggio,et al.  Trainable pedestrian detection , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[19]  Song-Chun Zhu Filters, Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling , 1998 .

[20]  Sudeep Sarkar,et al.  Investigation of measures for grouping by graph partitioning , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[21]  J. Odobez,et al.  Separation of Moving Regions from Background in an Image Sequence Acquired with a Mobil Camera , 1997 .

[22]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  R. Brunelli,et al.  A Survey on the Automatic Indexing of Video Data, , 1999, J. Vis. Commun. Image Represent..

[24]  Rachid Deriche,et al.  Geodesic Active Contours and Level Sets for the Detection and Tracking of Moving Objects , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Michael Isard,et al.  BraMBLe: a Bayesian multiple-blob tracker , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[26]  Andrew Blake,et al.  A Probabilistic Background Model for Tracking , 2000, ECCV.

[27]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[28]  Larry S. Davis,et al.  W4S : A real-time system for detecting and tracking people in 2 D , 1998, eccv 1998.

[29]  Michael J. Black,et al.  Cardboard people: a parameterized model of articulated image motion , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[30]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  William T. Freeman,et al.  Bayesian Reconstruction of 3D Human Motion from Single-Camera Video , 1999, NIPS.

[32]  Pawan Sinha,et al.  Top-down influences on stereoscopic depth-perception , 1998, Nature Neuroscience.

[33]  James M. Rehg,et al.  A multiple hypothesis approach to figure tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[34]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[35]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[36]  Andrew Blake,et al.  Probabilistic tracking in a metric space , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[37]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..