HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion

While research on articulated human motion and pose estimation has progressed rapidly in the last few years, there has been no systematic quantitative evaluation of competing methods to establish the current state of the art. Current algorithms make many different choices about how to model the human body, how to exploit image evidence and how to approach the inference problem. We argue that there is a need for common datasets that allow fair comparison between different methods and their design choices. Until recently gathering ground-truth data for evaluation of results (especially in 3D) was challenging. In this report we present a novel dataset obtained using a unique setup for capturing synchronized video and ground-truth 3D motion. Data was captured simultaneously using a calibrated marker-based motion capture system and multiple high-speed video capture systems. The video and motion capture streams were synchronized in software using a direct optimization method. The resulting HumanEvaI dataset contains multiple subjects performing a set of predefined actions with a number of repetitions. On the order of 50,000 frames of synchronized motion capture and video was collected at 60 Hz with an additional 37,000 frames of pure motion capture data. The data is partitioned into training, validation, and testing sub-sets. A standard set of error metrics is defined that can be used for evaluation of both 2D and 3D pose estimation and tracking algorithms. Support software and an on-line evaluation system for quantifying results using the test data is being made available to the community. This report provides an overview of the dataset and evaluation metrics and provides pointers into the dataset for additional details. It is our hope that HumanEva-I will become a standard dataset for the evaluation of articulated human motion and pose estimation.

[1]  J. O'Rourke,et al.  Model-based image analysis of human motion using constraint propagation , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  David C. Hogg Model-based vision: a program to see a walking person , 1983, Image Vis. Comput..

[3]  Ioannis A. Kakadiaris,et al.  Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Michael J. Black,et al.  Cardboard people: A parametrized model of articulated motion , 1996 .

[5]  Hyeonjoon Moon,et al.  The FERET evaluation methodology for face-recognition algorithms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[7]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[8]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[9]  Camillo J. Taylor,et al.  Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[10]  Rómer Rosales,et al.  Inferring body pose without tracking body parts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[11]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[12]  Jean-Yves Bouguet,et al.  Camera calibration toolbox for matlab , 2001 .

[13]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[14]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[15]  Cordelia Schmid,et al.  Learning to Parse Pictures of People , 2002, ECCV.

[16]  Cristian Sminchisescu,et al.  Kinematic jump processes for monocular 3D human tracking , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[17]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18]  P. Jonathon Phillips,et al.  Face recognition vendor test 2002 , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[19]  David A. Forsyth,et al.  Finding and tracking people from the bottom up , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[20]  Cristian Sminchisescu,et al.  Estimating Articulated Human Motion with Covariance Scaled Sampling , 2003, Int. J. Robotics Res..

[21]  Trevor Darrell,et al.  Inferring 3D structure with a statistical image-based shape model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  Ian D. Reid,et al.  Articulated Body Motion Capture by Stochastic Search , 2005, International Journal of Computer Vision.

[23]  Ankur Agarwal,et al.  3D human pose from silhouettes by relevance vector regression , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[24]  Daniel P. Huttenlocher,et al.  A unified spatio-temporal articulated model for tracking , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[25]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[26]  Ankur Agarwal,et al.  Learning to track 3D human motion from silhouettes , 2004, ICML.

[27]  Stephen J. McKenna,et al.  Human Pose Estimation Using Learnt Probabilistic Region Similarities and Partial Configurations , 2004, ECCV.

[28]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[29]  Sidharth Bhatia,et al.  Tracking loose-limbed people , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[30]  David A. Forsyth,et al.  Strike a pose: tracking people by finding stylized poses , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Greg Mori,et al.  Guiding model search using segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[32]  Jitendra Malik,et al.  Recovering human body configurations using pairwise constraints between parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[33]  Michael J. Black,et al.  A Quantitative Evaluation of Video-based 3D Person Tracking , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[34]  Gang Hua,et al.  Learning to estimate human pose with data driven belief propagation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Sudeep Sarkar,et al.  The humanID gait challenge problem: data sets, performance, and analysis , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[37]  Daniel P. Huttenlocher,et al.  Beyond trees: common-factor models for 2D human pose recovery , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[38]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  Ramakant Nevatia,et al.  Human Pose Tracking Using Multi-level Structured Models , 2006, ECCV.

[40]  Rui Li,et al.  Monocular Tracking of 3D Human Motion with a Coordinated Mixture of Factor Analyzers , 2006, ECCV.