论文信息 - MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation

MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation

In this work, we propose a novel and efficient method for articulated human pose estimation in videos using a convolutional network architecture, which incorporates both color and motion features. We propose a new human body pose dataset, FLIC-motion (This dataset can be downloaded from http://cs.nyu.edu/~ajain/accv2014/.), that extends the FLIC dataset [1] with additional motion features. We apply our architecture to this dataset and report significantly better performance than current state-of-the-art pose detection systems.

[1] G. Johansson. Visual perception of biological motion and a model for its analysis , 1973 .

[2] Martin A. Fischler,et al. The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[3] David C. Hogg. Model-based vision: a program to see a walking person , 1983, Image Vis. Comput..

[4] William T. Freeman,et al. Orientation Histograms for Hand Gesture Recognition , 1995 .

[5] Takeo Kanade,et al. Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[6] Ioannis A. Kakadiaris,et al. Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7] Alex Pentland,et al. Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8] Jitendra Malik,et al. Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[9] David J. Fleet,et al. Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[10] Andrew Blake,et al. Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[11] Cristian Sminchisescu,et al. Monocular tracking of the human arm in 3D , 1995, Proceedings of IEEE International Conference on Computer Vision.

[12] Jitendra Malik,et al. Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[13] Trevor Darrell,et al. Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14] Trevor Darrell,et al. Inferring 3D structure with a statistical image-based shape model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[16] Michael Isard,et al. Tracking loose-limbed people , 2004, CVPR 2004.

[17] David A. Forsyth,et al. Strike a pose: tracking people by finding stylized poses , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18] Sebastian Thrun,et al. SCAPE: shape completion and animation of people , 2005, SIGGRAPH 2005.

[19] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20] Ivan Laptev,et al. On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[21] Cordelia Schmid,et al. Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[22] Ankur Agarwal,et al. Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Ronald Poppe,et al. Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[24] Hans-Peter Seidel,et al. Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[25] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Andrew Zisserman,et al. Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Vittorio Ferrari,et al. Better Appearance Models for Pictorial Structures , 2009, BMVC.

[28] Jitendra Malik,et al. Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29] Michael J. Black,et al. HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[30] Andrew Zisserman,et al. Learning sign language by watching TV (using weakly aligned subtitles) , 2009, CVPR.

[31] Bernt Schiele,et al. Pictorial structures revisited: People detection and articulated pose estimation , 2009, CVPR.

[32] Ben Taskar,et al. Sidestepping Intractable Inference with Structured Ensemble Cascades , 2010, NIPS.

[33] Ben Taskar,et al. Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[34] Mark Everingham,et al. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[35] Hans-Peter Seidel,et al. MovieReshape: tracking and reshaping of humans in videos , 2010, ACM Trans. Graph..

[36] Hans-Peter Seidel,et al. Performance Capture from Multi-View Video , 2010, Image and Geometry Processing for 3-D Cinematography.

[37] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[38] Mark Everingham,et al. Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[39] Hans-Peter Seidel,et al. Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[40] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[41] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[42] Ben Taskar,et al. MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43] Jitendra Malik,et al. Articulated Pose Estimation Using Discriminative Armlet Classifiers , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[44] Li Deng,et al. A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[45] Cordelia Schmid,et al. DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[46] Luca Maria Gambardella,et al. Fast image scanning with deep max-pooling convolutional neural networks , 2013, 2013 IEEE International Conference on Image Processing.

[47] Luc Van Gool,et al. Human Pose Estimation Using Body Parts Dependent Joint Regressors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[48] Yann LeCun,et al. Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[50] Peter V. Gehler,et al. Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[51] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[52] Rob Fergus,et al. Visualizing and Understanding Convolutional Neural Networks , 2013 .

[53] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[54] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[55] Jonathan Tompson,et al. Learning Human Pose Estimation Features with Convolutional Networks , 2013, ICLR.

[56] R. Fergus,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[57] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[58] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59] Ken Perlin,et al. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..