论文信息 - Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks

Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks

We present a novel method for real-time continuous pose recovery of markerless complex articulable objects from a single depth image. Our method consists of the following stages: a randomized decision forest classifier for image segmentation, a robust method for labeled dataset generation, a convolutional network for dense feature extraction, and finally an inverse kinematics stage for stable real-time pose recovery. As one possible application of this pipeline, we show state-of-the-art results for real-time puppeteering of a skinned hand-model.

[1] Berthold K. P. Horn,et al. Closed-form solution of absolute orientation using unit quaternions , 1987 .

[2] John C. Platt,et al. A Convolutional Neural Network Hand Tracker , 1994, NIPS.

[3] Takeo Kanade,et al. Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[4] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5] Olga Veksler,et al. Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6] Paul Tseng,et al. Fortified-Descent Simplicial Search Method: A General Approach , 1999, SIAM J. Optim..

[7] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[8] Zoran Popovic,et al. The space of human body shapes: reconstruction and parameterization from range scans , 2003, ACM Trans. Graph..

[9] CurlessBrian,et al. The space of human body shapes , 2003 .

[10] Yann LeCun,et al. Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[11] John Hart,et al. ACM Transactions on Graphics , 2004, SIGGRAPH 2004.

[12] Y. LeCun,et al. Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[13] Mircea Nicolescu,et al. Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[14] Hao Li,et al. Global Correspondence Optimization for Non‐Rigid Registration of Depth Scans , 2008, Comput. Graph. Forum.

[15] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16] Luc Van Gool,et al. Face/Off: live facial puppetry , 2009, SCA '09.

[17] Robert Y. Wang,et al. Real-time hand-tracking with a color glove , 2009, ACM Trans. Graph..

[18] T. Yasuda,et al. Extended pso with partial randomization for large scale multimodal problems , 2010, 2010 World Automation Congress.

[19] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[20] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[21] Luca Maria Gambardella,et al. Max-pooling convolutional neural networks for vision-based hand gesture recognition , 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[22] Christoph Bregler,et al. Learning invariance through imitation , 2011, CVPR 2011.

[23] Antonis A. Argyros,et al. Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[24] Lale Akarun,et al. Real time hand pose estimation using depth sensors , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[25] Sylvain Paris,et al. 6D hands: markerless hand-tracking for computer aided design , 2011, UIST.

[26] Jinxiang Chai,et al. Combining marker-based mocap and RGB-D camera for acquiring high-fidelity hand motion data , 2012, SCA '12.

[27] Lale Akarun,et al. Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[28] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29] David Kim,et al. Shake'n'sense: reducing interference for overlapping structured light depth cameras , 2012, CHI.

[30] Luc Van Gool,et al. Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.

[31] Murphy Stein. ARCADE: a system for augmenting gesture-based computer graphic presentations , 2012, SIGGRAPH '12.

[32] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Yann LeCun,et al. Indoor Semantic Segmentation using depth information , 2013, ICLR.

[34] Sterling Orsten,et al. Dynamics based 3D skeletal hand tracking , 2013, I3D '13.

[35] Jihun Yu,et al. Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[36] Christian Wolf,et al. Human body part estimation from depth images via spatially-constrained deep learning , 2014, Pattern Recognition Letters.