Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks

We present a novel method for real-time continuous pose recovery of markerless complex articulable objects from a single depth image. Our method consists of the following stages: a randomized decision forest classifier for image segmentation, a robust method for labeled dataset generation, a convolutional network for dense feature extraction, and finally an inverse kinematics stage for stable real-time pose recovery. As one possible application of this pipeline, we show state-of-the-art results for real-time puppeteering of a skinned hand-model.

[1]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[2]  John C. Platt,et al.  A Convolutional Neural Network Hand Tracker , 1994, NIPS.

[3]  Takeo Kanade,et al.  Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Paul Tseng,et al.  Fortified-Descent Simplicial Search Method: A General Approach , 1999, SIAM J. Optim..

[7]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[8]  Zoran Popovic,et al.  The space of human body shapes: reconstruction and parameterization from range scans , 2003, ACM Trans. Graph..

[9]  CurlessBrian,et al.  The space of human body shapes , 2003 .

[10]  Yann LeCun,et al.  Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[11]  John Hart,et al.  ACM Transactions on Graphics , 2004, SIGGRAPH 2004.

[12]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[13]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[14]  Hao Li,et al.  Global Correspondence Optimization for Non‐Rigid Registration of Depth Scans , 2008, Comput. Graph. Forum.

[15]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Luc Van Gool,et al.  Face/Off: live facial puppetry , 2009, SCA '09.

[17]  Robert Y. Wang,et al.  Real-time hand-tracking with a color glove , 2009, ACM Trans. Graph..

[18]  T. Yasuda,et al.  Extended pso with partial randomization for large scale multimodal problems , 2010, 2010 World Automation Congress.

[19]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[20]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[21]  Luca Maria Gambardella,et al.  Max-pooling convolutional neural networks for vision-based hand gesture recognition , 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[22]  Christoph Bregler,et al.  Learning invariance through imitation , 2011, CVPR 2011.

[23]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[24]  Lale Akarun,et al.  Real time hand pose estimation using depth sensors , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[25]  Sylvain Paris,et al.  6D hands: markerless hand-tracking for computer aided design , 2011, UIST.

[26]  Jinxiang Chai,et al.  Combining marker-based mocap and RGB-D camera for acquiring high-fidelity hand motion data , 2012, SCA '12.

[27]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  David Kim,et al.  Shake'n'sense: reducing interference for overlapping structured light depth cameras , 2012, CHI.

[30]  Luc Van Gool,et al.  Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.

[31]  Murphy Stein ARCADE: a system for augmenting gesture-based computer graphic presentations , 2012, SIGGRAPH '12.

[32]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Yann LeCun,et al.  Indoor Semantic Segmentation using depth information , 2013, ICLR.

[34]  Sterling Orsten,et al.  Dynamics based 3D skeletal hand tracking , 2013, I3D '13.

[35]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[36]  Christian Wolf,et al.  Human body part estimation from depth images via spatially-constrained deep learning , 2014, Pattern Recognition Letters.