HBE: Hand Branch Ensemble Network for Real-Time 3D Hand Pose Estimation

The goal of this paper is to estimate the 3D coordinates of the hand joints from a single depth image. To give consideration to both the accuracy and the real time performance, we design a novel three-branch Convolutional Neural Networks named Hand Branch Ensemble network (HBE), where the three branches correspond to the three parts of a hand: the thumb, the index finger and the other fingers. The structural design inspiration of the HBE network comes from the understanding of the differences in the functional importance of different fingers. In addition, a feature ensemble layer along with a low-dimensional embedding layer ensures the overall hand shape constraints. The experimental results on three public datasets demonstrate that our approach achieves comparable or better performance to state-of-the-art methods with less training data, shorter training time and faster frame rate.

[1]  Qi Ye,et al.  BigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[3]  Daniel Thalmann,et al.  Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Guijin Wang,et al.  Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation , 2017, Neurocomputing.

[5]  Daniel Thalmann,et al.  3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrea Tagliasacchi,et al.  Sphere-meshes for real-time hand modeling and tracking , 2016, ACM Trans. Graph..

[7]  Sergio Escalera,et al.  End-to-end Global to Local CNN Learning for Hand Pose Recovery in Depth data , 2017, ArXiv.

[8]  Kaspar Althoefer,et al.  The Role of the Thumb: Study of Finger Motion in Grasping and Reachability Space in Human and Robotic Hands , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[9]  David Kim,et al.  Articulated distance fields for ultra-fast tracking of hands interacting , 2017, ACM Trans. Graph..

[10]  Antti Oulasvirta,et al.  Fast and robust hand tracking using detection-guided optimization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Shanxin Yuan,et al.  First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Yi Yang,et al.  Depth-Based Hand Pose Estimation: Data, Methods, and Challenges , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Yu Sun,et al.  Robot grasp planning based on demonstrated grasp strategies , 2015, Int. J. Robotics Res..

[14]  Jian Sun,et al.  Cascaded hand pose regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Luc Van Gool,et al.  Crossing Nets: Combining GANs and VAEs with a Shared Latent Space for Hand Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  R. L. Susman Hand function and tool behavior in early hominids. , 1998, Journal of human evolution.

[17]  Yichen Wei,et al.  Model-Based Deep Hand Pose Estimation , 2016, IJCAI.

[18]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[19]  Li Cheng,et al.  Efficient Hand Pose Estimation from a Single Depth Image , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Daniele Panozzo,et al.  Position-based tensegrity design , 2017, ACM Trans. Graph..

[21]  Shanxin Yuan,et al.  The 2017 Hands in the Million Challenge on 3D Hand Pose Estimation , 2017, ArXiv.

[22]  Xudong Zhang,et al.  Quantitative analysis of finger motion coordination in hand manipulative and gestic acts. , 2004, Human movement science.

[23]  Andrew W. Fitzgibbon,et al.  Learning an efficient model of hand shape variation from depth images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Christian Theobalt,et al.  Real-Time Hand Tracking Under Occlusion from an Egocentric RGB-D Sensor , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[26]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[27]  Fei Qiao,et al.  Region ensemble network: Improving convolutional network for hand pose estimation , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[28]  Andrew W. Fitzgibbon,et al.  Accurate, Robust, and Flexible Real-time Hand Tracking , 2015, CHI.

[29]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Sergio Escalera,et al.  Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Guijin Wang,et al.  Towards Good Practices for Deep 3D Hand Pose Estimation , 2017, ArXiv.

[33]  Kyoung Mu Lee,et al.  V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35]  Vincent Lepetit,et al.  Hands Deep in Deep Learning for Hand Pose Estimation , 2015, ArXiv.

[36]  Vincent Lepetit,et al.  DeepPrior++: Improving Fast and Accurate 3D Hand Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[37]  Vincent Lepetit,et al.  Feature Mapping for Learning Fast and Accurate 3D Pose Inference from Synthetic Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[39]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[40]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.