3D hand tracking by rapid stochastic gradient descent using a skinning model

The main challenge of tracking articulated structures like hands is their large number of degrees of freedom (DOFs). A realistic 3D model of the human hand has at least 26 DOFs. The arsenal of tracking approaches that can track such structures fast and reliably is still very small. This paper proposes a tracker based on ‘Stochastic Meta-Descent’ (SMD) for optimizations in such highdimensional state spaces. This new algorithm is based on a gradient descent approach with adaptive and parameter-specific step sizes. The SMD tracker facilitates the integration of constraints, and combined with a stochastic sampling technique, can get out of spurious local minima. Furthermore, the integration of a deformable hand model based on linear blend skinning and anthropometrical measurements reinforce the robustness of our tracker. Experiments show the efficiency of the SMD algorithm in comparison with common optimization methods.

[1]  David J. Fleet,et al.  Stochastic Tracking of 3 D Human Figures Using 2 D Image Motion , 2000 .

[2]  Cristian Sminchisescu,et al.  Monocular tracking of the human arm in 3D , 1995, Proceedings of IEEE International Conference on Computer Vision.

[3]  Mance E. Harmon,et al.  Multi-Agent Residual Advantage Learning with General Function Approximation. , 1996 .

[4]  Michael Gleicher,et al.  Building efficient, accurate character skins from examples , 2003, ACM Trans. Graph..

[5]  William H. Press,et al.  Numerical recipes in C , 2002 .

[6]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[7]  Jacob D. Furst,et al.  An improved articulated model of the human hand , 2001, The Visual Computer.

[8]  Koji Komatsu,et al.  Human skin model capable of natural shape variation , 1988, The Visual Computer.

[9]  Lance Williams,et al.  Motion signal processing , 1995, SIGGRAPH.

[10]  Daniel Thalmann,et al.  Joint-dependent local deformations for hand animation and object grasping , 1989 .

[11]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[12]  Donald H. House,et al.  An integrated approach towards the representation, manipulation and reuse of pre-recorded motion , 2000, Proceedings Computer Animation 2000.

[13]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[14]  Luc Van Gool,et al.  Real-time range scanning of deformable surfaces by adaptively coded structured light , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[15]  B. Buchholz,et al.  Anthropometric data for describing the kinematics of the human hand. , 1992, Ergonomics.

[16]  Luís B. Almeida,et al.  Speeding up Backpropagation , 1990 .

[17]  Mark Harmon Multi-player residual advantage learning with general function , 1996 .

[18]  Edwin Catmull,et al.  A system for computer generated movies , 1972, ACM Annual Conference.

[19]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[20]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[21]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[22]  John P. Lewis,et al.  Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation , 2000, SIGGRAPH.

[23]  Thibault Langlois,et al.  Parameter adaptation in stochastic optimization , 1999 .

[24]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[25]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[26]  R. Sutton Gain Adaptation Beats Least Squares , 2006 .

[27]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[28]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[29]  Ying Wu,et al.  Capturing natural hand articulation , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[30]  Marc Levoy,et al.  Real-time 3D model acquisition , 2002, ACM Trans. Graph..

[31]  Peter Litwinowicz,et al.  Inkwell: A 2-D animation system , 1991, SIGGRAPH.

[32]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.