Stochastic optimisation for high-dimensional tracking in dense range maps

The main challenge of tracking articulated structures like hands is their many degrees of freedom (DOFs). A realistic 3-D model of the human hand has at least 26 DOFs. The arsenal of tracking approaches that can track such structures fast and reliably is still very small. This paper proposes a tracker based on stochastic meta-descent (SMD) for optimisations in such high-dimensional state spaces. This new algorithm is based on a gradient descent approach with adaptive and parameter-specific step sizes. The SMD tracker facilitates the integration of constraints, and combined with a stochastic sampling technique, can get out of spurious local minima. Furthermore, the integration of a deformable hand model based on linear blend skinning and anthropometrical measurements reinforces the robustness of the tracker. Experiments show the efficiency of the SMD algorithm in comparison with common optimisation methods.

[1]  John P. Lewis,et al.  Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation , 2000, SIGGRAPH.

[2]  R. Sutton Gain Adaptation Beats Least Squares , 2006 .

[3]  Hans-Hellmut Nagel,et al.  Tracking Persons in Monocular Image Sequences , 1999, Comput. Vis. Image Underst..

[4]  Jacob D. Furst,et al.  An improved articulated model of the human hand , 2001, The Visual Computer.

[5]  Edwin Catmull A system for computer generated movies , 1998 .

[6]  Carlo Tomasi,et al.  3D tracking = classification + interpolation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[8]  Mance E. Harmon,et al.  Multi-Agent Residual Advantage Learning with General Function Approximation. , 1996 .

[9]  B. Buchholz,et al.  Anthropometric data for describing the kinematics of the human hand. , 1992, Ergonomics.

[10]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[11]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[12]  Thibault Langlois,et al.  Parameter adaptation in stochastic optimization , 1999 .

[13]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[14]  Michael Girard,et al.  Computational modeling for the computer animation of legged figures , 1998 .

[15]  Ying Wu,et al.  Capturing natural hand articulation , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[16]  Stan Sclaroff,et al.  3D hand pose reconstruction using specialized mappings , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[17]  Michael Gleicher,et al.  Building efficient, accurate character skins from examples , 2003, ACM Trans. Graph..

[18]  William H. Press,et al.  Numerical recipes in C , 2002 .

[19]  Lance Williams,et al.  Motion signal processing , 1995, SIGGRAPH.

[20]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[21]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[22]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[23]  Daniel Thalmann,et al.  Joint-dependent local deformations for hand animation and object grasping , 1989 .

[24]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[25]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[26]  Mark Harmon Multi-player residual advantage learning with general function , 1996 .

[27]  Koji Komatsu,et al.  Human skin model capable of natural shape variation , 1988, The Visual Computer.

[28]  Donald H. House,et al.  An integrated approach towards the representation, manipulation and reuse of pre-recorded motion , 2000, Proceedings Computer Animation 2000.

[29]  Marc Levoy,et al.  Real-time 3D model acquisition , 2002, ACM Trans. Graph..

[30]  Björn Stenger,et al.  Filtering using a tree-based estimator , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[31]  Peter Litwinowicz,et al.  Inkwell: A 2-D animation system , 1991, SIGGRAPH.

[32]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Luc Van Gool,et al.  Smart particle filtering for 3D hand tracking , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[34]  Edwin Catmull,et al.  A system for computer generated movies , 1972, ACM Annual Conference.

[35]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[36]  Björn Stenger,et al.  Hand Pose Estimation Using Hierarchical Detection , 2004, ECCV Workshop on HCI.

[37]  Cristian Sminchisescu,et al.  Monocular tracking of the human arm in 3D , 1995, Proceedings of IEEE International Conference on Computer Vision.

[38]  Stan Sclaroff,et al.  An appearance-based framework for 3D hand shape classification and camera viewpoint estimation , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[39]  Luc Van Gool,et al.  Real-time range scanning of deformable surfaces by adaptively coded structured light , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..