Gaussian Process Latent Variable Models for Human Pose Estimation

We describe a method for recovering 3D human body pose from silhouettes. Our model is based on learning a latent space using the Gaussian Process Latent Variable Model (GP-LVM) [1] encapsulating both pose and silhouette features Our method is generative, this allows us to model the ambiguities of a silhouette representation in a principled way. We learn a dynamical model over the latent space which allows us to disambiguate between ambiguous silhouettes by temporal consistency. The model has only two free parameters and has several advantages over both regression approaches and other generative methods. In addition to the application shown in this paper the suggested model is easily extended to multiple observation spaces without constraints on type.

[1]  Trevor Darrell,et al.  Inferring 3D structure with a statistical image-based shape model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Fei Wang,et al.  Semi-definite Manifold Alignment , 2007, ECML.

[3]  Ankur Agarwal,et al.  Machine Learning for Image Based Motion Capture , 2006 .

[4]  Daniel D. Lee,et al.  Learning a manifold-constrained map between image sets: applications to matching and pose estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Jitendra Malik,et al.  Efficient shape matching using shape contexts , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[7]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[8]  Pushmeet Kohli,et al.  PoseCut: Simultaneous Segmentation and 3D Pose Estimation of Humans Using Dynamic Graph-Cuts , 2006, ECCV.

[9]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[10]  Neil D. Lawrence,et al.  Hierarchical Gaussian process latent variable models , 2007, ICML '07.

[11]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Christopher M. Bishop,et al.  Non-linear Bayesian Image Modelling , 2000, ECCV.

[14]  Thomas S. Huang,et al.  Discriminative estimation of 3D human pose using Gaussian processes , 2008, 2008 19th International Conference on Pattern Recognition.

[15]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[16]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[17]  Hee-Su Choi,et al.  Kernel Isomap , 2005 .

[18]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Cristian Sminchisescu,et al.  Human Pose Estimation from Silhouettes - A Consistent Approach Using Distance Level Sets , 2002, WSCG.

[20]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[21]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[22]  Godfried T. Toussaint,et al.  Relative neighborhood graphs and their relatives , 1992, Proc. IEEE.

[23]  Daniel D. Lee,et al.  Semisupervised alignment of manifolds , 2005, AISTATS.

[24]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[25]  James M. Rehg,et al.  Singularity analysis for articulated object tracking , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[26]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[27]  Colin Fyfe,et al.  A Gaussian process latent variable model formulation of canonical correlation analysis , 2006, ESANN.

[28]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[30]  Michael J. Black,et al.  Cardboard people: a parameterized model of articulated image motion , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[31]  Björn Stenger,et al.  Pose estimation and tracking using multivariate regression , 2008, Pattern Recognit. Lett..

[32]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[33]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34]  Thomas V. Wiecki,et al.  The independent components of natural images are perceptually dependent , 2007, Electronic Imaging.

[35]  Godfried T. Toussaint,et al.  The relative neighbourhood graph of a finite planar set , 1980, Pattern Recognit..

[36]  Rómer Rosales,et al.  Learning Body Pose via Specialized Maps , 2001, NIPS.

[37]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[38]  Christopher M. Bishop,et al.  GTM: A Principled Alternative to the Self-Organizing Map , 1996, NIPS.

[39]  David J. Fleet,et al.  Shared Kernel Information Embedding for discriminative inference , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[41]  Malte Kuss,et al.  The Geometry Of Kernel Canonical Correlation Analysis , 2003 .

[42]  Joaquin Quiñonero Candela,et al.  Local distance preservation in the GP-LVM through back constraints , 2006, ICML.

[43]  David J. Fleet,et al.  Priors for people tracking from small training sets , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[44]  Snigdhansu Chatterjee,et al.  Procrustes Problems , 2005, Technometrics.

[45]  Luc Van Gool,et al.  Full body tracking from multiple views using stochastic sampling , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[46]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[47]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[48]  Trevor Darrell,et al.  Discriminative Gaussian process latent variable model for classification , 2007, ICML '07.

[49]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[50]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[51]  David W. Murray,et al.  Regression-based Hand Pose Estimation from Multiple Cameras , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[52]  James M. Rehg,et al.  A multiple hypothesis approach to figure tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[53]  Pascal Fua,et al.  Local deformation models for monocular 3D shape recovery , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[55]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[56]  Prosenjit Bose,et al.  Characterizing proximity trees , 1996, Algorithmica.

[57]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[58]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[59]  David J. Fleet,et al.  Topologically-constrained latent variable models , 2008, ICML '08.

[60]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[61]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[62]  Rui Li,et al.  Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[63]  Trevor Darrell,et al.  Rank priors for continuous non-linear dimensionality reduction , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Philip H. S. Torr,et al.  Randomized trees for human pose detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[66]  David J. Fleet,et al.  Multifactor Gaussian process models for style-content separation , 2007, ICML '07.

[67]  David J. Fleet,et al.  Temporal motion models for monocular and multiview 3D human body tracking , 2006, Comput. Vis. Image Underst..

[68]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[69]  Trevor Darrell,et al.  Sparse probabilistic regression for activity-independent human pose inference , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Aaron Hertzmann,et al.  Style-based inverse kinematics , 2004, ACM Trans. Graph..

[71]  Herbert A. Simon,et al.  The Sciences of the Artificial - 3rd Edition , 1981 .

[72]  Neil A. Thacker,et al.  Real-time Body Tracking Using a Gaussian Process Latent Variable Model , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[73]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[74]  Hans-Hellmut Nagel,et al.  Tracking Persons in Monocular Image Sequences , 1999, Comput. Vis. Image Underst..

[75]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[76]  Rajesh P. N. Rao,et al.  Learning Shared Latent Structure for Image Synthesis and Robotic Imitation , 2005, NIPS.

[77]  Ankur Agarwal,et al.  3D human pose from silhouettes by relevance vector regression , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[78]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[79]  MalikJitendra,et al.  Efficient Shape Matching Using Shape Contexts , 2005 .

[80]  Philip H. S. Torr,et al.  Regression-Based Human Motion Capture From Voxel Data , 2006, BMVC.

[81]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[82]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[83]  Cristian Sminchisescu,et al.  Estimating Articulated Human Motion with Covariance Scaled Sampling , 2003, Int. J. Robotics Res..