Learning Articulated Skeletons from Motion

Humans demonstrate a remarkable ability to parse complicated motion sequences into their constituent structures and motions. We investigate this problem, attempting to learn the structure of one or more articulated objects, given a time-series of feature positions. We model the observed sequence in terms of “stick figure” objects, under the assumption that the relative joint angles between sticks can change over time, but their lengths and connectivities are fixed. We formulate the problem in a single probabilistic model that includes multiple sub-components: associating the features with particular sticks, determining the proper number of sticks, and finding which sticks are physically joined. We test the algorithm on challenging 2D and 3D datasets including optical human motion capture and video of walking giraffes.

[1]  D Thalmann,et al.  Using skeleton-based tracking to increase the reliability of optical motion capture. , 2001, Human movement science.

[2]  Trevor Darrell,et al.  Recovering Articulated Model Topology from Observed Rigid Motion , 2002, NIPS.

[3]  Michael Isard,et al.  Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation , 2003, NIPS.

[4]  Yair Weiss,et al.  Finding the M Most Probable Configurations in Arbitrary Graphical Models , 2003, NIPS.

[5]  Yang Song,et al.  Unsupervised Learning of Human Motion , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Mohammed Yeasin,et al.  Automatic acquisition and initialization of articulated models , 2003, Machine Vision and Applications.

[7]  Aaron Hertzmann,et al.  Learning Non-Rigid 3D Shape from 2D Motion , 2003, NIPS.

[8]  Frank Dellaert,et al.  EM, MCMC, and Chain Flipping for Structure from Motion with Unknown Correspondence , 2004, Machine Learning.

[9]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[10]  Y. Weiss,et al.  Multibody factorization with uncertainty and missing data using the EM algorithm , 2004, CVPR 2004.

[11]  Takeo Kanade,et al.  A Multibody Factorization Method for Independently Moving Objects , 1998, International Journal of Computer Vision.

[12]  David A. Forsyth,et al.  Skeletal parameter estimation from optical motion capture data , 2004, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Marc Pollefeys,et al.  Automatic Kinematic Chain Building from Feature Trajectories of Articulated Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Pushmeet Kohli,et al.  PoseCut: Simultaneous Segmentation and 3D Pose Estimation of Humans Using Dynamic Graph-Cuts , 2006, ECCV.

[15]  Marc Pollefeys,et al.  A General Framework for Motion Segmentation: Independent, Articulated, Rigid, Non-rigid, Degenerate and Non-degenerate , 2006, ECCV.

[16]  David A. Forsyth,et al.  Building models of animals from video , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[18]  Andrew Zisserman,et al.  Multiple View Geometry , 2009, Encyclopedia of Biometrics.