Unsupervised Learning of Skeletons from Motion

Humans demonstrate a remarkable ability to parse complicated motion sequences into their constituent structures and motions. We investigate this problem, attempting to learn the structure of one or more articulated objects, given a time-series of two-dimensional feature positions. We model the observed sequence in terms of "stick figure" objects, under the assumption that the relative joint angles between sticks can change over time, but their lengths and connectivities are fixed. We formulate the problem in a single probabilistic model that includes multiple sub-components: associating the features with particular sticks, determining the proper number of sticks, and finding which sticks are physically joined. We test the algorithm on challenging datasets of 2D projections of optical human motion capture and feature trajectories from real videos.

[1]  David A. Forsyth,et al.  Skeletal parameter estimation from optical motion capture data , 2004, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Aaron Hertzmann,et al.  Learning Non-Rigid 3D Shape from 2D Motion , 2003, NIPS.

[3]  Pushmeet Kohli,et al.  PoseCut: Simultaneous Segmentation and 3D Pose Estimation of Humans Using Dynamic Graph-Cuts , 2006, ECCV.

[4]  David A. Ross Learning Probabilistic Models for Visual Motion , 2008 .

[5]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[6]  Trevor Darrell,et al.  Recovering Articulated Model Topology from Observed Rigid Motion , 2002, NIPS.

[7]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[8]  Stefan Carlsson,et al.  Uncalibrated Motion Capture Exploiting Articulated Structure Constraints , 2004, International Journal of Computer Vision.

[9]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[10]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[11]  Frank Dellaert,et al.  EM, MCMC, and Chain Flipping for Structure from Motion with Unknown Correspondence , 2004, Machine Learning.

[12]  Y. Weiss,et al.  Multibody factorization with uncertainty and missing data using the EM algorithm , 2004, CVPR 2004.

[13]  D Thalmann,et al.  Using skeleton-based tracking to increase the reliability of optical motion capture. , 2001, Human movement science.

[14]  David A. Forsyth,et al.  Building models of animals from video , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Marc Pollefeys,et al.  Automatic Kinematic Chain Building from Feature Trajectories of Articulated Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Yang Song,et al.  Unsupervised Learning of Human Motion , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Marc Pollefeys,et al.  A General Framework for Motion Segmentation: Independent, Articulated, Rigid, Non-rigid, Degenerate and Non-degenerate , 2006, ECCV.

[19]  Gene H. Golub,et al.  Matrix computations , 1983 .

[20]  Takeo Kanade,et al.  A Multibody Factorization Method for Independently Moving Objects , 1998, International Journal of Computer Vision.

[21]  Thomas Viklands Algorithms for the Weighted Orthogonal Procrustes Problem and other Least Squares Problems , 2006 .