Perpetual Motion: Generating Unbounded Human Motion

The modeling of human motion using machine learning methods has been widely studied. In essence it is a time-series modeling problem involving predicting how a person will move in the future given how they moved in the past. Existing methods, however, typically have a short time horizon, predicting a only few frames to a few seconds of human motion. Here we focus on long-term prediction; that is, generating long sequences (potentially infinite) of human motion that is plausible. Furthermore, we do not rely on a long sequence of input motion for conditioning, but rather, can predict how someone will move from as little as a single pose. Such a model has many uses in graphics (video games and crowd animation) and vision (as a prior for human motion estimation or for dataset creation). To address this problem, we propose a model to generate non-deterministic, \textit{ever-changing}, perpetual human motion, in which the global trajectory and the body pose are cross-conditioned. We introduce a novel KL-divergence term with an implicit, unknown, prior. We train this using a heavy-tailed function of the KL divergence of a white-noise Gaussian process, allowing latent sequence temporal dependency. We perform systematic experiments to verify its effectiveness and find that it is superior to baseline methods.

[1]  Quoc V. Le,et al.  Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[2]  Juan Carlos Niebles,et al.  Imitation Learning for Human Pose Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Nikolaus F. Troje,et al.  AMASS: Archive of Motion Capture As Surface Shapes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Michael J. Black,et al.  MoSh: motion and shape capture from sparse markers , 2014, ACM Trans. Graph..

[6]  Dimitrios Tzionas,et al.  Resolving 3D Human Pose Ambiguities With 3D Scene Constraints , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Otmar Hilliges,et al.  Learning Human Motion Models for Long-Term Predictions , 2017, 2017 International Conference on 3D Vision (3DV).

[8]  Otmar Hilliges,et al.  Structured Prediction Helps 3D Human Motion Modelling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[11]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yan Zhang,et al.  Generating 3D People in Scenes Without People , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  José M. F. Moura,et al.  Few-Shot Human Motion Prediction via Meta-learning , 2018, ECCV.

[14]  Christian Osendorfer,et al.  Learning Stochastic Recurrent Networks , 2014, NIPS 2014.

[15]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[16]  Francesc Moreno-Noguer,et al.  Human Motion Prediction via Spatio-Temporal Inpainting , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Yi Zhou,et al.  On the Continuity of Rotation Representations in Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Brendan Tran Morris,et al.  Convolutional Neural Networkfor Trajectory Prediction , 2018, ECCV Workshops.

[19]  Dario Pavllo,et al.  QuaterNet: A Quaternion-based Recurrent Model for Human Motion , 2018, BMVC.

[20]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[21]  Dahua Lin,et al.  Convolutional Sequence Generation for Skeleton-Based Action Synthesis , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Otmar Hilliges,et al.  STCN: Stochastic Temporal Convolutional Networks , 2019, ICLR.

[23]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[24]  José M. F. Moura,et al.  Adversarial Geometry-Aware Human Motion Prediction , 2018, ECCV.

[25]  Dario Pavllo,et al.  Modeling Human Motion with Quaternion-Based Neural Networks , 2019, International Journal of Computer Vision.

[26]  Zhen Zhang,et al.  Convolutional Sequence to Sequence Model for Human Dynamics , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[28]  Sebastian Starke,et al.  Neural state machine for character-scene interactions , 2019, ACM Trans. Graph..

[29]  Taku Komura,et al.  Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[30]  Ali Razavi,et al.  Preventing Posterior Collapse with delta-VAEs , 2019, ICLR.

[31]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2015, ACM Trans. Graph..

[32]  Michel Barlaud,et al.  Two deterministic half-quadratic regularization algorithms for computed imaging , 1994, Proceedings of 1st International Conference on Image Processing.

[33]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[34]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.