DeepSSM: Deep State-Space Model for 3D Human Motion Prediction.

Predicting future human motion plays a significant role in human-machine interactions for a variety of real-life applications. In this paper, we build a deep state-space model, DeepSSM, to predict future human motion. Specifically, we formulate the human motion system as the state-space model of a dynamic system and model the motion system by the state-space theory, offering a unified formulation for diverse human motion systems. Moreover, a novel deep network is designed to build this system, enabling us to utilize both the advantages of deep network and state-space model. The deep network jointly models the process of both the state-state transition and the state-observation transition of the human motion system, and multiple future poses can be generated via the state-observation transition of the model recursively. To improve the modeling ability of the system, a unique loss function, ATPL (Attention Temporal Prediction Loss), is introduced to optimize the model, encouraging the system to achieve more accurate predictions by paying increasing attention to the early time-steps. The experiments on two benchmark datasets (i.e., Human3.6M and 3DPW) confirm that our method achieves state-of-the-art performance with improved effectiveness. The code will be available if the paper is accepted.

[1]  Yaqiao Li,et al.  Human Motion Prediction Via Pattern Completion in Latent Representation Space , 2019, 2019 16th Conference on Computer and Robot Vision (CRV).

[2]  Yanfeng Wang,et al.  Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  C. Lee Giles,et al.  A Neural Temporal Model for Human Motion Prediction , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Otmar Hilliges,et al.  Structured Prediction Helps 3D Human Motion Modelling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Matthias W. Seeger,et al.  Deep State Space Models for Time Series Forecasting , 2018, NeurIPS.

[8]  Marc Toussaint,et al.  Prediction of Human Full-Body Movements with Motion Optimization and Recurrent Neural Networks , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Roger Zimmermann,et al.  Towards Natural and Accurate Future Motion Prediction of Humans and Animals , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  José M. F. Moura,et al.  Adversarial Geometry-Aware Human Motion Prediction , 2018, ECCV.

[14]  X. R. Li,et al.  Survey of maneuvering target tracking. Part I. Dynamic models , 2003 .

[15]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[16]  José M. F. Moura,et al.  Few-Shot Human Motion Prediction via Meta-learning , 2018, ECCV.

[17]  Qi Tian,et al.  Symbiotic Graph Neural Networks for 3D Skeleton-Based Human Action Recognition and Motion Prediction , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Hongdong Li,et al.  Learning Trajectory Dependencies for Human Motion Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Bin Kang,et al.  TEA: Temporal Excitation and Aggregation for Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Shuicheng Yan,et al.  Predicting Scene Parsing and Motion Dynamics in the Future , 2017, NIPS.

[21]  Jake K. Aggarwal,et al.  Human motion analysis: a review , 1999, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[22]  Zhen Zhang,et al.  Convolutional Sequence to Sequence Model for Human Dynamics , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Yun Fu,et al.  Human Action Recognition and Prediction: A Survey , 2018, International Journal of Computer Vision.

[24]  R. Roesser A discrete state-space model for linear image processing , 1975 .

[25]  Danica Kragic,et al.  Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Francesc Moreno-Noguer,et al.  Human Motion Prediction via Spatio-Temporal Inpainting , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Juan Carlos Niebles,et al.  Action-Agnostic Human Pose Forecasting , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[29]  Xiao Guo,et al.  Human Motion Prediction via Learning Local Structure Representations and Temporal Dependencies , 2019, AAAI.

[30]  Jiashi Feng,et al.  VRED: A Position-Velocity Recurrent Encoder-Decoder for Human Motion Prediction , 2019, ArXiv.

[31]  Maximilian Karl,et al.  Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.