论文信息 - DeepSSM: Deep State-Space Model for 3D Human Motion Prediction.

DeepSSM: Deep State-Space Model for 3D Human Motion Prediction.

Predicting future human motion plays a significant role in human-machine interactions for a variety of real-life applications. In this paper, we build a deep state-space model, DeepSSM, to predict future human motion. Specifically, we formulate the human motion system as the state-space model of a dynamic system and model the motion system by the state-space theory, offering a unified formulation for diverse human motion systems. Moreover, a novel deep network is designed to build this system, enabling us to utilize both the advantages of deep network and state-space model. The deep network jointly models the process of both the state-state transition and the state-observation transition of the human motion system, and multiple future poses can be generated via the state-observation transition of the model recursively. To improve the modeling ability of the system, a unique loss function, ATPL (Attention Temporal Prediction Loss), is introduced to optimize the model, encouraging the system to achieve more accurate predictions by paying increasing attention to the early time-steps. The experiments on two benchmark datasets (i.e., Human3.6M and 3DPW) confirm that our method achieves state-of-the-art performance with improved effectiveness. The code will be available if the paper is accepted.

[1] Yaqiao Li,et al. Human Motion Prediction Via Pattern Completion in Latent Representation Space , 2019, 2019 16th Conference on Computer and Robot Vision (CRV).

[2] Yanfeng Wang,et al. Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Michael J. Black,et al. On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Cristian Sminchisescu,et al. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] C. Lee Giles,et al. A Neural Temporal Model for Human Motion Prediction , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Otmar Hilliges,et al. Structured Prediction Helps 3D Human Motion Modelling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7] Matthias W. Seeger,et al. Deep State Space Models for Time Series Forecasting , 2018, NeurIPS.

[8] Marc Toussaint,et al. Prediction of Human Full-Body Movements with Motion Optimization and Recurrent Neural Networks , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[9] Roger Zimmermann,et al. Towards Natural and Accurate Future Motion Prediction of Humans and Animals , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Yong Du,et al. Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Jitendra Malik,et al. Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13] José M. F. Moura,et al. Adversarial Geometry-Aware Human Motion Prediction , 2018, ECCV.

[14] X. R. Li,et al. Survey of maneuvering target tracking. Part I. Dynamic models , 2003 .

[15] Bodo Rosenhahn,et al. Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[16] José M. F. Moura,et al. Few-Shot Human Motion Prediction via Meta-learning , 2018, ECCV.

[17] Qi Tian,et al. Symbiotic Graph Neural Networks for 3D Skeleton-Based Human Action Recognition and Motion Prediction , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Hongdong Li,et al. Learning Trajectory Dependencies for Human Motion Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19] Bin Kang,et al. TEA: Temporal Excitation and Aggregation for Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Shuicheng Yan,et al. Predicting Scene Parsing and Motion Dynamics in the Future , 2017, NIPS.

[21] Jake K. Aggarwal,et al. Human motion analysis: a review , 1999, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[22] Zhen Zhang,et al. Convolutional Sequence to Sequence Model for Human Dynamics , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Yun Fu,et al. Human Action Recognition and Prediction: A Survey , 2018, International Journal of Computer Vision.

[24] R. Roesser. A discrete state-space model for linear image processing , 1975 .

[25] Danica Kragic,et al. Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Silvio Savarese,et al. Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Francesc Moreno-Noguer,et al. Human Motion Prediction via Spatio-Temporal Inpainting , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28] Juan Carlos Niebles,et al. Action-Agnostic Human Pose Forecasting , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[29] Xiao Guo,et al. Human Motion Prediction via Learning Local Structure Representations and Temporal Dependencies , 2019, AAAI.

[30] Jiashi Feng,et al. VRED: A Position-Velocity Recurrent Encoder-Decoder for Human Motion Prediction , 2019, ArXiv.

[31] Maximilian Karl,et al. Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.