AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation

In this letter, we present a novel markerless 3D human motion capture (MoCap) system for unstructured, outdoor environments that uses a team of autonomous unmanned aerial vehicles (UAVs) with on-board RGB cameras and computation. Existing methods are limited by calibrated cameras and offline processing. Thus, we present the first method (AirPose) to estimate human pose and shape using images captured by multiple extrinsically uncalibrated flying cameras. AirPose itself calibrates the cameras relative to the person instead of relying on any pre-calibration. It uses distributed neural networks running on each UAV that communicate viewpoint-independent information with each other about the person (i.e., their 3D shape and articulated pose). The person’s shape and pose are parameterized using the SMPL-X body model, resulting in a compact representation, that minimizes communication between the UAVs. The network is trained using synthetic images of realistic virtual environments, and fine-tuned on a small set of real images. We also introduce an optimization-based post-processing method (AirPose+) for offline applications that require higher MoCap quality. We make our method’s code and data available for research at https://github.com/robot-perception-group/AirPose. A video describing the approach and results is available at https://youtu.be/xLYe1TNHsfs.

[1]  David L. Mills,et al.  Internet time synchronization: the network time protocol , 1991, IEEE Trans. Commun..

[2]  Michael J. Black,et al.  Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Wenjun Zeng,et al.  Cross View Fusion for 3D Human Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Xiaowei Zhou,et al.  Harvesting Multiple Views for Marker-Less 3D Human Pose Annotations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Heinrich H. Bülthoff,et al.  Deep Neural Network-Based Cooperative Visual Tracking Through Multiple Micro Aerial Vehicles , 2018, IEEE Robotics and Automation Letters.

[6]  M. M. Reijne,et al.  Accuracy of human motion capture systems for sport applications; state-of-the-art review , 2018, European journal of sport science.

[7]  Heinrich H. Bülthoff,et al.  Active Perception Based Formation Control for Multiple Aerial Vehicles , 2019, IEEE Robotics and Automation Letters.

[8]  Qionghai Dai,et al.  3D Pose Detection of Closely Interactive Humans Using Multi-View Cameras , 2019, Sensors.

[9]  Steven Schwarcz,et al.  3D Human Pose Estimation from Deep Multi-View 2D Pose , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[10]  Javier Alonso-Mora,et al.  Flycon: real-time environment-independent multi-view human pose estimation with aerial vehicles , 2019, ACM Trans. Graph..

[11]  Nicolas Padoy,et al.  A generalizable approach for multi-view 3D human pose regression , 2018, Machine Vision and Applications.

[12]  Joachim Tesch,et al.  AGORA: Avatars in Geography Optimized for Regression Analysis , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Vijay Kumar,et al.  Human Motion Capture Using a Drone , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Chongyang Ma,et al.  Deep Volumetric Video From Very Sparse Multi-view Performance Capture , 2018, ECCV.

[15]  Victor Lempitsky,et al.  Learnable Triangulation of Human Pose , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Yinghao Huang,et al.  Towards Accurate Marker-Less Human Shape and Pose Estimation over Time , 2017, 2017 International Conference on 3D Vision (3DV).

[19]  Yizhou Wang,et al.  MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[21]  Qionghai Dai,et al.  FlyCap: Markerless Motion Capture Using Multiple Autonomous Flying Cameras , 2016, IEEE Transactions on Visualization and Computer Graphics.

[22]  Bernt Schiele,et al.  Multi-view Pictorial Structures for 3D Human Pose Estimation , 2013, BMVC.

[23]  C. Karen Liu,et al.  Leveraging depth cameras and wearable pressure sensors for full-body kinematics and dynamics capture , 2014, ACM Trans. Graph..

[24]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[25]  Michael J. Black,et al.  Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Ming C. Lin,et al.  Shape-Aware Human Pose and Shape Reconstruction Using Multi-View Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .