Motion Capture from Inertial and Vision Sensors

Xiaodong Chen,Wu Liu,Qian Bao,Xinchen Liu,Quanwei Yang,Ruoli Dai,Tao Mei
2024-07-23
Abstract:Human motion capture is the foundation for many computer vision and graphics tasks. While industrial motion capture systems with complex camera arrays or expensive wearable sensors have been widely adopted in movie and game production, consumer-affordable and easy-to-use solutions for personal applications are still far from mature. To utilize a mixture of a monocular camera and very few inertial measurement units (IMUs) for accurate multi-modal human motion capture in daily life, we contribute MINIONS in this paper, a large-scale Motion capture dataset collected from INertial and visION Sensors. MINIONS has several featured properties: 1) large scale of over five million frames and 400 minutes duration; 2) multi-modality data of IMUs signals and RGB videos labeled with joint positions, joint rotations, SMPL parameters, etc.; 3) a diverse set of 146 fine-grained single and interactive actions with textual descriptions. With the proposed MINIONS, we conduct experiments on multi-modal motion capture and explore the possibilities of consumer-affordable motion capture using a monocular camera and very few IMUs. The experiment results emphasize the unique advantages of inertial and vision sensors, showcasing the promise of consumer-affordable multi-modal motion capture and providing a valuable resource for further research and development.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the issues of high cost and complexity in the application of current motion capture systems in daily life. Specifically, the goal of the paper is to achieve multimodal human motion capture using affordable and easy-to-use consumer devices (such as monocular cameras and a small number of inertial measurement units (IMUs)) while ensuring accuracy. Existing industrial-grade motion capture systems, although precise, are expensive and complex to configure, making them unsuitable for personal users. Additionally, the paper points out that current datasets (such as TotalCapture) lack sufficient variety in scenes, subjects, and types of actions, and require tight-fitting clothing for data collection, which differs from everyday attire, limiting their widespread application. To address these issues, the authors constructed a large-scale multimodal motion capture dataset called MINIONS, which includes various sensor data (such as RGB videos and IMU signals) and provides detailed annotations of joint positions, rotations, SMPL parameters, and other information. Experiments have validated the feasibility of stable motion capture using a monocular camera and a small number of IMUs, demonstrating the potential of this method in practical applications.