HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling

Zhongang Cai,Daxuan Ren,Ailing Zeng,Zhengyu Lin,Tao Yu,Wenjia Wang,Xiangyu Fan,Yang Gao,Yifan Yu,Liang Pan,Fangzhou Hong,Mingyuan Zhang,Chen Change Loy,Lei Yang,Ziwei Liu

DOI: https://doi.org/10.48550/arXiv.2204.13686

2023-04-16

Abstract:4D human sensing and modeling are fundamental tasks in vision and graphics with numerous applications. With the advances of new sensors and algorithms, there is an increasing demand for more versatile datasets. In this work, we contribute HuMMan, a large-scale multi-modal 4D human dataset with 1000 human subjects, 400k sequences and 60M frames. HuMMan has several appealing properties: 1) multi-modal data and annotations including color images, point clouds, keypoints, SMPL parameters, and textured meshes; 2) popular mobile device is included in the sensor suite; 3) a set of 500 actions, designed to cover fundamental movements; 4) multiple tasks such as action recognition, pose estimation, parametric human recovery, and textured mesh reconstruction are supported and evaluated. Extensive experiments on HuMMan voice the need for further study on challenges such as fine-grained action recognition, dynamic human mesh reconstruction, point cloud-based parametric human recovery, and cross-device domain gaps.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the basic task of 4D (spatio - temporal) human perception and modeling in the fields of vision and graphics. With the development of new sensors and algorithms, the demand for more general - purpose datasets is increasing day by day. Specifically, the paper introduces HuMMan, a large - scale multi - modal 4D human dataset, which aims to support multiple perception and modeling tasks, such as action recognition, pose estimation, parametric human body recovery, and textured mesh reconstruction, etc. The HuMMan dataset has the following characteristics: 1. **Multi - modal data and annotations**: It includes color images, point clouds, key points, SMPL parameters, and textured meshes. 2. **Mobile devices**: Popular mobile devices are included in the data collection devices, such as iPhones with built - in LiDAR. 3. **Action set**: 500 actions are designed, covering basic human body movements. 4. **Support for multiple tasks**: It supports multiple tasks, such as action recognition, 2D and 3D pose estimation, 3D parametric human body recovery, and textured mesh reconstruction. Through these characteristics, the HuMMan dataset aims to promote more comprehensive research on human body perception and modeling, especially in the aspects of fine - grained action recognition, dynamic human body mesh reconstruction, point - cloud - based parametric human body recovery, and cross - device domain gaps.

HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling

MOtion Human Parsing - A New Benchmark for 3D Human Parsing.

HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive Media

Intestinal-prostatic impedance measurements in bulls

MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures

CZU-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors

MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing

FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions

Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes

Human-centric Scene Understanding for 3D Large-scale Scenarios

HSPACE: Synthetic Parametric Humans Animated in Complex Environments

Unsupervised Universal Hierarchical Multi-Person 3D Pose Estimation for Natural Scenes

HOI-M3:Capture Multiple Humans and Objects Interaction within Contextual Environment

Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes

HUMBI: A Large Multiview Dataset of Human Body Expressions and Benchmark Challenge

DGU-HAO: A Dataset With Daily Life Objects for Comprehensive 3D Human Action Analysis

Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset

SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling

PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling

Look Into Multi-Person: A New Benchmark For Pose Estimation And Human Parsing

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR