HuMoMM: A Multi-Modal Dataset and Benchmark for Human Motion Analysis

Xiong Zhang,Minghui Wang,Ming Zeng,Wenxiong Kang,Feiqi Deng
DOI: https://doi.org/10.1007/978-3-031-46305-1_17
2023-01-01
Abstract:Human motion analysis is a fundamental task in computer vision, and there is an increasing demand for versatile datasets with the development of deep learning. However, how to obtain the annotations of human motion, such as 3D keypoints and SMPL parameters, requires further research. In this work, we design a multi-view human motion capture system and develop a toolchain to generate multi-modal motion annotations. Additionally, we contribute HuMoMM, a large-scale multi-modal dataset which has the following characteristics: 1) multiple modalities, including two data formats, i.e., RGB and depth images, and four annotation formats, i.e., action categories, 2D keypoints, 3D keypoints, and SMPL parameters; 2) large-scale with 18 subjects, 30 actions, 3.5k sequences, and 262k frames; 3) multi-task for action recognition, 2D keypoint detection, 3D pose estimation and human mesh recovery. Furthermore, we provide a benchmark on HuMoMM to test the performance of popular methods in several related tasks. The experimental results demonstrate that HuMoMM holds significant research value. We expect HuMoMM can contribute to human motion-related research, and it is available at https://github.com/SCUT-BIP-Lab/HuMoMM .
What problem does this paper attempt to address?