Inter-X: Towards Versatile Human-Human Interaction Analysis

Liang Xu,Xintao Lv,Yichao Yan,Xin Jin,Shuwen Wu,Congsheng Xu,Yifan Liu,Yizhou Zhou,Fengyun Rao,Xingdong Sheng,Yunhui Liu,Wenjun Zeng,Xiaokang Yang
DOI: https://doi.org/10.48550/arXiv.2312.16051
2023-12-26
Computer Vision and Pattern Recognition
Abstract:The analysis of the ubiquitous human-human interactions is pivotal for understanding humans as social beings. Existing human-human interaction datasets typically suffer from inaccurate body motions, lack of hand gestures and fine-grained textual descriptions. To better perceive and generate human-human interactions, we propose Inter-X, a currently largest human-human interaction dataset with accurate body movements and diverse interaction patterns, together with detailed hand gestures. The dataset includes ~11K interaction sequences and more than 8.1M frames. We also equip Inter-X with versatile annotations of more than 34K fine-grained human part-level textual descriptions, semantic interaction categories, interaction order, and the relationship and personality of the subjects. Based on the elaborate annotations, we propose a unified benchmark composed of 4 categories of downstream tasks from both the perceptual and generative directions. Extensive experiments and comprehensive analysis show that Inter-X serves as a testbed for promoting the development of versatile human-human interaction analysis. Our dataset and benchmark will be publicly available for research purposes.
What problem does this paper attempt to address?