Learning Monocular Regression of 3D People in Crowds via Scene-aware Blending and De-occlusion
Yu Sun,Lubing Xu,Qian Bao,Wu Liu,Wenpeng Gao,Yili Fu
DOI: https://doi.org/10.1109/tmm.2023.3294820
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:In this study, we address the challenge of estimating 3D body pose, shape, and depth relationships from single RGB images in crowded scenes. The difficulty lies in the limited availability of in-the-wild training samples, which feature densely populated scenes. To mitigate this issue, we introduce a synthesis-based approach that fuses multiple human samples into a single composite scene. Our innovative scene-aware blending technique maintains human-scene consistency by positioning individuals within plausible locations and adjusting their scales to conform to 3D settings. Furthermore, our method enables flexible per-subject occlusion management during the blending process, bolstering the robustness of 3D human body representations through a novel de-occlusion training scheme. We present a one-stage model, CBD, designed to learn monocular regression of 3D people in crowds by leveraging blending and de-occlusion techniques. Our quantitative and qualitative evaluations on four benchmark datasets reveal that CBD surpasses existing state-of-the-art approaches in terms of 3D human pose and mesh regression accuracy, thereby establishing it as a promising solution for monocular 3D human mesh recovery in densely populated scenes.
computer science, information systems,telecommunications, software engineering