DecenterNet: Bottom-Up Human Pose Estimation Via Decentralized Pose Representation

Tao Wang,Lei Jin,Zhang Wang,Xiaojin Fan,Yu Cheng,Yinglei Teng,Junliang Xing,Jian Zhao
DOI: https://doi.org/10.1145/3581783.3611989
2023-01-01
Abstract:Multi-person pose estimation in crowded scenes remains a very challenging task. This paper finds that most previous methods fail to estimate or group visible keypoints in crowded scenes rather than reasoning invisible keypoints. We thus categorize the crowded scenes into entanglement and occlusion based on the visibility of human parts and observe that entanglement is a significant problem in crowded scenes. With this observation, we propose DecenterNet, an end-to-end deep architecture to perform robust and efficient pose estimation in crowded scenes. Within DecenterNet, we introduce a decentralized pose representation that uses all visible keypoints as the root points to represent human poses, which is more robust in the entanglement area. We also propose a decoupled pose assessment mechanism, which introduces a location map to adaptively select optimal poses in the offset map. In addition, we have constructed a new dataset named SkatingPose, containing more entangled scenes. The proposed DecenterNet surpasses the best method on SkatingPose by 1.8 AP. Furthermore, DecenterNet obtains 71.2 AP and 71.4 AP on the COCO and CrowdPose datasets, respectively, demonstrating the superiority of our method. We will release our source code, trained models, and dataset to facilitate further studies in this research direction. Our code and dataset are available in https://github.com/InvertedForest/DecenterNet.
What problem does this paper attempt to address?