Implicit Decouple Network for Efficient Pose Estimation

Lei Zhao,Le Han,Min Yao,Nenggan Zheng
DOI: https://doi.org/10.1145/3581783.3611790
2023-01-01
Abstract:In the field of pose estimation, keypoint representations can take the form of Gaussian heatmaps, classification vectors, or direct coordinates. However, the current networks suffer from a lack of consistency with these keypoint representations. They only accommodate these representations in the final layer, resulting in suboptimal efficiency and requiring a high number of parameters or computational resources. In this paper, we propose a simple yet efficient plug-and-play module, named the Implicit Decouple Module (IDM), which decouples features into two parts along the x-y axes and aggregates features in a direction-aware manner. This approach implicitly fuses direction-specific coordinate information, improving the consistency with the keypoint representations, especially in vector form. Furthermore, we introduce a fully convolutional backbone network, named the Implicit Decouple Network (IDN), which incorporates IDM without the need to maintain high-resolution features, dense multi-level feature fusion, or lots of repeated stages, while still achieving high performance. In experiments on the COCO dataset, our basic IDN without pre-training can outperform HRNet (28.5M) by 2.4 AP with 18.2M parameters, and even surpass some transformer-based methods. In the lightweight model scenario, our model outstrips Lite-HRNet by 3.9 AP with only 2.5M parameters. We also evaluate our model on the person instance segmentation task and other datasets, demonstrating its generality and effectiveness. http(s)://znk.ink/su/mm23idn.
What problem does this paper attempt to address?