SAMP: Adapting Segment Anything Model for Pose Estimation

Zhihang Zhu,Yunfeng Yan,Yi Chen,Haoyuan Jin,Xuesong Nie,Donglian Qi,Xi Chen
DOI: https://doi.org/10.1109/icme57554.2024.10688016
2024-01-01
Abstract:Segment Anything Model (SAM) exhibits superior performance for segmentation. Many follow-up works explore adapting this powerful model to specific domains. However, those works mainly focus on different sub-tasks of segmentation. The cross-task generalization ability of SAM is still not explored. In this paper, we propose SAMP (SAM for Pose), which makes the first attempt to adapt SAM for pose estimation. We observe that SAM could segment different human parts with specific prompts, proving that it contains the knowledge to understand the human structure. Considering that localizing keypoints requires fine-grained perceptual capabilities, we design a Detail-aware Adapter (DA-Adapter), which complements the features of the SAM encoder with multi-scale feature fusion and multi-level supervision. Experimental results demonstrate that SAMP achieves novel state-of-the-art against previously specifically designed pose estimation methods. Specifically, with ViT-B backbone, SAMP achieves 78.1% AP on the COCO val2017, 77.1% AP on the COCO test-dev2017, and 70.5% AP on the CrowdPose dataset.
What problem does this paper attempt to address?