MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection

Yonghao Dang,Liyuan Liu,Hui Kang,Ping Ye,Jianqin Yin
2024-12-02
Abstract:Real-time 2D keypoint detection plays an essential role in computer vision. Although CNN-based and Transformer-based methods have achieved breakthrough progress, they often fail to deliver superior performance and real-time speed. This paper introduces MamKPD, the first efficient yet effective mamba-based pose estimation framework for 2D keypoint detection. The conventional Mamba module exhibits limited information interaction between patches. To address this, we propose a lightweight contextual modeling module (CMM) that uses depth-wise convolutions to model inter-patch dependencies and linear layers to distill the pose cues within each patch. Subsequently, by combining Mamba for global modeling across all patches, MamKPD effectively extracts instances' pose information. We conduct extensive experiments on human and animal pose estimation datasets to validate the effectiveness of MamKPD. Our MamKPD-L achieves 77.3% AP on the COCO dataset with 1492 FPS on an NVIDIA GTX 4090 GPU. Moreover, MamKPD achieves state-of-the-art results on the MPII dataset and competitive results on the AP-10K dataset while saving 85% of the parameters compared to ViTPose. Our project page is available at <a class="link-external link-https" href="https://mamkpd.github.io/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the balance between efficiency and accuracy in real - time 2D keypoint detection. Although methods based on CNN and Transformer have made remarkable progress in 2D keypoint detection, these methods often require expensive computing resources and it is difficult to achieve both high performance and real - time speed simultaneously. Specifically: 1. **Limitations of existing methods**: - **Large network scale**: Many existing 2D keypoint detection methods rely on large - scale neural networks (such as deep convolutional neural networks or Transformer), which lead to high computing costs and low inference speeds. - **Trade - off between accuracy and efficiency**: In order to improve efficiency, some lightweight network architectures reduce the number of parameters, but usually at the cost of sacrificing detection accuracy. 2. **Research objectives**: - **Improve model efficiency**: Design an efficient 2D keypoint detection framework that can significantly improve the inference speed while maintaining high accuracy. - **Explore the application of Mamba module**: Apply the Mamba module to the 2D keypoint detection task for the first time to utilize its efficient state - space modeling ability. 3. **Proposed new method**: - **MamKPD framework**: Introduce a new 2D keypoint detection framework named MamKPD, which is based on the Mamba module and combines a lightweight context - modeling module (CMM) to enhance information interaction. - **CMM module**: Capture the dependencies between image patches through deep convolution and linear layers, thereby enhancing multi - scale feature extraction capabilities. 4. **Experimental verification**: - **Dataset**: Extensive experiments were carried out on datasets such as COCO, MPII, and AP - 10K to verify the effectiveness of MamKPD. - **Performance comparison**: MamKPD not only performs excellently in inference speed (for example, reaching 1492 FPS on an NVIDIA GTX 4090 GPU), but also is competitive in accuracy, and even exceeds existing methods on some datasets. In summary, this paper aims to solve the problem that it is difficult to balance efficiency and accuracy in existing 2D keypoint detection methods by introducing the MamKPD framework, especially in real - time application scenarios.