MRSAPose: Multi-level Routing Sparse Attention for Multi-Person Pose Estimation

Shang Wu,Bin Wang
DOI: https://doi.org/10.1016/j.eswa.2024.125100
IF: 8.5
2024-01-01
Expert Systems with Applications
Abstract:Multi-person Pose Estimation (MPPE) aims to reconstruct human poses by locating and connecting keypoints of individuals in input images. The variability of human poses and the complexity of scenes make MPPE reliant on both local details and global structures, and the absence of either can lead to the generation of deformed poses. With the emergence of Transformer, the performance of MPPE has been significantly improved. However, due to self-attention computing attention scores between each pair of positions, the current Transformer-based MPPE exhibits high quadratic complexity. To address these issues, this paper proposes a novel pose estimation model, MRSAPose. MRSAPose utilizes Multi-level Routing Sparse Attention (MRSA) to dynamically select relevant regions for attention, reducing computational complexity and mitigating the impact of irrelevant regions. Furthermore, MRSAPose constructs a Transformer-CNN Parallel Interaction Block (T-CP block) through MRSA and Recursive Residual Gated Convolution (Res-gnConv), facilitating parallel learning of global and local information. By relying on multi-level routing algorithms and high-order spatial interactions conducted by recursive processing of adjacent features, T-CP block helps MRSAPose effectively alleviates the issues of occlusion and misalignment in pose estimation. On multiple challenging keypoint datasets, MRSAPose outperforms current state-of-the-art algorithms, particularly excelling in crowded and occluded scenes
What problem does this paper attempt to address?