Multi-person dance tiered posture recognition with cross progressive multi-resolution representation integration

Huizhu Kao
DOI: https://doi.org/10.1371/journal.pone.0300837
IF: 3.7
2024-06-13
PLoS ONE
Abstract:Recognizing postures in multi-person dance scenarios presents challenges due to mutual body part obstruction and varying distortions across different dance actions. These challenges include differences in proximity and size, demanding precision in capturing fine details to convey action expressiveness. Robustness in recognition becomes crucial in complex real-world environments. To tackle these issues, our study introduces a novel approach, i.e., Multi-Person Dance Tiered Posture Recognition with Cross Progressive Multi-Resolution Representation Integration (CPMRI) and Tiered Posture Recognition (TPR) modules. The CPMRI module seamlessly merges high-level features, rich in semantic information, with low-level features that provide precise spatial details. Leveraging a cross progressive approach, it retains semantic understanding while enhancing spatial precision, bolstering the network's feature representation capabilities. Through innovative feature concatenation techniques, it efficiently blends high-resolution and low-resolution features, forming a comprehensive multi-resolution representation. This approach significantly improves posture recognition robustness, especially in intricate dance postures involving scale variations. The TPR module classifies body key points into core torso joints and extremity joints based on distinct distortion characteristics. Employing a three-tier tiered network, it progressively refines posture recognition. By addressing the optimal matching problem between torso and extremity joints, the module ensures accurate connections, refining the precision of body key point locations. Experimental evaluations against state-of-the-art methods using MSCOCO2017 and a custom Chinese dance dataset validate our approach's effectiveness. Evaluation metrics including Object Keypoint Similarity (OKS)-based Average Precision (AP), mean Average Precision (mAP), and Average Recall (AR) underscore the efficacy of the proposed method.
What problem does this paper attempt to address?