Pose-Promote: Progressive Visual Perception for Activities of Daily Living

Qilang Ye,Zitong Yu
DOI: https://doi.org/10.1109/lsp.2024.3480046
2024-10-29
IEEE Signal Processing Letters
Abstract:Poses are effective in interpreting fine-grained human activities, especially when encountering complex visual information. Unimodal methods for action recognition unsatisfactorily to daily activities due to the lack of a more comprehensive perspective. Multimodal methods to combine pose and visual are still not exhaustive enough in mining complementary information. Therefore, we propose a Pose-promote (Ppromo) framework that utilizes a priori knowledge of pose joints to perceive visual information progressively. We first introduce a temporal promote module to activate each video segment using temporally synchronized joint weights. Then a spatial promote module is proposed to capture the key regions in visuals using the learned pose attentions. To further refine the bimodal associations, the global inter-promote module is proposed to align global pose-visual semantics at the feature granularity. Finally, a learnable late fusion strategy between visual and pose is applied for accurate inference. Ppromo achieves state-of-the-art performance on three publicly available datasets.
engineering, electrical & electronic
What problem does this paper attempt to address?