Scene adaptive mechanism for action recognition

Cong Wu,Xiao-Jun Wu,Tianyang Xu,Josef Kittler
DOI: https://doi.org/10.1016/j.cviu.2023.103854
IF: 4.886
2023-10-06
Computer Vision and Image Understanding
Abstract:Scene knowledge plays an important role in visual analysis. For the task of action recognition, human activities often occur in specific scenes. However, it should be emphasised that the association between actions and scenes is very complex. Simplistic attempts to improve the effectiveness of action recognition by intensifying or suppressing the scene knowledge are unwise. In this article, we tackle this problem by proposing a new action recognition framework based on the Scene Adaptive Mechanism. Specifically, with the Scene Knowledge Modulation module, we can control the feature extractors to either suppress or intensify scene knowledge. And then, through an Adaptive Fusion Layer, the role of scene information in different visual feature sequences can thus be dynamically regulated and fused. The resulting model is abbreviated as SAM-Net. Our method serves as a pluggable module, capable of integration into other backbones to further enhance their performance. We perform extensive experiments on three large datasets: Something-Something V1&V2 and Kinetics-400. The quantitative and qualitative experimental results demonstrate the effectiveness of SAM-Net, with a great improvement in performance compared to the baseline methods.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?