Context-aware focal alignment network for micro-video multi-label classification

Bin Yuan,Weiheng Yao,Peiguang Jing,Jing Zhang,Kim Fung Tsang,Shuqiang Wang
DOI: https://doi.org/10.1007/s10044-024-01376-8
IF: 2.307
2024-11-21
Pattern Analysis and Applications
Abstract:Micro-videos have gained immense popularity in recent years due to their concise and interactive format, which aligns well with the fast-paced nature of modern digital consumption. However, this brevity often results in significant semantic shifts within a short timeframe, making it challenging to accurately uncover their context for more precise categorization. To address this issue, a context-aware focal alignment network (CAFANET) for micro-video multi-label classification is proposed. We first implemented a temporal scaling feature extraction approach to achieve a hierarchical representation enriched with segment-based details. We then introduce a context-aware focal alignment attention (CAFAA) mechanism, and this innovative component dynamically adjusts its focus based on the unique characteristics of each segment, effectively bridging the gap between local details with global contextual awareness. Furthermore, we finally fuse these aligned features with the global contexts to obtain the final feature representations, describing the overall information for subsequent classification. Experimental results on a real-world micro-videos multi-label dataset demonstrated the effectiveness of our proposed method in comparison to several state-of-the-art approaches.
computer science, artificial intelligence
What problem does this paper attempt to address?