Time‐attentive fusion network: An efficient model for online detection of action start

Xuejiao Hu,Shijie Wang,Ming Li,Yang Li,Sidan Du
DOI: https://doi.org/10.1049/ipr2.13071
IF: 2.3
2024-03-15
IET Image Processing
Abstract:Here, a novel Time‐Attentive Fusion Network (TAF‐Net) is introduced to address the requirements of improved action detection accuracy and operational efficiency in the task of online detection of action start. The proposed model not only learns valuable sequence information for precise detection but its linear computational complexity and parallelism also contribute to a faster inference speed. Online detection of action start is a significant and challenging task that requires prompt identification of action start positions and corresponding categories within streaming videos. This task presents challenges due to data imbalance, similarity in boundary content, and real‐time detection requirements. Here, a novel Time‐Attentive Fusion Network is introduced to address the requirements of improved action detection accuracy and operational efficiency. The time‐attentive fusion module is proposed, which consists of long‐term memory attention and the fusion feature learning mechanism, to improve spatial‐temporal feature learning. The temporal memory attention mechanism captures more effective temporal dependencies by employing weighted linear attention. The fusion feature learning mechanism facilitates the incorporation of current moment action information with historical data, thus enhancing the representation. The proposed method exhibits linear complexity and parallelism, enabling rapid training and inference speed. This method is evaluated on two challenging datasets: THUMOS'14 and ActivityNet v1.3. The experimental results demonstrate that the proposed method significantly outperforms existing state‐of‐the‐art methods in terms of both detection accuracy and inference speed.
computer science, artificial intelligence,engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?