Streamer Temporal Action Detection in Live Video by Co-Attention Boundary Matching

Li Chenhao,He Chen,Zhang Hui,Yao Jiacheng,Zhang Jing,Zhuo Li
DOI: https://doi.org/10.1007/s13042-022-01581-z
2022-01-01
International Journal of Machine Learning and Cybernetics
Abstract:With the advent of the we-media era, live video is being sought after by more and more web users. How to effectively identify and supervise the streamer activities in the live video is of great significance to promote the high-quality development of the live video industry. The streamer activity can be characterized by the temporal composition of a series of actions. To improve the accuracy of streamer temporal action detection, it is a promising path to utilize the temporal action location and co-attention mechanism to overcome the problem of blurring action boundary. Therefore, a streamer temporal action detection method by co-attention boundary matching in live video is proposed. (1) The global spatiotemporal features and action template features of live video are extracted by using two-stream convolutional network and action spatiotemporal attention network respectively. (2) The probability sequences are generated from the global spatiotemporal features through temporal action evaluation, and the boundary matching confidence maps are produced by confidence evaluation of global spatiotemporal features and action template features under the co-attention mechanism. (3) The streamer temporal actions are detected based on the action proposals generated by probability sequences and boundary matching maps. We establish a real-world streamer action BJUT-SAD dataset and conduct extensive experiments to verify that our method can boost the accuracy of streamer temporal action detection in live video. In particular, our temporal action proposal generation and streamer action detection task produce competitive results to prior methods, demonstrating the effectiveness of our method.
What problem does this paper attempt to address?