Demsasa: micro-video scene classification based on denoising multi-shots association self-attention

Rui Gong,Yu Zhang,Yanhui Zhang,Yue Liu,Jie Guo,Xiushan Nie
DOI: https://doi.org/10.1007/s10044-024-01378-6
IF: 2.307
2024-12-01
Pattern Analysis and Applications
Abstract:Due to segmentation and splicing in micro-videos when user upload videos to platform, the content of different shots in the same scene is discontinuous, which leads to the problem of large content differences between different shots. At the same time, due to the low resolution of the shooting equipment or jitter and other factors, the video has noise information. In view of the above problems, the conventional and serialized scene feature learning in micro-video cannot learn the content difference and correlation between different shots, which will weaken the semantic representation of scene features. Therefore, this paper proposes a micro-video scene classification method based on De-noising Multi-shots Association Self-attention (DeMsASa) model. In this method, the shot boundary detection algorithm segments micro- video firstly, and then the semantic representation of the multi-shots video scene is learned by de-noising, association between video frames in the same shot and the association modeling between different shots. Experiments results show that the classification performance of the proposed method is superior to the existing micro-video scene classification methods.
computer science, artificial intelligence
What problem does this paper attempt to address?