Video Summarization Generation Model Based on Transformer and Deep Reinforcement Learning

Guangli Wu,Leiting Li,Shanshan Song
DOI: https://doi.org/10.1109/ICCCS57501.2023.10150725
2023-04-21
Abstract:Video summarization technology is to extract the key frames or clips of the original video to generate a summarization. Most of the existing methods are only improved based on image features, ignoring the temporal sequence between images and the lack of autonomous learning ability of models. We propose a video summarization network based on Transformer and deep reinforcement learning. The network takes Transformer's encoder-decoder as its main structure. The encoder is composed of self-attention and feed forward neural network modules in Transformer. BiLSTM and reinforcement learning are used to replace the decoder in Transformer. Experiments are carried out on two public standard data sets of video abstracts, and the experimental results prove the effectiveness of the proposed method.
Computer Science
What problem does this paper attempt to address?