Describing video scenarios using deep learning techniques

Yin‐Fu Huang,Li‐Ping Shih,Chia‐Hsin Tsai,Guan‐Ting Shen
DOI: https://doi.org/10.1002/int.22387
IF: 8.993
2021-02-25
International Journal of Intelligent Systems
Abstract:<p>The combination of computer vision and natural language processing is still a very challenging issue. In contrast to previous models focusing on generating only a single sentence for a video, we think that describing a longer video is an important application. In this paper, we propose a video scenario description system that considers video genres to generate multiple sentences. First, the semantics and genres of videos are analyzed. Next, video descriptions are also analyzed. Then, relevant semantic features are selected and translated into the corresponding video descriptions through deep learning. In the experiments, we compare the generated video descriptions based on four evaluation metrics. The results reveal our method is comparable with the state‐of‐the‐art methods.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?