MSEVA : A System for Multimodal Short Videos Emotion Visual Analysis

Qinglan Wei,Yaqi Zhou,Longhui Xiao,Yuan Zhang
2024-03-09
Abstract:YouTube Shorts, a new section launched by YouTube in 2021, is a direct competitor to short video platforms like TikTok. It reflects the rising demand for short video content among online users. Social media platforms are often flooded with short videos that capture different perspectives and emotions on hot events. These videos can go viral and have a significant impact on the public's mood and views. However, short videos' affective computing was a neglected area of research in the past. Monitoring the public's emotions through these videos requires a lot of time and effort, which may not be enough to prevent undesirable outcomes. In this paper, we create the first multimodal dataset of short video news covering hot events. We also propose an automatic technique for audio segmenting and transcribing. In addition, we improve the accuracy of the multimodal affective computing model by about 4.17% by optimizing it. Moreover, a novel system MSEVA for emotion analysis of short videos is proposed. Achieving good results on the bili-news dataset, the MSEVA system applies the multimodal emotion analysis method in the real world. It is helpful to conduct timely public opinion guidance and stop the spread of negative emotions. Data and code from our investigations can be accessed at: <a class="link-external link-http" href="http://xxx.github.com" rel="external noopener nofollow">this http URL</a>.
Social and Information Networks
What problem does this paper attempt to address?
The paper mainly addresses the issue of sentiment analysis of short videos related to popular events on short video platforms. Specifically, the research aims to solve the following key problems: 1. **Constructing a Multimodal Short Video Dataset**: The paper proposes a new multimodal short video dataset (named bili-news) for overall sentiment annotation of short videos. To improve the efficiency of dataset construction, the authors propose an automatic speech segmentation and transcription method and perform sentiment annotation on the entire short video. 2. **Optimizing Multimodal Sentiment Analysis Model**: Based on the existing V2EM multimodal sentiment analysis model, the accuracy of the model is improved by approximately 4.17% through the optimization of the text modality. The experiment compares the performance of small language models and large language models in the text modality, finding that the performance of the small language model after training is superior to that of the large language model. 3. **Developing the MSEV A System**: The paper designs a new short video sentiment analysis system named MSEV A. This system can perform sentiment analysis of short videos, helping to guide public opinion in real-time and prevent the spread of negative emotions. Through the above work, the research team not only constructed a high-quality dataset but also proposed an effective multimodal sentiment analysis method and a practical sentiment analysis system. These achievements are expected to play an important role in practical applications.