MSEVA : A System for Multimodal Short Videos Emotion Visual Analysis

Qinglan Wei,Yaqi Zhou,Longhui Xiao,Yuan Zhang

2024-03-09

Abstract:YouTube Shorts, a new section launched by YouTube in 2021, is a direct competitor to short video platforms like TikTok. It reflects the rising demand for short video content among online users. Social media platforms are often flooded with short videos that capture different perspectives and emotions on hot events. These videos can go viral and have a significant impact on the public's mood and views. However, short videos' affective computing was a neglected area of research in the past. Monitoring the public's emotions through these videos requires a lot of time and effort, which may not be enough to prevent undesirable outcomes. In this paper, we create the first multimodal dataset of short video news covering hot events. We also propose an automatic technique for audio segmenting and transcribing. In addition, we improve the accuracy of the multimodal affective computing model by about 4.17% by optimizing it. Moreover, a novel system MSEVA for emotion analysis of short videos is proposed. Achieving good results on the bili-news dataset, the MSEVA system applies the multimodal emotion analysis method in the real world. It is helpful to conduct timely public opinion guidance and stop the spread of negative emotions. Data and code from our investigations can be accessed at: <a class="link-external link-http" href="http://xxx.github.com" rel="external noopener nofollow">this http URL</a>.

Social and Information Networks

What problem does this paper attempt to address?

The paper mainly addresses the issue of sentiment analysis of short videos related to popular events on short video platforms. Specifically, the research aims to solve the following key problems: 1. **Constructing a Multimodal Short Video Dataset**: The paper proposes a new multimodal short video dataset (named bili-news) for overall sentiment annotation of short videos. To improve the efficiency of dataset construction, the authors propose an automatic speech segmentation and transcription method and perform sentiment annotation on the entire short video. 2. **Optimizing Multimodal Sentiment Analysis Model**: Based on the existing V2EM multimodal sentiment analysis model, the accuracy of the model is improved by approximately 4.17% through the optimization of the text modality. The experiment compares the performance of small language models and large language models in the text modality, finding that the performance of the small language model after training is superior to that of the large language model. 3. **Developing the MSEV A System**: The paper designs a new short video sentiment analysis system named MSEV A. This system can perform sentiment analysis of short videos, helping to guide public opinion in real-time and prevent the spread of negative emotions. Through the above work, the research team not only constructed a high-quality dataset but also proposed an effective multimodal sentiment analysis method and a practical sentiment analysis system. These achievements are expected to play an important role in practical applications.

MSEVA : A System for Multimodal Short Videos Emotion Visual Analysis

Towards Emotion Analysis in Short-form Videos: A Large-Scale Dataset and Baseline

Exploiting EEG signals and audiovisual feature fusion for video emotion recognition

Multi-modal emotion analysis from facial expressions and electroencephalogram.

FV2ES: A Fully End2End Multimodal System for Fast Yet Effective Video Emotion Recognition Inference

MVIndEmo: a dataset for micro video public-induced emotion prediction on social media

An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos

Mutilmodal Feature Extraction and Attention-based Fusion for Emotion Estimation in Videos

Video emotion analysis enhanced by recognizing emotion in video comments

Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module

A Multimodal Sentiment Analysis Approach Based on a Joint Chained Interactive Attention Mechanism

Multimodal Emotion Recognition and Sentiment Analysis via Attention Enhanced Recurrent Model

EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model

HEU Emotion: A Large-scale Database for Multi-modal Emotion Recognition in the Wild

Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers

Video Sentiment Analysis with Bimodal Information-augmented Multi-Head Attention

Emotional Video Captioning With Vision-Based Emotion Interpretation Network

StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models

Affective Video Content Analysis: Decade Review and New Perspectives

Understanding public opinions on Chinese short video platform by multimodal sentiment analysis using deep learning-based techniques

MEAD: A Large-Scale Audio-Visual Dataset for Emotional Talking-Face Generation