Abstract:Previous Sentiment Analysis (SA) studies have demonstrated that exploring sentiment cues from multiple synchronized modalities can effectively improve the SA results. Unfortunately, until now there is no publicly available dataset for multimodal SA of the stock market. Existing datasets for stock market SA only provide textual stock comments, which usually contain words with ambiguous sentiments or even sarcasm words expressing opposite sentiments of literal meaning. To address this issue, we introduce a Fine-grained Multimodal Sentiment Analysis dataset built upon 1,247 Stock Comment videos, called FMSA-SC. It provides both multimodal sentiment annotations for the videos and unimodal sentiment annotations for the textual, visual, and acoustic modalities of the videos. In addition, FMSA-SC also provides fine-grained annotations that align text at the phrase level with visual and acoustic modalities. Furthermore, we present a new fine-grained multimodal multi-task framework as the baseline for multimodal SA on the FMSA-SC. Data and codes are available at https://github.com/sunlitsong/FMSA-SC-dataset.git.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to conduct fine - grained multi - modal sentiment analysis in stock market review videos. Specifically, most of the existing sentiment analysis methods only mine sentiment polarity from text comments, ignoring sentiment cues in visual and acoustic modalities, which may lead to incorrect sentiment analysis results. In addition, there are currently no publicly available multi - modal sentiment analysis datasets for the stock market. To solve these problems, the author constructed a new dataset named FMSA - SC, which is based on 1,247 stock review videos, provides multi - modal sentiment annotations for each video as well as independent sentiment annotations for text, visual, and acoustic modalities, and also provides phrase - level alignment annotations. In addition, the author also proposed a new fine - grained multi - modal multi - task framework (FGMSA) as a baseline model for sentiment analysis tasks in the stock market. ### Main contributions of the paper: 1. **Constructed the FMSA - SC dataset**: This is the first multi - modal sentiment analysis dataset for the stock market, which not only provides unified multi - modal sentiment annotations, but also provides independent sentiment annotations for text, visual, and acoustic modalities, as well as phrase - level alignment annotations. 2. **Proposed the FGMSA framework**: This is a fine - grained multi - modal multi - task framework that combines unimodal and multi - modal sentiment analysis tasks and improves the accuracy of sentiment analysis by learning the importance weights of different modalities on different phrases. 3. **Verified the effectiveness of the method**: Through extensive experiments and ablation studies, the superior performance of FGMSA in the multi - modal sentiment analysis task of the stock market has been proven. ### Characteristics of the dataset: - **Multi - modal annotation**: Each video segment has a unified multi - modal sentiment label, as well as independent sentiment labels for text, visual, and acoustic modalities. - **Fine - grained alignment**: Provides phrase - level alignment annotations, which are helpful for studying sentiment cues between different modalities. - **High - quality data**: Annotated through a professional data annotation platform to ensure the quality and consistency of the data. ### Innovation points of the method: - **Fine - grained multi - modal fusion**: The FGMSA framework fuses sentiment cues of different modalities at the phrase level, and flexibly controls the influence of different modalities on sentiment analysis by learning the importance weights of each modality. - **Multi - task learning**: Combines unimodal and multi - modal sentiment analysis tasks and learns more abundant feature representations through joint optimization. ### Experimental results: - **Sentiment distribution**: Shows the sentiment distribution of different modal data and finds that text and acoustic modalities contain more valuable sentiment cues than visual modalities. - **Sentiment confusion matrix**: Verifies the effectiveness of multi - modal fusion by calculating the differences between different modalities. - **Feature extraction**: Uses the pre - trained BERT model to extract text features, the Wav2vec 2.0 model to extract acoustic features, and the OpenFace toolkit to extract visual features. In conclusion, this paper solves the challenges of multi - modal sentiment analysis in stock market review videos by constructing the FMSA - SC dataset and proposing the FGMSA framework, providing important resources and methods for research in related fields.

FMSA-SC: A Fine-grained Multimodal Sentiment Analysis Dataset based on Stock Comment Videos

Ch-Sims: A Chinese Multimodal Sentiment Analysis Dataset With Fine-Grained Annotations Of Modality

Sentiment Analysis Using Deep Robust Complementary Fusion of Multi-Features and Multi-Modalities.

Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module

Towards Exploiting Sticker for Multimodal Sentiment Analysis in Social Media: A New Dataset and Baseline.

MACSA: A Multimodal Aspect-Category Sentiment Analysis Dataset with Multimodal Fine-grained Aligned Annotations

A Fine-Grained Modal Label-Based Multi-Stage Network for Multimodal Sentiment Analysis.

Sentiment Analysis: Comprehensive Reviews, Recent Advances, and Open Challenges

MMLSCU: A Dataset for Multi-modal Multi-domain Live Streaming Comment Understanding

M-SENA: An Integrated Platform for Multimodal Sentiment Analysis

Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline

Multimodal Mutual Attention-Based Sentiment Analysis Framework Adapted to Complicated Contexts

M$^{3}$SA: Multimodal Sentiment Analysis Based on Multi-Scale Feature Extraction and Multi-Task Learning

Towards Robust Multimodal Sentiment Analysis with Incomplete Data

MMSF: A Multimodal Sentiment-Fused Method to Recognize Video Speaking Style.

MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos

Multimodal Sentiment Analysis with Preferential Fusion and Distance-aware Contrastive Learning.

Multimodal Sentiment Analysis of Intangible Cultural Heritage Songs with Strengthened Audio Features-Guided Attention

Cooperative Sentiment Agents for Multimodal Sentiment Analysis

Sentiment Analysis of Social Media Comments Based on Multimodal Attention Fusion Network

MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos