FMSA-SC: A Fine-grained Multimodal Sentiment Analysis Dataset based on Stock Comment Videos

Lingyun Song,Siyu Chen,Ziyang Meng,Mingxuan Sun,Xuequn Shang
DOI: https://doi.org/10.1109/tmm.2024.3363641
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Previous Sentiment Analysis (SA) studies have demonstrated that exploring sentiment cues from multiple synchronized modalities can effectively improve the SA results. Unfortunately, until now there is no publicly available dataset for multimodal SA of the stock market. Existing datasets for stock market SA only provide textual stock comments, which usually contain words with ambiguous sentiments or even sarcasm words expressing opposite sentiments of literal meaning. To address this issue, we introduce a Fine-grained Multimodal Sentiment Analysis dataset built upon 1,247 Stock Comment videos, called FMSA-SC. It provides both multimodal sentiment annotations for the videos and unimodal sentiment annotations for the textual, visual, and acoustic modalities of the videos. In addition, FMSA-SC also provides fine-grained annotations that align text at the phrase level with visual and acoustic modalities. Furthermore, we present a new fine-grained multimodal multi-task framework as the baseline for multimodal SA on the FMSA-SC. Data and codes are available at https://github.com/sunlitsong/FMSA-SC-dataset.git.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to conduct fine - grained multi - modal sentiment analysis in stock market review videos. Specifically, most of the existing sentiment analysis methods only mine sentiment polarity from text comments, ignoring sentiment cues in visual and acoustic modalities, which may lead to incorrect sentiment analysis results. In addition, there are currently no publicly available multi - modal sentiment analysis datasets for the stock market. To solve these problems, the author constructed a new dataset named FMSA - SC, which is based on 1,247 stock review videos, provides multi - modal sentiment annotations for each video as well as independent sentiment annotations for text, visual, and acoustic modalities, and also provides phrase - level alignment annotations. In addition, the author also proposed a new fine - grained multi - modal multi - task framework (FGMSA) as a baseline model for sentiment analysis tasks in the stock market. ### Main contributions of the paper: 1. **Constructed the FMSA - SC dataset**: This is the first multi - modal sentiment analysis dataset for the stock market, which not only provides unified multi - modal sentiment annotations, but also provides independent sentiment annotations for text, visual, and acoustic modalities, as well as phrase - level alignment annotations. 2. **Proposed the FGMSA framework**: This is a fine - grained multi - modal multi - task framework that combines unimodal and multi - modal sentiment analysis tasks and improves the accuracy of sentiment analysis by learning the importance weights of different modalities on different phrases. 3. **Verified the effectiveness of the method**: Through extensive experiments and ablation studies, the superior performance of FGMSA in the multi - modal sentiment analysis task of the stock market has been proven. ### Characteristics of the dataset: - **Multi - modal annotation**: Each video segment has a unified multi - modal sentiment label, as well as independent sentiment labels for text, visual, and acoustic modalities. - **Fine - grained alignment**: Provides phrase - level alignment annotations, which are helpful for studying sentiment cues between different modalities. - **High - quality data**: Annotated through a professional data annotation platform to ensure the quality and consistency of the data. ### Innovation points of the method: - **Fine - grained multi - modal fusion**: The FGMSA framework fuses sentiment cues of different modalities at the phrase level, and flexibly controls the influence of different modalities on sentiment analysis by learning the importance weights of each modality. - **Multi - task learning**: Combines unimodal and multi - modal sentiment analysis tasks and learns more abundant feature representations through joint optimization. ### Experimental results: - **Sentiment distribution**: Shows the sentiment distribution of different modal data and finds that text and acoustic modalities contain more valuable sentiment cues than visual modalities. - **Sentiment confusion matrix**: Verifies the effectiveness of multi - modal fusion by calculating the differences between different modalities. - **Feature extraction**: Uses the pre - trained BERT model to extract text features, the Wav2vec 2.0 model to extract acoustic features, and the OpenFace toolkit to extract visual features. In conclusion, this paper solves the challenges of multi - modal sentiment analysis in stock market review videos by constructing the FMSA - SC dataset and proposing the FGMSA framework, providing important resources and methods for research in related fields.