M-SENA: An Integrated Platform for Multimodal Sentiment Analysis

Huisheng Mao,Ziqi Yuan,Hua Xu,Wenmeng Yu,Yihe Liu,Kai Gao
DOI: https://doi.org/10.48550/arXiv.2203.12441
2022-03-23
Abstract:M-SENA is an open-sourced platform for Multimodal Sentiment Analysis. It aims to facilitate advanced research by providing flexible toolkits, reliable benchmarks, and intuitive demonstrations. The platform features a fully modular video sentiment analysis framework consisting of data management, feature extraction, model training, and result analysis modules. In this paper, we first illustrate the overall architecture of the M-SENA platform and then introduce features of the core modules. Reliable baseline results of different modality features and MSA benchmarks are also reported. Moreover, we use model evaluation and analysis tools provided by M-SENA to present intermediate representation visualization, on-the-fly instance test, and generalization ability test results. The source code of the platform is publicly available at <a class="link-external link-https" href="https://github.com/thuiar/M-SENA" rel="external noopener nofollow">this https URL</a>.
Artificial Intelligence,Multimedia
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on several key challenges in the field of Multimodal Sentiment Analysis (MSA): 1. **Effective acoustic and visual feature extraction**: Most previous studies rely on the provided modal sequences from CMU - MultimodalSDK. However, due to the vague description of feature selection and backbone network selection, it is difficult to accurately replicate the same acoustic and visual feature extraction process. In addition, recent studies have found that the text modality dominates in sentiment classification, while the contributions of the acoustic and visual modalities are less. This further highlights the importance of effectively extracting acoustic and visual features. 2. **Reliable comparison of different modal features and fusion methods**: As researchers begin to develop models based on custom - made modal sequences, performance comparisons between different modal features become unfair. Therefore, there is an increasingly urgent need for reliable comparison of different modal features and fusion methods. 3. **Lack of comprehensive model evaluation and analysis methods**: Existing MSA models perform well on a given test set, but may experience performance degradation in real - world scenarios due to distribution differences or random modal perturbations. In addition, effective model analysis is also crucial for researchers to explain improvements and optimize models. To address the above challenges, the paper proposes the M - SENA platform, which provides a highly customized feature extraction toolkit, a unified MSA pipeline to ensure fair comparison between different features and fusion models, and comprehensive model evaluation and analysis tools, including intermediate result visualization, real - time instance testing, and generalization ability testing. These functions help researchers better understand and optimize MSA models, thereby promoting further development in this field.