A Optimized BERT for Multimodal Sentiment Analysis

Jun Wu,Tianliang Zhu,Jiahui Zhu,Tianyi Li,Chunzhi Wang
DOI: https://doi.org/10.1145/3566126
2023-02-17
Abstract:Sentiment analysis of one modality (e.g., text or image) has been broadly studied. However, not much attention has been paid to the sentiment analysis of multi-modal data. As the research on and applications of multi-modal data analysis are becoming more and more broad, it is necessary to optimize BERT internal structure. This article proposes a hierarchical multi-head self-attention and gate channel BERT, which is an optimized BERT model. The model is composed of three modules: the hierarchical multi-head self-attention module realizes the hierarchical extraction process of features; the gate channel module replaces BERT’s original Feed Forward layer to realize information filtering; and the tensor fusion model based on a self-attention mechanism is utilized to implement the fusion process of different modal features. Experiments show that our method achieves promising results and improves accuracy by 5–6% when compared with traditional models on the CMU-MOSI dataset.
computer science, information systems, theory & methods, software engineering
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address issues in multimodal sentiment analysis. Specifically, while sentiment analysis of unimodal data (such as text or images) has been extensively studied, sentiment analysis of multimodal data has received less attention. As research and application of multimodal data analysis become increasingly widespread, optimizing the BERT model to handle multimodal data becomes particularly important. ### Main Contributions 1. **Hierarchical Multi-Head Self-Attention Mechanism**: Achieves hierarchical extraction of data features by using a hierarchical multi-head self-attention mechanism. 2. **Gated Channels**: Replaces the original feedforward layer in the BERT model with gated channels to achieve information filtering. 3. **HG-BERT Model**: Proposes an optimized BERT model that combines hierarchical multi-head self-attention mechanisms and gated channels. 4. **Self-Attention-Based Feature Fusion**: Achieves information interaction between different modal features through a tensor fusion model based on the self-attention mechanism. ### Experimental Results The experimental results show that the HG-BERT model outperforms traditional models on the CMU-MOSI dataset, with an accuracy improvement of 5–6%. This indicates that the optimized BERT model has better performance in multimodal sentiment analysis tasks. ### Conclusion This paper proposes an optimized BERT-based model, HG-BERT, which significantly enhances the performance of multimodal sentiment analysis through hierarchical multi-head self-attention mechanisms, gated channels, and self-attention-based feature fusion methods. Future research can further optimize the design of head distribution and gating mechanisms.