PowMix: A Versatile Regularizer for Multimodal Sentiment Analysis

Efthymios Georgiou,Yannis Avrithis,Alexandros Potamianos
2023-12-20
Abstract:Multimodal sentiment analysis (MSA) leverages heterogeneous data sources to interpret the complex nature of human sentiments. Despite significant progress in multimodal architecture design, the field lacks comprehensive regularization methods. This paper introduces PowMix, a versatile embedding space regularizer that builds upon the strengths of unimodal mixing-based regularization approaches and introduces novel algorithmic components that are specifically tailored to multimodal tasks. PowMix is integrated before the fusion stage of multimodal architectures and facilitates intra-modal mixing, such as mixing text with text, to act as a regularizer. PowMix consists of five components: 1) a varying number of generated mixed examples, 2) mixing factor reweighting, 3) anisotropic mixing, 4) dynamic mixing, and 5) cross-modal label mixing. Extensive experimentation across benchmark MSA datasets and a broad spectrum of diverse architectural designs demonstrate the efficacy of PowMix, as evidenced by consistent performance improvements over baselines and existing mixing methods. An in-depth ablation study highlights the critical contribution of each PowMix component and how they synergistically enhance performance. Furthermore, algorithmic analysis demonstrates how PowMix behaves in different scenarios, particularly comparing early versus late fusion architectures. Notably, PowMix enhances overall performance without sacrificing model robustness or magnifying text dominance. It also retains its strong performance in situations of limited data. Our findings position PowMix as a promising versatile regularization strategy for MSA. Code will be made available.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of comprehensive regularization methods in Multimodal Sentiment Analysis (MSA). Despite significant progress in multimodal architecture design, existing regularization techniques are often limited to specific tasks or data types and cannot be widely applied to different multimodal scenarios. Therefore, the paper proposes a new regularization method named PowMix, aiming to improve the regularization effect in multimodal situations by introducing five key components: 1. **Generate different numbers of mixed samples**: PowMix can generate more mixed samples than the mini - batch size. These samples are located within the convex hull of the mini - batch representation space. This increases the number of loss terms for each sample during the training process and helps to better approximate the expected risk integral. 2. **Re - weighting of mixing factors**: By normalizing the mixing factors of each mini - batch sample across modalities, PowMix can reduce the influence of uninformative (close to zero) unimodal instances in the representation space. This method is not only applicable to multimodal tasks but also robust to the architecture and pooling mechanism of the input modality encoder. 3. **Anisotropic mixing**: For each modality, PowMix samples an independent mixing matrix. This modality - specific mixing strategy allows the algorithm to exhibit different mixing strategies among different modalities, which is crucial for the good performance of PowMix. 4. **Dynamic mixing**: By randomly masking and selecting a small number of non - zero elements, PowMix limits the number of interpolation samples in each row of the mixing matrix. This dynamic mixing process makes each generated sample interpolate only a few original samples, thereby enhancing the model's generalization ability. 5. **Cross - modal label mixing**: PowMix generates the final multimodal label by averaging the mixing labels of each modality. This process is only meaningful when using mixing factor re - weighting and anisotropic mixing, because at this time the mixing labels of each modality are different. Through these innovations, PowMix provides a widely applicable regularization framework that can achieve consistent performance improvements on a variety of multimodal sentiment analysis data sets and model architectures. Experimental results show that PowMix not only improves the overall performance of the model but also maintains the robustness of the model, does not magnify the dominance of the text, and can also maintain good performance in the case of limited data.