Multimodal Emotion Recognition Using Deep Generalized Canonical Correlation Analysis with an Attention Mechanism

Yu-Ting Lan,Wei Liu,Bao-Liang Lu
DOI: https://doi.org/10.1109/ijcnn48605.2020.9207625
2020-01-01
Abstract:Since multimodal learning is able to take advantage of the complementarity of multimodal signals, the performance of multimodal emotion recognition usually surpasses that based on a single modality. In this paper, we introduce deep generalized canonical correlation analysis with an attention mechanism (DGCCA-AM) to multimodal emotion recognition. This model extends the conventional canonical correlation analysis (CCA) from two modalities to arbitrarily numerous modalities and implements multimodal adaptive fusion with an attention mechanism. By adjusting the weights matrices to maximize the generalized correlation of different modalities, DGCCA-AM extracts emotion-related information from multiple modalities and discards noises. The attention mechanism allows a neural network to learn adaptive fusion weights for different modalities and produces a more effective multimodal fusion and superior emotion recognition performance. We evaluate DGCCA-AM on a public multimodal dataset, SEED-V. Our experimental results demonstrate that DGCCA-AM achieves a state-of-the-art mean accuracy of 82.11% and standard deviation of 2.76% for five emotion classifications with three modalities.
What problem does this paper attempt to address?