Adaptive information fusion network for multi‐modal personality recognition

Yongtang Bao,Xiang Liu,Yue Qi,Ruijun Liu,Haojie Li
DOI: https://doi.org/10.1002/cav.2268
IF: 1.01
2024-06-11
Computer Animation and Virtual Worlds
Abstract:This paper proposes an adaptive multimodal information fusion network for personality recognition. The design features of each encoder are optimized and merged for downstream tasks. We greatly enhance the functionality of the Transformer component by integrating adaptive attention and automatic learning of cross‐modal associations. This not only solves the problem of outliers and gradient vanishing during model training, but also has practical significance for practical applications. Personality recognition is of great significance in deepening the understanding of social relations. While personality recognition methods have made significant strides in recent years, the challenge of heterogeneity between modalities during feature fusion still needs to be solved. This paper introduces an adaptive multi‐modal information fusion network (AMIF‐Net) capable of concurrently processing video, audio, and text data. First, utilizing the AMIF‐Net encoder, we process the extracted audio and video features separately, effectively capturing long‐term data relationships. Then, adding adaptive elements in the fusion network can alleviate the problem of heterogeneity between modes. Lastly, we concatenate audio‐video and text features into a regression network to obtain Big Five personality trait scores. Furthermore, we introduce a novel loss function to address the problem of training inaccuracies, taking advantage of its unique property of exhibiting a peak at the critical mean. Our tests on the ChaLearn First Impressions V2 multi‐modal dataset show partial performance surpassing state‐of‐the‐art networks.
computer science, software engineering
What problem does this paper attempt to address?