CANAMRF: An Attention-Based Model for Multimodal Depression Detection

Yuntao Wei,Yuzhe Zhang,Shuyang Zhang,Hong Zhang
DOI: https://doi.org/10.1007/978-981-99-7022-3_10
2024-01-04
Abstract:Multimodal depression detection is an important research topic that aims to predict human mental states using multimodal data. Previous methods treat different modalities equally and fuse each modality by naïve mathematical operations without measuring the relative importance between them, which cannot obtain well-performed multimodal representations for downstream depression tasks. In order to tackle the aforementioned concern, we present a Cross-modal Attention Network with Adaptive Multi-modal Recurrent Fusion (CANAMRF) for multimodal depression detection. CANAMRF is constructed by a multimodal feature extractor, an Adaptive Multimodal Recurrent Fusion module, and a Hybrid Attention Module. Through experimentation on two benchmark datasets, CANAMRF demonstrates state-of-the-art performance, underscoring the effectiveness of our proposed approach.
Computation and Language,Artificial Intelligence,Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The paper aims to address several key issues in multimodal depression detection. Specifically, existing methods often treat all modalities as equally important when processing different modal data and fuse them through simple mathematical operations without considering the relative importance of each modality. This approach leads to the inability to obtain high-quality multimodal representations, thereby affecting the effectiveness of the depression detection task. To overcome the above issues, the authors propose a model based on the attention mechanism—Cross-modal Attention Network with Adaptive Multimodal Recursive Fusion (CANAMRF). The main contributions of CANAMRF include: 1. Introducing the emotional structure modality as a supplementary modality to enhance the performance of multimodal depression detection. 2. Proposing a novel modality fusion method—Adaptive Multimodal Recursive Fusion (AMRF), which can dynamically adjust the fusion weights of different modalities to achieve a balance between modalities, showing excellent performance. 3. Constructing a hybrid attention module that combines cross-modal attention and self-attention mechanisms to generate representative multimodal features. Extensive experiments and comprehensive analysis demonstrate the effectiveness of the proposed method. The experimental results on two benchmark datasets show that CANAMRF outperforms existing state-of-the-art methods in both unimodal and multimodal depression detection tasks.