MDDR: Multi-modal Dual-Attention Aggregation for Depression Recognition

Wei Zhang,En Zhu,Juan Chen,YunPeng Li
DOI: https://doi.org/10.1145/3664647.3681491
2024-01-01
Abstract:Automated diagnosis of depression is crucial for early detection and timely intervention. Previous research has largely concentrated on visual information, often neglecting the value of leveraging a variety of data types. Although some studies have attempted to employ multiple modalities, they typically fall short in investigating the complex dynamics between features from various modalities over time. To address this challenge, we present an innovative Multi-modal Dual-Attention aggregation architecture for Depression Recognition (MDDR). This framework leverages multi-modal pre-trained features and introduces two attention aggregation mechanisms: the Feature Alignment and Aggregation (FAA) module and the Sequence Encoding and Aggregation (SEA) module. The FAA module is designed to dynamically evaluate the relevance of multi-modal features for each instance, facilitating a dynamic integration of these features over time. Following this, the SEA module determines the importance of the amalgamated features for each frame, ensuring that aggregation is conducted based on their significance, to extract the most relevant features for accurately diagnosing depression. Moreover, we propose a unique loss calculation method specifically designed for depression assessment, named DR Loss. Our approach, evaluated on the AVEC2013 and AVEC2014 depression audiovisual datasets, achieves unparalleled performance.
What problem does this paper attempt to address?