LSCAformer: Long and short-term cross-attention-aware transformer for depression recognition from video sequences

Lang He,Zheng Li,Prayag Tiwari,Feng Zhu,Di Wu
DOI: https://doi.org/10.1016/j.bspc.2024.106767
IF: 5.1
2024-08-22
Biomedical Signal Processing and Control
Abstract:Depression will be the first prevalent mental disorder to result in the negative impact on individuals and society globally by 2030. Artificial intelligence (AI) algorithms have the potentials to significantly advance depression treatment. Existing deep learning-based architectures for the automatic diagnosis of a patient depression severity have the two primary challenges: (1) How to effectively learn both long-term and short-term patterns of depression? (2) How to efficiently merge long-term and short-term depressive features to achieve extended predictions from facial videos? To mitigate these challenges, a novel long short-term cross-attention-aware Transformer (LSCAformer) that is engineered for video-based depression recognition. Within LSCAformer, two architectures are introduced, i.e., a long short-term feature extraction (LSTFE) and a cross-attention-aware Transformer. Initially, LSTFE employs two separate branches to capture depression behaviors across long and short-term intervals. Subsequently, cross-attention-aware Transformer is implemented to identify complementary patterns within both long-term and short-term features, employing temporal-directed attention (TDA) to discern complementary temporal patterns across the long and short duration branches. On the AVEC2013/AVEC2014, the LSCAformer demonstrated superior performances with a root mean square error (RMSE), a mean absolute error (MAE) and a concordance correlation coefficient (CCC) of 7.69/5.89/0.868 and 7.55/5.91/0.845, respectively. Additionally, cross dataset experiments are performed to valid the generalization of the LSCAformer with a RMSE of 7.21, a MAE of 5.63, and a CCC of 0.874 (AVEC2013 for training, and the Northwind task of AVEC2014 for testing). Moreover, the proposed method can model the complementary behavioral patterns between long-term and short-term sequences for depression recognition. Code will be available at: https://github.com/helang818/LSCAformer/ .
engineering, biomedical
What problem does this paper attempt to address?