A Deep Learning Method on Audio and Text Sequences for Automatic Depression Detection

Jing Xiao,Yongming Huang,Guobao Zhang,Wei Liu
DOI: https://doi.org/10.1109/icaml54311.2021.00088
2021-01-01
Abstract:In this study, we focus on a deep learning method on audio-text sequences for automatic depression detection. Sequence modelling in depression detection is often based on RNNs or CNNs. Inner interactions in sequence signals should be paid attention to. Therefore, for audio sequence modelling, we propose a new model Attention-C-CNN in our research, where attention mechanism is combined with casual CNN. And for text sequence modelling, BERT model is fine-tuned for textual features based on the strategy of transfer learning. What’ s more, to get better joint audio-textual features than simple fusion methods in previous work, a new co-attention encoder is applied. Experiments on DAIC-WOZ dataset show that our proposed approach shows competitive performance than other traditional data-driven machine learning methods.
What problem does this paper attempt to address?