Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features

Md Nasir,Arindam Jati,Prashanth Gurunath Shivakumar,Sandeep Nallan Chakravarthula,Panayiotis Georgiou
DOI: https://doi.org/10.1145/2988257.2988261
2016-10-16
Abstract:Automatic classification of depression using audiovisual cues can help towards its objective diagnosis. In this paper, we present a multimodal depression classification system as a part of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We investigate a number of audio and video features for classification with different fusion techniques and temporal contexts. In the audio modality, Teager energy cepstral coefficients~(TECC) outperform standard baseline features; while the best accuracy is achieved with i-vector modelling based on MFCC features. On the other hand, polynomial parameterization of facial landmark features achieves the best performance among all systems and outperforms the best baseline system as well.
What problem does this paper attempt to address?