Audio Visual Multimodal Classification of Bipolar Disorder Episodes

Yan Li,Le Yang,Haifeng Chen,Dongmei Jiang,Hichem Sahli
DOI: https://doi.org/10.1109/aciiw.2019.8925023
2019-01-01
Abstract:Bipolar disorder is a highly prevalent and complex medical syndrome of multifactorial origin. In this paper, we propose an audio visual multi-modal framework for classifying the different episodes (Remission, Hypomania or Mania) of bipolar disorder. To represent the temporal dynamics of face and body poses, we propose to compute the Motion History Histogram (MHH) of facial landmarks as well as Histogram of Displacement Range (HDR) of body keypoints as the visual features. For audio features, functionals of the low level descriptors (LLDs) of speech are computed as global features. Each feature stream is input into a Convolutional Neural Network (CNN) to get the initial classification result of the patient's episode, which are then concatenated into a vector and fed into a random forest for the final classification. Experimental results on the development set of Audio Visual Emotion Challenge (AVEC2018) Bipolar Disorder Sub-Challenge demonstrate that the proposed visual features and bipolar disorder classification framework achieve promising results with the unweighted average recall (UAR) reaching 0.749, which is better or comparable with the state of the art results.
What problem does this paper attempt to address?