Abstract:Nowadays, depression heavily affects humans' physical and mental health. Depression occurs due to changes in mood, loss of interest, and stress, which leads to self-harm events and suicide. Thus analyzing depression is very important to reduce suicidal acts. In recent years, automatic depression evaluation has been developed in computer vision technology. Several models were investigated for depression analysis, but they are limited only to video and audio data analysis. In this paper, hybrid Artificial Intelligence (AI) based Multi-modal depression analysis was proposed in which the severity of depression from multi-modal data such as video, audio and text descriptors are extracted. Initially, the proposed approach estimates the Patient Health Questionnaire (PHQ) depression scale by a hybrid framework Residual Network based Deep Neural Network (D-ResNet), which computes the PHQ-8 score from video and audio features. Then, Paragraph Vector Kernel Extreme Learning Machine (PV-KELM) is developed to infer the mental and physical states of the individuals related to the psychoanalytic features of depression. It recognizes the absence (or) presence of the measured psychoanalytic symptoms. Finally, the estimated PHQ-8 score and psychoanalytic symptoms are extracted from the Residual Network based Deep Neural Network and the Paragraph Vector based Kernel Extreme Learning Machine, which is fed together into the ensemble classifier. In the ensemble classifier, three classifiers are used, namely Support Vector Machine (SVM), Naive-Bayes (NB), and Decision Tree (DT) classifier, to classify whether the individual is depressed or not. The proposed approach is implemented in PYTHON software, and the experiments will be carried out using the Distress Analysis Interview Corpus-Wizard of -OZ interview depression dataset. By using the proposed approach, the accuracy, precision, recall, F-measure, RMSE, MAE, JSD and Contextual similarity obtained are 0.89, 0.86, 0.86 and 0.86, 0.373, 0.35, 0.355 and 0.689 respectively. Our proposed approach has been compared with the state-of-the-art approaches, and the performance result shows the efficiency of the proposed approach.

A Multimodal Approach for Detection and Assessment of Depression Using Text, Audio and Video

Multi-Modal and Multi-Task Depression Detection with Sentiment Assistance

An Intra- and Inter-Emotion Transformer-Based Fusion Model with Homogeneous and Diverse Constraints Using Multi-Emotional Audiovisual Features for Depression Detection.

Automatic Assessment of Depression from Speech Via a Hierarchical Attention Transfer Network and Attention Autoencoders

Automatic Depression Prediction Via Cross-Modal Attention-Based Multi-Modal Fusion in Social Networks

Hierarchical Attention Transfer Networks for Depression Assessment from Speech

Multimodal Sensing for Depression Risk Detection: Integrating Audio, Video, and Text Data

Textual-dominated Multimodal Depression Detection

End-to-end multimodal system for depression detection from online recordings

Fusing Multi-Level Features from Audio and Contextual Sentence Embedding from Text for Interview-Based Depression Detection

Multimodal Measurement of Depression Using Deep Learning Models

Unaligned Multimodal Sequences for Depression Assessment From Speech

Enhancing depression detection: A multimodal approach with text extension and content fusion

Depression Scale Recognition from Audio, Visual and Text Analysis

Prediction of Depression Severity Based on the Prosodic and Semantic Features with Bidirectional LSTM and Time Distributed CNN

Additive Cross-Modal Attention Network (ACMA) for Depression Detection Based on Audio and Textual Features

Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

D-ResNet-PVKELM: deep neural network and paragraph vector based kernel extreme machine learning model for multimodal depression analysis

Multimodal Depression Detection based on Factorized Representation

Automatic recognition of depression based on audio and video: A review

Multimodal Spatiotemporal Representation for Automatic Depression Level Detection