Abstract:Background: The rise of depression, anxiety, and suicide rates has led to increased demand for telemedicine-based mental health screening and remote patient monitoring (RPM) solutions to alleviate the burden on, and enhance the efficiency of, mental health practitioners. Multimodal dialog systems (MDS) that conduct on-demand, structured interviews offer a scalable and cost-effective solution to address this need. Objective: This study evaluates the feasibility of a cloud based MDS agent, Tina, for mental state characterization in participants with depression, anxiety, and suicide risk. Method: Sixty-eight participants were recruited through an online health registry and completed 73 sessions, with 15 (20.6%), 21 (28.8%), and 26 (35.6%) sessions screening positive for depression, anxiety, and suicide risk, respectively using conventional screening instruments. Participants then interacted with Tina as they completed a structured interview designed to elicit calibrated, open-ended responses regarding the participants' feelings and emotional state. Simultaneously, the platform streamed their speech and video recordings in real-time to a HIPAA-compliant cloud server, to compute speech, language, and facial movement-based biomarkers. After their sessions, participants completed user experience surveys. Machine learning models were developed using extracted features and evaluated with the area under the receiver operating characteristic curve (AUC). Results: For both depression and suicide risk, affected individuals tended to have a higher percent pause time, while those positive for anxiety showed reduced lip movement relative to healthy controls. In terms of single-modality classification models, speech features performed best for depression (AUC = 0.64; 95% CI = 0.51-0.78), facial features for anxiety (AUC = 0.57; 95% CI = 0.43-0.71), and text features for suicide risk (AUC = 0.65; 95% CI = 0.52-0.78). Best overall performance was achieved by decision fusion of all models in identifying suicide risk (AUC = 0.76; 95% CI = 0.65-0.87). Participants reported the experience comfortable and shared their feelings. Conclusion: MDS is a feasible, useful, effective, and interpretable solution for RPM in real-world clinical depression, anxiety, and suicidal populations. Facial information is more informative for anxiety classification, while speech and language are more discriminative of depression and suicidality markers. In general, combining speech, language, and facial information improved model performance on all classification tasks.

Seeing and hearing what has not been said; A multimodal client behavior classifier in Motivational Interviewing with interpretable fusion

Multimodal Automatic Coding of Client Behavior in Motivational Interviewing

M3TCM: Multi-modal Multi-task Context Model for Utterance Classification in Motivational Interviews

Modality-invariant Temporal Representation Learning for Multimodal Sentiment Classification

EMMI -- Empathic Multimodal Motivational Interviews Dataset: Analyses and Annotations

Eliciting Motivational Interviewing Skill Codes in Psychotherapy with LLMs: A Bilingual Dataset and Analytical Study

A Comparison of Natural Language Processing Methods for Automated Coding of Motivational Interviewing

Unlocking LLMs: Addressing Scarce Data and Bias Challenges in Mental Health

A multimodal approach for modeling engagement in conversation

Multimodal Emotional Classification Based on Meaningful Learning

TextMI: Textualize Multimodal Information for Integrating Non-verbal Cues in Pre-trained Language Models

Automated Behavioral Coding to Enhance the Effectiveness of Motivational Interviewing in a Chat-Based Suicide Prevention Helpline: Secondary Analysis of a Clinical Trial

Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes

Modeling Motivational Interviewing Strategies on an Online Peer-to-Peer Counseling Platform

Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module

MOCA: A Motivational Online Conversational Agent for Improving Student Engagement in Collaborative Learning.

Rethinking the Alignment of Psychotherapy Dialogue Generation with Motivational Interviewing Strategies

A multimodal dialog approach to mental state characterization in clinically depressed, anxious, and suicidal populations

Towards Multimodal Emotional Support Conversation Systems

A study of the effectiveness of machine learning methods for classification of clinical interview fragments into a large number of categories