Abstract:This paper compares the effectiveness of traditional machine learning methods, encoder-based models, and large language models (LLMs) on the task of detecting depression and anxiety. Five datasets were considered, each differing in format and the method used to define the target pathology class. We tested AutoML models based on linguistic features, several variations of encoder-based Transformers such as BERT, and state-of-the-art LLMs as pathology classification models. The results demonstrated that LLMs outperform traditional methods, particularly on noisy and small datasets where training examples vary significantly in text length and genre. However, psycholinguistic features and encoder-based models can achieve performance comparable to language models when trained on texts from individuals with clinically confirmed depression, highlighting their potential effectiveness in targeted clinical applications.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the effectiveness of detecting depression and anxiety through text analysis in the era of large - scale language models. Specifically, the paper compares the performance of traditional machine - learning methods, encoder - based models (such as BERT), and large language models (LLMs) in the task of identifying depression and anxiety. The study uses five different datasets, which vary in format and the way of defining target pathological categories. ### Research Background In recent years, research on detecting mental disorders and patients' emotions through text analysis and machine learning has attracted increasing attention. Advances in natural language processing technology have provided new opportunities for screening, monitoring, early detection, and prevention of the negative consequences of mental disorders. Although some studies use interviews and offline text data, most of the materials come from social media. These studies mainly focus on conditions such as depression, anxiety, stress, suicidal tendencies, post - traumatic stress disorder, and anorexia. Methods for predicting mental states from text are usually divided into traditional machine - learning methods (using hand - crafted language features) and various forms of deep - learning methods. Deep - learning methods tend to be more accurate when there are sufficient data samples, while traditional machine - learning produces more interpretable results. ### Research Objectives This article aims to compare the performance of language features, encoder - based models, and large language models (LLMs) in the task of identifying mental disorders. The study considers two types of mental states: depression and anxiety, and uses several Russian - language text datasets that vary in text format and the way of pathological detection. ### Main Contributions 1. **Performance Improvement**: On one dataset, we surpassed the existing state - of - the - art depression detection methods and established classification baselines on three previously unstudied anxiety datasets. 2. **Comprehensive Comparison**: We conducted a thorough comparison of different model groups in the tasks of detecting depression and anxiety in the Russian context, which can provide references for future experiments for practitioners in this field. 3. **Model Transferability**: We studied the model transferability from tasks targeted at clinical diagnosis to tasks targeted at questionnaire results to alleviate the problem of lack of clinically - validated data in mental disorder detection tasks. ### Research Methods 1. **Language Features**: 113 language features were used, including morphological, syntactic, and lexical parameters, as well as various psycholinguistic coefficients. 2. **Encoder Models**: Pre - trained multilingual BERT, RuBERT, RuBioRoBERTa, and RuRoberta - large models were used. 3. **Large Language Models**: Experiments were carried out with multiple settings, including 0 - shot and 5 - shot prompts, and small self - hosted open - source models fine - tuned with LoRA. ### Experimental Results - **DE Dataset**: On the paper - writing dataset (DE), the fine - tuned LLM model achieved the best F1 - macro score (88.4%), and the F1 score for pathological categories was 81.1%. The model based on language features also performed well, with an F1 - macro score of 85.8% and an F1 score for pathological categories of 77.0%. - **DSM Dataset**: The Vikhr 7B IT 0.4 model achieved an F1 - macro score of 66.1% in the 5 - shot setting, while traditional machine - learning methods and encoder models performed poorly, with an F1 - macro score of about 53%. ### Conclusions The study shows that large language models perform well in detecting depression and anxiety, especially on datasets with high noise and small sample sizes. However, psycholinguistic features and encoder - based models can also achieve performance comparable to language models when trained on texts of clinically - diagnosed depression, showing their potential in specific clinical applications.

Mental Disorders Detection in the Era of Large Language Models

Large Language Model for Mental Health: A Systematic Review

Identifying Psychiatric Manifestations in Outpatients with Depression and Anxiety: A Large Language Model-Based Approach

An Assessment on Comprehending Mental Health through Large Language Models

Evaluating Large Language Models for Anxiety and Depression Classification using Counseling and Psychotherapy Transcripts

A Comprehensive Evaluation of Large Language Models on Mental Illnesses

Still Not Quite There! Evaluating Large Language Models for Comorbid Mental Health Diagnosis

Large Language Models for Mental Health Applications: Systematic Review

Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data

Depression Detection on Social Media with Large Language Models

Large Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums

PsyEval: A Comprehensive Large Language Model Evaluation Benchmark for Mental Health

The Applications of Large Language Models in Mental Health: A Scoping Review (Preprint)

Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language

Challenges of Large Language Models for Mental Health Counseling

When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection

Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia

Large Language Models in Mental Health Care: a Scoping Review

PsyEval: A Suite of Mental Health Related Tasks for Evaluating Large Language Models

Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental Health