Mental Disorders Detection in the Era of Large Language Models

Gleb Kuzmin,Petr Strepetov,Maksim Stankevich,Artem Shelmanov,Ivan Smirnov
2024-10-16
Abstract:This paper compares the effectiveness of traditional machine learning methods, encoder-based models, and large language models (LLMs) on the task of detecting depression and anxiety. Five datasets were considered, each differing in format and the method used to define the target pathology class. We tested AutoML models based on linguistic features, several variations of encoder-based Transformers such as BERT, and state-of-the-art LLMs as pathology classification models. The results demonstrated that LLMs outperform traditional methods, particularly on noisy and small datasets where training examples vary significantly in text length and genre. However, psycholinguistic features and encoder-based models can achieve performance comparable to language models when trained on texts from individuals with clinically confirmed depression, highlighting their potential effectiveness in targeted clinical applications.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the effectiveness of detecting depression and anxiety through text analysis in the era of large - scale language models. Specifically, the paper compares the performance of traditional machine - learning methods, encoder - based models (such as BERT), and large language models (LLMs) in the task of identifying depression and anxiety. The study uses five different datasets, which vary in format and the way of defining target pathological categories. ### Research Background In recent years, research on detecting mental disorders and patients' emotions through text analysis and machine learning has attracted increasing attention. Advances in natural language processing technology have provided new opportunities for screening, monitoring, early detection, and prevention of the negative consequences of mental disorders. Although some studies use interviews and offline text data, most of the materials come from social media. These studies mainly focus on conditions such as depression, anxiety, stress, suicidal tendencies, post - traumatic stress disorder, and anorexia. Methods for predicting mental states from text are usually divided into traditional machine - learning methods (using hand - crafted language features) and various forms of deep - learning methods. Deep - learning methods tend to be more accurate when there are sufficient data samples, while traditional machine - learning produces more interpretable results. ### Research Objectives This article aims to compare the performance of language features, encoder - based models, and large language models (LLMs) in the task of identifying mental disorders. The study considers two types of mental states: depression and anxiety, and uses several Russian - language text datasets that vary in text format and the way of pathological detection. ### Main Contributions 1. **Performance Improvement**: On one dataset, we surpassed the existing state - of - the - art depression detection methods and established classification baselines on three previously unstudied anxiety datasets. 2. **Comprehensive Comparison**: We conducted a thorough comparison of different model groups in the tasks of detecting depression and anxiety in the Russian context, which can provide references for future experiments for practitioners in this field. 3. **Model Transferability**: We studied the model transferability from tasks targeted at clinical diagnosis to tasks targeted at questionnaire results to alleviate the problem of lack of clinically - validated data in mental disorder detection tasks. ### Research Methods 1. **Language Features**: 113 language features were used, including morphological, syntactic, and lexical parameters, as well as various psycholinguistic coefficients. 2. **Encoder Models**: Pre - trained multilingual BERT, RuBERT, RuBioRoBERTa, and RuRoberta - large models were used. 3. **Large Language Models**: Experiments were carried out with multiple settings, including 0 - shot and 5 - shot prompts, and small self - hosted open - source models fine - tuned with LoRA. ### Experimental Results - **DE Dataset**: On the paper - writing dataset (DE), the fine - tuned LLM model achieved the best F1 - macro score (88.4%), and the F1 score for pathological categories was 81.1%. The model based on language features also performed well, with an F1 - macro score of 85.8% and an F1 score for pathological categories of 77.0%. - **DSM Dataset**: The Vikhr 7B IT 0.4 model achieved an F1 - macro score of 66.1% in the 5 - shot setting, while traditional machine - learning methods and encoder models performed poorly, with an F1 - macro score of about 53%. ### Conclusions The study shows that large language models perform well in detecting depression and anxiety, especially on datasets with high noise and small sample sizes. However, psycholinguistic features and encoder - based models can also achieve performance comparable to language models when trained on texts of clinically - diagnosed depression, showing their potential in specific clinical applications.