Chinese MentalBERT: Domain-Adaptive Pre-training on Social Media for Chinese Mental Health Text Analysis

Wei Zhai,Hongzhi Qi,Qing Zhao,Jianqiang Li,Ziqi Wang,Han Wang,Bing Xiang Yang,Guanghui Fu
2024-06-13
Abstract:In the current environment, psychological issues are prevalent and widespread, with social media serving as a key outlet for individuals to share their feelings. This results in the generation of vast quantities of data daily, where negative emotions have the potential to precipitate crisis situations. There is a recognized need for models capable of efficient analysis. While pre-trained language models have demonstrated their effectiveness broadly, there's a noticeable gap in pre-trained models tailored for specialized domains like psychology. To address this, we have collected a huge dataset from Chinese social media platforms and enriched it with publicly available datasets to create a comprehensive database encompassing 3.36 million text entries. To enhance the model's applicability to psychological text analysis, we integrated psychological lexicons into the pre-training masking mechanism. Building on an existing Chinese language model, we performed adaptive training to develop a model specialized for the psychological domain. We evaluated our model's performance across six public datasets, where it demonstrated improvements compared to eight other models. Additionally, in the qualitative comparison experiment, our model provided psychologically relevant predictions given the masked sentences. Due to concerns regarding data privacy, the dataset will not be made publicly available. However, we have made the pre-trained models and codes publicly accessible to the community via: <a class="link-external link-https" href="https://github.com/zwzzzQAQ/Chinese-MentalBERT" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of text analysis in the field of mental health, particularly focusing on text analysis on Chinese social media platforms. Specifically, the main objectives of the paper include: 1. **Filling the Gap in the Field**: Currently, there is a lack of large-scale pre-trained language models specifically tailored for Chinese in the field of mental health. To this end, the research team developed Chinese MentalBERT, the first pre-trained language model specifically designed for mental health analysis in Chinese communities. 2. **Improving Model Performance**: By employing Domain-Adaptive Pre-training and a Lexicon-Guided Masking Mechanism based on a depression lexicon, the model's performance on mental health-related tasks is enhanced. Experimental results show that Chinese MentalBERT outperforms eight other models on multiple public datasets. 3. **Promoting Early Intervention**: By leveraging data extracted from social media, researchers hope that the model can help identify individuals in need of psychological support, thereby enabling early detection and timely intervention strategies. In summary, the core issue of this paper is to develop and optimize a pre-trained language model specifically for mental health analysis, in order to improve the understanding and identification of mental health states on Chinese social media.