DepreSym: A Depression Symptom Annotated Corpus and the Role of LLMs as Assessors of Psychological Markers

Anxo Pérez,Marcos Fernández-Pichel,Javier Parapar,David E. Losada
2023-08-21
Abstract:Computational methods for depression detection aim to mine traces of depression from online publications posted by Internet users. However, solutions trained on existing collections exhibit limited generalisation and interpretability. To tackle these issues, recent studies have shown that identifying depressive symptoms can lead to more robust models. The eRisk initiative fosters research on this area and has recently proposed a new ranking task focused on developing search methods to find sentences related to depressive symptoms. This search challenge relies on the symptoms specified by the Beck Depression Inventory-II (BDI-II), a questionnaire widely used in clinical practice. Based on the participant systems' results, we present the DepreSym dataset, consisting of 21580 sentences annotated according to their relevance to the 21 BDI-II symptoms. The labelled sentences come from a pool of diverse ranking methods, and the final dataset serves as a valuable resource for advancing the development of models that incorporate depressive markers such as clinical symptoms. Due to the complex nature of this relevance annotation, we designed a robust assessment methodology carried out by three expert assessors (including an expert psychologist). Additionally, we explore here the feasibility of employing recent Large Language Models (ChatGPT and GPT4) as potential assessors in this complex task. We undertake a comprehensive examination of their performance, determine their main limitations and analyze their role as a complement or replacement for human annotators.
Computation and Language,Information Retrieval
What problem does this paper attempt to address?
The problem this paper attempts to address is the limitations of existing depression detection methods in terms of generalization ability and interpretability. Specifically, the paper points out that current depression detection solutions trained on online content perform poorly in these aspects. To overcome these issues, recent research shows that identifying depressive symptoms can improve the robustness of the models. To this end, the paper introduces the DepreSym dataset, which is a dataset containing 21,580 sentences, each annotated with its relevance to the 21 depressive symptoms in the Beck Depression Inventory-II (BDI-II). These annotated sentences come from various ranking methods, and the final dataset provides a valuable resource for developing depression detection models that incorporate clinical symptoms. Additionally, the paper explores the feasibility of recent large language models (such as ChatGPT and GPT-4) as potential evaluators. By comprehensively assessing their performance, the paper identifies their main limitations and analyzes the possibility of these models serving as a supplement or replacement for human annotators.