Utilizing Neural Networks and Linguistic Metadata for Early Detection of Depression Indications in Text Sequences

Marcel Trotzek,Sven Koitka,Christoph M. Friedrich
DOI: https://doi.org/10.1109/TKDE.2018.2885515
2018-12-21
Abstract:Depression is ranked as the largest contributor to global disability and is also a major reason for suicide. Still, many individuals suffering from forms of depression are not treated for various reasons. Previous studies have shown that depression also has an effect on language usage and that many depressed individuals use social media platforms or the internet in general to get information or discuss their problems. This paper addresses the early detection of depression using machine learning models based on messages on a social platform. In particular, a convolutional neural network based on different word embeddings is evaluated and compared to a classification based on user-level linguistic metadata. An ensemble of both approaches is shown to achieve state-of-the-art results in a current early detection task. Furthermore, the currently popular ERDE score as metric for early detection systems is examined in detail and its drawbacks in the context of shared tasks are illustrated. A slightly modified metric is proposed and compared to the original score. Finally, a new word embedding was trained on a large corpus of the same domain as the described task and is evaluated as well.
Computation and Language,Information Retrieval
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to detect signs of depression in text sequences as early as possible through machine - learning methods, especially Convolutional Neural Networks (CNN) and user - based language metadata. Specifically, researchers hope to identify individuals who may be suffering from depression by analyzing messages on social media platforms. The methods proposed in the paper include not only the use of different word - embedding techniques, but also the combination of user - level language metadata analysis to improve the accuracy of early detection of depression. In addition, the paper also examines in detail the currently popular evaluation metric for early detection systems - the ERDE (Early Risk Detection Error) score, and points out its shortcomings in shared tasks. The author proposes a slightly modified evaluation metric and compares it with the original ERDE score. Finally, the paper also introduces a new word - embedding method, which is trained on a large - scale corpus in the same field as the task, and evaluates its effectiveness. Through these methods and improvements, the paper aims to provide a more effective and accurate early - detection scheme for depression, which is helpful for timely discovery and intervention of depression patients, thereby reducing the social burden brought by depression.