A textual-based featuring approach for depression detection using machine learning classifiers and social media texts

Raymond Chiong,Gregorius Satia Budhi,Sandeep Dhakal,Fabian Chiong
DOI: https://doi.org/10.1016/j.compbiomed.2021.104499
IF: 7.7
2021-08-01
Computers in Biology and Medicine
Abstract:<p>Depression is one of the leading causes of suicide worldwide. However, a large percentage of cases of depression go undiagnosed and, thus, untreated. Previous studies have found that messages posted by individuals with major depressive disorder on social media platforms can be analysed to predict if they are suffering, or likely to suffer, from depression. This study aims to determine whether machine learning could be effectively used to detect signs of depression in social media users by analysing their social media posts—especially when those messages do not explicitly contain specific keywords such as 'depression' or 'diagnosis'. To this end, we investigate several text preprocessing and textual-based featuring methods along with machine learning classifiers, including single and ensemble models, to propose a generalised approach for depression detection using social media texts. We first use two public, labelled Twitter datasets to train and test the machine learning models, and then another three non-Twitter depression-class-only datasets (sourced from Facebook, Reddit, and an electronic diary) to test the performance of our trained models against other social media sources. Experimental results indicate that the proposed approach is able to effectively detect depression via social media texts even when the training datasets do not contain specific keywords (such as 'depression' and 'diagnose'), as well as when unrelated datasets are used for testing.</p>
engineering, biomedical,computer science, interdisciplinary applications,mathematical & computational biology,biology
What problem does this paper attempt to address?