Abstract:Due to massive adoption of social media, detection of users' depression through social media analytics bears significant importance, particularly for underrepresented languages, such as Bangla. This study introduces a well-grounded approach to identify depressive social media posts in Bangla, by employing advanced natural language processing techniques. The dataset used in this work, annotated by domain experts, includes both depressive and non-depressive posts, ensuring high-quality data for model training and evaluation. To address the prevalent issue of class imbalance, we utilised random oversampling for the minority class, thereby enhancing the model's ability to accurately detect depressive posts. We explored various numerical representation techniques, including Term Frequency-Inverse Document Frequency (TF-IDF), Bidirectional Encoder Representations from Transformers (BERT) embedding and FastText embedding, by integrating them with a deep learning-based Convolutional Neural Network-Bidirectional Long Short-Term Memory (CNN-BiLSTM) model. The results obtained through extensive experimentation, indicate that the BERT approach performed better the others, achieving a F1-score of 84%. This indicates that BERT, in combination with the CNN-BiLSTM architecture, effectively recognises the nuances of Bangla texts relevant to depressive contents. Comparative analysis with the existing state-of-the-art methods demonstrates that our approach with BERT embedding performs better than others in terms of evaluation metrics and the reliability of dataset annotations. Our research significantly contribution to the development of reliable tools for detecting depressive posts in the Bangla language. By highlighting the efficacy of different embedding techniques and deep learning models, this study paves the way for improved mental health monitoring through social media platforms.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenge of detecting depression - related posts in Bengali. Specifically, the author focuses on identifying users' depressive emotions through social media analysis, especially for under - represented languages like Bengali. The research aims to improve the ability to recognize Bengali depression posts by applying advanced natural language processing techniques. The paper uses a data set annotated by domain experts, including depression and non - depression posts, to ensure high - quality data for model training and evaluation. To address the problem of class imbalance, the author adopts the method of random oversampling to increase the sample size of the minority class, thereby improving the model's ability to accurately detect depression posts. The research explores multiple numerical representation techniques, including term frequency - inverse document frequency (TF - IDF), BERT embeddings and FastText embeddings, and combines these techniques with a deep - learning - based convolutional neural network - bidirectional long - short - term memory (CNN - BiLSTM) model. The experimental results show that the BERT method outperforms other methods in performance, achieving an F1 score of 84%, indicating that the BERT combined with the CNN - BiLSTM architecture can effectively identify depression - related content in Bengali texts. ### Main contributions of the paper: 1. **Effectively handling class imbalance**: By randomly oversampling, the problem of insufficient samples in the minority class in the data set is solved, ensuring that the model performs better when predicting the minority class. 2. **Improving the performance of text representation techniques**: The research shows that although TF - IDF can effectively capture key features, BERT embeddings provide a more comprehensive understanding of the text, especially in capturing the subtle semantics of Bengali depression posts. 3. **Proposing a novel custom - made CNN - BiLSTM model**: This model combines a convolutional neural network (CNN) and a bidirectional long - short - term memory network (BiLSTM), and is able to capture local patterns and long - term dependencies, thereby achieving high - precision prediction of depression posts. ### Formula explanations: - **TF - IDF formula**: - \( \text{TF}(t, d)=\frac{\text{Number of times } t \text{ appears in document } d}{\text{Total number of terms in document } d} \) - \( \text{IDF}(t)=\log \left( \frac{\text{Total number of documents}}{\text{Number of documents containing term } t} \right) \) - \( \text{TF - IDF}(t, d)=\text{TF}(t, d)\times \text{IDF}(t) \) Through these methods and techniques, the paper makes an important contribution to the development of reliable tools for detecting depression posts on Bengali social media, which helps to improve mental health monitoring.

Enhancing Depressive Post Detection in Bangla: A Comparative Study of TF-IDF, BERT and FastText Embeddings

Detection of Depression Severity Using Bengali Social Media Posts on Mental Health: Study Using Natural Language Processing Techniques

Harnessing Large Language Models Over Transformer Models for Detecting Bengali Depressive Social Media Text: A Comprehensive Study

Novel Transformer Based Contextualized Embedding and Probabilistic Features for Depression Detection From Social Media

Detecting Level of Depression from Social Media Posts for the Low-resource Bengali Language

Depression Prediction using Machine Learning Algorithms

A Novel Text Mining Approach for Mental Health Prediction Using Bi-LSTM and BERT Model

Linguistic Analysis of Hindi-English Mixed Tweets for Depression Detection

Depression detection from social network data using machine learning techniques

Early Depression Detection from Social Network Using Deep Learning Techniques

A Hybrid BERT-CNN Approach for Depression Detection on Social Media Using Multimodal Data

An hybrid deep learning approach for depression prediction from user tweets using feature-rich CNN and bi-directional LSTM

Depression Detection by Analyzing Social Media Posts of User

Depression detection in social media posts using transformer-based models and auxiliary features

A comprehensive empirical analysis on cross-domain semantic enrichment for detection of depressive language

A textual-based featuring approach for depression detection using machine learning classifiers and social media texts

Enhancing depression detection: A multimodal approach with text extension and content fusion

DEPTWEET: A typology for social media texts to detect depression severities

An ensemble approach to detect depression from social media platform: E-CLS

Advancing Depression Detection on Social Media Platforms Through Fine-Tuned Large Language Models