Abstract:This work focuses on the efficiency of the knowledge distillation approach in generating a lightweight yet powerful BERT based model for natural language processing applications. After the model creation, we applied the resulting model, LastBERT, to a real-world task classifying severity levels of Attention Deficit Hyperactivity Disorder (ADHD)-related concerns from social media text data. Referring to LastBERT, a customized student BERT model, we significantly lowered model parameters from 110 million BERT base to 29 million, resulting in a model approximately 73.64% smaller. On the GLUE benchmark, comprising paraphrase identification, sentiment analysis, and text classification, the student model maintained strong performance across many tasks despite this reduction. The model was also used on a real-world ADHD dataset with an accuracy and F1 score of 85%. When compared to DistilBERT (66M) and ClinicalBERT (110M), LastBERT demonstrated comparable performance, with DistilBERT slightly outperforming it at 87%, and ClinicalBERT achieving 86% across the same metrics. These findings highlight the LastBERT model's capacity to classify degrees of ADHD severity properly, so it offers a useful tool for mental health professionals to assess and comprehend material produced by users on social networking platforms. The study emphasizes the possibilities of knowledge distillation to produce effective models fit for use in resource-limited conditions, hence advancing NLP and mental health diagnosis. Furthermore underlined by the considerable decrease in model size without appreciable performance loss is the lower computational resources needed for training and deployment, hence facilitating greater applicability. Especially using readily available computational tools like Google Colab. This study shows the accessibility and usefulness of advanced NLP methods in pragmatic world applications.

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on two aspects: 1. **Model Compression and Efficiency Improvement**: - Through knowledge distillation technology (Knowledge Distillation), the paper extracts knowledge from a large pre - trained model (BERT Large) to generate a lightweight but powerful student model (LastBERT). This process aims to reduce the number of model parameters, thereby reducing the demand for computing resources, enabling the model to operate efficiently even in resource - constrained environments. - Specifically, the paper reduces the model parameters from 110 million (BERT Base) to 29 million, with the model size reduced by approximately 73.64%, while maintaining strong performance on multiple NLP tasks. 2. **Performance Verification in Practical Applications**: - Researchers apply the generated student model to a real - world task - classifying the severity of attention - deficit/hyperactivity disorder (ADHD) - related problems from social media text data. This task not only verifies the effectiveness of the model in practical applications but also shows its potential value in the field of mental health diagnosis. - Through evaluation on the GLUE benchmark test set and specific application on the ADHD data set, the paper proves the competitiveness of the LastBERT model on multiple NLP tasks, especially performing well in sentiment analysis and text classification tasks. ### Main Contributions: 1. **Developed LastBERT**: A lightweight BERT student model generated through knowledge distillation, having performance comparable to large models while significantly reducing the number of parameters. 2. **Comprehensively Evaluated LastBERT**: Conducted rigorous tests on six GLUE benchmark data sets, verifying its adaptability and robustness in multiple NLP tasks. 3. **Created and Applied ADHD - related Data Set**: Constructed a specialized ADHD - related data set from the Reddit Mental Health data set and used the LastBERT model for severity classification, demonstrating its potential in practical mental health diagnosis. 4. **Detailed Comparison with Existing Models**: Conducted a detailed performance comparison with DistilBERT and ClinicalBERT on the ADHD data set, verifying the effectiveness and applicability of the LastBERT design. ### Conclusion: The paper not only pushes the boundaries of current NLP - based mental health diagnosis research but also provides a foundation for future research, promoting the development of automated and scalable detection and classification methods for various mental health disorders. By reducing the model size without sacrificing performance, the LastBERT model has higher practicality and deployability in resource - constrained environments.

Larger models yield better results? Streamlined severity classification of ADHD-related concerns using BERT-based knowledge distillation

Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains.

Patient Knowledge Distillation for BERT Model Compression

Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models

XtremeDistil: Multi-stage Distillation for Massive Multilingual Models

[Knowledge about genotype-phenotype of the diseases should be coming into pediatrician's horizon].

Improving Knowledge Distillation for BERT Models: Loss Functions, Mapping Methods, and Weight Tuning

Lightweight transformers for clinical natural language processing

Comparative analysis of strategies of knowledge distillation on BERT for text matching

A Novel Text Mining Approach for Mental Health Prediction Using Bi-LSTM and BERT Model

MLKD-BERT: Multi-level Knowledge Distillation for Pre-trained Language Models

Knowledge distillation and data augmentation for NLP light pre-trained models

Extremely Small BERT Models from Mixed-Vocabulary Training

Knowledge Distillation of LLM for Automatic Scoring of Science Education Assessments

On the effectiveness of compact biomedical transformers

Carnitine and carnitine esters in acute renal failure.

MiniALBERT: Model Distillation via Parameter-Efficient Recursive Transformers

Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge Distillation

Harnessing Large Language Models Over Transformer Models for Detecting Bengali Depressive Social Media Text: A Comprehensive Study

Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights

NewsBERT: Distilling Pre-trained Language Model for Intelligent News Application