Abstract:This work focuses on the efficiency of the knowledge distillation approach in generating a lightweight yet powerful BERT based model for natural language processing applications. After the model creation, we applied the resulting model, LastBERT, to a real-world task classifying severity levels of Attention Deficit Hyperactivity Disorder (ADHD)-related concerns from social media text data. Referring to LastBERT, a customized student BERT model, we significantly lowered model parameters from 110 million BERT base to 29 million, resulting in a model approximately 73.64% smaller. On the GLUE benchmark, comprising paraphrase identification, sentiment analysis, and text classification, the student model maintained strong performance across many tasks despite this reduction. The model was also used on a real-world ADHD dataset with an accuracy and F1 score of 85%. When compared to DistilBERT (66M) and ClinicalBERT (110M), LastBERT demonstrated comparable performance, with DistilBERT slightly outperforming it at 87%, and ClinicalBERT achieving 86% across the same metrics. These findings highlight the LastBERT model's capacity to classify degrees of ADHD severity properly, so it offers a useful tool for mental health professionals to assess and comprehend material produced by users on social networking platforms. The study emphasizes the possibilities of knowledge distillation to produce effective models fit for use in resource-limited conditions, hence advancing NLP and mental health diagnosis. Furthermore underlined by the considerable decrease in model size without appreciable performance loss is the lower computational resources needed for training and deployment, hence facilitating greater applicability. Especially using readily available computational tools like Google Colab. This study shows the accessibility and usefulness of advanced NLP methods in pragmatic world applications.

Distillation for Text Classification Task Based on BERT

Chinese Text Classification Using BERT and Flat-Lattice Transformer.

Comparative analysis of strategies of knowledge distillation on BERT for text matching

Mandarin Text-to-Speech Front-End with Lightweight Distilled Convolution Network

Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation

Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation

Research of Weibo Text Classification Based on Knowledge Distillation and Joint Model

Long Text Classification Based on BERT

NewsBERT: Distilling Pre-trained Language Model for Intelligent News Application

How to Fine-Tune BERT for Text Classification?

Global Semantic Information Extraction Model for Chinese long text classification based on fine-tune BERT

[Knowledge about genotype-phenotype of the diseases should be coming into pediatrician's horizon].

Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT

Improving BERT-Based Text Classification With Auxiliary Sentence and Domain Knowledge

Larger models yield better results? Streamlined severity classification of ADHD-related concerns using BERT-based knowledge distillation

Single task fine-tune BERT for text classification

The Automatic Text Classification Method Based on BERT and Feature Union

Research on Text Classification Based on BERT-BiGRU Model

DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling

Can depth-adaptive BERT perform better on binary classification tasks

Deploying a BERT-based Query-Title Relevance Classifier in a Production System: a View from the Trenches