Abstract:With breakthroughs in pretrained language models, a large number of finetuned models specialized in distinct domains have surfaced online. Yet, when faced with a fresh dataset covering multiple (sub)domains, their performance might degrade. Reusing these available finetuned models to train a new model is a more feasible solution than the finetuning method that demands extensive manual labeling. Knowledge Amalgamation (KA) is such a model reusing technique, which derives a new model (termed student model) by amalgamating those trained models (termed teacher models) tailored for distinct domains, bypassing the need for manual labeling. However, when the domains of text samples are unknown, selecting a number of appropriate teacher models (simply called a combination) for reuse becomes complicated. To learn an accurate student model, the classical KA method resorts to manual selections, a process both tedious and inefficient. Our study pioneers the automation of this combination selection process for KA in the fundamental text classification task, an area previously unexplored. In this paper, we introduce BoKA : an automatic knowledge amalgamation framework for identifying a combination that can learn a superior student model without human labor. Through the lens of Bayesian optimization, BoKA iteratively samples a subset of possible combinations for amalgamation instead of manual selections. Furthermore, we introduce a novel KA method tailored for text classification, which guides the student model using both soft and pseudo-hard labels from the teacher models when their predictions are closely aligned; in cases of significant disagreement, it uses randomly generated labels. Experiments on two public multi-domain datasets show that BoKA achieves remarkable efficiency by sampling only up to 5.5% of all potential combinations. Moreover, BoKA is capable of matching or even surpassing leading zero-shot large language models, despite having dozens of times fewer parameters.

Research of Weibo Text Classification Based on Knowledge Distillation and Joint Model

Cross-domain knowledge distillation for text classification

BoKA: Bayesian Optimization Based Knowledge Amalgamation for Multi-unknown-domain Text Classification

RoBERTa-wwm-ext Fine-Tuning for Chinese Text Classification

Chinese text classification by combining Chinese-BERTology-wwm and GCN

Research on Text Classification Based on BERT-BiGRU Model

KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation

Comparative analysis of strategies of knowledge distillation on BERT for text matching

Mandarin Text-to-Speech Front-End with Lightweight Distilled Convolution Network

A Chinese Text Classification Method Based on BERT and Convolutional Neural Network

Weibo Text Sentiment Analysis Based on BERT and Deep Learning

Improving BERT-Based Text Classification With Auxiliary Sentence and Domain Knowledge

News text classification based on hybrid model of Bidirectional Encoder Representation from Transformers and Convolutional Neural Network

BERTtoCNN: Similarity-preserving enhanced knowledge distillation for stance detection

The Automatic Text Classification Method Based on BERT and Feature Union

Adversarial Self-Supervised Data-Free Distillation for Text Classification

Feature-Enhanced Nonequilibrium Bidirectional Long Short-Term Memory Model for Chinese Text Classification

Deep learning-based text knowledge classification for whole-process engineering consulting standards

NewsBERT: Distilling Pre-trained Language Model for Intelligent News Application