BoKA: Bayesian Optimization Based Knowledge Amalgamation for Multi-unknown-domain Text Classification
Linzhu Yu,Huan Li,Ke Chen,Lidan Shou
DOI: https://doi.org/10.1145/3637528.3671963
2024-01-01
Abstract:With breakthroughs in pretrained language models, a large number of finetuned models specialized in distinct domains have surfaced online. Yet, when faced with a fresh dataset covering multiple (sub)domains, their performance might degrade. Reusing these available finetuned models to train a new model is a more feasible solution than the finetuning method that demands extensive manual labeling. Knowledge Amalgamation (KA) is such a model reusing technique, which derives a new model (termed student model) by amalgamating those trained models (termed teacher models) tailored for distinct domains, bypassing the need for manual labeling. However, when the domains of text samples are unknown, selecting a number of appropriate teacher models (simply called a combination) for reuse becomes complicated. To learn an accurate student model, the classical KA method resorts to manual selections, a process both tedious and inefficient. Our study pioneers the automation of this combination selection process for KA in the fundamental text classification task, an area previously unexplored. In this paper, we introduce BoKA : an automatic knowledge amalgamation framework for identifying a combination that can learn a superior student model without human labor. Through the lens of Bayesian optimization, BoKA iteratively samples a subset of possible combinations for amalgamation instead of manual selections. Furthermore, we introduce a novel KA method tailored for text classification, which guides the student model using both soft and pseudo-hard labels from the teacher models when their predictions are closely aligned; in cases of significant disagreement, it uses randomly generated labels. Experiments on two public multi-domain datasets show that BoKA achieves remarkable efficiency by sampling only up to 5.5% of all potential combinations. Moreover, BoKA is capable of matching or even surpassing leading zero-shot large language models, despite having dozens of times fewer parameters.