Advancing Domain Adaptation of BERT by Learning Domain Term Semantics.

Jian Yang,Xinyu Hu,Weichun Huang,Hao Yuan,Yulong Shen,Gang Xiao
DOI: https://doi.org/10.1007/978-3-031-40292-0_2
2023-01-01
Abstract:Pre-trained Language Models, such as BERT, have recently experienced a significant advancement, enhancing state-of-the-art performance across various Natural Language Processing (NLP) tasks. However, these models yield an unsatisfactory results in domain scenarios, particularly in specialized fields like biomedical contexts, where they cannot amass sufficient semantics of domain terms. To tackle this problem, we present a semantic learning method for BERT, focusing on the biomedical domain, to acquire and inject biomedical term semantics. Specifically, we first use BERT to encode the definitions of biomedical terms, acquiring their semantics and storing them as embeddings. Next, we design a contrastive learning task based on these embeddings to inject semantics, facilitating the transfer of domain term semantics from term embeddings to BERT’s vocabulary. This process narrows the semantic gap between the original vocabulary and domain terms in the embedding space. We evaluate our method on both general and biomedical NLP tasks, and experimental results demonstrate a significant improvement in BERT’s performance across all biomedical NLP tasks without affecting its performance on general tasks.
What problem does this paper attempt to address?