Continual Post-Training of Language Models

Zixuan Ke,Haowei Lin,Yijia Shao,Tatsuya Konishi,Gyuhak Kim,Bing Liu
2023-01-01
Abstract:Language models (LMs) have been instrumental for the recent rapid advance of natural language processing. Existing research has shown that post-training or adapting an LM using an unlabeled topical/domain corpus can improve the end-task performance in the domain. This paper proposes a novel method to continually post-train an LM with a sequence of unlabeled domain corpora to adapt the LMto these domains to improve their end-task performances. The key novelty of our method is a soft-masking mechanism that directly controls the update to the LM. A novel proxy is also proposed to preserve the general knowledge in the original LM. Additionally, it contrasts the representations of the previously learned domain knowledge (including the general knowledge in pre-trained LM) and the knowledge from the current full network to achieve knowledge integration. The method not only overcomes catastrophic forgetting, but also achieves knowledge transfer to improve end-task performances compared to post-training each domain separately. Empirical evaluation demonstrates the effectiveness of the proposed method.
What problem does this paper attempt to address?