Leveraging Neighbor Attention Initialization (NAI) for Efficient Training of Pretrained LLMs

Qiao Tan,Jingjing Zhang
DOI: https://doi.org/10.3390/electronics13081550
IF: 2.9
2024-04-20
Electronics
Abstract:In the realm of pretrained language models (PLMs), the exponential increase in computational resources and time required for training as model sizes expand presents a significant challenge. This paper proposes an innovative approach named neighbor attention initialization (NAI) to expedite the training process of larger PLMs by leveraging smaller PLMs through parameter initialization. Our methodology hinges on the hypothesis that smaller PLMs, having already learned fundamental language structures and patterns, can provide a robust foundational knowledge base for larger models, which is called function preserving. Specifically, we present a comprehensive framework detailing the process of transferring learned features on transformer-based language models mainly using the neighbor attention head and neighbor layer. We conducted experiments in GPT-2 and demonstrated that our method yields considerable savings in training costs compared to standard approaches, including learning from scratch and bert2BERT, indicating a notable improvement in training efficiency for large PLMs.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?