Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model

Habib Hajimolahoseini,Mohammad Hassanpour,Foozhan Ataiefard,Boxing Chen,Yang Liu
2024-06-28
Abstract:This paper introduces a novel method of Progressive Low Rank Decomposition (PLRD) tailored for the compression of large language models. Our approach leverages a pre-trained model, which is then incrementally decompressed to smaller sizes using progressively lower ranks. This method allows for significant reductions in computational overhead and energy consumption, as subsequent models are derived from the original without the need for retraining from scratch. We detail the implementation of PLRD, which strategically decreases the tensor ranks, thus optimizing the trade-off between model performance and resource usage. The efficacy of PLRD is demonstrated through extensive experiments showing that models trained with PLRD method on only 1B tokens maintain comparable performance with traditionally trained models while using 0.1% of the tokens. The versatility of PLRD is highlighted by its ability to generate multiple model sizes from a single foundational model, adapting fluidly to varying computational and memory budgets. Our findings suggest that PLRD could set a new standard for the efficient scaling of LLMs, making advanced AI more feasible on diverse platforms.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the issue of high computational resource and memory consumption faced by large language models (LLMs) during training and deployment. Specifically: 1. **High computational resource and memory consumption**: Existing large language models require a significant amount of computational resources and memory due to their enormous number of parameters. For example, the GPT-3 model has 175 billion parameters and requires 320 GB of storage space, making it impossible for most consumer-grade devices to run these models. 2. **Limited model variants**: To accommodate different computational resources and application scenarios, existing LLMs typically release multiple variants of different sizes (such as Llama2's 7 billion, 13 billion, and 70 billion parameter versions). However, these variants are all trained from scratch, resulting in a very limited number of variants and very high training costs for each variant. 3. **Lack of intermediate-sized models**: If a user's computational resources fall between two variants, they can only choose the smaller variant, which may not be the optimal choice. To address these issues, the paper proposes a new method—Progressive Low Rank Decomposition (PLRD). This method can generate multiple models of different sizes from a pre-trained base model without retraining from scratch. The PLRD method compresses the model by progressively reducing the rank of tensors, significantly reducing computational overhead and energy consumption while maintaining model performance. Experimental results show that models trained using the PLRD method can achieve performance comparable to models trained using traditional methods with only 1 billion tokens.