LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron,Thibaut Lavril,Gautier Izacard,Xavier Martinet,Marie-Anne Lachaux,Timothée Lacroix,Baptiste Rozière,Naman Goyal,Eric Hambro,Faisal Azhar,Aurelien Rodriguez,Armand Joulin,Edouard Grave,Guillaume Lample
2023-02-28
Abstract:We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
Computation and Language
What problem does this paper attempt to address?
The main problem this paper attempts to address is the development of a set of efficient and open Foundation Language Models (FLMs) that can achieve optimal performance under different scales of inference budgets. Specifically, the goals of the paper include: 1. **Model Performance Optimization**: By training on a large amount of data, the model aims to achieve or exceed the performance of existing top models in various benchmarks. For example, LLaMA-13B surpasses GPT-3 (175B parameters) in most benchmarks, while LLaMA-65B competes with top models like Chinchilla-70B and PaLM-540B. 2. **Dataset Openness**: Unlike many models that rely on proprietary datasets, LLaMA is trained only on publicly available data, making the model open-source and accessible for the research community. 3. **Inference Efficiency**: Choosing models that train faster and infer more efficiently given a performance target. Although larger models may be cheaper to train, smaller models can be more economical during inference. For example, while Hoffmann et al. suggest training a 10B parameter model with 200B tokens, the authors find that a 7B parameter model continues to improve in performance after using 1T tokens. 4. **Multi-task Understanding and Generation Capability**: Evaluating the model's performance on various tasks, including common sense reasoning, closed-book question answering, reading comprehension, mathematical reasoning, and code generation. Results show that LLaMA performs excellently on multiple tasks, especially in zero-shot and few-shot settings. 5. **Instruction Fine-tuning**: Further improving the model's performance on tasks like multi-task language understanding (MMLU) through brief instruction fine-tuning. Even without fine-tuning, LLaMA-65B can follow basic instructions, with performance further enhanced after fine-tuning. 6. **Bias, Toxicity, and Misinformation Assessment**: Evaluating whether the content generated by the model contains bias, toxicity, and misinformation to ensure the model's safety in practical applications. Through multiple standard benchmarks, the authors find that LLaMA performs better than other models in some aspects but still needs improvement in others. In summary, this paper aims to provide an efficient, open, and high-performing foundation language model through the development of the LLaMA model series, to advance research and applications in the field of natural language processing.