Large Language Models: A Survey
Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao
2024-02-09
Abstract:Large Language Models (LLMs) have drawn a lot of attention due to their
strong performance on a wide range of natural language tasks, since the release
of ChatGPT in November 2022. LLMs' ability of general-purpose language
understanding and generation is acquired by training billions of model's
parameters on massive amounts of text data, as predicted by scaling laws
\cite{kaplan2020scaling,hoffmann2022training}. The research area of LLMs, while
very recent, is evolving rapidly in many different ways. In this paper, we
review some of the most prominent LLMs, including three popular LLM families
(GPT, LLaMA, PaLM), and discuss their characteristics, contributions and
limitations. We also give an overview of techniques developed to build, and
augment LLMs. We then survey popular datasets prepared for LLM training,
fine-tuning, and evaluation, review widely used LLM evaluation metrics, and
compare the performance of several popular LLMs on a set of representative
benchmarks. Finally, we conclude the paper by discussing open challenges and
future research directions.
Artificial Intelligence,Computation and Language