Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Minghao Shao,Abdul Basit,Ramesh Karri,Muhammad Shafique
DOI: https://doi.org/10.1109/ACCESS.2024.3482107
2024-12-04
Abstract:Large Language Models (LLMs) represent a class of deep learning models adept at understanding natural language and generating coherent responses to various prompts or queries. These models far exceed the complexity of conventional neural networks, often encompassing dozens of neural network layers and containing billions to trillions of parameters. They are typically trained on vast datasets, utilizing architectures based on transformer blocks. Present-day LLMs are multi-functional, capable of performing a range of tasks from text generation and language translation to question answering, as well as code generation and analysis. An advanced subset of these models, known as Multimodal Large Language Models (MLLMs), extends LLM capabilities to process and interpret multiple data modalities, including images, audio, and video. This enhancement empowers MLLMs with capabilities like video editing, image comprehension, and captioning for visual content. This survey provides a comprehensive overview of the recent advancements in LLMs. We begin by tracing the evolution of LLMs and subsequently delve into the advent and nuances of MLLMs. We analyze emerging state-of-the-art MLLMs, exploring their technical features, strengths, and limitations. Additionally, we present a comparative analysis of these models and discuss their challenges, potential limitations, and prospects for future development.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to provide a comprehensive review of the latest progress, technical features, advantages and limitations of large - language models (LLMs) and their multimodal extensions (MLLMs). Specifically, the paper aims to: 1. **Construct a detailed timeline**: Construct a detailed LLMs development timeline from the emergence of GPT and BERT in 2018 to the latest models at the time of this paper's publication. 2. **Extract key technologies and strategies**: Summarize the technologies and strategies that play a key role in the development of LLMs. 3. **Conduct architecture comparison**: Conduct a comprehensive comparison of different types of LLMs (such as auto - encoding models, autoregressive models, encoder - decoder models), and evaluate their performance metrics. 4. **Evaluate impacts and challenges**: Analyze the current impacts of LLMs, explore the challenges they face and the prospects for future development. ### Overview of the Main Content of the Paper #### 1. Introduction The paper first introduces the development background of LLMs, emphasizing the significance of the introduction of the Transformer architecture in 2017 as a turning point for natural language processing (NLP) technology. Subsequently, the paper outlines the main functions of LLMs, including text generation, logical reasoning, machine translation, summary generation and multimodal support, etc. #### 2. Background This part explains in detail the architectures of modern LLMs, especially the Transformer architecture. The Transformer achieves parallel processing through the multi - head self - attention mechanism, overcoming the sequential processing limitations of traditional RNNs and LSTMs. In addition, the position encoding technique is also introduced, enabling the model to understand the word - order information in the sequence. #### 3. Different Types of LLMs - **Auto - encoding models**: Such as BERT, are suitable for learning from the context, but not suitable for sequence generation. - **Autoregressive models**: Such as GPT, are suitable for generation tasks, but lack future context information during the generation process. - **Encoder - decoder models**: Such as the Pangu series, combine the advantages of the previous two models and are suitable for conditional generation tasks. #### 4. Variational Auto - Encoder (VAE) The variational auto - encoder creates a dynamic and highly adaptable latent space through probabilistic encoding, which can not only reconstruct data but also generate new data through sampling. This makes VAE perform well in data generation and augmentation tasks. #### 5. Generative Adversarial Network (GAN) The generative adversarial network consists of a generator and a discriminator, and generates realistic data through adversarial training. GAN has a wide range of applications in image generation, data augmentation and anomaly detection, etc. #### 6. Previous Domain - Specific LLM Reviews This part conducts a comprehensive analysis of existing LLM reviews, organized in chronological order, covering model architectures, datasets, pre - training, fine - tuning, benchmarking, challenges, multilingual models and applications, etc. #### 7. Comparative Analysis of LLMs By comparing various benchmark tests, evaluate the performance of different LLMs in terms of language understanding and cognitive ability. The MMLU (Massive Multitask Language Understanding) benchmark is particularly mentioned, which contains 57 tasks covering a wide range of topics from the humanities to STEM. ### Conclusion Through a detailed timeline, technical comparison and performance evaluation, the paper provides researchers and practitioners with a comprehensive perspective on the latest progress and future directions of LLMs. This not only helps to understand the development history of LLMs, but also provides valuable references for future research and applications.