Abstract:Large Language Models (LLMs) have demonstrated remarkable language understanding and generation capabilities. However, training, deploying, and accessing these models pose notable challenges, including resource-intensive demands, extended training durations, and scalability issues. To address these issues, we introduce a concept of hierarchical, distributed LLM architecture that aims at enhancing the accessibility and deployability of LLMs across heterogeneous computing platforms, including general-purpose computers (e.g., laptops) and IoT-style devices (e.g., embedded systems). By introducing a "layered" approach, the proposed architecture enables on-demand accessibility to LLMs as a customizable service. This approach also ensures optimal trade-offs between the available computational resources and the user's application needs. We envision that the concept of hierarchical LLM will empower extensive, crowd-sourced user bases to harness the capabilities of LLMs, thereby fostering advancements in AI technology in general.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are the significant challenges faced by large - language models (LLMs) during training, deployment, and access. These challenges mainly include resource - intensive requirements, long training cycles, and scalability issues. Specifically: 1. **Resource - intensive requirements**: Large - language models require a large amount of computational resources for training and operation, which are often difficult to achieve on local devices, especially on devices with limited computing power (such as embedded systems). 2. **Long training cycles**: Due to the large number of model parameters, training these models usually takes a long time, which poses an obstacle to the need for frequent model updates or rapid response to new tasks. 3. **Scalability issues**: With the growth of application requirements, how to effectively expand the capabilities of the model to cover a larger knowledge base while maintaining the efficiency and adaptability of the model is an important challenge. To address these problems, the paper proposes a hierarchical, distributed LLM architecture, aiming to enhance the accessibility and deployability of LLMs in the following ways: - **Hierarchical organization of knowledge**: By layering the model according to language, application domain, and sub - domain, each node represents a language model, with the upper - level nodes being general and larger models, and the lower - level nodes being small and domain - specific models. This organization method reduces redundancy and makes each application - specific language model more manageable. - **Enhanced customization**: Allows users to select the appropriate LLM according to specific requirements and further configure it to meet their application requirements, instead of using a single, large - parameter model. - **Efficient resource management**: By allowing users to select a language model that matches the hardware capabilities, it optimizes the allocation of computing power, memory, and battery capacity, prevents excessive resource occupation, and ensures that LLMs can operate effectively on various devices. - **Scalability**: As application requirements grow, users can upgrade their application - specific language models by selecting higher - level models to handle more complex tasks without switching to a completely new model architecture. The paper also presents a case study in the healthcare field, demonstrating the practical application of this architecture in a resource - constrained environment, and illustrating how, through the hierarchical LLM architecture, advanced medical research can be carried out even with limited computational resources.

LLMs as On-demand Customizable Service

Large Language Models (LLMs): Deployment, Tokenomics and Sustainability

New Solutions on LLM Acceleration, Optimization, and Application

Institutional Platform for Secure Self-Service Large Language Model Exploration

Distributed Training of Large Language Models

LLeMpower: Understanding Disparities in the Control and Access of Large Language Models

ELMS: Elasticized Large Language Models On Mobile Devices

DisLLM: Distributed LLMs for Privacy Assurance in Resource-Constrained Environments

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Large Language Models Humanize Technology

Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment

eFedLLM: Efficient LLM Inference Based on Federated Learning

BlockLLM: Multi-tenant Finer-grained Serving for Large Language Models

An Empirical Study on Usage and Perceptions of LLMs in a Software Engineering Project

Demystifying Platform Requirements for Diverse LLM Inference Use Cases

ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency

Towards a Middleware for Large Language Models

DOLLmC: DevOps for Large Language model Customization

Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach