LLMs as On-demand Customizable Service

Souvika Sarkar,Mohammad Fakhruddin Babar,Monowar Hasan,Shubhra Kanti Karmaker
2024-01-30
Abstract:Large Language Models (LLMs) have demonstrated remarkable language understanding and generation capabilities. However, training, deploying, and accessing these models pose notable challenges, including resource-intensive demands, extended training durations, and scalability issues. To address these issues, we introduce a concept of hierarchical, distributed LLM architecture that aims at enhancing the accessibility and deployability of LLMs across heterogeneous computing platforms, including general-purpose computers (e.g., laptops) and IoT-style devices (e.g., embedded systems). By introducing a "layered" approach, the proposed architecture enables on-demand accessibility to LLMs as a customizable service. This approach also ensures optimal trade-offs between the available computational resources and the user's application needs. We envision that the concept of hierarchical LLM will empower extensive, crowd-sourced user bases to harness the capabilities of LLMs, thereby fostering advancements in AI technology in general.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problems that this paper attempts to solve are the significant challenges faced by large - language models (LLMs) during training, deployment, and access. These challenges mainly include resource - intensive requirements, long training cycles, and scalability issues. Specifically: 1. **Resource - intensive requirements**: Large - language models require a large amount of computational resources for training and operation, which are often difficult to achieve on local devices, especially on devices with limited computing power (such as embedded systems). 2. **Long training cycles**: Due to the large number of model parameters, training these models usually takes a long time, which poses an obstacle to the need for frequent model updates or rapid response to new tasks. 3. **Scalability issues**: With the growth of application requirements, how to effectively expand the capabilities of the model to cover a larger knowledge base while maintaining the efficiency and adaptability of the model is an important challenge. To address these problems, the paper proposes a hierarchical, distributed LLM architecture, aiming to enhance the accessibility and deployability of LLMs in the following ways: - **Hierarchical organization of knowledge**: By layering the model according to language, application domain, and sub - domain, each node represents a language model, with the upper - level nodes being general and larger models, and the lower - level nodes being small and domain - specific models. This organization method reduces redundancy and makes each application - specific language model more manageable. - **Enhanced customization**: Allows users to select the appropriate LLM according to specific requirements and further configure it to meet their application requirements, instead of using a single, large - parameter model. - **Efficient resource management**: By allowing users to select a language model that matches the hardware capabilities, it optimizes the allocation of computing power, memory, and battery capacity, prevents excessive resource occupation, and ensures that LLMs can operate effectively on various devices. - **Scalability**: As application requirements grow, users can upgrade their application - specific language models by selecting higher - level models to handle more complex tasks without switching to a completely new model architecture. The paper also presents a case study in the healthcare field, demonstrating the practical application of this architecture in a resource - constrained environment, and illustrating how, through the hierarchical LLM architecture, advanced medical research can be carried out even with limited computational resources.