Abstract:Large Language Models (LLMs) have demonstrated impressive capabilities in autogenerating code based on natural language instructions provided by humans. We observed that in the microservice models of edge computing, the problem of deployment latency optimization can be transformed into an NP-hard mathematical optimization problem. However, in the real world, deployment strategies at the edge often require immediate updates, while human-engineered code tends to be lagging. To bridge this gap, we innovatively integrated LLMs into the decision-making process for microservice deployment. Initially, we constructed a private Retrieval Augmented Generation (RAG) database containing prior knowledge. Subsequently, we employed meticulously designed step-by-step inductive instructions and used the chain of thought (CoT) technique to enable the LLM to learn, reason, reflect, and regenerate. We decomposed the microservice deployment latency optimization problem into a collection of granular sub-problems (described in natural language), progressively providing instructions to the fine-tuned LLM to generate corresponding code blocks. The generated code blocks underwent integration and consistency assessment. Additionally, we prompted the LLM to generate code without the use of the RAG database for comparative analysis. We executed the aforementioned code and comparison algorithm under identical operational environments and simulation parameters, conducting rigorous result analysis. Our fine-tuned model significantly reduced latencies by 22.8% in handling surges in request flows, 37.8% in managing complex microservice types, and 39.5% in processing increased network nodes compared to traditional algorithms. Moreover, our approach demonstrated marked improvements in latency performance over LLMs not utilizing RAG technology and reinforcement learning algorithms reported in other literature. The use of LLMs also highlights the concept of symmetry, as the symmetrical structure of input-output relationships in microservice deployment models aligns with the LLM's inherent ability to process and generate balanced and optimized code. Symmetry in this context allows for more efficient resource allocation and reduces redundant operations, further enhancing the model's effectiveness. We believe that LLMs hold substantial potential in optimizing microservice deployment models.

An Empirical Analysis and Resource Footprint Study of Deploying Large Language Models on Edge Devices

Mobile Edge Intelligence for Large Language Models: A Contemporary Survey

A Review on Edge Large Language Models: Design, Execution, and Applications

On-Device Language Models: A Comprehensive Review

Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches

Pushing Large Language Models to the 6G Edge: Vision, Challenges, and Opportunities

Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Efficient Deployment of Large Language Model Across Cloud-Device Systems

EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models

On-Device LLMs for SMEs: Challenges and Opportunities

Large Language Models (LLMs): Deployment, Tokenomics and Sustainability

Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation

Resource Allocation for Stable LLM Training in Mobile Edge Computing

Generative AI on the Edge: Architecture and Performance Evaluation

Toward Democratized Generative AI in Next-Generation Mobile Edge Networks

Optimizing Microservice Deployment in Edge Computing with Large Language Models: Integrating Retrieval Augmented Generation and Chain of Thought Techniques

Deployment of Large Language Models to Control Mobile Robots at the Edge

Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization

Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly

MELTing point: Mobile Evaluation of Language Transformers