Abstract:Advanced Driver Assistance Systems (ADAS) enhance driving safety and convenience by providing auxiliary functions. However, traditional rule-based or learning-based ADAS lack the capability for commonsense-based environmental understanding and multi-sensor data fusion, which leads to limitations in complex dynamic environments. Multimodal large language models (MLLMs) can effectively integrate data from different modalities and possess strong environmental perception and commonsense reasoning abilities, offering more intelligent driver assistance services within Internet of Things (IoT) networks. In this paper, we propose a cloud-edge collaborative ADAS based on MLLMs, utilizing IoT networks by deploying a smaller model, CogVLM2, at the edge and a larger model, ChatGPT-4o, in the cloud to achieve collaborative driver assistance services. Specifically, we first re-annotate the BDD-X dataset and use it to fine-tune CogVLM2 with LoRA, while applying few-shot learning to ChatGPT-4o to enhance their understanding and decision-making capabilities in traffic scenarios. We then formulate service latency, energy consumption, and quality of service (QoS) models for the cloud-edge collaborative ADAS in IoT networks, optimizing the combination of these models. Finally, we design an improved DDPG-based task offloading algorithm by introducing a multi-step reward mechanism and using a diffusion model to generate noise, aiming to determine the optimal execution location (i.e., cloud, edge, or local) for each task. Experimental results show that both CogVLM2 and ChatGPT-4o can achieve basic ADAS functionality. After fine-tuning and few-shot learning, their task success rates were significantly improved. Moreover, compared to other mainstream DRL-based task offloading algorithms, the improved DDPG task offloading algorithm demonstrates better performance in latency, energy consumption, and QoS within IoT networks.

Large Language Models (llms) Inference Offloading and Resource Allocation in Cloud-Edge Networks: an Active Inference Approach

Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach

Efficient and Economic Large Language Model Inference with Attention Offloading

CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration

Hybrid SLM and LLM for Edge-Cloud Collaborative Inference

Efficient Deployment of Large Language Model Across Cloud-Device Systems

UELLM: A Unified and Efficient Approach for LLM Inference Serving

Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance

Cached Model-as-a-Resource: Provisioning Large Language Model Agents for Edge Intelligence in Space-air-ground Integrated Networks

AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality

Mobile Edge Intelligence for Large Language Models: A Contemporary Survey

LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices

Collaborative Inference for Large Models with Task Offloading and Early Exiting

Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

EdgeShard: Efficient LLM Inference via Collaborative Edge Computing

A Cloud-Edge Collaborative Architecture for Multimodal LLMs-Based Advanced Driver Assistance Systems in IoT Networks

Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization

User Association and Resource Allocation in Large Language Model Based Mobile Edge Computing System over 6G Wireless Communications

PerLLM: Personalized Inference Scheduling with Edge-Cloud Collaboration for Diverse LLM Services

A Survey on Efficient Inference for Large Language Models

CoLLM: A Collaborative LLM Inference Framework for Resource-Constrained Devices