Implementation of Big AI Models for Wireless Networks with Collaborative Edge Computing

Liekang Zeng,Shengyuan Ye,Xu Chen,Yang Yang
2024-04-27
Abstract:Big Artificial Intelligence (AI) models have emerged as a crucial element in various intelligent applications at the edge, such as voice assistants in smart homes and autonomous robotics in smart factories. Training big AI models, e.g., for personalized fine-tuning and continual model refinement, poses significant challenges to edge devices due to the inherent conflict between limited computing resources and intensive workload associated with training. Despite the constraints of on-device training, traditional approaches usually resort to aggregating training data and sending it to a remote cloud for centralized training. Nevertheless, this approach is neither sustainable, which strains long-range backhaul transmission and energy-consuming datacenters, nor safely private, which shares users' raw data with remote infrastructures. To address these challenges, we alternatively observe that prevalent edge environments usually contain a diverse collection of trusted edge devices with untapped idle resources, which can be leveraged for edge training acceleration. Motivated by this, in this article, we propose collaborative edge training, a novel training mechanism that orchestrates a group of trusted edge devices as a resource pool for expedited, sustainable big AI model training at the edge. As an initial step, we present a comprehensive framework for building collaborative edge training systems and analyze in-depth its merits and sustainable scheduling choices following its workflow. To further investigate the impact of its parallelism design, we empirically study a case of four typical parallelisms from the perspective of energy demand with realistic testbeds. Finally, we discuss open challenges for sustainable collaborative edge training to point to future directions of edge-centric big AI model training.
Machine Learning,Artificial Intelligence,Distributed, Parallel, and Cluster Computing,Networking and Internet Architecture
What problem does this paper attempt to address?
The paper aims to address the challenges faced in training large-scale artificial intelligence (Big AI) models on edge devices in wireless networks. Specifically: 1. **Problems with traditional methods**: Traditional centralized cloud training methods can accelerate the training process through remote computing resources, but they rely on remote data centers, which not only increase carbon emissions but also raise privacy concerns. On-device local training can protect privacy, but due to the limited computing resources of edge devices, it is difficult to support large-scale model training. Federated learning (including centralized and decentralized) allows for distributed training on multiple edge devices but still faces the issue of insufficient device resources. 2. **Proposed new mechanism**: The paper proposes a new mechanism called "Collaborative Edge Training," which utilizes the underutilized resources of idle devices within a trusted range in a wireless network environment to form a resource pool, thereby enabling fast, sustainable, and privacy-preserving large-scale AI model training. This method overcomes the limitations of existing solutions in terms of performance, sustainability, and privacy protection. 3. **Research contributions**: - Comparative analysis of the advantages and disadvantages of collaborative edge training versus other common edge model training methods; - Proposing a comprehensive framework that covers the entire lifecycle from participant selection to model training; - Demonstrating through experiments the impact of different forms of parallelism on energy consumption during the training process; - Discussing the key challenges that need to be addressed for the future development of collaborative edge training. In summary, the core goal of the paper is to explore how to optimize the efficiency and security of large-scale AI model training through collaborative methods in edge computing.