Snake Learning: A Communication- and Computation-Efficient Distributed Learning Framework for 6G

Xiaoxue Yu,Xingfu Yi,Rongpeng Li,Fei Wang,Chenghui Peng,Zhifeng Zhao,Honggang Zhang
2024-05-06
Abstract:In the evolution towards 6G, integrating Artificial Intelligence (AI) with advanced network infrastructure emerges as a pivotal strategy for enhancing network intelligence and resource utilization. Existing distributed learning frameworks like Federated Learning and Split Learning often struggle with significant challenges in dynamic network environments including high synchronization demands, costly communication overheads, severe computing resource consumption, and data heterogeneity across network nodes. These obstacles hinder the applications of ubiquitous computing capabilities of 6G networks, especially in light of the trend of escalating model parameters and training data volumes. To address these challenges effectively, this paper introduces "Snake Learning", a cost-effective distributed learning framework. Specifically, Snake Learning respects the heterogeneity of inter-node computing capability and local data distribution in 6G networks, and sequentially trains the designated part of model layers on individual nodes. This layer-by-layer serpentine update mechanism contributes to significantly reducing the requirements for storage, memory and communication during the model training phase, and demonstrates superior adaptability and efficiency for both Computer Vision (CV) training and Large Language Model (LLM) fine-tuning tasks across homogeneous and heterogeneous data distributions.
Networking and Internet Architecture,Artificial Intelligence
What problem does this paper attempt to address?
The paper focuses on the challenges faced by artificial intelligence (AI) integration in 6G networks, particularly the high synchronization requirements, large communication overhead, significant computational resource consumption, and data heterogeneity in dynamic network environments for distributed learning frameworks such as federated learning and sharded learning. The paper proposes a new distributed learning framework called "Snake Learning." Snake Learning is characterized by respecting the heterogeneity of computational capabilities and local data distribution among nodes in 6G networks. It trains the model layer by layer, reducing storage, memory, and communication demands, enhancing adaptability and efficiency for computer vision (CV) and large language model (LLM) tasks, regardless of whether the data distribution is homogeneous or heterogeneous. Similar to the classic game "Snake," the framework gradually updates the model across different nodes, allowing independent iterations multiple times based on the computational capabilities of each node without real-time synchronization. Unlike federated learning and sharded learning, Snake Learning is more suitable for resource dynamic availability caused by traffic demands, supporting both client-server (CS) and peer-to-peer (P2P) modes, effectively distributing AI tasks among multiple nodes. Key challenges include communication synchronization dependencies, resource heterogeneity, dynamism, and limitations, as well as data distribution heterogeneity. Existing frameworks such as federated learning and sharded learning have limitations in addressing these challenges, while Snake Learning reduces resource consumption and adapts to resource-constrained nodes through partial model updates and quantization strategies, making it especially suitable for 6G network environments.