Towards Efficient Edge Learning for Large Models in Heterogeneous Resource-limited Environments.

Defang Liu,Zhibo Wang,Xiaoyi Pang,Yunan Sun,Jiahui Hu,Peng Sun,Yuke Hu
DOI: https://doi.org/10.1109/BIGCOM61073.2023.00038
2023-01-01
Abstract:Large models like transformer models cannot be directly applied and trained at the network edge since edge devices are heterogeneous and often equipped with limited resources. To collaboratively train a large model at the network edge, existing works assign sub-models of the large model with proper sizes to each edge device, and adopt advanced technologies like knowledge distillation to aggregate their local models to update the large model. However, these methods assign sub-models in a coarse-grained manner and require a certain amount of edge devices to train the large model locally, thus cannot achieve efficient edge learning for large models in practice. In this paper, to make the utmost of edge devices to collaboratively train a large model faster and better, we propose a novel synchronous edge learning framework to achieve efficient large model training across heterogeneous resource-limited edge devices. Specifically, to reduce the waiting time in the training process while ensuring that every edge device can afford the local training, we design a capability-aware local model customization mechanism to granularly tailor personalized model structures for each edge device based on their memory and computing capabilities to make them have similar local training times. Then, to efficiently integrate the local training achievements of heterogeneous edge devices, we propose a layer augmentation-based heterogeneous model aggregation mechanism to align local models and quickly and effectively get the global update for the global large model. Besides, we propose a monument distillation-based model deployment mechanism to deploy the updated global large model at edge devices without the loss of learned knowledge. In this way, each edge device can fully contribute to the global large model training, thus achieving efficient edge learning for the large model. Experimental results demonstrate that our framework has better performance in accuracy and efficiency than the state-of-art edge learning frameworks in the heterogeneous environment.
What problem does this paper attempt to address?