CrowdLearning: A Decentralized Distributed Training Framework Based on Collectives of Trusted AIoT Devices

Ziqi Wang,Sicong Liu,Bin Guo,Zhiwen Yu,Daqing Zhang
DOI: https://doi.org/10.1109/tmc.2024.3427636
IF: 6.075
2024-01-01
IEEE Transactions on Mobile Computing
Abstract:With the rise of artificial intelligence (AIoT), integrating deep neural networks (DNNs) with mobile and embedded devices has become an increasingly growing trend. This integration enhances the ability of mobile IoT devices to collect and analyze data. The previous integration paradigm mainly relied on cloud based training and terminal deployment. However, the real world is dynamic and changing, and previous integration paradigms can lead to untimely model updates, decreased accuracy, and significant communication overhead. Therefore, in order to ensure the accuracy of DNN, the on device training method has gradually become a research hotspot. However, the bottleneck that restricts the efficiency of on device training is usually limited local perception data and computing resources. To address this bottleneck, researchers have proposed federated learning. However, in order to protect data privacy, federated learning cannot share data or model details, which brings new problems such as slow model convergence and reduced convergence accuracy. By contrast, we consider that the environment in which mobile devices operate in the real world is trustworthy (such as personal devices in smart spaces, trusted devices from the same organization/company, etc.). This trustworthiness promotes the sharing of data and model training tasks among mobile devices to improve training efficiency and model accuracy. Therefore, in this article, we propose CrowdLearning, a decentralized distributed training framework based on a collection of trusted AIoT devices. It consists of two collaborative modules, namely the heterogeneous resource sensitive task offloading module that solves the training delay bottleneck and the communication efficient data reallocation module that determines when, how, and to whom data is transmitted, improving the efficiency and effectiveness of DNN training. The experimental results show that group learning outperforms existing federated learning and distributed training baselines on devices in different scenarios. It achieved a 55.8% reduction in training latency and a 67.1% reduction in communication costs.
What problem does this paper attempt to address?