Model Sparsification for Communication-Efficient Multi-Party Learning Via Contrastive Distillation in Image Classification

Kai-Yuan Feng,Maoguo Gong,Ke Pan,Hongyu Zhao,Yue Wu,Kai Sheng
DOI: https://doi.org/10.1109/tetci.2023.3268713
2024-01-01
IEEE Transactions on Emerging Topics in Computational Intelligence
Abstract:Multi-party learning allows all parties to train a joint model under legal and practical constraints without private data transmission. Related research can perform multi-party learning tasks on homogeneous data through deep networks. However, due to the heterogeneity of data from different parties and the limitation of computational resources and costs, traditional approaches may affect the effectiveness of multi-party learning, and cannot provide a personalized network for each party. In addition, to reduce the computational cost and communication bandwidth of local models, there are still challenges in building an adaptive model from the private data of different parties. To address these challenges, we aim to apply a model sparsification strategy in multi-party learning. Model sparsification can not only reduce the computational overhead in local edge devices and the cost of communication and interaction between multi-party models. It can also develop privatized and personalized networks based on the heterogeneity of local data. We use the contrastive distillation method during training to reduce the distance between local and global models. In addition, we maintain the performance of the aggregation model from heterogeneous data. In brief, we developed an adaptive multi-party learning framework based on contrastive distillation, which can significantly reduce the communication cost in the learning process, improve the effectiveness of the aggregation model for local heterogeneous and unbalanced data, and make it easy to deploy in the limited edge devices. Finally, to verify the effectiveness of this framework, we experimented with the Fshion-MNIST, Cifar-10, and Cifar-100 datasets in different scenarios to verify the effectiveness of this framework.
What problem does this paper attempt to address?