YOGA: Adaptive Layer-Wise Model Aggregation for Decentralized Federated Learning
Jun Liu,Jianchun Liu,Hongli Xu,Yunming Liao,Zhiyuan Wang,Qianpiao Ma
DOI: https://doi.org/10.1109/tnet.2023.3329005
2024-01-01
Abstract:Traditional Federated Learning (FL) is a promising paradigm that enables massive edge clients to collaboratively train deep neural network (DNN) models without exposing raw data to the parameter server (PS). To avoid the bottleneck on the PS, Decentralized Federated Learning (DFL), which utilizes peer-to-peer (P2P) communication without maintaining a global model, has been proposed. Nevertheless, DFL still faces two critical challenges, i.e., limited communication bandwidth and not independent and identically distributed (non-IID) local data, thus hindering efficient model training. Existing works commonly assume full model aggregation at periodic intervals, i.e., clients periodically collect models from peers. To reduce the communication cost, these methods allow clients to collect model(s) from selected peers, but often result in a significant degradation of model accuracy when dealing with non-IID data. Alternatively, the layer-wise aggregation mechanism has been proposed to alleviate communication overhead under the PS architecture, but its potential in DFL remains rarely explored yet. To this end, we propose an efficient DFL framework YOGA that adaptively performs layer-wise model aggregation and training. Specifically, YOGA first generates the ranking of layers in the model according to the learning speed and layer-wise divergence. Combining with the layer ranking and peers’ status information (i.e., data distribution and communication capability), we propose the max-match (MM) algorithm to generate the proper layer-wise model aggregation policy for the clients. Extensive experiments on DNN models and datasets show that YOGA saves communication cost by about 45% without sacrificing the model performance compared with the baselines, and provides 1.53-3.5 $\times$ speedup on the physical platform.