SHARE: Shaping Data Distribution at Edge for Communication-Efficient Hierarchical Federated Learning

Yongheng Deng,Feng Lyu,Ju Ren,Yongmin Zhang,Yuezhi Zhou,Yaoxue Zhang,Yuanyuan Yang
DOI: https://doi.org/10.1109/icdcs51616.2021.00012
2021-01-01
Abstract:Federated learning (FL) can enable distributed model training over mobile nodes without sharing privacy-sensitive raw data. However, to achieve efficient FL, one significant challenge is the prohibitive communication overhead to commit model updates since frequent cloud model aggregations are usually required to reach a target accuracy, especially when the data distributions at mobile nodes are imbalanced. With pilot experiments, it is verified that frequent cloud model aggregations can be avoided without performance degradation if model aggregations can be conducted at edge. To this end, we shed light on the hierarchical federated learning (HFL) framework, where a subset of distributed nodes are selected as edge aggregators to conduct edge aggregations. Particularly, under the HFL framework, we formulate a communication cost minimization (CCM) problem to minimize the communication cost raised by edge/cloud aggregations with making decisions on edge aggregator selection and distributed node association. Inspired by the insight that the potential of HFL lies in the data distribution at edge aggregators, we propose SHARE, i.e., SHaping dAta distRibution at Edge, to transform and solve the CCM problem. In SHARE, we divide the original problem into two sub-problems to minimize the per-round communication cost and mean Kullback-Leibler divergence of edge aggregator data, and devise two light-weight algorithms to solve them, respectively. Extensive experiments under various settings are carried out to corroborate the efficacy of SHARE.
What problem does this paper attempt to address?