CHEESE: Distributed Clustering-Based Hybrid Federated Split Learning over Edge Networks

Zhipeng Cheng,Xiaoyu Xia,Minghui Liwang,Xuwei Fan,Yanglong Sun,Xianbin Wang,Lianfen Huang
DOI: https://doi.org/10.1109/tpds.2023.3322755
IF: 5.3
2023-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Implementing either Federated learning (FL) or split learning (SL) over clients with limited computation/communication resources faces challenges on achieving delay-efficient model training. To overcome such challenges, we investigate a novel distributed C lustering-based H ybrid f E d E rated S plit l E arning ( CHEESE ) framework, consolidating distributed resources among clients by device-to-device (D2D) communications, working in an intra-serial inter-parallel manner. In CHEESE , each learning client can form a cluster with its neighboring helping clients via D2D communications to train an FL model collaboratively. Inside each cluster, the model is split into multiple segments via a model splitting and allocation (MSA) strategy, while each cluster member trains one segment. After completing intra-cluster training, a transmission client (TC) is determined from each cluster to upload a complete model to the base station for global model aggregation under allocated bandwidth. Accordingly, an overall training delay cost minimization problem is formulated, involving the following subproblems: client clustering, MSA, TC selection, and bandwidth allocation. Due to its NP-Hardness, the problem is decoupled and solved iteratively. The client clustering problem is first transformed into a distributed clustering game based on potential game theory, where each cluster further investigates the remaining three subproblems to evaluate the utility of each clustering strategy. Specifically, a heuristic algorithm is proposed to solve the MSA problem under a given clustering strategy, while a greedy-based convex optimization approach is introduced to solve the joint TC selection and bandwidth allocation problem. Extensive experiments on practical models and datasets demonstrate that CHEESE can significantly reduce training delay costs.
What problem does this paper attempt to address?