Hybrid FedGraph: An efficient hybrid federated learning algorithm using graph convolutional neural network

Jaeyeon Jang,Diego Klabjan,Veena Mendiratta,Fanfei Meng
2024-04-15
Abstract:Federated learning is an emerging paradigm for decentralized training of machine learning models on distributed clients, without revealing the data to the central server. Most existing works have focused on horizontal or vertical data distributions, where each client possesses different samples with shared features, or each client fully shares only sample indices, respectively. However, the hybrid scheme is much less studied, even though it is much more common in the real world. Therefore, in this paper, we propose a generalized algorithm, FedGraph, that introduces a graph convolutional neural network to capture feature-sharing information while learning features from a subset of clients. We also develop a simple but effective clustering algorithm that aggregates features produced by the deep neural networks of each client while preserving data privacy.
Machine Learning,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The paper attempts to address the problem of how to effectively aggregate feature representations from multiple clients in a Hybrid Federated Learning (HBFL) environment while protecting data privacy, in order to improve predictive performance. Specifically, the paper focuses on how to capture feature-sharing information by introducing Graph Convolutional Network (GCN) in the presence of sample and feature space heterogeneity among clients, and on this basis, build a server model that can collaboratively aggregate client feature representations. This approach aims to overcome the limitations of existing methods that simply aggregate client training model parameters, thereby achieving deeper feature fusion and enhancing the overall predictive capability of the model. The main contributions of the paper include: 1. Proposing the first method focused on learning to aggregate multiple client feature representations in the HBFL scenario while ensuring data privacy. 2. Introducing GCN to aggregate client feature representations, maintaining strong predictive performance even in the case of sparse data. 3. Introducing the new concept of "privacy score" to evaluate the impact of different hidden layer numbers on privacy protection and to find the optimal number of hidden layers. 4. Proposing the Class-conditioned Random Clustering (CRC) algorithm to further enhance collaborative predictive performance while maintaining data privacy. Through these innovations, the paper aims to address the shortcomings of existing federated learning methods in handling sample and feature space heterogeneity, particularly in applications such as medical diagnosis, recommendation systems, and finance, which often face challenges such as uneven data distribution, high communication costs, and strict data privacy requirements.