FedGraph: Federated Graph Learning With Intelligent Sampling

Fahao Chen,Peng Li,Toshiaki Miyazaki,Celimuge Wu,Fahao Chen,Peng Li,Toshiaki Miyazaki,Celimuge Wu
DOI: https://doi.org/10.1109/tpds.2021.3125565
IF: 5.3
2022-08-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Federated learning has attracted much research attention due to its privacy protection in distributed machine learning. However, existing work of federated learning mainly focuses on Convolutional Neural Network (CNN), which cannot efficiently handle graph data that are popular in many applications. Graph Convolutional Network (GCN) has been proposed as one of the most promising techniques for graph learning, but its federated setting has been seldom explored. In this article, we propose FedGraph for federated graph learning among multiple computing clients, each of which holds a subgraph. FedGraph provides strong graph learning capability across clients by addressing two unique challenges. First, traditional GCN training needs feature data sharing among clients, leading to risk of privacy leakage. FedGraph solves this issue using a novel cross-client convolution operation. The second challenge is high GCN training overhead incurred by large graph size. We propose an intelligent graph sampling algorithm based on deep reinforcement learning, which can automatically converge to the optimal sampling policies that balance training speed and accuracy. We implement FedGraph based on PyTorch and deploy it on a testbed for performance evaluation. The experimental results of four popular datasets demonstrate that FedGraph significantly outperforms existing work by enabling faster convergence to higher accuracy.
computer science, theory & methods,engineering, electrical & electronic
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to effectively process graph data in the federated learning framework. Specifically, the article points out that the existing federated learning work mainly focuses on Convolutional Neural Networks (CNN), and CNN cannot efficiently process graph data. For the learning of graph data, Graph Convolutional Network (GCN) is a very promising technology, but its application in the federated learning environment has not been fully explored. Therefore, this paper proposes FedGraph, aiming to address the following two unique challenges: 1. **The contradiction between privacy protection and feature sharing**: - Traditional GCN training requires sharing node feature data among clients, which may lead to the risk of privacy leakage. For example, in the medical record scenario, each graph node represents a record, and its features include personal privacy information (such as age, gender, occupation, etc.) and health conditions (such as diseases). These feature data are highly sensitive and cannot be exposed. 2. **High training cost brought by large - scale graph data**: - Large - scale graph data (such as Facebook's social network, which contains more than 3 billion users) will lead to extremely high computational costs. Since the GCN model stacks multiple layers of the same structure, the model size becomes very large and may even exceed the physical memory limit. To solve these problems, FedGraph proposes a cross - client graph convolution operation. It avoids directly sharing node features but shares them after embedding the features into low - dimensional representations, thus preventing the original features from being recovered. In addition, to reduce the GCN training cost, FedGraph designs an intelligent sampling algorithm based on Deep Reinforcement Learning (DRL), which can automatically converge to the optimal sampling strategy and balance training speed and accuracy. In summary, this paper aims to achieve efficient distributed graph data learning through the FedGraph system while ensuring privacy, and significantly improve training speed and accuracy.