Federated Graph Learning with Structure Proxy Alignment

Xingbo Fu,Zihan Chen,Binchi Zhang,Chen Chen,Jundong Li
2024-08-18
Abstract:Federated Graph Learning (FGL) aims to learn graph learning models over graph data distributed in multiple data owners, which has been applied in various applications such as social recommendation and financial fraud detection. Inherited from generic Federated Learning (FL), FGL similarly has the data heterogeneity issue where the label distribution may vary significantly for distributed graph data across clients. For instance, a client can have the majority of nodes from a class, while another client may have only a few nodes from the same class. This issue results in divergent local objectives and impairs FGL convergence for node-level tasks, especially for node classification. Moreover, FGL also encounters a unique challenge for the node classification task: the nodes from a minority class in a client are more likely to have biased neighboring information, which prevents FGL from learning expressive node embeddings with Graph Neural Networks (GNNs). To grapple with the challenge, we propose FedSpray, a novel FGL framework that learns local class-wise structure proxies in the latent space and aligns them to obtain global structure proxies in the server. Our goal is to obtain the aligned structure proxies that can serve as reliable, unbiased neighboring information for node classification. To achieve this, FedSpray trains a global feature-structure encoder and generates unbiased soft targets with structure proxies to regularize local training of GNN models in a personalized way. We conduct extensive experiments over four datasets, and experiment results validate the superiority of FedSpray compared with other baselines. Our code is available at <a class="link-external link-https" href="https://github.com/xbfu/FedSpray" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The paper aims to address two main issues in Federated Graph Learning (FGL): 1. **Data Heterogeneity Issue**: In federated learning, the data distribution across different clients can vary significantly, leading to inconsistent label distributions. For example, in the context of financial fraud detection, the distribution of customer occupation labels may differ greatly among different banks. Some banks may have a majority of doctors, while others may have a majority of teachers. This inconsistency in data distribution can make it difficult for the federated graph learning model to converge on node-level tasks such as node classification. 2. **Bias in Neighbor Information for Minority Class Nodes**: Minority class nodes within a client are more likely to receive neighbor information from other classes, making it challenging for Graph Neural Networks (GNNs) to learn effective node embeddings. For instance, in a particular bank, the teacher category might be a minority class, with most of its neighbors being doctors. This causes the embedding representation of teacher nodes to be influenced by doctor nodes, thereby affecting classification accuracy. To address the above issues, the authors propose a new framework called FedSpray. The main goal of FedSpray is to learn personalized GNN models for each client and obtain a global structural proxy by aligning structural proxies on the central server. This provides reliable and unbiased neighbor information for node classification tasks. Specifically, FedSpray trains a global feature-structure encoder and generates unbiased soft targets to personalize the regularization of the local GNN model training process. Experimental results show that FedSpray outperforms other baseline methods in terms of performance.