Graph Federated Learning with Hidden Representation Sharing

Shuang Wu,Mingxuan Zhang,Yuantong Li,Carl Yang,Pan Li
DOI: https://doi.org/10.48550/arXiv.2212.12158
2022-12-23
Abstract:Learning on Graphs (LoG) is widely used in multi-client systems when each client has insufficient local data, and multiple clients have to share their raw data to learn a model of good quality. One scenario is to recommend items to clients with limited historical data and sharing similar preferences with other clients in a social network. On the other hand, due to the increasing demands for the protection of clients' data privacy, Federated Learning (FL) has been widely adopted: FL requires models to be trained in a multi-client system and restricts sharing of raw data among clients. The underlying potential data-sharing conflict between LoG and FL is under-explored and how to benefit from both sides is a promising problem. In this work, we first formulate the Graph Federated Learning (GFL) problem that unifies LoG and FL in multi-client systems and then propose sharing hidden representation instead of the raw data of neighbors to protect data privacy as a solution. To overcome the biased gradient problem in GFL, we provide a gradient estimation method and its convergence analysis under the non-convex objective. In experiments, we evaluate our method in classification tasks on graphs. Our experiment shows a good match between our theory and the practice.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the data - sharing conflict between Learning on Graphs (LoG) and Federated Learning (FL) in multi - client systems. Specifically: 1. **Learning on Graphs (LoG)**: In a multi - client system, the amount of data for each client may be insufficient, so it is necessary to share the original data with other clients to train high - quality models. For example, when recommending items in a social network, a user's historical data is limited, but the performance of the recommendation system can be improved by sharing data from friends with similar preferences. 2. **Federated Learning (FL)**: To protect the data privacy of clients, FL requires training models without sharing the original data. This is especially important in fields such as medicine, mobile devices, and the Internet of Things. However, there is a fundamental conflict between LoG and FL: LoG requires sharing the original data, while FL prohibits such sharing. How to find a balance between the two and make full use of their advantages is a challenging problem. ### Main contributions of the paper 1. **Proposing the Graph Federated Learning (GFL) framework**: Modeling FL clients as nodes in a graph, thereby unifying LoG and FL. This framework aims to solve the data - sharing conflict problem in multi - client systems. 2. **Introducing the hidden representation - sharing technique**: To protect data privacy, the paper proposes to share only hidden representations instead of the original data of neighbors. This can achieve effective model training while protecting privacy. 3. **Providing theoretical analysis**: For non - convex objective functions, the paper provides a gradient estimation method and its convergence analysis. This is the first theoretical analysis of graph - based federated learning. 4. **Proposing the GFL - APPNP algorithm**: This algorithm has been empirically evaluated on multiple classification tasks, including deterministic node classification, random node classification, and supervised classification. The experimental results show that this method not only converges well but also has excellent performance, verifying the consistency between theory and practice. ### Summary By constructing the GFL framework, the paper combines the advantages of LoG and FL to solve the data - sharing conflict problem in multi - client systems. At the same time, by introducing the hidden representation - sharing technique and theoretical analysis, the effectiveness of the model and privacy protection are ensured. The experimental results further verify the superior performance of this method on various tasks.