BFLN: A Blockchain-based Federated Learning Model for Non-IID Data

Yang Li,Chunhe Xia,Dongchi Huang,Xiaojian Li,Tianbo Wang
2024-07-10
Abstract:As the application of federated learning becomes increasingly widespread, the issue of imbalanced training data distribution has emerged as a significant challenge. Federated learning utilizes local data stored on different training clients for model training, rather than centralizing data on a server, thereby greatly enhancing the privacy and security of training data. However, the distribution of training data across different clients may be imbalanced, with different categories of data potentially residing on different clients. This presents a challenge to traditional federated learning, which assumes data distribution is independent and identically distributed (IID). This paper proposes a Blockchain-based Federated Learning Model for Non-IID Data (BFLN), which combines federated learning with blockchain technology. By introducing a new aggregation method and incentive algorithm, BFLN enhances the model performance of federated learning on non-IID data. Experiments on public datasets demonstrate that, compared to other state-of-the-art models, BFLN improves training accuracy and provides a sustainable incentive mechanism for personalized federated learning.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problems that this paper attempts to solve are the challenges brought by the unbalanced training data distribution (Non - IID Data Distribution) in federated learning (Federated Learning, FL) and the lack of effective incentive mechanisms in personalized federated learning. Specifically: 1. **Non - IID (Non - Independent and Identically Distributed) Problem**: - In traditional federated learning, it is assumed that the data distributions of all clients are independent and identically distributed (IID). However, in practical applications, the data distributions of different clients are often inconsistent, that is, non - independent and identically distributed (Non - IID). In this case, using a single global model for aggregation may lead to poor performance because the data characteristics of different clients vary greatly. - For example, some classes may only exist on specific clients, while other clients do not have data of these classes. This makes it difficult for traditional federated learning methods to adapt to the specific needs of each client. 2. **Lack of Effective Incentive Mechanisms**: - Personalized federated learning (Personalized Federated Learning) aims to provide personalized model updates for each client to adapt to the characteristics of its local data. However, at present, there is a lack of effective incentive mechanisms to encourage clients to actively participate in the training process and ensure effective updates to the global model. - Without appropriate incentives, clients may lack the motivation to participate in model training. Especially when computing and communication resources are not fully compensated, it may lead to passive participation or no participation at all, thus hindering the continuous development of federated learning. To solve the above problems, the paper proposes a blockchain - based federated learning model (Blockchain - based Federated Learning Model for Non - IID Data, BFLN), which contains two key components: - **Prototype - based Aggregation Algorithm (PAA)**: By clustering local models and performing personalized updates according to the clustering results, it effectively solves the non - IID problem. - **Clustering Centroids - based Consensus Algorithm (CCCA)**: It introduces an incentive mechanism based on the number of clustering members, ensures the sustainability of the blockchain packing process, and provides a new incentive scheme for personalized federated learning. Through experimental verification, the BFLN model performs excellently in dealing with non - IID data and provides an effective incentive mechanism for personalized federated learning. ### Formula Summary - The **Pearson correlation coefficient** is used to measure the linear similarity between different prototype vectors: \[ S_{V_e^\chi(x) | V_e^\delta(x)}=\frac{\text{cov}(V_e^\chi(x), V_e^\delta(x))}{\sigma_{V_e^\chi(x)} \sigma_{V_e^\delta(x)}}=\frac{E[(V_e^\chi(x)-\mu_{V_e^\chi(x)})(V_e^\delta(x)-\mu_{V_e^\delta(x)})]}{\sigma_{V_e^\chi(x)} \sigma_{V_e^\delta(x)}} \] - The **allocation function** is used to calculate rewards, ensuring that as the number of members in the group increases, the per - capita reward also increases: \[ \Gamma(n_i)=\kappa n_i^\rho, \quad \rho > 1 \] where \[ \kappa=\frac{R}{\sum_{i = 1}^j n_i^\rho} \] The reward of the training client \(T_k\) is: \[ r_k=\frac{\Gamma(n_i)}{n_i} \] These formulas and methods work together to make the BFLN model.