BDVFL: Blockchain-based Decentralized Vertical Federated Learning

Shuo Wang,Keke Gai,Jing Yu,Liehuang Zhu
DOI: https://doi.org/10.1109/icdm58522.2023.00072
2023-01-01
Abstract:Vertical Federated Learning (VFL) effectively addresses the issue of data isolation, which makes data mining secure. Most VFL implementations rely on a single server or third party for training, which will be terminated if the server or third party fails. In addition, the model accuracy trained by VFL depends on the quality of the client’s local features; nevertheless, the client’s local feature quality is difficult to verify. There exists a chance that the features owned by the client are irrelevant to the model or the intermediate results submitted by the client are inaccurate, such that the model’s accuracy will be seriously affected. In order to solve the single point failure and model accuracy issues in VFL, this paper first proposes a Blockchain – based Decentralized VFL (BDVFL) training model. With the integration of blockchain and the VFL training process, the nodes within the blockchain are categorized into non-training and training nodes. Our method focuses on the scenario in which all training nodes possess labeled data and actively engage in the training procedure of VFL. To be specific, first, each client utilizes local features and initial models to carry out forward activation and generate intermediate results. Second, we randomly choose a training node and combine it with the intermediate results from all clients to formulate the loss function. Finally, each client updates the local model by using the gradient. To protect the raw features, a blinding factor is utilized for safeguarding the intermediate results submitted by the client, such that the training nodes cannot infer the local features from intermediate results. To mitigate the interference of irrelevant training outcomes from clients on the model’s accuracy, we propose a verifiable aggregation method to assess the validity of the intermediate results submitted by the clients. We have conducted both theoretical and experimental analysis, and the results demonstrate the effectiveness of the proposed method.
What problem does this paper attempt to address?