VertiBayes: learning Bayesian network parameters from vertically partitioned data with missing values

Florian van Daalen,Lianne Ippel,Andre Dekker,Inigo Bermejo

DOI: https://doi.org/10.1007/s40747-024-01424-0

IF: 6.7

2024-04-25

Complex & Intelligent Systems

Abstract:Abstract Federated learning makes it possible to train a machine learning model on decentralized data. Bayesian networks are widely used probabilistic graphical models. While some research has been published on the federated learning of Bayesian networks, publications on Bayesian networks in a vertically partitioned data setting are limited, with important omissions, such as handling missing data. We propose a novel method called VertiBayes to train Bayesian networks (structure and parameters) on vertically partitioned data, which can handle missing values as well as an arbitrary number of parties. For structure learning we adapted the K2 algorithm with a privacy-preserving scalar product protocol. For parameter learning, we use a two-step approach: first, we learn an intermediate model using maximum likelihood, treating missing values as a special value, then we train a model on synthetic data generated by the intermediate model using the EM algorithm. The privacy guarantees of VertiBayes are equivalent to those provided by the privacy preserving scalar product protocol used. We experimentally show VertiBayes produces models comparable to those learnt using traditional algorithms. Finally, we propose two alternative approaches to estimate the performance of the model using vertically partitioned data and we show in experiments that these give accurate estimates.

computer science, artificial intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges encountered when training Bayesian network parameters in vertically partitioned datasets, especially when dealing with missing values and multiple parties (more than two). Specifically: 1. **Handling missing values**: In practical applications, especially in the federated learning scenario, different parties may have different data collection protocols and quality standards, resulting in missing values in the data. Existing methods cannot effectively handle this situation, while VertiBayes solves this problem by introducing a new two - step method. First, an intermediate model is trained using the maximum - likelihood estimation method, treating the missing values as a special value; then, the EM algorithm is used to train the final model on the synthetic data generated by the intermediate model. 2. **Supporting any number of parties**: Most of the existing federated learning methods can only handle the scenario of two parties, which limits the diversity of data sources. VertiBayes can support any number of parties, thereby making better use of decentralized data resources and improving the representativeness and accuracy of the model. 3. **Privacy protection**: When performing structure learning and parameter learning on vertically partitioned datasets, how to ensure data privacy is an important issue. VertiBayes solves this problem by using the privacy - preserving scalar product protocol to ensure that model training is completed without revealing the original data. In summary, VertiBayes aims to provide a method that can effectively handle missing values in vertically partitioned datasets, support multiple parties, and ensure privacy protection, in order to train a Bayesian network model comparable to the traditional centralized training method.

VertiBayes: learning Bayesian network parameters from vertically partitioned data with missing values

VertiBayes: Learning Bayesian network parameters from vertically partitioned data with missing values

Scalable Vertical Federated Learning via Data Augmentation and Amortized Inference

Improving Federated Relational Data Modeling via Basis Alignment and Weight Penalty

Federated Bayesian Network Ensembles

FedEmb: A Vertical and Hybrid Federated Learning Algorithm using Network And Feature Embedding Aggregation

De-VertiFL: A Solution for Decentralized Vertical Federated Learning

FedV: Privacy-Preserving Federated Learning over Vertically Partitioned Data

Multi-Tier Federated Learning for Vertically Partitioned Data

A Vertical Federated Learning Framework for Horizontally Partitioned Labels

Vertical Federated Learning with Missing Features During Training and Inference

Privacy-Preserving Vertical Federated KNN Feature Imputation Method

Privacy-Preserving Asynchronous Federated Learning Algorithms for Multi-Party Vertically Collaborative Learning

Federated Learning via Variational Bayesian Inference: Personalization, Sparsity and Clustering

A Bayesian Federated Learning Framework With Online Laplace Approximation

Decoupled Vertical Federated Learning for Practical Training on Vertically Partitioned Data

Federated Doubly Stochastic Kernel Learning for Vertically Partitioned Data

A Bayesian Framework for Clustered Federated Learning

Distributed GAN-Based Privacy-Preserving Publication of Vertically-Partitioned Data

Cascade Vertical Federated Learning