Privacy-Preserving Federated Learning over Vertically and Horizontally Partitioned Data for Financial Anomaly Detection

Swanand Ravindra Kadhe,Heiko Ludwig,Nathalie Baracaldo,Alan King,Yi Zhou,Keith Houck,Ambrish Rawat,Mark Purcell,Naoise Holohan,Mikio Takeuchi,Ryo Kawahara,Nir Drucker,Hayim Shaul,Eyal Kushnir,Omri Soceanu
2023-10-30
Abstract:The effective detection of evidence of financial anomalies requires collaboration among multiple entities who own a diverse set of data, such as a payment network system (PNS) and its partner banks. Trust among these financial institutions is limited by regulation and competition. Federated learning (FL) enables entities to collaboratively train a model when data is either vertically or horizontally partitioned across the entities. However, in real-world financial anomaly detection scenarios, the data is partitioned both vertically and horizontally and hence it is not possible to use existing FL approaches in a plug-and-play manner. Our novel solution, PV4FAD, combines fully homomorphic encryption (HE), secure multi-party computation (SMPC), differential privacy (DP), and randomization techniques to balance privacy and accuracy during training and to prevent inference threats at model deployment time. Our solution provides input privacy through HE and SMPC, and output privacy against inference time attacks through DP. Specifically, we show that, in the honest-but-curious threat model, banks do not learn any sensitive features about PNS transactions, and the PNS does not learn any information about the banks' dataset but only learns prediction labels. We also develop and analyze a DP mechanism to protect output privacy during inference. Our solution generates high-utility models by significantly reducing the per-bank noise level while satisfying distributed DP. To ensure high accuracy, our approach produces an ensemble model, in particular, a random forest. This enables us to take advantage of the well-known properties of ensembles to reduce variance and increase accuracy. Our solution won second prize in the first phase of the U.S. Privacy Enhancing Technologies (PETs) Prize Challenge.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of data privacy protection in financial anomaly detection. Specifically, it attempts to solve how to achieve privacy-preserving federated learning (FL) when collaborating among multiple financial institutions (such as Payment Network System PNS and its partner banks) with vertically and horizontally partitioned data. These financial institutions have different types of transaction data and account information, but due to regulatory restrictions and competitive relationships, they find it difficult to directly share these sensitive data. However, to improve the ability to detect suspicious transactions, these institutions need to jointly train a model. ### Problems Addressed by the Paper: 1. **Vertically and Horizontally Partitioned Data**: Existing privacy-enhancing technologies usually assume that data is either horizontally partitioned (i.e., each participant has different samples) or vertically partitioned (i.e., each participant has different features of the same set of samples). However, in complex real-world scenarios, the data for financial anomaly detection is often both vertically and horizontally partitioned. For example, the Payment Network System (PNS) has transaction data, while partner banks have account information. This mixed partitioning data structure makes existing FL methods inapplicable directly. 2. **Challenges in Training High-Accuracy Models**: Due to the significantly smaller number of anomalous transactions compared to normal transactions, the training data has a severe class imbalance problem. Additionally, directly using transaction attributes as features may not effectively improve model performance. Therefore, it is necessary to design privacy-preserving schemes that can handle complex features (such as graph structure features) to improve model accuracy. 3. **Privacy Protection During Inference**: Even if privacy is preserved during the training phase, there may still be risks of information leakage during the inference phase. For example, predicting labels can infer the data information of the participants. Therefore, methods need to be designed to prevent such attacks during the inference phase. ### Solution: The paper proposes a solution named PV4FAD (Privacy-Preserving Vertical Ensemble for Financial Anomaly Detection), which combines Fully Homomorphic Encryption (HE), Secure Multi-Party Computation (SMPC), Differential Privacy (DP), and randomization techniques to balance privacy and accuracy and prevent threats during the inference phase. Specific measures include: - **Input Privacy**: Protecting the privacy of input data through Fully Homomorphic Encryption and Secure Multi-Party Computation techniques. - **Output Privacy**: Protecting the privacy of inference outputs through Differential Privacy techniques. - **High-Accuracy Model**: Using Random Forest as the model architecture and leveraging Random Decision Trees (RDT) to reduce variance and improve accuracy. - **Complex Feature Handling**: Allowing PNS and partner banks to locally engineer complex features, such as statistical features of transaction graphs, without exposing sensitive information. - **Efficient Protocol**: Designing an efficient Private Intersection-Sum (PI-Sum) protocol to compute the labels of each leaf node while ensuring privacy. ### Main Contributions: - Proposing a comprehensive privacy-preserving federated learning solution suitable for vertically and horizontally partitioned data. - Designing a Private Intersection-Sum (PI-Sum) protocol based on Fully Homomorphic Encryption for efficiently computing leaf node labels. - Protecting inference phase output privacy through Differential Privacy techniques to prevent information leakage. - Achieving second place in the first phase of the U.S. Privacy Enhancing Technologies (PETs) Prize Challenge. In summary, the paper addresses the key issue of data privacy protection in financial anomaly detection through innovative technical means, providing an effective solution for collaboration among financial institutions.