Federated learning model for credit card fraud detection with data balancing techniques

Mustafa Abdul Salam,Khaled M. Fouad,Doaa L. Elbably,Salah M. Elsayed
DOI: https://doi.org/10.1007/s00521-023-09410-2
2024-01-20
Neural Computing and Applications
Abstract:Abstract In recent years, credit card transaction fraud has resulted in massive losses for both consumers and banks. Subsequently, both cardholders and banks need a strong fraud detection system to reduce cardholder losses. Credit card fraud detection (CCFD) is an important method of fraud prevention. However, there are many challenges in developing an ideal fraud detection system for banks. First off, due to data security and privacy concerns, various banks and other financial institutions are typically not permitted to exchange their transaction datasets. These issues make traditional systems find it difficult to learn and detect fraud depictions. Therefore, this paper proposes federated learning for CCFD over different frameworks (TensorFlow federated, PyTorch). Second, there is a significant imbalance in credit card transactions across all banks, with a small percentage of fraudulent transactions outweighing the majority of valid ones. In order to demonstrate the urgent need for a comprehensive investigation of class imbalance management techniques to develop a powerful model to identify fraudulent transactions, the dataset must be balanced. In order to address the issue of class imbalance, this study also seeks to give a comparative analysis of several individual and hybrid resampling techniques. In several experimental studies, the effectiveness of various resampling techniques in combination with classification approaches has been compared. In this study, it is found that the hybrid resampling methods perform well for machine learning classification models compared to deep learning classification models. The experimental results show that the best accuracy for the Random Forest (RF); Logistic Regression; K-Nearest Neighbors (KNN); Decision Tree (DT), and Gaussian Naive Bayes (NB) classifiers are 99,99%; 94,61%; 99.96%; 99,98%, and 91,47%, respectively. The comparative results show that the RF outperforms with high performance parameters (accuracy, recall, precision and f score) better than NB; RF; DT and KNN. RF achieve the minimum loss values with all resampling techniques, and the results, when utilizing the proposed models on the entire skewed dataset, achieved preferable outcomes to the unbalanced dataset. Furthermore, the PyTorch framework achieves higher prediction accuracy for the federated learning model than the TensorFlow federated framework but with more computational time.
computer science, artificial intelligence
What problem does this paper attempt to address?
The paper "Federated Learning Model for Credit Card Fraud Detection with Data Balancing Techniques" addresses the significant challenge of credit card fraud detection (CCFD) in the context of modern electronic services and the rapid increase in credit card transactions. The key problems and contributions of the paper can be summarized as follows: ### Problems Addressed: 1. **Data Security and Privacy Concerns**: Banks and financial institutions are typically not allowed to share their transaction datasets due to data security and privacy concerns. This makes it difficult for traditional systems to learn and detect fraud. 2. **Class Imbalance**: There is a significant imbalance in credit card transactions across all banks, with a small percentage of fraudulent transactions far outnumbered by legitimate ones. This imbalance makes it challenging for predictive models to find patterns in the data from the minority (fraudulent) class. ### Contributions: 1. **Federated Learning Approach**: The paper proposes a federated learning approach to enable different banks to collaboratively train a fraud detection model without sharing raw data. This approach allows financial institutions to benefit from a shared global model that has seen more fraud than each bank alone, thereby improving fraud detection accuracy while maintaining data privacy. 2. **Resampling Techniques**: The paper investigates several individual and hybrid resampling techniques to address the class imbalance problem.