Comparative Evaluation of Anomaly Detection Methods for Fraud Detection in Online Credit Card Payments

Hugo Thimonier,Fabrice Popineau,Arpad Rimmel,Bich-Liên Doan,Fabrice Daniel
2023-12-21
Abstract:This study explores the application of anomaly detection (AD) methods in imbalanced learning tasks, focusing on fraud detection using real online credit card payment data. We assess the performance of several recent AD methods and compare their effectiveness against standard supervised learning methods. Offering evidence of distribution shift within our dataset, we analyze its impact on the tested models' performances. Our findings reveal that LightGBM exhibits significantly superior performance across all evaluated metrics but suffers more from distribution shifts than AD methods. Furthermore, our investigation reveals that LightGBM also captures the majority of frauds detected by AD methods. This observation challenges the potential benefits of ensemble methods to combine supervised, and AD approaches to enhance performance. In summary, this research provides practical insights into the utility of these techniques in real-world scenarios, showing LightGBM's superiority in fraud detection while highlighting challenges related to distribution shifts.
Machine Learning,Statistical Finance
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the fraud detection problem in online credit card payments, especially in view of the challenges of highly imbalanced datasets and distribution shift. Specifically: 1. **Highly imbalanced datasets**: In credit card payments, the number of legitimate transactions far exceeds that of fraudulent transactions. This extreme class imbalance makes it difficult for traditional classification algorithms to accurately identify the minority class (i.e., fraudulent transactions). Therefore, it is necessary to develop or apply methods that can effectively handle such imbalanced datasets. 2. **Distribution shift**: Fraudsters are constantly changing their behavior patterns, resulting in changes in the distribution between training data and test data. This causes the performance of models to decline in practical applications because they cannot adapt well to new fraud means. Therefore, it is crucial to study how to make models robust in such a dynamically changing environment. To solve these problems, the author compared the performance of multiple Anomaly Detection (AD) methods with standard supervised learning methods (such as LightGBM) on real - world credit card payment data. By evaluating the performance of these methods under different conditions, the author hopes to provide practical insights and technical selection suggestions for the financial industry, especially in the face of highly imbalanced datasets and distribution shift. In addition, the paper also explored whether combining anomaly detection methods with supervised learning methods can further improve the effectiveness of fraud detection. However, the experimental results show that in most cases, the performance of LightGBM is better than all the tested AD methods, although it is more sensitive to distribution shift. This finding is of great significance for future research and practice, suggesting that we need to pay more attention to how to deal with distribution shift and consider whether it is necessary to use AD methods in combination with supervised learning methods. In summary, the core problem of this paper is to explore and evaluate the applicability and effectiveness of different machine - learning methods in handling credit card fraud detection tasks that are highly imbalanced and have distribution shift.