Credit Card Fraud Detection via Intelligent Sampling and Self-supervised Learning

Chiao-Ting Chen,Chi Lee,Szu-Hao Huang,Wen-Chih Peng
DOI: https://doi.org/10.1145/3641283
IF: 5
2024-01-23
ACM Transactions on Intelligent Systems and Technology
Abstract:The significant increase in credit card transactions can be attributed to the rapid growth of online shopping and digital payments, particularly during the COVID-19 pandemic. To safeguard cardholders, e-commerce companies, and financial institutions, the implementation of an effective and real-time fraud detection method using modern artificial intelligence techniques is imperative. However, the development of machine-learning-based approaches for fraud detection faces challenges such as inadequate transaction representation, noise labels, and data imbalance. Additionally, practical considerations like dynamic thresholds, concept drift, and verification latency need to be appropriately addressed. In this study, we designed a fraud detection method that accurately extracts a series of spatial and temporal representative features to precisely describe credit card transactions. Furthermore, several auxiliary self-supervised objectives were developed to model cardholders’ behavior sequences. By employing intelligent sampling strategies, potential noise labels were eliminated, thereby reducing the level of data imbalance. The developed method encompasses various innovative functions that cater to practical usage requirements. We applied this method to two real-world datasets, and the results indicated a higher F1 score compared to the most commonly used online fraud detection methods.
computer science, information systems, artificial intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address several key issues in credit card fraud detection, specifically including: 1. **Insufficient Transaction Representation**: Existing fraud detection methods lack sufficient information when describing user behavior sequences and store consumption records, leading to models failing to fully capture the complex behavior patterns of users. 2. **Noisy Labels**: In practical applications, fraudulent transactions are often reported after a period of time, and small transactions or regular consumption behaviors may hide fraudulent activities. These noisy labels can significantly reduce the performance of detection systems. 3. **Data Imbalance**: In the real world, the number of fraudulent transactions is much smaller than that of normal transactions. This imbalance in data distribution can cause models to be biased towards predicting normal transactions while ignoring fraudulent ones. 4. **Dynamic Thresholds and Concept Drift**: Fraud detection systems need to handle practical issues such as dynamic thresholds (i.e., the number of investigation calls that can be made within a limited time) and concept drift (i.e., the rapid changes in user consumption habits and fraudster strategies) to improve the accuracy and practicality of the system. To address these issues, the authors propose a credit card fraud detection method based on intelligent sampling and self-supervised learning. This method improves existing techniques in the following ways: - **Feature Extraction**: Developed a representative feature extraction method based on financial knowledge, which can more accurately describe various credit card fraud detection scenarios. - **Intelligent Sampling**: Employed intelligent sampling strategies to mitigate the issues of noisy labels and data imbalance, enhancing the robustness and performance of the model. - **Self-Supervised Learning**: Introduced multiple self-supervised learning tasks to extract distinctive sequential behavior representations from raw transaction data. - **Practical Problem Solving**: Addressed practical issues such as dynamic thresholds and concept drift, making the method applicable to commercial banks' fraud detection systems. Experimental results show that this method outperforms existing commonly used online fraud detection methods on two real-world datasets.