Explainable Fraud Detection with Deep Symbolic Classification

Samantha Visbeek,Erman Acar,Floris den Hengst
2023-12-01
Abstract:There is a growing demand for explainable, transparent, and data-driven models within the domain of fraud detection. Decisions made by fraud detection models need to be explainable in the event of a customer dispute. Additionally, the decision-making process in the model must be transparent to win the trust of regulators and business stakeholders. At the same time, fraud detection solutions can benefit from data due to the noisy, dynamic nature of fraud and the availability of large historical data sets. Finally, fraud detection is notorious for its class imbalance: there are typically several orders of magnitude more legitimate transactions than fraudulent ones. In this paper, we present Deep Symbolic Classification (DSC), an extension of the Deep Symbolic Regression framework to classification problems. DSC casts classification as a search problem in the space of all analytic functions composed of a vocabulary of variables, constants, and operations and optimizes for an arbitrary evaluation metric directly. The search is guided by a deep neural network trained with reinforcement learning. Because the functions are mathematical expressions that are in closed-form and concise, the model is inherently explainable both at the level of a single classification decision and the model's decision process. Furthermore, the class imbalance problem is successfully addressed by optimizing for metrics that are robust to class imbalance such as the F1 score. This eliminates the need for oversampling and undersampling techniques that plague traditional approaches. Finally, the model allows to explicitly balance between the prediction accuracy and the explainability. An evaluation on the PaySim data set demonstrates competitive predictive performance with state-of-the-art models, while surpassing them in terms of explainability. This establishes DSC as a promising model for fraud detection systems.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the lack of transparency and interpretability of existing fraud - detection models, especially when it is necessary to be able to explain model decisions in the face of customer disputes. In addition, fraud - detection data are usually highly imbalanced (that is, the number of legitimate transactions is far greater than that of fraudulent transactions), which makes it difficult for traditional methods to handle effectively. Specifically, this paper aims to solve the following problems: 1. **Model Transparency and Interpretability**: - Existing deep - learning models are usually "black - box" models, and it is difficult to explain their decision - making processes. - In fraud detection, model decisions need to be explained at the legal and business levels in order to win the trust of regulatory agencies, analysts, and business - related parties. 2. **Class Imbalance Problem**: - Datasets in fraud detection usually have a serious class - imbalance problem, that is, the number of legitimate transactions far exceeds the number of fraudulent transactions. - Traditional over - sampling or under - sampling techniques may introduce biases and affect model performance. 3. **Trade - off between Predictive Performance and Interpretability**: - In practical applications, financial institutions such as banks need to ensure the interpretability of the model while guaranteeing predictive performance, so as to be able to reasonably explain their decisions. To solve these problems, the paper proposes **Deep Symbolic Classification (DSC)**, an extended method based on the deep symbolic regression framework, specifically designed for classification problems. The main features of DSC include: - **Intrinsic Interpretability**: The generated mathematical expressions are in closed form and easy to understand, thus ensuring the transparency and interpretability of the model. - **Handling Class Imbalance**: By optimizing evaluation metrics that are robust to class imbalance, such as the F1 score, it avoids the problems of traditional over - sampling or under - sampling. - **Explicit Trade - off between Predictive Performance and Interpretability**: It allows users to adjust the balance between the predictive performance and interpretability of the model according to their needs. Through these improvements, DSC significantly improves the interpretability of the model while maintaining high predictive performance, making it a promising model for fraud - detection systems.