A Hierarchical Clustering Strategy of Processing Class Imbalance and Its Application in Fraud Detection.

Youjun Zhang,Guanjun Liu,Lutao Zheng,Chungang Yan
DOI: https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00249
2019-01-01
Abstract:With the Internet and mobile communications becoming an indispensable part of people's daily lives, online transactions have become one of the most common payment methods. However, transaction fraud incidents also occur frequently, causing a larger number of economic losses. Therefore, transaction fraud detection is significant. The methods of machine learning are often used to detect fraudulent transactions from a larger amount of transaction data. However, the class imbalance problem reduces the performance of these methods. There are mainly four factors causing this problem: imbalanced class distribution, sample size, class separability and within-class concept. The existing improvement strategies for class imbalance mainly focus on the first factor but omit other three ones. This paper considers the four factors to propose a comprehensive model called clustering tree. Constructing a clustering tree includes two steps: 1) we first select a clustering algorithm considering the class separability; and then 2) we use this clustering algorithm to construct a tree that can be used to determine if an incoming transaction is illegal. The root node of this tree contains all samples, and we consider both the imbalanced class distribution and the within-class concept when samples are hierarchically divided into sub-nodes during the constructing process. We compare the proposed method with five state-of-the-art ones on two real transaction datasets, and the experimental results show that our method works better.
What problem does this paper attempt to address?