Research on Bootstrapping Algorithm for Health Insurance Data Fraud Detection Based on Decision Tree.

Wenyi Yang,Wenhui Hu,Yingjie Liu,Yu Huang,Xueyang Liu,Shikun Zhang
DOI: https://doi.org/10.1109/bigdatasecurityhpscids52275.2021.00021
2021-01-01
Abstract:After years of development, the traditional decision tree algorithm implementations represented by LightGBM has been very mature and widely used in various classification problems. However, when applying LightGBM to detecting fraud in health insurance data, we find that the performance of LightGBM is not ideal due to the large imbalance between the number of fraud examples and normal examples. To solve this problem, we propose a simple and effective LightGBM-based hard example mining algorithm (LHEM) for detecting health insurance fraud. Our motivation is to detect the large number of simple examples and the small number of hard examples in the dataset. Selecting these hard examples and discarding simple examples can balance the ratio of fraud and normal examples, thus improving the performance of the original model. We use the health insurance data collected in Zhejiang Jinhua to test our new method, and prove that the performance of our new method is better than LightGBM.
What problem does this paper attempt to address?