A Tax Evasion Detection Method Based on Positive and Unlabeled Learning with Network Embedding Features

Lingyun Mi,Bo Dong,Bin Shi,Qinghua Zheng
DOI: https://doi.org/10.1007/978-3-030-63833-7_12
2020-01-01
Abstract:Tax evasion detection has a crucial role in addressing tax revenue loss. In the real world, an accessed tax dataset only contains a small number of labeled taxpayers who evade tax (positive samples) and a large number of unlabeled taxpayers who either evade tax or do not evade tax. It is difficult to address this issue due to this nontraditional dataset. In addition, the basic features of taxpayers designed according to tax experts' domain knowledge and experience are very limited to determining whether taxpayers evade tax. These limitations motivate the contribution of this work. In this paper, we argue that the tax evasion detection task in the real world should be formalized as a positive unlabeled (PU) learning problem. We propose a novel tax evasion detection method based on PU learning with Network Embedding features (PUNE). PUNE effectively detects tax evasion based on basic features and transaction network features that are extracted by a network embedding algorithm. Moreover, PUNE can work well even under label noise. To evaluate the effectiveness of PUNE, we conduct experimental tests on a real-world tax dataset. The results demonstrate that PUNE can significantly improve the performance of tax evasion detection.
What problem does this paper attempt to address?