Distributional constraint discovery for intelligent auditing
Wentao Hu,Dawei Jiang,Sai Wu,Ke Chen,Gang Chen
DOI: https://doi.org/10.1007/s10115-023-01929-z
IF: 2.7
2023-08-09
Knowledge and Information Systems
Abstract:Constraint discovery in relational databases aims to find constraints that express dependency relationships among a set of attributes and has witnessed remarkable success in the applications of data cleaning, detecting data errors, and enhancing police and security operations. In this paper, we propose a new type of constraint, called distributional constraints (DCs), which leverages the attribute value distribution feature for intelligent auditing and security analysis. The constraint, which specifies the range of attribute values that most data follow, enables financial auditors, law enforcement, and security analysts to identify data with anomalous distributions and explain the reasons for such data anomalies. In the context of police and security applications, distributional constraints can help detect potential criminal activities, fraud, and other security threats by identifying unusual patterns in data. To efficiently discover distributional constraints, we propose an inference system to find the minimum coverage of a set of DCs. The efficient optimization technique BitVector indexing is also proposed to further speed up the distributional constraint discovery. We conduct experiments on 12 real datasets such as medical bills and credit card statements to validate the efficiency and effectiveness of our solution. We show the performance of the discovery DCs and the effectiveness of using DCs for detecting abnormal data in different audit datasets.
computer science, information systems, artificial intelligence