IceBerg: Deep Generative Modeling for Constraint Discovery and Anomaly Detection.

Wentao Hu,Dawei Jiang,Sai Wu,Ke Chen,Gang Chen
DOI: https://doi.org/10.1109/ispa-bdcloud-socialcom-sustaincom57177.2022.00017
2022-01-01
Abstract:Automatic constraint discovery from a relational database is beneficial for domain experts in fraud detection and intelligent auditing. Its objective is to discover a set of inherent constraints underlying the database such that tuples violating them are considered anomalous. In this paper, we propose IceBerg as the first system to simultaneously detect anomalous tuples and discover the associated human-readable constraints. The backbone of IceBerg is a novel generative network, namely KD-VAE, that integrates Kernel Density estimation with Variational AutoEncoder. KD-VAE is expected to learn the distributions of normal tuples. We can perform anomalous data detection by calculating the likelihood that the tuple fits the distributions of normal tuples and abnormality interpretation by comparing the detected anomalous tuples with their generated normal counterparts.We empirically compare the proposed method with several state-of-the-art outlier detection methods on 13 real-world datasets. The results show that IceBerg outperforms its competitors in most cases, especially for complex datasets with high-dimensional features.
What problem does this paper attempt to address?