SUOD: A Scalable Unsupervised Outlier Detection Framework.
Yue Zhao,Xiyang Hu,Cheng,Cong Wang,Changlin Wan,Wen Wang,Jianing Yang,Haoping Bai,Zheng Li,Cao Xiao,Yunlong Wang,Zhi Qiao,Jimeng Sun,Leman Akoglu
2020-01-01
Abstract:Outlier detection (OD) is a key machine learning (ML) task for identifyingabnormal objects from general samples with numerous high-stake applicationsincluding fraud detection and intrusion detection. Due to the lack of groundtruth labels, practitioners often have to build a large number of unsupervised,heterogeneous models (i.e., different algorithms with varying hyperparameters)for further combination and analysis, rather than relying on a single model.How to accelerate the training and scoring on new-coming samples byoutlyingness (referred as prediction throughout the paper) with a large numberof unsupervised, heterogeneous OD models? In this study, we propose a modularacceleration system, called SUOD, to address it. The proposed system focuses onthree complementary acceleration aspects (data reduction for high-dimensionaldata, approximation for costly models, and taskload imbalance optimization fordistributed environment), while maintaining performance accuracy. Extensiveexperiments on more than 20 benchmark datasets demonstrate SUOD's effectivenessin heterogeneous OD acceleration, along with a real-world deployment case onfraudulent claim analysis at IQVIA, a leading healthcare firm. We open-sourceSUOD for reproducibility and accessibility.