Implanting Domain Knowledge into Feature Selection for Effective Outlier Detection in Network Traffic Data

Zhongyang Wang,Yijie Wang,Yongjun Wang
DOI: https://doi.org/10.1109/swc50871.2021.00025
2021-01-01
Abstract:Feature selection is vital to outlier detection for network traffic data containing many noise features. Existing methods often perform on pure categorical or numerical data, which cannot directly work on network traffic data containing categorical and numerical features. Meanwhile, these methods do not utilize domain knowledge summarized based on characteristics of existing outliers, leading to unsatisfying performance. This paper proposes a novel outlier detection method based on feature selection in network traffic data, termed ODNTD, which adopts domain knowledge in feature selection. ODNTD includes a three-stage process of decomposition-aggregation-decomposition. In decomposition, categorical and numerical data are separately outlier scored using value frequency and deep autoencoder. In aggregation, the two scores are merged by a dynamic aggregation strategy, and a set of outlier candidates is identified by a thresholding function. The re-decomposition performs coarse-grained and fine-grained selection in respective feature spaces of outlier candidates by machine learning with domain knowledge to obtain feature subsets. ODNTD repeats the stages until an empirical error no longer decreases. Summarily, ODNTD iteratively exchanges information between categorical and numerical data and implants domain knowledge into selection. Experiments show ODNTD averagely improves AUC than nine state-of-the-art competitors by 46.18% and reduces features by 52% on five real network traffic datasets.
What problem does this paper attempt to address?