Causal Genetic Network Anomaly Detection Method for Imbalanced Data and Information Redundancy
ZengRi Zeng,Xuhui Liu,Ming Dai,Jian Zheng,Xiaoheng Deng,Detian Zeng,Jie Chen
DOI: https://doi.org/10.1109/tnsm.2024.3455768
2024-01-01
IEEE Transactions on Network and Service Management
Abstract:The proliferation of internet-connected devices and the complexity of modern network environments have led to the collection of massive and high-dimensional datasets, resulting in substantial information redundancy and sample imbalance issues. These challenges not only hinder the computational efficiency and generalizability of anomaly detection systems but also compromise their ability to detect rare attack types, posing significant security threats. To address these pressing issues, we propose a novel causal genetic network-based anomaly detection method, the CNSGA, which integrates causal inference and the nondominated sorting genetic algorithm-III (NSGA-III). The CNSGA leverages causal reasoning to exclude irrelevant information, focusing solely on the features that are causally related to the outcome labels. Simultaneously, NSGA-III iteratively eliminates redundant information and prioritizes minority samples, thereby enhancing detection performance. To quantitatively assess the improvements achieved, we introduce two indices: a detection balance index and an optimal feature subset index. These indices, along with the causal effect weights, serve as fitness metrics for iterative optimization. The optimized individuals are then selected for subsequent population generation on the basis of nondominated reference point ordering. The experimental results obtained with four real-world network attack datasets demonstrate that the CNSGA significantly outperforms existing methods in terms of overall precision, the imbalance index, and the optimal feature subset index, with maximum increases exceeding 10%, 0.5, and 50%, respectively. Notably, for the CICDDoS2019 dataset, the CNSGA requires only 16-dimensional features to effectively detect more than 70% of all sample types, including 6 more network attack sample types than the other methods detect. The significance and impact of this work encompass the ability to eliminate redundant information, increase detection rates, balance attack detection systems, and ensure stability and generalizability. The proposed CNSGA framework represents a significant step forward in developing efficient and accurate anomaly detection systems capable of defending against a wide range of cyber threats in complex network environments.