Labeling malicious communication samples based on semi-supervised deep neural network

Guolin Shao,Xingshu Chen,Xuemei Zeng,Lina Wang
DOI: https://doi.org/10.23919/jcc.2019.11.015
2019-11-01
China Communications
Abstract:The limited labeled sample data in the field of advanced security threats detection seriously restricts the effective development of research work. Learning the sample labels from the labeled and unlabeled data has received a lot of research attention and various universal labeling methods have been proposed. However, the labeling task of malicious communication samples targeted at advanced threats has to face the two practical challenges: the difficulty of extracting effective features in advance and the complexity of the actual sample types. To address these problems, we proposed a sample labeling method for malicious communication based on semi-supervised deep neural network. This method supports continuous learning and optimization feature representation while labeling sample, and can handle uncertain samples that are outside the concerned sample types. According to the experimental results, our proposed deep neural network can automatically learn effective feature representation, and the validity of features is close to or even higher than that of features which extracted based on expert knowledge. Furthermore, our proposed method can achieve the labeling accuracy of 97.64%~98.50%, which is more accurate than the train-then-detect, kNN and LPA methods in any labeled-sample proportion condition. The problem of insufficient labeled samples in many network attack detecting scenarios, and our proposed work can function as a reference for the sample labeling tasks in the similar real-world scenarios.
telecommunications
What problem does this paper attempt to address?