An unknown malware detection scheme based on the features of graph
Zongqu Zhao,Junfeng Wang,Chonggang Wang
DOI: https://doi.org/10.1002/sec.524
IF: 1.968
2012-02-28
Security and Communication Networks
Abstract:The traditional malware detection schemes based on specific signature give an unsatisfactory performance as disposing the previously unknown malware, so the general features of binary files should be explored to solve this problem. Recently, classification algorithms were employed successfully to choose the features in unknown malicious code, and most of the works use byte or operation code sequence n‐gram representation of the executables. However, these n‐gram representations are heavily dependent on the training data. In this paper, we present a graph‐based method to detect unknown malware. The function call graph of an executable, which includes the functions and the call relations between them, is selected as the representation of the executable in this method. The features are defined according to both the statistical information and the topology of the function call graph. They are extracted and processed through machine learning to classify unknown Portable Executable files. For the sake of fixed sum of the features, the graph‐based method can avoid so many features found in other methods. In our experiments, three types of malware datasets were tested, and as high as 96.8% accuracy can be achieved. Furthermore, it can achieve 92.1% accuracy when only 5% of the dataset is served as training set. Copyright © 2012 John Wiley & Sons, Ltd. To detect the unknown malware, the features of graph‐based method are predefined from function call graph of software, and then the rules are built with these features to classify an executable as the benign or the malware. The quantity and the contents of these features are independent of testing dataset, and this peculiarity enables our experiment results to nearly reflect the situation of unknown malware detection. The design can provide a high malware detection accuracy by using a small set of training set.
computer science, information systems,telecommunications