A FEATURE SELECTION AND MODELLING METHOD FOR MALICIOUS CODE

Meng Li,Xiaoqi Jia,Rui Wang,Dongdai Lin
DOI: https://doi.org/10.3969/j.issn.1000-386x.2015.08.063
2015-01-01
Abstract:In malicious code analysis and detection, the static analysis techniques are not effective to detect metamorphic/polymorphic ma-licious codes.Aiming at this problem, this paper proposes an approach for extracting the dynamic features of malicious code semantics.The method extracts the dynamic features of malicious codes in virtual environment so as to achieve the purpose of protecting physical machine. The primitive features extracted are then further sifted and processed to obtain API calling sequence information in regard to various code sam-ples.In order to make the features more effective, the traditional n-gram model is improved and the n-gram frequency information and the de-pendencies between APIs are added, the improved n-gram model is built as well.The analysis part in experimental result uses the machine learning methods, the decision trees, k-nearest neighbour, support vector machine and Bayesian networks are employed separately to perform a 10-fold crossover validation on the selected sample features.Experimental results show that this feature selection has best detection effect using decision tree J48, it can effectively detect the malicious codes using confusion and polymorphism technologies.
What problem does this paper attempt to address?