Performance analysis of machine learning and pattern recognition algorithms for Malware classification

Barath Narayanan Narayanan,Ouboti Djaneye-Boundjou,Temesguen M. Kebede
DOI: https://doi.org/10.1109/naecon.2016.7856826
2016-07-01
Abstract:Anti-Malware industry faces the challenge of evaluating huge amount of data for potential malicious contents. This is due to the fact that hackers introduce polymorphism to the existing malicious groups/classes. Effective feature extraction and classification of malware data is necessary to tackle such issues. In this paper, we visualize viruses in an image as they capture minor changes while retaining a global structure. Later, we implement Principal Component Analysis (PCA) method for feature extraction. Based on extracted PCA features, we study the performance of various Artificial Neural Network (ANN) algorithms along with K-Nearest Neighbors $(k\mathbf{NN})$ and Support Vector Machine (SVM) classification techniques for identification of malware data into their respective classes. We use k-fold validation to gauge the effectiveness of our approach. The study makes use of the publicly available Kaggle database provided by Microsoft for the Microsoft Malware Classification Challenge (BIG 2015).
What problem does this paper attempt to address?