Malware Classification and Analysis Using Convolutional and Recurrent Neural Network

Yassine Maleh
DOI: https://doi.org/10.4018/978-1-5225-7862-8.ch014
2019-01-01
Abstract:Over the past decade, malware has grown exponentially. Traditional signature-based approaches to detecting malware have proven their limitations against new malware, and categorizing malware samples has become essential to understanding the basics of malware behavior. Recently, antivirus solutions have increasingly started to adopt machine learning approaches. Unfortunately, there are few open source data sets available for the academic community. One of the largest data sets available was published last year in a competition on Kaggle with data provided by Microsoft for the big data innovators gathering. This chapter explores the problem of malware classification. In particular, this chapter proposes an innovative and scalable approach using convolutional neural networks (CNN) and long short-term memory (LSTM) to assign malware to the corresponding family. The proposed method achieved a classification accuracy of 98.73% and an average log loss of 0.0698 on the validation data.
What problem does this paper attempt to address?