On the Robustness of Machine Learning Based Malware Detection Algorithms

Weiwei Hu,Ying Tan
DOI: https://doi.org/10.1109/ijcnn.2017.7966021
2017-01-01
Abstract:With the rapid popularity of the Internet, a large amount of new malware is produced every day, while the traditional signature based malware detection algorithm is unable to detect such unseen malware. In recent years, many machine learning based algorithms have been proposed to detect new malware, and several of these algorithms are able to achieve quite good detection performance when supplied with plenty of training data. However, most of these algorithms just focus on how to improve the classification performance, while the robustness is not taken into consideration. This paper performs a detailed analysis on the robustness of four well-known machine learning based malware detection approaches, i.e. the DLL and API feature, the string feature, PE-Miner and the byte level N-Gram feature. We proposed two pretense approaches under which malware is able to pretend to be benign and bypass the detection algorithms. Experimental results show that the performances of these detection algorithms decline greatly under the pretense approaches. The lack of robustness makes these algorithms unable to be used in real world applications. In future works of machine learning based malware detection, researchers have to take the problem of robustness seriously.
What problem does this paper attempt to address?