Towards Efficient Detection of Malicious VBA Macros with LSI

Mamoru Mimura,Taro Ohminami
DOI: https://doi.org/10.1007/978-3-030-26834-3_10
2019-01-01
Abstract:Targeted email attacks are one of main threats for organizations of all sizes and across every field. In targeted email attacks, malicious VBA (Visual Basic for Applications) macros are often contained in the attachment files to exploit the target computers. These malicious VBA macros are obfuscated in several ways to evade detection. Hence, pattern-based detection has a limitation in detecting these new malicious VBA macros. To detect new malicious VBA macros, some methods with machine learning techniques have been proposed. A method extracts words from the source code, and constructs a language model to represent VBA macros for machine learning techniques. This method, however, constructs a language model from all the extracted words. Therefore, this model might contain unnecessary words to classify. To construct an efficient language model, we focus on LSI (Latent Semantic Indexing). LSI is one of the foundational techniques in topic modeling, and calculates similarity of documents. Our method uses LSI to construct an efficient language model, which produces more accuracy and efficiency. To the best of our knowledge, our method is the first method to detect new malicious VBA macros with LSI. Our method extracts words from the source code and converts into feature vectors with some Natural Language Processing techniques. Our method trains a classifier with benign and malicious VBA macros and detects new malicious VBA macros. Several thousands of samples for evaluation are obtained from Virus Total. The experimental result shows that our method can detect new malicious VBA macros more accurately and efficiently. The best F-measure achieves 0.95.
What problem does this paper attempt to address?