PROUD-MAL: static analysis-based progressive framework for deep unsupervised malware classification of windows portable executable

Syed Khurram Jah Rizvi,Warda Aslam,Muhammad Shahzad,Shahzad Saleem,Muhammad Moazam Fraz
DOI: https://doi.org/10.1007/s40747-021-00560-1
2021-10-12
Abstract:Abstract Enterprises are striving to remain protected against malware-based cyber-attacks on their infrastructure, facilities, networks and systems. Static analysis is an effective approach to detect the malware, i.e., malicious Portable Executable (PE). It performs an in-depth analysis of PE files without executing, which is highly useful to minimize the risk of malicious PE contaminating the system. Yet, instant detection using static analysis has become very difficult due to the exponential rise in volume and variety of malware. The compelling need of early stage detection of malware-based attacks significantly motivates research inclination towards automated malware detection. The recent machine learning aided malware detection approaches using static analysis are mostly supervised. Supervised malware detection using static analysis requires manual labelling and human feedback; therefore, it is less effective in rapidly evolutionary and dynamic threat space. To this end, we propose a progressive deep unsupervised framework with feature attention block for static analysis-based malware detection (PROUD-MAL). The framework is based on cascading blocks of unsupervised clustering and features attention-based deep neural network. The proposed deep neural network embedded with feature attention block is trained on the pseudo labels. To evaluate the proposed unsupervised framework, we collected a real-time malware dataset by deploying low and high interaction honeypots on an enterprise organizational network. Moreover, endpoint security solution is also deployed on an enterprise organizational network to collect malware samples. After post processing and cleaning, the novel dataset consists of 15,457 PE samples comprising 8775 malicious and 6681 benign ones. The proposed PROUD-MAL framework achieved an accuracy of more than 98.09% with better quantitative performance in standard evaluation parameters on collected dataset and outperformed other conventional machine learning algorithms. The implementation and dataset are available at https://bit.ly/35Sne3a .
computer science, artificial intelligence
What problem does this paper attempt to address?