From Malware Samples to Fractal Images: A New Paradigm for Classification. (Version 2.0, Previous version paper name: Have you ever seen malware?)

Ivan Zelinka,Miloslav Szczypka,Jan Plucar,Nikolay Kuznetsov
2023-06-02
Abstract:To date, a large number of research papers have been written on the classification of malware, its identification, classification into different families and the distinction between malware and goodware. These works have been based on captured malware samples and have attempted to analyse malware and goodware using various techniques, including techniques from the field of artificial intelligence. For example, neural networks have played a significant role in these classification methods. Some of this work also deals with analysing malware using its visualisation. These works usually convert malware samples capturing the structure of malware into image structures, which are then the object of image processing. In this paper, we propose a very unconventional and novel approach to malware visualisation based on dynamic behaviour analysis, with the idea that the images, which are visually very interesting, are then used to classify malware concerning goodware. Our approach opens an extensive topic for future discussion and provides many new directions for research in malware analysis and classification, as discussed in conclusion. The results of the presented experiments are based on a database of 6 589 997 goodware, 827 853 potentially unwanted applications and 4 174 203 malware samples provided by ESET and selected experimental data (images, generating polynomial formulas and software generating images) are available on GitHub for interested readers. Thus, this paper is not a comprehensive compact study that reports the results obtained from comparative experiments but rather attempts to show a new direction in the field of visualisation with possible applications in malware analysis.
Cryptography and Security,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem this paper attempts to address is the proposal of a novel malware visualization method and the differentiation between malware and goodware through this method. Specifically, the authors propose an unconventional and innovative malware visualization method based on dynamic behavior analysis. This method transforms the behavior of malware into visually intriguing fractal images, which are then used for malware classification. ### The main contributions of the paper include: 1. **Innovative Visualization Method**: Unlike traditional static or dynamic analysis methods, this paper proposes a malware visualization method based on fractal geometry. This method not only generates visually appealing images but also captures the complexity of malware behavior. 2. **Generation of Fractal Images**: The paper details how to convert the API call sequences of malware into fractal images. This process involves treating the API call sequences as vertex sequences in a graph and generating fractal images through specific algorithms. 3. **Deep Learning Classification**: The generated fractal images are used in deep learning models to classify malware and goodware. Experimental results show that this method has high potential in terms of classification accuracy. 4. **Aesthetic Value**: Besides the technical contributions, the paper also emphasizes the aesthetic value of fractal images, showcasing the combination of science and computer art, providing an attractive research presentation for the non-professional public. ### Method Overview: 1. **Data Collection**: The data used in the paper is provided by ESET, including a large number of malware, potentially harmful applications, and goodware samples. 2. **Data Processing**: Preprocessing of the data, including removing redundancies and deleting overly short entries. 3. **Fractal Image Generation**: Converting the API call sequences of malware into vertex sequences in a graph, then using Iterated Function Systems (IFS) or Escape Time Algorithm (TEA) to generate fractal images. 4. **Deep Learning Classification**: Using the generated fractal images to train deep learning models to classify malware and goodware. ### Experimental Results: - The paper showcases the generated fractal images, which are not only visually appealing but also effectively reflect the behavioral characteristics of malware. - The deep learning model demonstrates high accuracy in classification tasks, proving the effectiveness of the method. ### Conclusion: The method proposed in the paper provides a new perspective for malware visualization and classification, offering innovation in both technical and aesthetic aspects. Future research can further explore this direction, developing more fractal geometry-based malware analysis methods.