Abstract:The attack that occurred recently involved the utilization of malicious software, commonly referred to as malware, along with advanced techniques such as machine learning, specifically deep learning, code transformation, and polymorphism. This makes it harder for cyber experts to detect malware using traditional analysis methods. In view of the low accuracy and high false positive rate of traditional malware detection methods, this research proposes a fine-tuned deep learning model with a novel dataset that is compared with ANN, Support Vector Machine (SVM), random forest (RF), K-Nearest Neighbourhood (KNN) classifiers. Converting a malware code into an image could allow users to effectively identify the presence of malware, even if the original code is modified by the creator. This is due to the fact that the attributes of images remain unchanged, allowing for reliable identification. So, researchers used deep learning technology to detect malware, like detecting malaria from red blood cells. The deep learning model found that a more detailed analysis of malware data sets, focusing on RGB and greyscale images, is needed. These data sets currently rely on publicly available data, but the accuracy of the traditional model could be a lot higher. It also produces many false positive results. The main goal is to create a new data set and model using malware images to identify and categorize malware using deep learning without relying on existing image detection and transfer learning models. The researcher adjusted different hyper parameters, like the number of neurons, filters, stride, hidden units, layers, learning rate, batch size, activation function, optimizer, and epochs. Identifying and correcting issues in models, improving their clarity, stability, and fairness, as well as debugging and monitoring them, is a challenging task. To eliminate this obstacle, assess the model's behaviour and performance by employing various methods such as logging, profiling, testing, visualization, and model clarity. To overcome the challenges posed by the electricity shutdown, we utilized Google's cloud-based GPU and Python 3.7 language to conduct the experiment and train the model. Kali Linux is an operating system that can automatically encrypt file systems and has a lower chance of crashing the system due to malware in a virtual sandbox. The researcher used techniques like early stopping and cross-validation to prevent over fitting and assess generalization in addition to monitoring and evaluating the model's behaviour and performance. The researcher used different methods like L1 and L2 regularization, dropout, and batch normalization to improve the model's performance and avoid over fitting. The binary portable executable (PE) malware dataset is collected from Kaggle, Malimg, Virusshare, Malvis, MS Big2015, and VX-underground and finally converted to greyscale and RGB images to create a novel dataset to fill the lack of image dataset. The raw dataset was then rescaled to a 128x128 greyscale and RGB (red, green, blue) image and flattened to 1024-byte vector images input into convolution neural network (CNN) interpolation to extract features for malware detection. The newly acquired dataset is utilized as an input for the innovative DL algorithms in order to create a tailor-made model capable of accurately predicting malware. The findings in this customized CNN model by fine-tuning hyper-parameters with the three-channel RGB dataset outperformed a greyscale dataset with an accuracy of 98.7% and error rate of 1.3% in current malware compared with other artificial neural network (ANN), ML algorithms such as Support Vector Machine (SVM), K-Nearest Neighbour (KNN), and also Random Forest (RF) models. The proposed approach is compatible with all operating systems (Windows, Linux, and Mac) and can identify different types of malwares, such as packed, polymorphic, obfuscated, metamorphic, or variations of a malware family. There are some limitations to the shortage of malware image datasets and computational costs for fine-tuning hyper parameters in the whole model. Furthermore, the proposed approach detects malware and groups it into families without using common techniques like disassembly, decompilation, de-obfuscation, or running malicious code in a virtual environment.

Visualized Malware Multi-Classification Framework Using Fine-Tuned CNN-Based Transfer Learning Models

IMCFN: Image-based Malware Classification Using Fine-Tuned Convolutional Neural Network Architecture.

Enhanced Image-Based Malware Classification Using Transformer-Based Convolutional Neural Networks (CNNs)

Explainable Artificial Intelligence-Based IoT Device Malware Detection Mechanism Using Image Visualization and Fine-Tuned CNN-Based Transfer Learning Model

Data augmentation based malware detection using convolutional neural networks

Malware Classification with Improved Convolutional Neural Network Model

IMCNN:Intelligent Malware Classification using Deep Convolution Neural Networks as Transfer learning and ensemble learning in honeypot enabled organizational network

Hybrid Malware Classification Method Using Segmentation-Based Fractal Texture Analysis and Deep Convolution Neural Network Features

Digital Forensics for Malware Classification: An Approach for Binary Code to Pixel Vector Transition

Intelligent malware classification based on network traffic and data augmentation techniques

A Visualized Malware Detection Framework with CNN and Conditional GAN

Classifying Malware Images with Convolutional Neural Network Models

Cyber-Threat Detection System Using a Hybrid Approach of Transfer Learning and Multi-Model Image Representation

Malicious Code Variant Identification Based on Multiscale Feature Fusion CNNs

Image-Based Malware Classification Method with the AlexNet Convolutional Neural Network Model

Malware Detection using Deep Learning (DL)

A Deep Learning Model for Malware Multi-Class Classification based on Colored Malware Images

S-DCNN: stacked deep convolutional neural networks for malware classification

Image-Based Malware Classification Using VGG19 Network and Spatial Convolutional Attention

Malware Classification Based on GAF Visualization of Dynamic API Call Sequences

Imbalanced Malware Images Classification: a CNN based Approach