Abstract:As computing systems become increasingly advanced and as users increasingly engage themselves in technology, security has never been a greater concern. In malware detection, static analysis, the method of analyzing potentially malicious files, has been the prominent approach. This approach, however, quickly falls short as malicious programs become more advanced and adopt the capabilities of obfuscating its binaries to execute the same malicious functions, making static analysis extremely difficult for newer variants. The approach assessed in this paper is a novel dynamic malware analysis method, which may generalize better than static analysis to newer variants. Inspired by recent successes in Natural Language Processing (NLP), widely used document classification techniques were assessed in detecting malware by doing such analysis on system calls, which contain useful information about the operation of a program as requests that the program makes of the kernel. Features considered are extracted from system call traces of benign and malicious programs, and the task to classify these traces is treated as a binary document classification task of system call traces. The system call traces were processed to remove the parameters to only leave the system call function names. The features were grouped into various n-grams and weighted with Term Frequency-Inverse Document Frequency. This paper shows that Linear Support Vector Machines (SVM) optimized by Stochastic Gradient Descent and the traditional Coordinate Descent on the Wolfe Dual form of the SVM are effective in this approach, achieving a highest of 96% accuracy with 95% recall score. Additional contributions include the identification of significant system call sequences that could be avenues for further research.

A FEATURE SELECTION AND MODELLING METHOD FOR MALICIOUS CODE

LSTM Android Malicious Behavior Analysis Based on Feature Weighting

Malicious Code Detection Method Based on Static Features and Ensemble Learning

A Dynamic and Static Combined Android Malicious Code Detection Model Based on SVM.

K-Means Clustering Analysis Based On Adaptive Weights For Malicious Code Detection

Malicious Code Detection Using LLM

Learning-Based Detection for Malicious Android Application Using Code Vectorization

Malware Analysis Using Machine Learning and Deep Learning Techniques

A Malicious Behavior Analysis Method Based on Program Semantic

An ensemble framework for interpretable malicious code detection

A Malicious Program Behavior Detection Model Based on API Call Sequences

A Novel Approach to Malicious Code Detection Using CNN-BiLSTM and Feature Fusion

A Hybrid Deep Learning Model for Malicious Behavior Detection

Component Similarity Based Methods for Automatic Analysis of Malicious Executables

NtMalDetect: A Machine Learning Approach to Malware Detection Using Native API System Calls

Dynamic Emulation Based Modeling and Detection of Polymorphic Shellcode at the Network Level.

Survey on Malicious Code Intelligent Detection Techniques

An Efficient Malicious Code Detection System Based on Convolutional Neural Networks

AMCAS: an Automatic Malicious Code Analysis System

Feature fusion-based malicious code detection with dual attention mechanism and BiLSTM

Automatic Malicious Code Classification System through Static Analysis Using Machine Learning