Abstract:As computing systems become increasingly advanced and as users increasingly engage themselves in technology, security has never been a greater concern. In malware detection, static analysis, the method of analyzing potentially malicious files, has been the prominent approach. This approach, however, quickly falls short as malicious programs become more advanced and adopt the capabilities of obfuscating its binaries to execute the same malicious functions, making static analysis extremely difficult for newer variants. The approach assessed in this paper is a novel dynamic malware analysis method, which may generalize better than static analysis to newer variants. Inspired by recent successes in Natural Language Processing (NLP), widely used document classification techniques were assessed in detecting malware by doing such analysis on system calls, which contain useful information about the operation of a program as requests that the program makes of the kernel. Features considered are extracted from system call traces of benign and malicious programs, and the task to classify these traces is treated as a binary document classification task of system call traces. The system call traces were processed to remove the parameters to only leave the system call function names. The features were grouped into various n-grams and weighted with Term Frequency-Inverse Document Frequency. This paper shows that Linear Support Vector Machines (SVM) optimized by Stochastic Gradient Descent and the traditional Coordinate Descent on the Wolfe Dual form of the SVM are effective in this approach, achieving a highest of 96% accuracy with 95% recall score. Additional contributions include the identification of significant system call sequences that could be avenues for further research.

Malware Clustering Based on SNN Density Using System Calls.

NIDS: Neural Network Based Intrusion Detection System

Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API.

K-Means Clustering Analysis Based On Adaptive Weights For Malicious Code Detection

Malware Classification based on Call Graph Clustering

Malware Analysis Using Machine Learning and Deep Learning Techniques

NtMalDetect: A Machine Learning Approach to Malware Detection Using Native API System Calls

A malware variant clustering method based on fuzzy hash

Research on Features Selection in Malware Clustering

Android Malware Clustering through Malicious Payload Mining

Machine Learning based Malware Detection in Cloud Environment using Clustering Approach

Malware Analysis and Detection Using Machine Learning Algorithms

Online Analytical Model of Massive Malware Based on Feature Clusting

SCMA: Scalable and Collab orative Malw are Analysis using System Call Sequences

Identifying Android Malware with System Call Co‐occurrence Matrices

Risk-Based System-Call Sequence Grouping Method for Malware Intrusion Detection

Android Malware Family Clustering Based on Multiple Features

Android Malware Detection of Calls Tracing with AndroidManifest and API

A New Study of the System Call Sequence Analysis Method

ANNs on Co-occurrence Matrices for Mobile Malware Detection.

Online Clustering of Known and Emerging Malware Families