Abstract:As businesses increasingly adopt cloud technologies, they also need to be aware of new security challenges, such as server-side script attacks, to ensure the integrity of their systems and data. These scripts can steal data, compromise credentials, and disrupt operations. Unlike executables with standardized formats (e.g., ELF, PE), scripts are plaintext files with diverse syntax, making them harder to detect using traditional methods. As a result, more sophisticated approaches are needed to protect cloud infrastructures from these evolving threats. In this paper, we propose novel feature extraction and deep learning (DL)-based approaches for static script malware detection, targeting server-side threats. We extract features from plain-text code using two techniques: syntactic code highlighting (SCH) and abstract syntax tree (AST) construction. SCH leverages complex regexes to parse syntactic elements of code, such as keywords, variable names, etc. ASTs generate a hierarchical representation of a program's syntactic structure. We then propose a sequential and a graph-based model that exploits these feature representations to detect script malware. We evaluate our approach on more than 400K server-side scripts in Bash, Python and Perl. We use a balanced dataset of 90K scripts for training, validation, and testing, with the remaining from 400K reserved for further analysis. Experiments show that our method achieves a true positive rate (TPR) up to 81% higher than leading signature-based antivirus solutions, while maintaining a low false positive rate (FPR) of 0.17%. Moreover, our approach outperforms various neural network-based detectors, demonstrating its effectiveness in learning code maliciousness for accurate detection of script malware.

PyComm: Malicious commands detection model for python scripts

A malware detection framework based on kolmogorov complexity

Killing Two Birds with One Stone: Malicious Package Detection in NPM and PyPI Using a Single Model of Malicious Behavior Sequence

Machine Learning Approaches to Malicious PowerShell Scripts Detection and Feature Combination Analysis

K-Means Clustering Analysis Based On Adaptive Weights For Malicious Code Detection

A Hybrid Deep Learning Model for Malicious Behavior Detection

Malicious Code Detection Method Based on Static Features and Ensemble Learning

An ensemble framework for interpretable malicious code detection

A Machine Learning-Based Approach For Detecting Malicious PyPI Packages

Malicious Package Detection in NPM and PyPI Using a Single Model of Malicious Behavior Sequence

A Dynamic and Static Combined Android Malicious Code Detection Model Based on SVM.

Component Similarity Based Methods for Automatic Analysis of Malicious Executables

SCORE: Syntactic Code Representations for Static Script Malware Detection

MalWuKong: Towards Fast, Accurate, and Multilingual Detection of Malicious Code Poisoning in OSS Supply Chains

A Benchmark Comparison of Python Malware Detection Approaches

On the Feasibility of Cross-Language Detection of Malicious Packages in npm and PyPI

RoboMal: Malware Detection for Robot Network Systems

Supply Chain Security: Pre-training Model for Python Source Code Vulnerability Detection

A study on malicious software behaviour analysis and detection techniques: Taxonomy, current trends and challenges

Malicious Code Detection Using Machine Learning

MCES: Multi-classifier Ensemble System for Malware Detection and Identification