Abstract:As the Windows OS stands out as one of the most targeted systems, the PowerShell language has become a key tool for malicious actors and cybersecurity professionals (e.g., for penetration testing). This work explores an uncharted domain in AI code generation by automatically generating offensive PowerShell code from natural language descriptions using Neural Machine Translation (NMT). For training and evaluation purposes, we propose two novel datasets with PowerShell code samples, one with manually curated descriptions in natural language and another code-only dataset for reinforcing the training. We present an extensive evaluation of state-of-the-art NMT models and analyze the generated code both statically and dynamically. Results indicate that tuning NMT using our dataset is effective at generating offensive PowerShell code. Comparative analysis against the most widely used LLM service ChatGPT reveals the specialized strengths of our fine-tuned models.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: **How to use the Neural Machine Translation (NMT) model to automatically generate offensive PowerShell code from natural - language descriptions**. Specifically, the research aims to evaluate and improve the ability of the NMT model in generating offensive PowerShell code for security applications. ### Detailed problem decomposition: 1. **Generation ability evaluation**: - Researchers want to know whether existing NMT models can generate valid PowerShell code without domain - specific fine - tuning. - Can the ability of these models to generate syntactically correct and semantically relevant PowerShell code be significantly improved by fine - tuning them? 2. **Dataset construction**: - Due to the lack of a dataset suitable for generating offensive PowerShell code, researchers need to construct a new dataset to train and evaluate these models. - The dataset includes manually - annotated PowerShell code samples and their corresponding natural - language descriptions, as well as an unannotated dataset containing general - purpose PowerShell code. 3. **Model selection and training strategy**: - Researchers selected several state - of - the - art NMT models (such as CodeT5+, CodeGPT, and CodeGen) and explored the impact of different pre - training and fine - tuning strategies on model performance. - In the pre - training phase, a large amount of general - purpose PowerShell code is used, and in the fine - tuning phase, security - related PowerShell code with natural - language descriptions is used. 4. **Code quality evaluation**: - To evaluate the quality of the generated PowerShell code, researchers adopted multiple static and dynamic analysis methods, including syntactic accuracy, execution behavior, etc. - Automated output similarity metrics (such as BLEU, edit distance, METEOR, and ROUGE - L) are also used to quantify the similarity between the generated code and the real code. 5. **Comparative analysis**: - Compare the privately fine - tuned model with widely - used large - language models (such as ChatGPT) to evaluate the advantages of models specifically for the PowerShell code - generation task. ### Summary: The main objective of this paper is to explore and verify the potential of NMT models in automatically generating offensive PowerShell code from natural - language descriptions. By constructing appropriate datasets, selecting suitable models and training strategies, and conducting detailed evaluations, researchers hope to provide a solid foundation for further research in this area.

The Power of Words: Generating PowerShell Attacks from Natural Language

Can we generate shellcodes via natural language? An empirical study

Enhancing AI-based Generation of Software Exploits with Contextual Information

Enhancing Robustness of AI Offensive Code Generators via Data Augmentation

From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads

Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks

AST-Based Deep Learning for Detecting Malicious PowerShell

Detecting Malicious PowerShell Commands using Deep Neural Networks

Generating Natural Language Adversarial Examples on a Large Scale with Generative Models

How Robust Is a Large Pre-trained Language Model for Code Generationƒ A Case on Attacking GPT2

CodeAttack: Code-Based Adversarial Attacks for Pre-trained Programming Language Models

RatGPT: Turning online LLMs into Proxies for Malware Attacks

Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

Turning Generative Models Degenerate: The Power of Data Poisoning Attacks

AMSI-Based Detection of Malicious PowerShell Code Using Contextual Embeddings

What You See Is Not Always What You Get: An Empirical Study of Code Comprehension by Large Language Models

Unleashing offensive artificial intelligence: Automated attack technique code generation

Who evaluates the evaluators? On automatic metrics for assessing AI-based offensive code generators

On the Security Vulnerabilities of Text-to-SQL Models

Offensive AI: Enhancing Directory Brute-forcing Attack with the Use of Language Models