The Power of Words: Generating PowerShell Attacks from Natural Language

Pietro Liguori,Christian Marescalco,Roberto Natella,Vittorio Orbinato,Luciano Pianese
2024-04-19
Abstract:As the Windows OS stands out as one of the most targeted systems, the PowerShell language has become a key tool for malicious actors and cybersecurity professionals (e.g., for penetration testing). This work explores an uncharted domain in AI code generation by automatically generating offensive PowerShell code from natural language descriptions using Neural Machine Translation (NMT). For training and evaluation purposes, we propose two novel datasets with PowerShell code samples, one with manually curated descriptions in natural language and another code-only dataset for reinforcing the training. We present an extensive evaluation of state-of-the-art NMT models and analyze the generated code both statically and dynamically. Results indicate that tuning NMT using our dataset is effective at generating offensive PowerShell code. Comparative analysis against the most widely used LLM service ChatGPT reveals the specialized strengths of our fine-tuned models.
Cryptography and Security,Software Engineering
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: **How to use the Neural Machine Translation (NMT) model to automatically generate offensive PowerShell code from natural - language descriptions**. Specifically, the research aims to evaluate and improve the ability of the NMT model in generating offensive PowerShell code for security applications. ### Detailed problem decomposition: 1. **Generation ability evaluation**: - Researchers want to know whether existing NMT models can generate valid PowerShell code without domain - specific fine - tuning. - Can the ability of these models to generate syntactically correct and semantically relevant PowerShell code be significantly improved by fine - tuning them? 2. **Dataset construction**: - Due to the lack of a dataset suitable for generating offensive PowerShell code, researchers need to construct a new dataset to train and evaluate these models. - The dataset includes manually - annotated PowerShell code samples and their corresponding natural - language descriptions, as well as an unannotated dataset containing general - purpose PowerShell code. 3. **Model selection and training strategy**: - Researchers selected several state - of - the - art NMT models (such as CodeT5+, CodeGPT, and CodeGen) and explored the impact of different pre - training and fine - tuning strategies on model performance. - In the pre - training phase, a large amount of general - purpose PowerShell code is used, and in the fine - tuning phase, security - related PowerShell code with natural - language descriptions is used. 4. **Code quality evaluation**: - To evaluate the quality of the generated PowerShell code, researchers adopted multiple static and dynamic analysis methods, including syntactic accuracy, execution behavior, etc. - Automated output similarity metrics (such as BLEU, edit distance, METEOR, and ROUGE - L) are also used to quantify the similarity between the generated code and the real code. 5. **Comparative analysis**: - Compare the privately fine - tuned model with widely - used large - language models (such as ChatGPT) to evaluate the advantages of models specifically for the PowerShell code - generation task. ### Summary: The main objective of this paper is to explore and verify the potential of NMT models in automatically generating offensive PowerShell code from natural - language descriptions. By constructing appropriate datasets, selecting suitable models and training strategies, and conducting detailed evaluations, researchers hope to provide a solid foundation for further research in this area.