Automatic code generation based on Abstract Syntax-based encoding. Application on malware detection code generation based on MITRE ATT&CK techniques

Alexandru-Gabriel Sîrbu,Gabriela Czibula
DOI: https://doi.org/10.1016/j.eswa.2024.125821
IF: 8.5
2024-12-02
Expert Systems with Applications
Abstract:In the last decade, the area of code generation based on natural language was one of the most studied machine learning topics. The paper addresses the problem of code generation from natural language, by generating a syntax-error-free generator model, which creates an Syntax-based model, later translated into code, for generating the structure of the code. Two approaches are comparatively investigated for generating the structure of the program. The first approach generates code templates in the form of an Syntax Tree, while the second generates the code in the form of an Syntax Graph, a new introduced concept which reduces the initial redundancy of Syntax Trees and uses it as a new way to generate code. The proposed methodology is tested on two literature data sets and on malware detection code generation based on a real data set containing MITRE ATT&CK techniques. The results outperform the state-of-the-art Syntax Tree approaches by 2.46% and the plain text-based approaches with more than 12.5%, highlighting that the proposed methodology learns better the structural representation than other literature approaches.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?