ExploitGen: Template-augmented Exploit Code Generation Based on CodeBERT

Guang Yang,Yu Zhou,Xiang Chen,Xiangyu Zhang,Tingting Han,Taolue Chen
DOI: https://doi.org/10.1016/j.jss.2022.111577
IF: 3.5
2023-01-01
Journal of Systems and Software
Abstract:Exploit code is widely used for detecting vulnerabilities and implementing defensive measures. However, automatic generation of exploit code for security assessment is a challenging task. In this paper, we propose a novel template-augmented exploit code generation approach ExploitGen based on CodeBERT. Specifically, we first propose a rule-based Template Parser to generate template-augmented natural language descriptions (NL). Both the raw and template-augmented NL sequences are encoded to context vectors by the respective encoders. For better learning semantic information, ExploitGen incorporates a semantic attention layer, which uses the attention mechanism to extract and calculate each layer’s representational information. In addition, ExploitGen computes the interaction information between the template information and the semantics of the raw NL and designs a residual connection to append the template information into the semantics of the raw NL. Comprehensive experiments on two datasets show the effectiveness of ExploitGen after comparison with six state-of-the-art baselines. Apart from the automatic evaluation, we conduct a human study to evaluate the quality of generated code in terms of syntactic and semantic correctness. The results also confirm the effectiveness of ExploitGen.
What problem does this paper attempt to address?