Optimizing software vulnerability detection using RoBERTa and machine learning

DOI: https://doi.org/10.1007/s10515-024-00440-1
IF: 1.677
2024-05-08
Automated Software Engineering
Abstract:Detecting vulnerabilities in source code written in C and C + + is currently essential as attack techniques against systems seek to find, exploit, and attack these vulnerabilities. In this article, to improve the effectiveness of the source code vulnerability detection process, we propose a new approach based on building and representing source code features using natural language processing (NLP) techniques. Our proposal in the article consists of two main stages: (i) building a feature profile of the source code using the RoBERTa model, and (ii) classifying source code based on the feature profile using a supervised machine learning algorithm. Specifically, with our proposal utilizing the pre-trained RoBERTa model, we have successfully built and represented important features of source code as complete vectors, thereby enhancing the effectiveness of prediction and vulnerability detection models. The experimental part of our article compared and evaluated our proposal with other approaches on the FFmpeg + Qume dataset. The experimental results in the article showed that the approach in this study was superior to other research directions on all measures. Therefore, the proposal to use NLP techniques based on the RoBERTa model not only has scientific significance as a new research direction that has not been proposed for application but also has practical significance when all experimental results are highly effective.
computer science, software engineering
What problem does this paper attempt to address?