Vulnerability detection based on federated learning

Chunyong Zhang,Tianxiang Yu,Bin Liu,Yang Xin
DOI: https://doi.org/10.1016/j.infsof.2023.107371
IF: 3.9
2024-03-01
Information and Software Technology
Abstract:Context: Detecting potential vulnerabilities is a key step in defending against network attacks. However, manual detection is time-consuming and requires expertise. Therefore, vulnerability detection must require automated techniques. Objective: Vulnerability detection methods based on deep learning need to rely on sufficient vulnerable code samples. However, the problem of code islands has not been extensively researched. For example, in the case of multi-party vulnerability data, how to securely combine multi-party data to improve vulnerability detection performance. From the perspectives of data augmentation and data security, we propose a v ulnerability d etection framework b ased on f ederated l earning (VDBFL). VDBFL is a new model for vulnerability code detection that combines multi-party data. Method: Firstly, VDBFL utilizes the code property graph as a code representation. The code property graph contains various semantic dependencies of the code. Secondly, VDBFL utilizes graph neural networks and convolutional neural networks as the code feature extractor. VDBFL utilizes the jump-structured graph attention network to aggregate node information of important neighbors. Finally, VDBFL utilizes horizontal federated learning to train a local vulnerability detection model for the client. Result: In the real world, VDBFL improves F1-Score by 37.4% compared to the vulnerability detection method Reveal. Among the 5401 vulnerability samples, VDBFL detected 11.8 times more vulnerabilities than Reveal. Conclusion: Under different datasets, VDBFL has shown better performance than advanced vulnerability detection methods in multiple metrics. In addition, the federated learning stage of VDBFL can be expanded on top of the feature extraction stage of any vulnerable detection method.
computer science, information systems, software engineering
What problem does this paper attempt to address?