Refcount Field Identification for Linux Kernel Based on Deep Learning

TAN Xin,YANG Xi-Yu,CAO Jia-Jun,ZHANG Yuan
DOI: https://doi.org/10.21655/ijsi.1673-7288.00288
2022-01-01
International Journal of Software and Informatics
Abstract:Reference counting (refcount) is a common memory management technique in modern software.Refcount errors can often lead to severe memory errors such as memory leak and use-after-free.Many efforts to harden refcount security rely on known refcount fields as their input.However, due to the complexity of software code, identifying refcount fields in source code is very challenging.Traditional methods of identifying refcount fields are mainly based on code pattern matching and have great limitations such as requiring expert experience to summarize code patterns, which is a laborious job.Besides, the manually summarized patterns do not cover all cases, resulting in low recall rate.To address these problems, this paper proposes to characterize a field based on the field name and the code behavior associated with the field and designs a multimodal deep learning based approach.The paper implements a prototype of the new approach for Linux kernel code.In the evaluation, the precision and recall rate achieved by the prototype system are 96.98% and 93.54%, respectively.In contrast, the traditional identification method based on code pattern matching did not report any refcount fields on the testing set.In addition, we identify 61 refcount fields which are implemented with insecure data types in the latest Linux kernel.Until now, we have reported 21 of them to the Linux community, of which six have been confirmed.
What problem does this paper attempt to address?