A Just-in-time Software Defect Localization Method Based on Code Graph Representation

Huan Zhang,Weihuan Min,Zhao Wei,Li Kuang,Honghao Gao,Huaikou Miao
DOI: https://doi.org/10.1145/3643916.3644428
2024-01-01
Abstract:Traditional software defect localization aims to locate defective files, methods, or code lines based on symptoms such as defect reports. In comparison, Just-In-Time (JIT) software defect localization focuses on identifying defective code lines when a defective code change is initially submitted. It can identify issues at the code line level before the defect becomes apparent, preventing it from adversely affecting the software. Although researchers have proposed various methods for JIT defect localization, existing methods still have the following shortcomings: (1) Most methods rely heavily on tokens from single code lines to calculate naturalness for defect localization, which makes it challenging to effectively distinguish between code lines that have the same content but different labels (defective code lines or non-defective code lines) - termed Duplicate Lines with Different Labels (DLDL). (2) Existing methods represent code in the form of sequences, neglecting the structural information of the code. Therefore, we propose a JIT defect localization method based on code graph representation. First, we construct code linelevel code graphs for code changes to distinguish DLDL explicitly. Next, to extract sequential and structural information from the code, we propose a code graph representation model with contrastive learning to generate graph feature vectors and node scores with rich semantics. Finally, we calculate the naturalness of code lines based on the graph feature vectors and node scores. Using this naturalness, we identify defective code lines. Experimental results show that our JIT defect localization method outperforms the state-of-the-art methods.
What problem does this paper attempt to address?