Development of an algorithm for code clone detection in source code based on abstract syntax tree

Yevhenii Kubiuk,Gennadiy Kyselov
DOI: https://doi.org/10.15587/2706-5448.2023.286472
2023-08-29
Technology audit and production reserves
Abstract:The object of research of this work is the algorithm for searching for duplicates in the program code based on the Abstract Syntaxes Tree (AST). The main tasks solved within the framework of this study are the detection of duplicate code and the search for vulnerabilities in the program code. The obtained results showed that the proposed algorithm is resistant to type 1 and 2 clones, which means its effectiveness in detecting similar code fragments with identical or variant text. However, for type 3 and 4 clones, the algorithm may show less efficiency due to the change in the AST structure for these types of clones. Experimental studies of the proposed algorithm showed that the algorithm can detect matches between unrelated files due to the presence of typical AST chains present in many programs. This can lead to a certain level of false positives in the detection of duplicates. Testing of the algorithm in the task of finding vulnerabilities showed that: The best recognition is observed for the «SQL injection» vulnerability, but it also has the highest number of false positives. Memory leak and null pointer dereferencing vulnerabilities are detected with equal effectiveness and false positives. «Buffer overflow» has the lowest recognition rate but fewer false positives compared to «SQL injection». The study showed that the use of AST allows for the effective detection of duplicate code and vulnerabilities in the software code. The developed tool can help software developers reduce maintenance efforts, improve code quality, and ensure software product security.
What problem does this paper attempt to address?