An Empirical Understanding of Code Clone Detection by ChatGPT

PeiJie Wang,Lu Zhu,Qianlu Wang,Ousainou Jaiteh,Chenkai Guo
DOI: https://doi.org/10.1109/dsit60026.2023.00021
2023-01-01
Abstract:As one of the most popular NLP models recently, ChatGPT has achieved remarkable applications in various NLP tasks. Code clone detection serving as a typical prediction task of software engineering has been studied for years. However, there is a lack of systematic evaluation for the ChatGPT in code clone detection. To fill in this gap, we construct a specific dataset covers multiple types of code data and conduct the first empirical study in the clone detection task for ChatGPT on both source code and binary code. Our study found that ChatGPT can successfully detect the code clones and accurately explain the code semantics for most simple cases. However, in complex binary code scenarios, ChatGPT gets limited performance. Our work shows that ChatGPT has difficulty in identifying the semantics of long assembly code. The results and findings of our research support developers to better apply the big intelligent models to the prediction tasks of software engineering field.
What problem does this paper attempt to address?