Semantic-enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse
Qing Huang,Dianshu Liao,Zhenchang Xing,Zhengkang Zuo,Changjing Wang,Xin Xia
DOI: https://doi.org/10.1145/3597206
IF: 3.685
2023-05-22
ACM Transactions on Software Engineering and Methodology
Abstract:Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences among similar functions. Current code mining methods can extract syntax and semantic knowledge (known knowledge), but they cannot uncover these unknowns due to a significant gap between the known and the unknown. To address this issue, we formulate knowledge acquisition as a knowledge deduction task and propose an analytic flow that uses function-clone as a bridge to gradually deduce the known knowledge into the problem-solving knowledge that can reveal the unknowns. This flow comprises five methods: clone detection, co-occurrence probability calculation, function usage frequency accumulation, description propagation, and CFG annotation. This provides a systematic and coherent approach to knowledge deduction. We then structure all the knowledge into a semantic-enriched code Knowledge Graph (KG) and integrate this KG into two software engineering tasks: code recommendation and crowd-scaled coding practice checking. As a proof of concept, we apply our approach to 5,140 smart contract files available on Etherscan.io, and confirm high accuracy of our KG construction steps. In our experiments, our code KG effectively improved code recommendation accuracy by 6% to 45%, increased diversity by 61% to 102%, and enhanced NDCG by 1% to 21%. Furthermore, compared to traditional analysis tools and the debugging-with-the-crowd method, our KG improved time efficiency by 30-380 seconds, vulnerability determination accuracy by 20%-33%, and vulnerability fixing accuracy by 24%-40% for novice developers who identified and fixed vulnerable smart contract functions.
computer science, software engineering