Application programming interface recommendation for smart contract using deep learning from augmented code representation

Jie Cai,Qian Cai,Bin Li,Jiale Zhang,Xiaobing Sun
DOI: https://doi.org/10.1002/smr.2658
2024-03-05
Journal of Software Evolution and Process
Abstract:This paper proposes a learning‐based approach for API recommendation in smart contracts. We propose a code graph named pruned and augmented AST (pa‐AST) with the API sequence to capture the semantic features surrounding recommendation points. Meanwhile, we utilize a GAT‐based model for code feature learning and API recommendation. Application programming interface (API) recommendation plays a crucial role in facilitating smart contract development by providing developers with a ranked list of candidate APIs for specific recommendation points. Deep learning‐based approaches have shown promising results in this field. However, existing approaches mainly rely on token sequences or abstract syntax trees (ASTs) for learning recommendation point‐related features, which may overlook the essential knowledge implied in the relations between or within statements and may include task‐irrelevant components during feature learning. To address these limitations, we propose a novel code graph called pruned and augmented AST (pa‐AST). Our approach enhances the AST by incorporating additional knowledge derived from the control and data flow relations between and within statements in the smart contract code. Through this augmentation, the pa‐AST can better represent the semantic features of the code. Furthermore, we conduct AST pruning to eliminate task‐irrelevant components based on the identified flow relations. This step helps mitigate the interference caused by these irrelevant parts during the model feature learning process. Additionally, we extract the API sequence surrounding the recommendation point to provide supplementary knowledge for the model learning. The experimental results demonstrate our proposed approach achieving an average mean reciprocal rank (MRR) of 68.02%, outperforming the baselines' performance. Furthermore, through ablation experiments, we explore the effectiveness of our proposed code representation approach. The results indicate that combining pa‐AST with the API sequence yields improved performance compared with using them individually. Moreover, our AST augmentation and pruning techniques significantly contribute to the overall results.
computer science, software engineering
What problem does this paper attempt to address?