Automatic Recognizing Relevant Fragments of APIs Using API References
Di Wu,Yang Feng,Hongyu Zhang,Baowen Xu
DOI: https://doi.org/10.1007/s10515-023-00401-0
IF: 1.677
2024-01-01
Automated Software Engineering
Abstract:API tutorials are crucial resources as they often provide detailed explanations of how to utilize APIs. Typically, an API tutorial is segmented into a number of consecutive fragments.. If a fragment explains API usage, we regard it as a relevant fragment of the API. Recognizing relevant fragments can aid developers in comprehending, learning, and using APIs. Recently, some studies have presented relevant fragments recognition approaches, which mainly focused on using API tutorials or Stack Overflow to train the recognition model. API references are also important API learning resources as they contain abundant API knowledge. Considering the similarity between API tutorials and API references (both provide API knowledge), we believe that using API knowledge from API references could help recognize relevant tutorial fragments of APIs effectively. However, it is non-trivial to leverage API references to build a supervised learning-based recognition model. Two major problems are the lack of labeled API references and the unavailability of engineered features of API references. We propose a supervised learning based approach named RRTR (which stands for Recognize Relevant Tutorial fragments using API References) to address the above problems. For the problem of lacking labeled API references, RRTR designs heuristic rules to automatically collect relevant and irrelevant API references for APIs. Regarding the unavailable engineered features issue, we adopt the pre-trained SBERT model (SBERT stands for Sentence-BERT) to automatically learn semantic features for API references. More specifically, we first automatically generate labeled A P I , A R E pairs (ARE stands for an API reference) via our heuristic rules of API references. We then use SBERT to automatically learn semantic features for the collected pairs and train a supervised learning based recognition model. Finally, we can recognize the relevant tutorial fragments of APIs based on the trained model. To evaluate the effectiveness of RRTR, we collected Java and Android API reference datasets containing a total of 20,680 labeled A P I , A R E pairs. Experimental results demonstrate that RRTR outperforms state-of-the-art approaches in terms of F-Measure on two datasets. In addition, we conducted a user study to investigate the practicality of RRTR and the results further illustrate the effectiveness of RRTR in practice. The proposed RRTR approach can effectively recognize relevant fragments of APIs with API references by solving the problems of lacking labeled API references and engineered features of API references.