Generating API tags for tutorial fragments from Stack Overflow

Di Wu,Xiao-Yuan Jing,Hongyu Zhang,Bing Li,Yu Xie,Baowen Xu
DOI: https://doi.org/10.1007/s10664-021-09962-8
IF: 3.762
2021-05-08
Empirical Software Engineering
Abstract:API tutorials are important learning resources as they explain how to use certain APIs in a given programming context. An API tutorial can be split into a number of units. Consecutive units that describe a same topic are often called a <i>tutorial fragment</i>. We consider the API explained by a tutorial fragment as an <i>API tag</i>. Generating API tags for a tutorial fragment can help understand, navigate, and retrieve the fragment. Existing approaches often do not perform well on API tag generation due to high manual effort and low accuracy. Like API tutorials, Stack Overflow (SO) is also an important learning resource that provides the explanations of APIs. Thus, SO posts also contain API tags. Besides, API tags of SO posts are abundant and can be extracted easily. In this paper, we propose a novel approach ATTACK (stands for <b><u>A</u></b> PI <b><u>T</u></b> ag for <b><u>T</u></b> utorial fr<b><u>A</u></b> gments using <b><u>C</u></b> rowd <b><u>K</u></b> nowledge), which can automatically generate API tags for tutorial fragments from SO posts. ATTACK first constructs <span class="mathjax-tex">\(\left \langle Q\&amp;A\ pair, tag\ set \right \rangle \)</span> pairs by extracting API tags of SO posts. Then, it trains a deep neural network with the attention mechanism to learn the semantic relatedness between Q&amp;A pairs and the associated API tags, taking into consideration both textual descriptions and code in a Q&amp;A pair. Finally, the trained model is used to generate API tags for tutorial fragments. We evaluate ATTACK on public Java and Android datasets containing 43,132 <span class="mathjax-tex">\(\left \langle Q\&amp;A\ pair, tag\ set \right \rangle \)</span> pairs. Experimental results show that ATTACK is effective and outperforms the state-of-the-art approaches in terms of F-Measure. Our user study further confirms the effectiveness of ATTACK in generating API tags for tutorial fragments. We also apply ATTACK to document linking and the results confirm the usefulness of API tags generated by ATTACK.
computer science, software engineering
What problem does this paper attempt to address?