Clean and Learn: Improving Robustness to Spurious Solutions in API Question Answering

Shuai Yuan,Haozhe Qin,Xiaodong Gu,Beijun Shen
DOI: https://doi.org/10.1142/s0218194022500449
IF: 1.007
2022-01-01
International Journal of Software Engineering and Knowledge Engineering
Abstract:The development of a question answering (QA) system for application programming interface (API) documentation can greatly facilitate developers in API-related tasks. However, when applying deep learning technology, API QA systems suffer from the spurious solution problem. That is, the answer can literally appear in multiple positions (i.e. start-end indices) in the API documentation, though only one of them (called golden solution) correctly solves the question given its context. The other incorrect candidates (called spurious solutions) hinder the neural network model to learn reasonable solutions or correct answers. In this work, we propose Clean-and-Learn, an effective and robust method for API QA over documents. In order to reduce the spuriousness of candidate solutions used for training, we design several scoring functions to rank the candidate occurrences (clean). Only high-quality (top-[Formula: see text]) candidate solutions are involved in training. Then, we perform multi-task learning by weighing the losses computed from the top-k occurrences (learn). We evaluate our method on the constructed APIQASet dataset. The experiment results show that Clean-and-Learn achieves a ROUGE-L score of 75.8 and accuracy of 70.5% in API QA, which significantly outperforms state-of-the-art approaches.
What problem does this paper attempt to address?