Abstract:The recognition of Application Programming Interface (API) mentions in the software-related texts is a prerequisite task for extracting API-related knowledge. Previous studies have demonstrated the superiority of deep learning-based methods in accomplishing this task. However, such techniques still meet their bottlenecks due to their inability to effectively handle the following three challenges: (1) differentiating APIs from common words; (2) identifying APIs in morphological variants of the standard APIs; and (3) the lack of high-quality labeled data for training. To overcome these challenges, this paper proposes a context-aware API recognition method named CAREER. This approach utilizes two key components, namely Bidirectional Encoder Representations from Transformers (BERT) and Bi-directional Long Short-Term Memory (BiLSTM), to extract context information at both the word-level and sequence-level. This strategic combination empowers the method to dynamically capture both syntactic and semantic information, effectively addressing the first challenge. To tackle the second challenge, CAREER introduces a character-level BiLSTM component, enriched with an attention mechanism. This enables the model to grasp character-level global context information, thereby enhancing the recognition of morphological attributes within API mentions. Furthermore, to address the third challenge, the paper introduces three data augmentation techniques aimed at generating new data samples. Accompanying these techniques is a novel sample selection algorithm designed to screen out high-quality instances. This dual-pronged approach effectively mitigates the requirement for data labeling. Experiments demonstrate that CAREER significantly improves F1-score by 11.0% compared with state-of-the-art methods. We also construct specific datasets to assess CAREER's capacity to tackle the aforementioned challenges. Results confirm that (1) CAREER significantly outperforms baseline methods in addressing the first and second challenges, and (2) with the aid of data augmentation techniques and sample selection algorithms, high-quality samples can be generated to improve the performance, and alleviate the third challenge.

Clean and Learn: Improving Robustness to Spurious Solutions in API Question Answering

APIReal: an API Recognition and Linking Approach for Online Developer Forums

Coarse-to-Careful: Seeking Semantic-related Knowledge for Open-domain Commonsense Question Answering

API-misuse detection driven by fine-grained API-constraint knowledge graph

FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering

Learning to Answer Multilingual and Code-Mixed Questions

Answering medical questions in Chinese using automatically mined knowledge and deep neural networks: an end-to-end solution

Knowledge Distillation for Improved Accuracy in Spoken Question Answering

Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering

Improving the Consistency of Semantic Parsing in KBQA Through Knowledge Distillation

PEDANTS: Cheap but Effective and Interpretable Answer Equivalence

One Stone, Four Birds: A Comprehensive Solution for QA System Using Supervised Contrastive Learning

CAREER: Context-Aware API Recognition with Data Augmentation for API Knowledge Extraction

Pop Quiz! Do Pre-trained Code Models Possess Knowledge of Correct API Names?

Making Neural QA as Simple as Possible but not Simpler

XAIQA: Explainer-Based Data Augmentation for Extractive Question Answering

Denoise while Aggregating: Collaborative Learning in Open-Domain Question Answering

Bridging the Language Gap: Knowledge Injected Multilingual Question Answering

A Roadmap Towards Explainable Question Answering A Solution for Information Pollution

A Question Answering Based Pipeline for Comprehensive Chinese EHR Information Extraction