An Extended Knowledge Representation Learning Approach for Context-Based Traceability Link Recovery: Extended Abstract
Guoshuai Zhao,Tong Li,Zhen Yang
DOI: https://doi.org/10.1109/aire51212.2020.00010
2020-01-01
Abstract:Software requirements traceability links have been widely recognized as an essential means for effective system evolution when requirements change. However, creating and maintaining traceability links is not an easy task in practice, especially when faced time pressure. Therefore, an automatic and accurate traceability link recovery approach is needed. Existing methods usually recover the traceability links through information retrieval models [1], which calculate text similarity among software artifacts. Although the text-similarity plays an essential role in correlating software artifacts, we argue that the context of software artifacts also renders important clues for establishing the traceability links among software artifacts. For example, for each software use case, its includes/extends use cases can be seen as its context information, contributing to comprehensively profiling the use case. The collection of software artifacts can be modeled as a graph structure through a variety of explicit relationships. Description-Embodied Knowledge Representation Learning (DKRL) [2] is a widely accepted method, which can effectively capture the structural information of explicit relationship and description information of entities. By effectively and precisely embedding such a graph, the context information can be meaningfully represented, contributing to the identification of requirements traceability links. In this paper, we propose a Traceability Link Recovery-Knowledge Representation Learning (TLR-KRL) to recover requirements traceability links between use cases and code based on DKRL. This work has been accepted in The 32nd International Conference on Software Engineering& Knowledge Engineering. TLR-KRL can comprehensively characterize software artifacts by embedding both text information and structural relationships, an overview of which is shown in Fig. 1. Specifically, we follow a systematic process to extend the DKRL model, improving its negative sampling method and thus minimizing false-negative samples generated by the original DKRL model. In such a way, we are able to obtain more precise embedding of software artifacts. Such meaningful embeddings are then used to train traceability link classifiers by using supervised machine learning algorithms. All traceability link candidates obtained from the classifier will be further screened by using triple classification in order to retrieve more correct traceability links. To verify the effectiveness of our method, we have carried out experiments on four datasets, including eTour, EAnci, Clinic, and ITrust. The evaluation results on each of the datasets have all shown that our approach can outperform existing work.