Abstract:Test-to-code traceability links (TCTLs) establish links between test artifacts and code artifacts. These links enable developers and testers to quickly identify the specific pieces of code tested by particular test cases, thus facilitating more efficient debugging, regression testing, and maintenance activities. Various approaches, based on distinct concepts, have been proposed to establish method-level TCTLs, specifically linking unit tests to corresponding focal methods. Static methods, such as naming-convention-based methods, use heuristic- and similarity-based strategies. However, such methods face the following challenges: 1 Developers, driven by specific scenarios and development requirements, may deviate from naming conventions, leading to TCTL identification failures. 2 Static methods often overlook the rich semantics embedded within tests, leading to erroneous associations between tests and semantically unrelated code fragments. Although dynamic methods achieve promising results, they require the project to be compilable and the tests to be executable, limiting their usability. This limitation is significant for downstream tasks requiring massive test-code pairs, as not all projects can meet these requirements. To tackle the abovementioned limitations, we propose a novel static method-level TCTL approach, named TestLinker. For the first challenge of existing static approaches, TestLinker introduces a two-phase TCTL framework to accommodate different project types in a triage manner. As for the second challenge, we employ the semantic correlation learning, which learns and establishes the semantic correlations between tests and focal methods based on Pre-trained Code Models (PCMs). TestLinker further establishes mapping rules to accurately link the recommended function name to the concrete production function declaration. Empirical evaluation on a meticulously labeled dataset reveals that TestLinker significantly outperforms traditional static techniques, showing average F1-score improvements ranging from 73.48% to 202.00%. Moreover, compared to state-of-the-art dynamic methods, TestLinker, which only leverages static information, demonstrates comparable or even better performance, with an average F1-score increase of 37.40%.

Linking Source Code to Untangled Change Intents

Automatic Retrieval Method for Tracing Links Between Code and Chinese Documentation

Feature Location in Source Code by Trace-Based Impact Analysis and Information Retrieval.

Automating Just-In-Time Comment Updating

Just-In-Time Defect Identification and Localization: A Two-Phase Framework.

Who Should Review This Change?: Putting Text and File Location Analyses Together for More Accurate Recommendations

Commit-Level Software Change Intent Classification Using a Pre-Trained Transformer-Based Code Model

Method-Level Test-to-Code Traceability Link Construction by Semantic Correlation Learning

CoLink: an Unsupervised Framework for User Identity Linkage

Identifying change patterns in software history

Visual Exploration of Dependency Graph in Source Code Via Embedding-Based Similarity.

Understanding Code Change with Micro-Changes

Untangling Composite Commits by Attributed Graph Clustering

Detect Hidden Dependency to Untangle Commits

Watch out for This Commit! A Study of Influential Software Changes

MTLink: Adaptive multi-task learning based pre-trained language model for traceability link recovery between issues and commits

Combining Code Context and Fine-grained Code Difference for Commit Message Generation

Enhancing Software Maintenance: A Learning to Rank Approach for Co-changed Method Identification

A Literature Review of Automatic Traceability Links Recovery for Software Change Impact Analysis

Towards Usable Neural Comment Generation Via Code-Comment Linkage Interpretation: Method and Empirical Study

Beyond Literal Meaning: Uncover and Explain Implicit Knowledge in Code Through Wikipedia-Based Concept Linking