Abstract:Software developers usually rely on in-house performance testing to detect performance regressions and locate their root causes. Such performance testing is typically resource and time-consuming, making it impractical to conduct when the software is delivered in fast-paced release cycles. On the other hand, the operational data generated in the field environment provides rich information about the performance of a software system and its runtime activities. Therefore, this work explores the idea of leveraging the readily-available field operational data to locate the root causes of performance regression instead of running expensive performance tests. However, due to the ever-changing workloads from the end users and the noise from the field, directly analyzing performance metrics such as response time of the system may not be able to help locate the root causes of performance regressions. In this paper, we report our experience of designing and adopting an approach that automatically locates the root causes of performance regressions while the software systems are deployed and running in the field. First, our approach uses black-box performance models to capture the relationship between the performance of a system and its runtime activities. Then, our approach analyzes the performance models and uses statistical techniques to suggest the problematic system runtime activities (i.e., the root causes) that are related to a performance regression. Our evaluation considered three open-source projects and one industrial product. In the three open-source systems, we find that our approach can successfully locate the root causes of all arbitrarily injected synthetic performance regressions. Our approach has successfully detected and located the root causes of three performance regressions in an industry system and it has been adopted by our industrial partner and used in practice on a daily basis over a 12-month period. In addition, we share the challenges that we encountered during the design and adoption of our approach, how we address those challenges, and the lessons that we learned during the process. We believe that our novel approach together with our documented experience can benefit practitioners and researchers who wish to leverage the field-operation data of a software system to conduct performance assurance activities.

What Makes a Real Change in Software Performance? an Empirical Study on Analyzing the Factors That Affect the Triagement of Performance Change Points

Distance Based Root Cause Analysis and Change Impact Analysis of Performance Regressions

Change Point Detection in Software Performance Testing

Collective Personalized Change Classification with Multiobjective Search

Who Should Review This Change?: Putting Text and File Location Analyses Together for More Accurate Recommendations

Early Prediction of Merged Code Changes to Prioritize Reviewing Tasks

FUNNEL: Assessing Software Changes in Web-Based Services

Automated Identification of Performance Changes at Code Level

Rapid and Robust Impact Assessment of Software Changes in Large Internet-Based Services

BIPeC: A Combined Change-Point Analyzer to Identify Performance Regressions in Large-scale Database Systems

An Evaluation of Change Point Detection Algorithms

Auto-PIP: Real-time Identification of Critical Performance Inflection Points in Software Stress Testing

Watch out for This Commit! A Study of Influential Software Changes

Time-leverage Point Detection for Time Sensitive Software Maintenance.

On the validity of retrospective predictive performance evaluation procedures in just-in-time software defect prediction

Using black-box performance models to detect performance regressions under varying workloads: an empirical study

PerfRanker: Prioritization of Performance Regression Tests for Collection-Intensive Software

How Does Regression Test Prioritization Perform in Real-World Software Evolution?

Application Research On Real-Time Perception Of Device Performance Status

Locating Performance Regression Root Causes in the Field Operations of Web-based Systems: An Experience Report

How Are Performance Issues Caused and Resolved?-An Empirical Study from a Design Perspective.