AutoOffAB: Toward Automated Offline A/B Testing for Data-Driven Requirement Engineering

Jie JW Wu

DOI: https://doi.org/10.1145/3663529.3663780

2024-08-09

Abstract:Software companies have widely used online A/B testing to evaluate the impact of a new technology by offering it to groups of users and comparing it against the unmodified product. However, running online A/B testing needs not only efforts in design, implementation, and stakeholders' approval to be served in production but also several weeks to collect the data in iterations. To address these issues, a recently emerging topic, called "Offline A/B Testing", is getting increasing attention, intending to conduct the offline evaluation of new technologies by estimating historical logged data. Although this approach is promising due to lower implementation effort, faster turnaround time, and no potential user harm, for it to be effectively prioritized as requirements in practice, several limitations need to be addressed, including its discrepancy with online A/B test results, and lack of systematic updates on varying data and parameters. In response, in this vision paper, I introduce AutoOffAB, an idea to automatically run variants of offline A/B testing against recent logging and update the offline evaluation results, which are used to make decisions on requirements more reliably and systematically.

Software Engineering

What problem does this paper attempt to address?

The paper attempts to address the issue of how to improve the reliability and systematization of data-driven requirements engineering through automated offline A/B testing during the software development process. Specifically, traditional online A/B testing has the following problems: 1. **High design and implementation costs**: It requires a lot of design and implementation work in the codebase and needs to meet production-level standards. 2. **Significant impact on users**: Online A/B testing directly affects a portion of users, and if the test version contains errors or security issues, it may negatively impact users. 3. **Long time cycles**: It usually takes several weeks to collect data and may require multiple iterations. To overcome these limitations, researchers have proposed the concept of offline A/B testing, which estimates the effects of new techniques through historical log data. Although this method has advantages such as low development costs and quick turnaround times, it still has some limitations, such as unreliable results due to manually selecting algorithms and parameter values, and a lack of systematic updates. To address the above issues, the authors propose the idea of AutoOffAB, which aims to automatically run and periodically update offline A/B test results, making them more reliable for requirements decision-making. Specifically, AutoOffAB improves existing methods in the following ways: - Automatically generating and evaluating algorithm variants. - Periodically updating test results using the latest log data. - Reducing the discrepancy between offline test results and online test results. In this way, AutoOffAB aims to improve the reliability of offline A/B test results, making them more suitable for data-driven requirements engineering.

AutoOffAB: Toward Automated Offline A/B Testing for Data-Driven Requirement Engineering

Design from Policies: Conservative Test-Time Adaptation for Offline Policy Optimization

An off-line programming system for robotic drilling in aerospace manufacturing

Can Offline Testing of Deep Neural Networks Replace Their Online Testing?: A Case Study of Automated Driving Systems

Can Offline Testing of Deep Neural Networks Replace Their Online Testing?

An architecture for enabling A/B experiments in automotive embedded software

An Online Sequential Test for Qualitative Treatment Effects

A/B testing: A systematic literature review

On the Advances and Challenges of Adaptive Online Testing

Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous Car Case Study

Offline recommender system evaluation: Challenges and new directions

Towards Data-Driven Offline Simulations for Online Reinforcement Learning

Offline Simulation Online Application: A New Framework of Simulation-Based Decision Making

Online Learning for Non-Stationary A/B Tests

A framework for Multi-A(rmed)/B(andit) testing with online FDR control

Automatic framework for requirement analysis phase

Validation of massively-parallel adaptive testing using dynamic control matching

ForTune: Running Offline Scenarios to Estimate Impact on Business Metrics

Online Controlled Experiments for Personalised e-Commerce Strategies: Design, Challenges, and Pitfalls

Degradation-Resistant Offline Optimization Via Accumulative Risk Control

Sequential Optimum Test with Multi-armed Bandits for Online Experimentation