AutoOffAB: Toward Automated Offline A/B Testing for Data-Driven Requirement Engineering

Jie JW Wu
DOI: https://doi.org/10.1145/3663529.3663780
2024-08-09
Abstract:Software companies have widely used online A/B testing to evaluate the impact of a new technology by offering it to groups of users and comparing it against the unmodified product. However, running online A/B testing needs not only efforts in design, implementation, and stakeholders' approval to be served in production but also several weeks to collect the data in iterations. To address these issues, a recently emerging topic, called "Offline A/B Testing", is getting increasing attention, intending to conduct the offline evaluation of new technologies by estimating historical logged data. Although this approach is promising due to lower implementation effort, faster turnaround time, and no potential user harm, for it to be effectively prioritized as requirements in practice, several limitations need to be addressed, including its discrepancy with online A/B test results, and lack of systematic updates on varying data and parameters. In response, in this vision paper, I introduce AutoOffAB, an idea to automatically run variants of offline A/B testing against recent logging and update the offline evaluation results, which are used to make decisions on requirements more reliably and systematically.
Software Engineering
What problem does this paper attempt to address?
The paper attempts to address the issue of how to improve the reliability and systematization of data-driven requirements engineering through automated offline A/B testing during the software development process. Specifically, traditional online A/B testing has the following problems: 1. **High design and implementation costs**: It requires a lot of design and implementation work in the codebase and needs to meet production-level standards. 2. **Significant impact on users**: Online A/B testing directly affects a portion of users, and if the test version contains errors or security issues, it may negatively impact users. 3. **Long time cycles**: It usually takes several weeks to collect data and may require multiple iterations. To overcome these limitations, researchers have proposed the concept of offline A/B testing, which estimates the effects of new techniques through historical log data. Although this method has advantages such as low development costs and quick turnaround times, it still has some limitations, such as unreliable results due to manually selecting algorithms and parameter values, and a lack of systematic updates. To address the above issues, the authors propose the idea of AutoOffAB, which aims to automatically run and periodically update offline A/B test results, making them more reliable for requirements decision-making. Specifically, AutoOffAB improves existing methods in the following ways: - Automatically generating and evaluating algorithm variants. - Periodically updating test results using the latest log data. - Reducing the discrepancy between offline test results and online test results. In this way, AutoOffAB aims to improve the reliability of offline A/B test results, making them more suitable for data-driven requirements engineering.