Abstract:Making ideal decisions as a product leader in a web-facing company is extremely difficult. In addition to navigating the ambiguity of customer satisfaction and achieving business goals, one must also pave a path forward for ones' products and services to remain relevant, desirable, and profitable. Data and experimentation to test product hypotheses are key to informing product decisions. Online controlled experiments by A/B testing may provide the best data to support such decisions with high confidence, but can be time-consuming and expensive, especially when one wants to understand impact to key business metrics such as retention or long-term value. Offline experimentation allows one to rapidly iterate and test, but often cannot provide the same level of confidence, and cannot easily shine a light on impact on business metrics. We introduce a novel, lightweight, and flexible approach to investigating hypotheses, called scenario analysis, that aims to support product leaders' decisions using data about users and estimates of business metrics. Its strengths are that it can provide guidance on trade-offs that are incurred by growing or shifting consumption, estimate trends in long-term outcomes like retention and other important business metrics, and can generate hypotheses about relationships between metrics at scale.

What problem does this paper attempt to address?

This paper introduces a new method called ForTune, which aims to address the challenges faced in making product decisions within internet companies. Product leaders need to find a balance between uncertain customer satisfaction, business goals, and maintaining product relevance and profitability. Online A/B testing can provide reliable data to support decision-making, but it is time-consuming and expensive, especially when evaluating the impact on key business metrics such as user retention. ForTune proposes a lightweight and flexible scenario analysis approach to study hypotheses and assist in product decision-making. This method uses estimated user data and business metric guidance to predict trade-offs brought by growth or shifting in consumption, estimate long-term trends such as user retention, and generates hypotheses about the relationships between metrics on a large scale. The ForTune tool implements this approach and conducts experiments on publicly available datasets and Spotify's actual production environment, demonstrating reasonable predictions of controlled experimental results given suitable features. The paper points out that although offline experiments can enable quick iteration and testing, they often cannot provide the same level of confidence as online experiments and cannot easily reveal the impact on business metrics. ForTune predicts the effects of changes without developing and training prediction models by reweighting past data to match expected variations. The experimental results show that ForTune performs well in predicting the impact of changes in user behavior on key business metrics, especially in predicting long-term outcomes such as retention rate. However, this method also has limitations, such as the need to choose the correct features and potential instability in predictions when the weights are concentrated on a small number of observations. Additionally, it is suitable for predicting average values rather than absolute values. Overall, ForTune provides product leaders with a tool for quickly assessing the impact of strategy, helping reduce uncertainty in the decision-making process.

ForTune: Running Offline Scenarios to Estimate Impact on Business Metrics

Online Controlled Experiments for Personalised e-Commerce Strategies: Design, Challenges, and Pitfalls

Powerful A/B-Testing Metrics and Where to Find Them

Online-to-Offline Advertisements as Field Experiments

Estimating Effects of Long-Term Treatments

Flexible Online Repeated Measures Experiment

Best of Three Worlds: Adaptive Experimentation for Digital Marketing in Practice

Learning Metrics that Maximise Power for Accelerated A/B-Tests

Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology

Adaptive Experimentation with Delayed Binary Feedback

Measuring e-Commerce Metric Changes in Online Experiments

Large-Scale Online Experimentation with Quantile Metrics

Online Evaluation of Audiences for Targeted Advertising via Bandit Experiments

Augmenting Decision Making via Interactive What-If Analysis

Automated metrics calculation in a dynamic heterogeneous environment

Online Experimentation with Surrogate Metrics: Guidelines and a Case Study

Trustworthy Online Marketplace Experimentation with Budget-split Design

SQR: Balancing Speed, Quality and Risk in Online Experiments

Unwitting Participants at Our Expense: A/B Testing and Digital Exploitation

An Opportunistic Bandit Approach for User Interface Experimentation

Competition-Based Dynamic Pricing in Online Retailing: A Methodology Validated with Field Experiments