Abstract:We consider a repeated newsvendor problem where the inventory manager has no prior information about the demand, and can access only censored/sales data. In analogy to multi-armed bandit problems, the manager needs to simultaneously "explore" and "exploit" with her inventory decisions, in order to minimize the cumulative cost. We make no probabilistic assumptions---importantly, independence or time stationarity---regarding the mechanism that creates the demand sequence. Our goal is to shed light on the hardness of the problem, and to develop policies that perform well with respect to the regret criterion, that is, the difference between the cumulative cost of a policy and that of the best fixed action/static inventory decision in hindsight, uniformly over all feasible demand sequences. We show that a simple randomized policy, termed the Exponentially Weighted Forecaster, combined with a carefully designed cost estimator, achieves optimal scaling of the expected regret (up to logarithmic factors) with respect to all three key primitives: the number of time periods, the number of inventory decisions available, and the demand support. Through this result, we derive an important insight: the benefit from "information stalking" as well as the cost of censoring are both negligible in this dynamic learning problem, at least with respect to the regret criterion. Furthermore, we modify the proposed policy in order to perform well in terms of the tracking regret, that is, using as benchmark the best sequence of inventory decisions that switches a limited number of times. Numerical experiments suggest that the proposed approach outperforms existing ones (that are tailored to, or facilitated by, time stationarity) on nonstationary demand models. Finally, we extend the proposed approach and its analysis to a "combinatorial" version of the repeated newsvendor problem.

Fixing Inventory Inaccuracies At Scale

Anomaly Detection for an E-commerce Pricing System

Sparse Anomaly Detection Across Referentials: A Rank-Based Higher Criticism Approach

Fight Inventory Shrinkage: Simultaneous Learning of Inventory Level and Shrinkage Rate

Distribution-Free Detection of Structured Anomalies: Permutation and Rank-Based Scans

Low-count Time Series Anomaly Detection

Applied Machine Learning to Anomaly Detection in Enterprise Purchase Processes

Low-Rank Matrix Approximation with Stability.

Scalable changepoint and anomaly detection in cross-correlated data with an application to condition monitoring

Fixing shelf out-of-stock with signals in point-of-sale data

A Dynamic Bayesian Network Model for Inventory Level Estimation in Retail Marketing

Imputation and low-rank estimation with Missing Not At Random data

A Marketplace Price Anomaly Detection System at Scale

Exact Characterization of the Jointly Optimal Restocking and Auditing Policy in Inventory Systems with Record Inaccuracy

Anomaly Detection for Incident Response at Scale

Concept-based Anomaly Detection in Retail Stores for Automatic Correction using Mobile Robots

Low-rank on Graphs plus Temporally Smooth Sparse Decomposition for Anomaly Detection in Spatiotemporal Data

Algorithmic Recourse for Anomaly Detection in Multivariate Time Series

Designing an Efficient End-to-end Machine Learning Pipeline for Real-time Empty-shelf Detection

Exact variable-length anomaly detection algorithm for univariate and multivariate time series

On the Hardness of Inventory Management with Censored Demand Data