The Big Data Newsvendor Problem under Demand and Yield Uncertainties
Tiantian Cao,Yi Yang,Han Zhu,Mingyue Yu
DOI: https://doi.org/10.1016/j.ijpe.2024.109409
IF: 11.251
2024-01-01
International Journal of Production Economics
Abstract:We consider a variant of the classic newsvendor problem in which the firms face both demand and yield randomness. Different from the existing literature, we assume that decision-makers have no priori knowledge of the distribution functions of demand and yield, but have access to past observations of demand, yield, and related feature information. We integrate predictive machine learning algorithms to determine the optimal order quantity directly from historical data, respectively based on the empirical risk minimization (ERM) principle, kernel regression approach, K-nearest neighbors (kNN), and classification and regression trees (CART). These data-driven approaches can not only sufficiently capture useful information from relevant features, but also take into account the structure of the optimization problem, which can effectively avoid inconsistency solutions in the traditional “prediction-then-optimization” approach. Most importantly, we establish out-of-sample generalization error bounds under mild conditions using uniform stability-based and Rademacher complexity-based methods in computational learning theory and then show the asymptotic optimality of the data-driven approaches based on kernel regression and kNN. Our data-driven approaches can tractably deal with both independent and interdependent demand and yield uncertainties. Finally, numerical experiments based on both synthetic data and real data are conducted to compare our proposed methods with two traditional benchmark approaches, including the Sample Average Approximation (SAA) approach and the traditional “Predict-then-Optimize” framework based on CART. We observe that our data-driven approaches can achieve significant performance improvement and the one based on the kernel regression method tends to perform the best on real data, with an average daily cost saving of up tp 54.92%.