Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses

Rong Jin,David Simchi-Levi,Li Wang,Xinshang Wang,Sen Yang
DOI: https://doi.org/10.48550/arXiv.1903.07844
2019-05-03
Abstract:The recent rising popularity of ultra-fast delivery services on retail platforms fuels the increasing use of urban warehouses, whose proximity to customers makes fast deliveries viable. The space limit in urban warehouses poses a problem for the online retailers: the number of products (SKUs) they carry is no longer "the more, the better", yet it can still be significantly large, reaching hundreds or thousands in a product category. In this paper, we study algorithms for dynamically identifying a large number of products (i.e., SKUs) with top customer purchase probabilities on the fly, from an ocean of potential products to offer on retailers' ultra-fast delivery platforms. We distill the product selection problem into a semi-bandit model with linear generalization. There are in total $N$ different arms, each with a feature vector of dimension $d$. The player pulls $K$ arms in each period and observes the bandit feedback from each of the pulled arms. We focus on the setting where $K$ is much greater than the number of total time periods $T$ or the dimension of product features $d$. We first analyze a standard UCB algorithm and show its regret bound can be expressed as the sum of a $T$-independent part $\tilde O(K d^{3/2})$ and a $T$-dependent part $\tilde O(d\sqrt{KT})$, which we refer to as "fixed cost" and "variable cost" respectively. To reduce the fixed cost for large $K$ values, we propose a novel online learning algorithm, which iteratively shrinks the upper confidence bounds within each period, and show its fixed cost is reduced by a factor of $d$ to $\tilde O(K \sqrt{d})$. Moreover, we test the algorithms on an industrial dataset from Alibaba Group. Experimental results show that our new algorithm reduces the total regret of the standard UCB algorithm by at least 10%.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the problem of dynamically selecting the optimal product set in urban warehouses. Specifically, with the increasing popularity of ultra - fast delivery services on retail platforms, urban warehouses have become the key to achieving rapid delivery because of their proximity to consumers. However, the space limitations of urban warehouses prevent online retailers from offering as many product categories (i.e., SKUs) as traditional e - commerce platforms. Therefore, how to dynamically identify a large number of products with the highest probability of customer purchase from a large number of potential products has become an urgent problem to be solved. The paper refines this product selection problem into a semi - bandit model with linear generalization. There are \(N\) arms in the model (each product is regarded as an arm), and each arm has a \(d\) - dimensional feature vector. The player selects \(K\) arms in each cycle and observes the bandit feedback of each selected arm. The focus of the study is on the case where \(K\) is much larger than the total time cycle \(T\) or the product feature dimension \(d\). To reduce the fixed cost, the paper proposes a new online learning algorithm - ConsUCB (Conservative Upper Confidence Bound), which reduces the fixed cost by iteratively shrinking the upper confidence bound in each cycle. Experimental results show that compared with the standard UCB algorithm, the new algorithm can reduce the total regret value by at least 10%. ### Specific Problem Description 1. **Background**: With the rise of ultra - fast delivery services, urban warehouses have become the key to rapid delivery because of their geographical location advantages. However, due to the limited space in urban warehouses, retailers cannot increase product categories without limit and need to select the products most likely to be purchased within the limited space. 2. **Objective**: Dynamically select the optimal product set from a large number of potential products to maximize the probability of meeting customer needs and thus improve sales performance. 3. **Challenges**: - There are a large number of products, and each category may have hundreds to thousands of products. - The sales data of products are limited, especially the sales data of low - demand products. - Multiple adjustments need to be made within a limited time cycle (such as a quarter) to adapt to market changes. 4. **Method**: The paper proposes a semi - bandit model with linear generalization and analyzes the regret bound of the standard UCB algorithm. On this basis, a new algorithm ConsUCB is proposed, which reduces the fixed cost by gradually shrinking the upper confidence bound in each cycle. 5. **Contributions**: - Provide a new regret bound analysis of the standard UCB algorithm under specific model settings. - Propose a new algorithm ConsUCB, which significantly reduces the fixed cost. - Verify the effectiveness of the new algorithm through Alibaba Group's data. ### Mathematical Model - **Model Setup**: - \(N\): The total number of candidate products. - \(T\): The total number of time cycles. - \(K\): The number of products selected in each cycle. - \(d\): The dimension of the feature vector of each product. - \(\theta^*\): The unknown parameter vector, representing the linear relationship between the product's sales probability and its feature vector. - **Objective Function**: Maximize the expected reward in each cycle, that is, the sum of the positive sales probabilities of the selected products. - **Regret Definition**: \[ R(T)=\sum_{t = 1}^{T}\sum_{i\in S^*}\mu(i)-\sum_{t = 1}^{T}\sum_{i\in S_t}\mu(i) \] where \(S^*\) is the optimal product set and \(S_t\) is the product set selected in the \(t\)-th cycle. ### New Algorithm ConsUCB - **Core Idea**: Reduce the fixed cost by gradually shrinking the upper confidence bound in each cycle. - **Fixed Cost**: The fixed cost of the standard UCB algorithm is \(\tilde{O}(Kd^{3/2})\), while the fixed cost of the ConsUCB algorithm is reduced to \(\tilde{O}(K\sqrt{d})\).