Abstract:The evolution of the retail business presents new challenges and raises pivotal questions on how to reinvent stores and supply chains to meet the growing demand of the online channel. One of the recent measures adopted by omnichannel retailers is to address the growth of online sales using in-store picking, which allows serving online orders using existing assets. However, it comes with the downside of harming the offline customer experience. To achieve picking policies adapted to the dynamic customer flows of a retail store, we formalize a new problem called Dynamic In-store Picker Routing Problem (diPRP). In this relevant problem - diPRP - a picker tries to pick online orders while minimizing customer encounters. We model the problem as a Markov Decision Process (MDP) and solve it using a hybrid solution approach comprising mathematical programming and reinforcement learning components. Computational experiments on synthetic instances suggest that the algorithm converges to efficient policies. Furthermore, we apply our approach in the context of a large European retailer to assess the results of the proposed policies regarding the number of orders picked and customers encountered. Our work suggests that retailers should be able to scale the in-store picking of online orders without jeopardizing the experience of offline customers. The policies learned using the proposed solution approach reduced the number of customer encounters by more than 50% when compared to policies solely focused on picking orders. Thus, to pursue omnichannel strategies that adequately trade-off operational efficiency and customer experience, retailers cannot rely on actual simplistic picking strategies, such as choosing the shortest possible route.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is how to improve operational efficiency and enhance customer experience when performing online order picking operations in physical stores. Specifically, the paper introduces a new problem - the Dynamic In - store Picker Routing Problem (diPRP), which aims to minimize the number of encounters with customers while pickers are picking online orders. The paper solves this problem by modeling diPRP as a Markov Decision Process (MDP) and adopting a hybrid solution method (including mathematical programming and reinforcement learning components). ### Overview of the Main Problem 1. **Background**: - The development of the retail industry has brought new challenges, especially in meeting the growing demands of online channels. - Multi - channel retailers adopt in - store picking strategies to use existing resources to handle online orders, but this strategy may harm the shopping experience of offline customers. 2. **Objectives**: - Optimize the picking policy so that it can adapt to the dynamic customer flow in the retail store. - Maximize the number of online orders picked while reducing the number of encounters with customers. 3. **Methods**: - Formalize the problem as a Markov Decision Process (MDP). - Use the hybrid Q - learning algorithm, combining mathematical programming and reinforcement learning techniques, to determine the optimal picking path. - Verify the effectiveness of the algorithm through computational experiments on synthetic examples and evaluate the results in the practical application of a large European retailer. ### Specific Problem Description - **Dynamic In - store Picker Routing Problem (diPRP)**: - Pickers need to pick online orders in the store while trying to avoid encounters with customers as much as possible. - The key to the problem is how to make decisions under uncertain customer flow to maximize picking efficiency and improve customer experience. - **Model**: - **State**: Consists of the picker's current position, the state of each picking position (whether it has been visited) and the arrival time. - **Action**: Select the next position to visit. - **Reward**: Rewards are given according to the picker's actions, including a fixed negative reward per step, a negative reward for the number of customers at the current and adjacent positions, and a positive reward for successfully picking products. - **Objective Function**: Maximize the total expected reward, that is, pick as many orders as possible while reducing the number of customer encounters. ### Solution - **Hybrid Q - learning Algorithm**: - Combine mathematical programming and reinforcement learning to solve sequential decision - making problems by recursively updating the policy. - The algorithm can dynamically adjust the path during each picking process to respond to changes in the customer flow in the store. ### Experiment and Application - **Synthetic Example Experiment**: - Verify the convergence and effectiveness of the algorithm through synthetic data. - **Practical Application**: - In the application of a large European retailer, the effectiveness of the proposed strategy in the actual environment has been verified. The results show that compared with the strategy that only focuses on picking orders, the new strategy reduces the number of customer encounters by more than 50%. In conclusion, by introducing diPRP and proposing the hybrid Q - learning algorithm, this paper provides an effective method for multi - channel retailers to improve the picking efficiency of online orders without sacrificing customer experience.

Playing hide and seek: tackling in-store picking operations while improving customer experience

Demand Pooling in Omnichannel Operations

Human-Robot Cooperation: Coordinating Autonomous Mobile Robots and Human Order Pickers

On picking operations in e-commerce warehouses: Insights from the complete-information counterpart

Learning Efficient and Fair Policies for Uncertainty-Aware Collaborative Human-Robot Order Picking

Picking Operations in Warehouses with Dynamically Arriving Orders: How Good is Reoptimization?

Dynamic pickup and delivery problem with transshipments and LIFO constraints

Applying machine learning to the dynamic selection of replenishment policies in fast-changing supply chain environments

A real‐life study on the value of integrated optimization in order picking operations under dynamic order arrivals

Human-Centric Order Picking: Performance Prediction and Robot Assignment at a Robotic Fulfilment Center

Modeling Single Picker Routing Problems in Classical and Modern Warehouses

Spatial and temporal optimization for smart warehouses with fast turnover

Data-driven Warehouse Optimization

Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations

From Single Orders to Batches: A Sensitivity Analysis of Warehouse Picking Efficiency

Solving a large multi-product production-routing problem with delivery time windows

The Optimization of Picking in Logistics Warehouses in the Event of Sudden Picking Order Changes and Picking Route Blockages

A Polling-Based Dynamic Order Picking System for Online Retailers

Optimizing Robotic Mobile Fulfillment Systems for Order Picking Based on Deep Reinforcement Learning

Deep reinforcement learning for demand fulfillment in online retail

Integration of returns and decomposition of customer orders in e-commerce warehouses