Abstract:We argue that inventory management presents unique opportunities for reliably applying and evaluating deep reinforcement learning (DRL). Toward reliable application, we emphasize and test two techniques. The first is Hindsight Differentiable Policy Optimization (HDPO), which performs stochastic gradient descent to optimize policy performance while avoiding the need to repeatedly deploy randomized policies in the environment-as is common with generic policy gradient methods. Our second technique involves aligning policy (neural) network architectures with the structure of the inventory network. Specifically, we focus on a network with a single warehouse that consolidates inventory from external suppliers, holds it, and then distributes it to many stores as needed. In this setting, we introduce the symmetry-aware policy network architecture. We motivate this architecture by establishing an asymptotic performance guarantee and empirically demonstrate its ability to reduce the amount of data needed to uncover strong policies. Both techniques exploit structures inherent in inventory management problems, moving beyond generic DRL algorithms. Toward rigorous evaluation, we create and share new benchmark problems, divided into two categories. One type focuses on problems with hidden structures that allow us to compute or bound the cost of the true optimal policy. Across four problems of this type, we find HDPO consistently attains near-optimal performance, handling up to 60-dimensional raw state vectors effectively. The other type of evaluation involves constructing a test problem using real time series data from a large retailer, where the optimum is poorly understood. Here, we find HDPO methods meaningfully outperform a variety of generalized newsvendor heuristics. Our code can be found at <a class="link-external link-http" href="http://github.com/MatiasAlvo/Neural_inventory_control" rel="external noopener nofollow">this http URL</a>.

Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management

Can Deep Reinforcement Learning Improve Inventory Management? Performance on Dual Sourcing, Lost Sales and Multi-Echelon Problems

Solving Inventory Management Problems Through Deep Reinforcement Learning

Deep Reinforcement Learning for Large-Scale Inventory Management

Performance of deep reinforcement learning algorithms in two-echelon inventory control systems

Deep Controlled Learning for Inventory Control

Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand

Multi-echelon inventory optimization using deep reinforcement learning

Deep Inventory Management

Neural Inventory Control in Networks via Hindsight Differentiable Policy Optimization

A Simulation Environment and Reinforcement Learning Method for Waste Reduction

Deep Policy Iteration with Integer Programming for Inventory Management

Solving a Joint Pricing and Inventory Control Problem for Perishables via Deep Reinforcement Learning

Cooperative Multi-Agent Reinforcement Learning for Inventory Management

Optimizing Robotic Mobile Fulfillment Systems for Order Picking Based on Deep Reinforcement Learning

Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations

A Deep Q-Network Based on Radial Basis Functions for Multi-Echelon Inventory Management

Adaptive Disassembly Sequence Planning for VR Maintenance Training Via Deep Reinforcement Learning

Reinforcement Learning with Intrinsically Motivated Feedback Graph for Lost-sales Inventory Control

Deep reinforcement learning for demand fulfillment in online retail

An application of deep reinforcement learning and vendor-managed inventory in perishable supply chain management