MARLIM: Multi-Agent Reinforcement Learning for Inventory Management

Rémi Leluc,Elie Kadoche,Antoine Bertoncello,Sébastien Gourvénec
2023-08-03
Abstract:Maintaining a balance between the supply and demand of products by optimizing replenishment decisions is one of the most important challenges in the supply chain industry. This paper presents a novel reinforcement learning framework called MARLIM, to address the inventory management problem for a single-echelon multi-products supply chain with stochastic demands and lead-times. Within this context, controllers are developed through single or multiple agents in a cooperative setting. Numerical experiments on real data demonstrate the benefits of reinforcement learning methods over traditional baselines.
Machine Learning,Artificial Intelligence,Multiagent Systems
What problem does this paper attempt to address?
The paper aims to address the issue of inventory control in supply chain management, particularly for a single-stage multi-product supply chain with stochastic demand and lead time. Specifically, the paper proposes a new framework called MARLIM (Multi-Agent Reinforcement Learning for Inventory Management), which utilizes multi-agent reinforcement learning methods to optimize inventory levels, balance supply and demand, and minimize costs. The main problems described in the paper include: 1. **Environmental Uncertainty**: Actual demand and lead time are stochastic, making inventory management complex and difficult to predict accurately. 2. **Limitations of Traditional Methods**: Traditional heuristic-based inventory management methods cannot effectively handle the randomness of demand and lead time; while dynamic programming is theoretically effective, it is challenging to implement in large-scale real-world applications because it requires precise knowledge of the system's mathematical model, which is often impossible in reality. To address these issues, the authors make the following contributions: - Developed a new reinforcement learning framework, MARLIM, specifically designed to solve the single-stage multi-product supply chain inventory management problem, considering stochastic demand and lead time. - Provided a training strategy for training agents in different scenarios, including fixed or shared capacity constraints and specific handling methods for storage overflow. - Validated the proposed method's advantages over traditional baseline methods through numerical experiments on real datasets. The MARLIM framework adopts a multi-agent reinforcement learning (MARL) approach, where each product can be viewed as an independent agent, allowing better modeling of the interdependencies between different products. Additionally, the paper details the basic concepts of reinforcement learning, the construction of the inventory management model, and the specific inventory dynamics process. In summary, the goal of the paper is to improve inventory management decisions through the MARLIM framework, particularly in the face of uncertainty and complexity, effectively reducing operational costs and avoiding stockouts.