Battery Management for Warehouse Robots Via Average-Reward Reinforcement Learning

Yongjin Mu,Yanjie Li,Ke Lin,Ki Deng,Qi Liu
DOI: https://doi.org/10.1109/robio55434.2022.10011784
2022-01-01
Abstract:In automated warehouses, the battery management strategy of Automated Guided Vehicles (AGVs) can affect the throughput and operational efficiency of the warehouse. In this paper, we first model the battery management problem as a Markov Decision Process (MDP) and adopt the deep reinforcement learning (DRL) algorithm as the battery management strategy. However, discounted reward DRL algorithms ignore long-term benefits, which are not suitable for the strategy since orders arriving at the warehouse at every moment are important and should be treated. In order to solve the above problems, we then introduce the average reward DRL algorithm to focus more on long-term benefits. But the existing average reward DRL algorithms have the problems of low sample utilization and unstable training. Therefore, we present a practical algorithm called average reward TD3 (ARTD3) that learns faster and is more stable. Finally, we conduct extensive experiments to confirm that ARTD3 outperforms discounted reward DRL algorithm and rule-based methods.
What problem does this paper attempt to address?