Guided Reinforce Learning Through Spatial Residual Value for Online 3D Bin Packing

Zefei Wang,Yi Chen,Chenlu Liu,Weiyang Lin,Liu Yang
DOI: https://doi.org/10.1109/IECON51785.2023.10312036
2023-01-01
Abstract:We have implemented a practical and high-performance non-removable and non-adjustable online 3D box packing algorithm. The problem to be solved by the algorithm belongs to a type of online 3D box packing problem (3D-BPP), but unlike the traditional 3D box packing problem, only a limited number of boxes to be loaded can be known at a time, so the size of boxed is random for algorithm. The problem also requires that the boxes can't be placed in the buffer or the state of the already loaded boxes can't be changed during the whole process. Due to realistic factors, the packing strategy must also satisfy geometric, stability and orientation constraints. We propose a reward function based on spatial residual value assisting the best deep reinforcement learning algorithm we know right now to solve such a question. The residual value of space means the value of the space that can be used in the future. The algorithm adjusts the network parameters in the Actor-Critic framework based on the impact of the intelligence's strategy on the spatial residual value. Compared with recent online 3D box packing strategies, our algorithm performs better than the best algorithm we know with normal reward function (of course better than all current heuristic methods), better than those learning-based methods (about 9% for space utilization), and have fewer learning iterations to converge (about 1000 in all 8000 episodes).
What problem does this paper attempt to address?