Abstract:Deep reinforcement learning (DRL) has demonstrated significant potential in industrial manufacturing domains such as workshop scheduling and energy system management. However, due to the model's inherent uncertainty, rigorous validation is requisite for its application in real-world tasks. Specific tests may reveal inadequacies in the performance of pre-trained DRL models, while the "black-box" nature of DRL poses a challenge for testing model behavior. We propose a novel performance improvement framework based on probabilistic automata, which aims to proactively identify and correct critical vulnerabilities of DRL systems, so that the performance of DRL models in real tasks can be improved with minimal model modifications. First, a probabilistic automaton is constructed from the historical trajectory of the DRL system by abstracting the state to generate probabilistic decision-making units (PDMUs), and a reverse breadth-first search (BFS) method is used to identify the key PDMU-action pairs that have the greatest impact on adverse outcomes. This process relies only on the state-action sequence and final result of each trajectory. Then, under the key PDMU, we search for the new action that has the greatest impact on favorable results. Finally, the key PDMU, undesirable action and new action are encapsulated as monitors to guide the DRL system to obtain more favorable results through real-time monitoring and correction mechanisms. Evaluations in two standard reinforcement learning environments and three actual job scheduling scenarios confirmed the effectiveness of the method, providing certain guarantees for the deployment of DRL models in real-world applications.

Undiscounted Reinforcement Learning Algorithm Based on Performance Potentials

Study on an Average Reward Reinforcement Learning Algorithm

Probabilistic Automata-Based Method for Enhancing Performance of Deep Reinforcement Learning Systems

Two-Timescale Simulation-based Algorithm for Markov Decision Process Based on Performance Potentials

An immediate-return reinforcement learning for the atypical Markov decision processes

A Potential-Based Method for Finite-Stage Markov Decision Process

Multiple Suboptimal Policies Integrated Reinforcement Learning Algorithm for Path Planning

Leveraging Efficiency Through Hybrid Prioritized Experience Replay in Door Environment.

Benchmarking Potential Based Rewards for Learning Humanoid Locomotion

A Collaborative Multiagent Reinforcement Learning Method Based on Policy Gradient Potential

On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning

Exploring Potential Energy Surfaces Using Reinforcement Machine Learning.

A new Potential-Based Reward Shaping for Reinforcement Learning Agent

Path Planning Method With Improved Artificial Potential Field—A Reinforcement Learning Perspective

Shaping Reward Learning Approach from Passive Samples

Potential Based Optimization Algorithm Of Constrained Markov Decision Processes

Mixed Reinforcement Learning for Efficient Policy Optimization in Stochastic Environments

Success-Rate Targeted Reinforcement Learning by Disorientation Penalty

Regularly Updated Deterministic Policy Gradient Algorithm

QUANTILE-BASED POLICY OPTIMIZATION FOR REINFORCEMENT LEARNING

Model-based Reinforcement Learning with Multi-step Plan Value Estimation