Development of Parametric Reinforcement Learning for different operation preferences

Ruiyu Qiu,Guanghui Yang,Zuhua Xu,Zhijiang Shao
DOI: https://doi.org/10.1109/CAC57257.2022.10055517
2022-01-01
Abstract:With the industry process becoming more and more complex, there are more and more different working conditions and operation policy requirements. Accordingly, operators with different operation preferences are needed in different conditions. Some are conservative and some are aggressive. At the same time, Deep Reinforcement Learning (DRL) is becoming notable for its excellent performance in exploring controlling policies in various kinds of problems in the past few years. However, although DRL can help solve the problem in some condition, it can be difficult for one agent to adapt to all situations, and it is also a waste of time to train a new agent when new operation requirements come. In this paper, a method called Parametric Reinforcement Learning is proposed to solve the problem, using a parameter to represent the policy characteristic of a single agent, so that the target operation can be fitted with some base agents trained ahead of time. Also, Shell benchmark is used to simulate the effectiveness of this method.
What problem does this paper attempt to address?