Deep Inverse Reinforcement Learning for Objective Function Identification in Bidding Models

Hongye Guo,Qixin Chen,Qing Xia,Chongqing Kang
DOI: https://doi.org/10.1109/tpwrs.2021.3076296
IF: 7.326
2021-01-01
IEEE Transactions on Power Systems
Abstract:Due to the deregulation of power systems worldwide, bidding behavior simulation research has gained prominence. One crucial element in these studies is accurately defining and modelling the individual reward function (or objective function). Considering the ubiquitous information barriers between market participants and researchers, the common way is to develop reward functions based on theoretical assumptions, which will inevitably cause deviations from the real world. However, since market data have gradually become transparent in recent years, especially data regarding historical bidding behaviors, it is feasible to introduce data-driven methods to identify the individual reward functions that are hidden in raw bidding data. Thus, this paper proposes a data-driven bidding objective function identification framework with three procedures. First, the bidding decision processes of participants are formulated as a standard Markov decision process. Second, a deep inverse reinforcement learning method that is based on maximum entropy is introduced to identify individual reward functions, whose high-dimensional nonlinearity could be saved in multilayer perceptions (MLPs). Third, a deep Q-network method is customized to simulate the individual bidding behaviors based on the obtained MLP-based objective functions. The effectiveness and feasibility of the proposed framework and methods are tested based on real market data from the Australian electricity market.
What problem does this paper attempt to address?