Two-Stage Reinforcement Learning Policy Search for Grid-Interactive Building Control

Xiangyu Zhang,Yue Chen,Andrey Bernstein,Rohit Chintala,Peter Graf,Xin Jin,David Biagioni
DOI: https://doi.org/10.1109/tsg.2022.3141625
IF: 10.275
2022-05-01
IEEE Transactions on Smart Grid
Abstract:This paper develops an intelligent grid-interactive building controller, which optimizes building operation during both normal hours and demand response (DR) events. To avoid costly on-demand computation and to adapt to non-linear building models, the controller utilizes reinforcement learning (RL) and makes real-time decisions based on a near-optimal control policy. Learning such a policy typically amounts to solving a hard non-convex optimization problem. We propose to address this problem with a novel global-local policy search method. In the first stage, an RL algorithm based on zero-order gradient estimation is leveraged to search for the optimal policy globally, due to its scalability and the potential to escape some poor performing local optima. The obtained policy is then fine-tuned locally to bring the first-stage solution closer to that of the original unsmoothed problem. Experiments on a simulated five-zone commercial building demonstrate the advantages of the proposed method over existing learning approaches. They also show that the learned control policy outperforms a pragmatic linear model predictive controller (MPC) and approaches the performance of an oracle MPC in testing scenarios. Using a state-of-the-art advanced computing system, we demonstrate that the controller can be learned and deployed within hours of training.
engineering, electrical & electronic
What problem does this paper attempt to address?