Abstract:Advanced building control methods such as model predictive control (MPC) offer significant potential benefits to both consumers and grid operators, but the high computational requirements have acted as barriers to more widespread adoption. Local control computation requires installation of expensive computational hardware, while cloud computing introduces data security and privacy concerns. In this paper, we drastically reduce the local computational requirements of advanced building control through a reinforcement learning (RL)-based approach called Behavioral Cloning, which represents the MPC policy as a neural network that can be locally implemented and quickly computed on a low-cost programmable logic controller. While previous RL and approximate MPC methods must be specifically trained for each building, our key improvement is that our controller can generalize to many buildings, electricity rates, and thermostat setpoint schedules without additional, effort-intensive retraining. To provide this versatility, we have adapted the traditional Behavioral Cloning approach through (1) a constraint-informed parameter grouping (CIPG) method that provides a more efficient representation of the training data; (2) an MPC-Guided training data generation method using the DAgger algorithm that improves stability and constraint satisfaction; and (3) a new deep learning model-structure called reverse-time recurrent neural networks (RT-RNN) that allows future information to flow backward in time to more effectively interpret the temporal information in disturbance predictions. The result is an easy-to-deploy, generalized behavioral clone of MPC that can be implemented on a programmable logic controller and requires little building-specific controller tuning, reducing the effort and costs associated with implementing smart residential heat pump control.

Sample-Efficient Policy Learning Based on Completely Behavior Cloning.

Learning visual servo policies via planner cloning

Model Predictive Control via On-Policy Imitation Learning

Generalized Reinforcement Learning for Building Control using Behavioral Cloning

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Stable-BC: Controlling Covariate Shift with Stable Behavior Cloning

Planning for Sample Efficient Imitation Learning

Towards Improving Learning from Demonstration Algorithms via MCMC Methods

PI-ELM: Reinforcement learning-based adaptable policy improvement for dynamical system

Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection

PLATO: Policy Learning using Adaptive Trajectory Optimization

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

An Efficient Model-Based Approach on Learning Agile Motor Skills without Reinforcement

Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL

Behavioral Cloning via Search in Embedded Demonstration Dataset

Improving Sample Efficiency of Multiagent Reinforcement Learning with Nonexpert Policy for Flocking Control.

MAMBPO: Sample-efficient multi-robot reinforcement learning using learned world models

Efficient Multi-Policy Evaluation for Reinforcement Learning

Blending Imitation and Reinforcement Learning for Robust Policy Improvement

Zero-Shot Retargeting of Learned Quadruped Locomotion Policies Using Hybrid Kinodynamic Model Predictive Control

Constrained Behavior Cloning for Robotic Learning