Building HVAC Scheduling Using Reinforcement Learning via Neural Network Based Model Approximation

Buildings sector is one of the major consumers of energy in the United States. The buildings HVAC (Heating, Ventilation, and Air Conditioning) systems, whose functionality is to maintain thermal comfort and indoor air quality (IAQ), account for almost half of the energy consumed by the buildings. Thus, intelligent scheduling of the building HVAC system has the potential for tremendous energy and cost savings while ensuring that the control objectives (thermal comfort, air quality) are satisfied. Traditionally, rule-based and model-based approaches such as linear-quadratic regulator (LQR) have been used for scheduling HVAC. However, the system complexity of HVAC and the dynamism in the building environment limit the accuracy, efficiency and robustness of such methods. Recently, several works have focused on model-free deep reinforcement learning based techniques such as Deep Q-Network (DQN). Such methods require extensive interactions with the environment. Thus, they are impractical to implement in real systems due to low sample efficiency. Safety-aware exploration is another challenge in real systems since certain actions at particular states may result in catastrophic outcomes. To address these issues and challenges, we propose a modelbased reinforcement learning approach that learns the system dynamics using a neural network. Then, we adopt Model Predictive Control (MPC) using the learned system dynamics to perform control with random-sampling shooting method. To ensure safe exploration, we limit the actions within safe range and the maximum absolute change of actions according to prior knowledge. We evaluate our ideas through simulation using widely adopted EnergyPlus tool on a case study consisting of a two zone data-center. Experiments show that the average deviation of the trajectories sampled from the learned dynamics and the ground truth is below 20%. Compared with baseline approaches, we reduce the total energy consumption by 17.1% ~ 21.8%. Compared with model-free reinforcement learning approach, we reduce the required number of training steps to converge by 10x.

[1]  Vijay Kumar,et al.  Approximating Explicit Model Predictive Control Using Constrained Neural Networks , 2018, 2018 Annual American Control Conference (ACC).

[2]  H. Robbins A Stochastic Approximation Method , 1951 .

[3]  Jian Sun,et al.  Optimal control of building HVAC&R systems using complete simulation-based sequential quadratic programming (CSB-SQP) , 2005 .

[4]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[5]  Giovanni De Magistris,et al.  Reinforcement Learning Testbed for Power-Consumption Optimization , 2018, ArXiv.

[6]  Anil V. Rao,et al.  ( Preprint ) AAS 09-334 A SURVEY OF NUMERICAL METHODS FOR OPTIMAL CONTROL , 2009 .

[7]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[8]  Khee Poh Lam,et al.  Practical implementation and evaluation of deep reinforcement learning control for a radiant heating system , 2018, BuildSys@SenSys.

[9]  A. Syaichu-Rohman,et al.  Implementation model predictive control (MPC) algorithm-3 for inverted pendulum , 2012, 2012 IEEE Control and System Graduate Research Colloquium.

[10]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Craig Boutilier,et al.  Data center cooling using model-predictive control , 2018, NeurIPS.

[12]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[13]  Jong-Il Park,et al.  Novel Modeling and Control Strategies for a HVAC System Including Carbon Dioxide Control , 2014 .

[14]  Wenjian Cai,et al.  PID autotuner and its application in HVAC systems , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[15]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[16]  Drury B. Crawley,et al.  EnergyPlus: Energy simulation program , 2000 .

[17]  Zhenjun Ma,et al.  Supervisory and Optimal Control of Building HVAC Systems: A Review , 2008 .

[18]  Tianshu Wei,et al.  Deep reinforcement learning for building HVAC control , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[19]  J. Betts,et al.  Path constrained trajectory optimization using sparse sequential quadratic programming , 1991 .

[20]  Geoffrey J. Gordon,et al.  No-Regret Reductions for Imitation Learning and Structured Prediction , 2010, ArXiv.

[21]  Michael L. Littman,et al.  A tutorial on partially observable Markov decision processes , 2009 .

[22]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[23]  Miguel Velez-Reyes,et al.  Nonlinear control of a heating, ventilating, and air conditioning system with thermal load estimation , 1999, IEEE Trans. Control. Syst. Technol..

[24]  Salmiah Ahmad,et al.  Linear Quadratic Regulator (LQR) approach for lifting and stabilizing of two-wheeled wheelchair , 2011, 2011 4th International Conference on Mechatronics (ICOM).

[25]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[26]  Jie Li,et al.  Energy-Efficient Thermal Comfort Control in Smart Buildings via Deep Reinforcement Learning , 2019, ArXiv.

[27]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[28]  Kaamran Raahemifar,et al.  Artificial neural network (ANN) based model predictive control (MPC) and optimization of HVAC systems: A state of the art review and case study of a residential HVAC system , 2017 .