Investigating the Performance of Machine Learning Models Combined with Different Feature Selection Methods to Estimate the Energy Consumption of Buildings

Xue Liu,Hao Tang,Yong Ding,Da Yan
DOI: https://doi.org/10.1016/j.enbuild.2022.112408
IF: 7.201
2022-01-01
Energy and Buildings
Abstract:Machine learning is considered a promising method for developing building energy-benchmarking mod-els. However, the high dimensionality of building energy datasets can reduce model accuracy and gener-alization but increase the computational cost. Meanwhile, the poor interpretability of machine learning models limits the understanding of insights and, in turn, hinders policymaking. Therefore, the first objec-tive of this study was to investigate the benefits of feature selection on the performance of machine learning-based energy usage models. Three typical feature selection methods (filter, wrapper, and embedded) were selected, and the effect of each method was evaluated based on three tree-ensemble learning algorithms. Another objective was to analyze the interpretability of the machine learning model using the Shapley additive explanation method. The results were obtained using a city-scale energy con-sumption dataset consisting of 478 healthcare buildings in Chongqing, China. It was found that the wrap-per method generally improved the accuracy of the machine learning models compared to that of the other two methods. In addition, the model developed using extreme gradient boosting combined with the wrapper method achieved the best accuracy. Moreover, the model interpretability analysis demon-strated important features and revealed how these features influence energy use for individual buildings.(c) 2022 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?