Similarity-based grouping method for evaluation and optimization of dataset structure in machine-learning based short-term building cooling load prediction without measurable occupancy information

Xu Zhang,Yongjun Sun,Dian-ce Gao,Wenke Zou,Jianping Fu,Xiaowen Ma
DOI: https://doi.org/10.1016/j.apenergy.2022.120144
IF: 11.2
2022-12-01
Applied Energy
Abstract:Short-term building cooling load prediction plays an import role in the building energy management. The concept of similar day approach is receiving special attention as an emerging alternative instance selection method for dataset with the purpose of accuracy improvement of machine learning algorithms in cooling load prediction. However, the performance of typical machine learning algorithms integrated with different similar day selection methods have not been comprehensively assessed, particularly in case of the absence of occupancy information in dataset in most of existing buildings. This study presents a similarity-based grouping method to evaluate and optimize the dataset structure for machine-learning based short-term building cooling load prediction without measurable occupancy information in dataset. The similar day methods using time-similarity index and weather-similarity index respectively are integrated with four typical machine learning methods. The dataset is re-organized and clustered into multiple groups with different degrees of similarity for model trainings. Case studies are conducted to assess the performance of similarity-based grouping method with different similarity indexes, and the impact of lack of measurable occupancy information in dataset on the prediction accuracy. The test results show that it is not reliable to obtain higher prediction accuracy when time-similarity index is applied. The effectiveness of the weather-similarity index for similar day selection is validated to be more reliable to obtain enhanced prediction accuracy. Moreover, the dataset without occupancy information would inevitably result in significant prediction errors of machine learning algorithms, particularly in the case that the days in dataset have considerably different occupancy profiles with the target day.
energy & fuels,engineering, chemical
What problem does this paper attempt to address?