Predicting daily diffuse horizontal solar radiation in various climatic regions of China using support vector machine and tree-based soft computing models with local and extrinsic climatic data

Junliang Fan,Xiukang Wang,Fucang Zhang,Xin Ma,Lifeng Wu
DOI: https://doi.org/10.1016/j.jclepro.2019.119264
IF: 11.1
2020-01-01
Journal of Cleaner Production
Abstract:Knowledge of diffuse horizontal solar radiation (R-d) on horizontal surfaces is a prerequisite for the design and optimization of active and passive solar energy systems such as the solar illumination system within a building, but it is unavailable in many worldwide locations and commonly predicted by readily available climatic variables. However, reliable prediction of R-d is difficult when lack of complete or previous climatic data at the target station. This study evaluated the performance of support vector machine (SVM) and four tree-based soft computing models, i.e. M5 model tree (M5Tree), random forest (RF), extreme gradient boosting (XGBoost) and gradient boosting with categorical features support (CatBoost), for prediction of daily horizontal R-d when using limited local (Scenario 1) and extrinsic (Scenarios 2 and 3) climatic data. Six input combinations of daily global solar radiation (Rs), sunshine hour (n), maximum/minimum temperature (T-max/T-min) and relative humidity (RH) during 1996-2015 at 15 weather stations across various climatic rons of China were considered. The results demonstrated that, when lack of Rs, the average root mean square error (RMSE) was considerably increased across China (42.4%) in Scenario 1, especially in the (sub)tropical monsoon ron (68.3%). SVM offered the best combination of prediction accuracy and generalization capability in all scenarios, followed by CatBoost. CatBoost produced the closest daily R-d estimates to SVM and satisfactory generalization capability. In Scenario 2, CatBoost and SVM models developed with climatic data from Beijing gave the overall best daily R-d estimates over the 15 stations, while models developed with data from 14 weather stations in Scenario 3 produced even better and steadier R-d estimates across China compared with those in Scenario 2. The average computational time of SVM (6.6 s) for a single sample was approximately 1.9 times that of CatBoost (3.5 s) in Scenarios 1 and 2, while the corresponding value (842.6 s) was approximately 33.9 times that of CatBoost (24.9 s) in Scenario 3. Comprehensively considering prediction accuracy, generalization capability and computational efficiency, CatBoost is highly recommended to develop general models for daily R-d prediction in various climatic rons of China, particularly when lack of previous local climatic data. (c) 2019 Elsevier Ltd. All rights reserved.
What problem does this paper attempt to address?