Investigating boosting techniques' efficacy in feature selection: A comparative analysis
Ubaid Ahmed,Anzar Mahmood,Majid Ali Tunio,Ghulam Hafeez,Ahsan Raza Khan,Sohail Razzaq
DOI: https://doi.org/10.1016/j.egyr.2024.03.020
IF: 5.2
2024-03-20
Energy Reports
Abstract:Accurate Solar Irradiance (SI) forecasting is an important aspect of solar energy harvesting and it depends on various meteorological features. Numerous feature selection algorithms have been implemented for the selection of suitable meteorological parameters. However, boosting algorithms are not explored widely for feature selection applications. Therefore, in this study, a novel perspective is introduced by exploring the efficacy of boosting algorithms in feature selection applications. In the proposed study, we perform a comparative analysis of different boosting algorithms for feature selection applications including Extreme Gradient Boosting (XgBoost), Categorical Boosting (CatBoost), Random Forest (RF) and Light Gradient Boosting Machine (LGBM). The novelty of this approach is in utilizing these boosting techniques for the selection of the most appropriate features that improve the predictive performance of the model. The SI data of three different geographical locations: Islamabad, Pakistan, Basel, Switzerland and Golden, Colorado, USA are attained from the National Solar Radiation Database (NSRDB) and used in the proposed study. First, the appropriate features are selected by four boosting algorithms separately. The selected features are then fed to the Bidirectional Long-Short-Term Memory (BiLSTM) network for forecasting hour-ahead Global Horizontal Irradiance (GHI). The Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Error (MAE), Mean Absolute Scaled Error (MASE) and Normalized Root Mean Square Error (NRMSE) are used as performance indicators. Findings demonstrate that the BiLSTM network trained on selected features, proposed by the XgBoost model, produces better forecasting results. In the case of the Islamabad city dataset, the RMSE and MAE of BiLSTM trained with appropriate features, as compared to the conventional model, are improved by 29.92% and 14.03%, respectively. For the dataset of Basel, the RMSE and MAE of BiLSTM network improved by 14.43% and 28.72%, respectively. Moreover, for the Golden city dataset, the RMSE and MAE of the proposed approach are improved by 10.5% and 17.38%, respectively than the conventional model.
energy & fuels