Machine Learning Models for Regional Photovoltaic Power Generation Forecasting with Limited Plant-Specific Data

Mauro Tucci,Antonio Piazzi,Dimitri Thomopulos
DOI: https://doi.org/10.3390/en17102346
IF: 3.2
2024-05-14
Energies
Abstract:Predicting electricity production from renewable energy sources, such as solar photovoltaic installations, is crucial for effective grid management and energy planning in the transition towards a sustainable future. This study proposes machine learning approaches for predicting electricity production from solar photovoltaic installations at a regional level in Italy, not using data on individual installations. Addressing the challenge of diverse data availability between pinpoint meteorological inputs and aggregated power data for entire regions, we propose leveraging meteorological data from the centroid of each Italian province within each region. Particular attention is given to the selection of the best input features, which leads to augmenting the input with 1-hour-lagged meteorological data and previous-hour power data. Several ML approaches were compared and examined, optimizing the hyperparameters through five-fold cross-validation. The hourly predictions encompass a time horizon ranging from 1 to 24 h. Among tested methods, Kernel Ridge Regression and Random Forest Regression emerge as the most effective models for our specific application. We also performed experiments to assess how frequently the models should be retrained and how frequently the hyperparameters should be optimized in order to comprise between accuracy and computational costs. Our results indicate that once trained, the model can provide accurate predictions for extended periods without frequent retraining, highlighting its long-term reliability.
energy & fuels
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the issue of predicting regional photovoltaic (PV) power generation, particularly in situations where specific power plant data is lacking. Specifically: 1. **Problem Background**: - With the increasing importance of renewable energy, especially solar PV systems, in the sustainable energy transition, accurately predicting the power output of these systems is crucial for grid management and energy planning. - However, the intermittency and variability of solar power generation pose challenges for grid operators. 2. **Research Objectives**: - Propose machine learning methods to predict PV power generation at the regional level in Italy without relying on specific information about the location or size of power plants. - By utilizing meteorological data and historical generation data, develop models to predict regional power output, thereby improving grid management efficiency and energy planning accuracy. 3. **Main Contributions**: - Innovatively use meteorological data from the geographic center points of each province to overcome inconsistencies in data availability at different scales. - Select optimal input features, including meteorological data lagged by 1 hour and generation data from the previous hour, to enhance the model's predictive capability. - Employ various machine learning methods (such as Kernel Ridge Regression and Random Forest Regression) and optimize hyperparameters through 5-fold cross-validation. - Analyze the impact of periodic model retraining and hyperparameter optimization frequency on prediction accuracy. Through these methods, the paper demonstrates that accurate regional PV power generation predictions can be achieved even in the absence of detailed power plant data. This has significant practical implications, especially in data-scarce situations.