Hostname: page-component-54dcc4c588-2bdfx Total loading time: 0 Render date: 2025-09-17T10:34:54.432Z Has data issue: false hasContentIssue false

Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France

Published online by Cambridge University Press:  15 September 2025

Eloi Lindas*
Affiliation:
Laboratoire des Sciences du Climat et de l’Environnement (LSCE), IPSL, CEA/CNRS/UVSQ, Université Paris-Saclay, Gif-sur-Yvette, France Atos Inno’Lab TS Bezons, Atos, Bezons, France
Yannig Goude
Affiliation:
Laboratoire de Mathématiques d’Orsay (LMO), Faculté des Sciences d’Orsay, CNRS, Université Paris-Saclay, Orsay, France EDF R&D Lab, OSIRIS, EDF, Palaiseau, France
Philippe Ciais
Affiliation:
Laboratoire des Sciences du Climat et de l’Environnement (LSCE), IPSL, CEA/CNRS/UVSQ, Université Paris-Saclay, Gif-sur-Yvette, France
*
Corresponding author: Eloi Lindas; Email: eloi.lindas@lsce.ipsl.fr

Abstract

Accurate prediction of nondispatchable renewable energy sources is essential for grid stability and price prediction. Regional power supply forecasts are usually indirect through a bottom-up approach of plant-level forecasts, incorporate lagged power values, and do not use the potential of spatially resolved data. This study presented a comprehensive methodology for predicting solar and wind power production at a country scale in France using machine learning models trained with spatially explicit weather data combined with spatial information about production sites’ capacity. A dataset is built spanning from 2012 to 2023, using daily power production data from Réseau de Transport d’Electricité (the national grid operator) as the target variable, with daily weather data from ECMWF Re-Analysis v5, production sites capacity and location, and electricity prices as input features. Three modeling approaches are explored to handle spatially resolved weather data: spatial averaging over the country, dimension reduction through principal component analysis, and a computer vision architecture to exploit complex spatial relationships. The study benchmarks state-of-the-art machine learning models as well as hyperparameter tuning approaches based on cross-validation methods on daily power production data. Results indicate that cross-validation tailored to time series is best suited to reach low error. We found that neural networks tend to outperform traditional tree-based models, which face challenges in extrapolation due to the increasing renewable capacity over time. Model performance ranges from 4% to 10% in normalized root-mean-squared error for midterm horizon, achieving similar error metrics to local models established at a single-plant level, highlighting the potential of these methods for regional power supply forecasting.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NC
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (http://creativecommons.org/licenses/by-nc/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Open Practices
Open data
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Impact Statement

Accurate power production forecasts, particularly for solar and wind power which are sensitive to weather conditions, are critical for grid stability, optimizing renewable energy integration, and supporting the transition to cleaner energy. We predict national power output in France by taking advantage of time-varying images of weather and power generation units’ capacity as input data for different machine learning models. The key finding is that image-based models outperform time series-based models. The results of this research provide a practical model benchmark usable for practitioners and policymakers.

1. Introduction

To meet the 2050 net-zero scenario (United Nations Convention on Climate Change, 2015) of the European Union (EU) reinforced by the European Green Deal, which aims at decreasing net greenhouse gas emissions by 55% by 2030 (European Commission, 2019). Sustainable energy sources have become key to clean power production and reduced emissions from the energy sector in Europe. As power demand increases, however, fossil reliance is still high, accounting for 68% of the global primary energy consumed in 2023 and 40% of the electricity produced in the EU (British Petroleum [BP], 2024; Ritchie and Rosado, Reference Ritchie and Rosado2020). Electrification, coupled with more renewable and other low-carbon power supplies, is needed to reduce dependence on fossil fuels. To meet the $ {\mathrm{CO}}_2 $ emissions goals of the EU, solar and wind power generation need to double their capacity by 2030 to produce 48% of Europe’s energy share (International Renewable Energy Agency (IRENA), 2020b).

France has set a reduction of 33% of its emissions by 2030 compared to 1990, and pledged to reach greenhouse gas neutrality in 2050 (Ministère de la Transition Ecologique, 2020). This involves an increase in renewable power capacity installed throughout the country. The capacity of solar and wind power plants has tripled since 2012, and this growth is expected to accelerate with the capacity being planned to double from 2017 to 2028 (Ministère de la Transition Ecologique, 2019). Increasing renewable capacity comes with grid distribution challenges to prevent gaps between supply and demand, especially during the day when production may exceed consumption (Liu et al., Reference Liu, He, Wu, Liu, Zhang, Chen, Shen and Li2023a). Accurate forecasts of power generation can improve the stability, reliability, quality, and penetration level of renewable energy (International Renewable Energy Agency (IRENA), 2020a). Solar and wind power sources depend on environmental and climate variables such as temperature, solar radiation, and wind speed, making their load highly variable (Engeland et al., Reference Engeland, Borga, Creutin, François, Ramos and Vidal2017; Wang et al., Reference Wang, Zhong, Lai, Xia, Wang and Kang2019b). This variability leads to obstacles for grid operators as they need to constantly balance the demand with the supply. This is one of the reasons why specific models for understanding and predicting day-to-day renewable power generation have motivated interest from researchers and practitioners.

Many studies addressed the problem of short- (10 min–1 h) to medium-term (3 h–3 days) forecasting of renewable power using weather data from stations or numerical weather prediction (NWP). The impact of weather data and variable importance on forecasting energy supply, photovoltaic (PV), and wind power was studied thoroughly (Vladislavleva et al., Reference Vladislavleva, Friedrich, Neumann and Wagner2013; De Giorgi et al., Reference De Giorgi, Congedo and Malvoni2014; Zhong and Wu, Reference Zhong and Wu2020; Liu et al., Reference Liu, He, Wu, Liu, Zhang, Chen, Shen and Li2023b). At the local scale, Malvoni et al. (Reference Malvoni, De Giorgi and Congedo2016) used solar radiation and temperature to predict the generation of a Mediterranean PV plant. The effect of various climates throughout the planet on hourly PV production was also investigated by Alcaniz et al. (Reference Alcañiz, Lindfors, Zeman, Ziar and Isabella2023). Other works such as Ahmad and Hossain (Reference Ahmad and Hossain2020) made use of weather forecasts to maximize hydropower generated from dams while Couto and Estanqueiro (Reference Couto and Estanqueiro2022) who examined model-based predictive features for wind power predictions. Frequently, the availability of accurate weather observation is a bottleneck when working with a dedicated local area, not to mention their inherent sparsity and noise level, leading to NWP being preferred by researchers. Yet, when both types of weather data are available, they can be combined (Sharma et al., Reference Sharma, Sharma, Irwin and Shenoy2011; López Gómez et al., Reference López Gómez, Ogando Martínez, Troncoso Pastoriza, Febrero Garrido, Granada Álvarez and Orosa García2020).

Recent advances in forecasting variable renewable energy generation have seen statistical, machine learning, and deep learning models gain popularity among practitioners (Wang et al., Reference Wang, Lei, Zhang, Zhou and Peng2019a; Iheanetu, Reference Iheanetu2022; Krechowicz et al., Reference Krechowicz, Krechowicz and Poczeta2022; Tsai et al., Reference Tsai, Hong, Tu, Lin and Chen2023). Thanks to the increase in weather and power data availability and quality, models have proven to be useful in revealing driving factors and learning from complex patterns (Sweeney et al., Reference Sweeney, Bessa, Browell and Pinson2020). Depending on the spatial and temporal scale, statistical models can outperform traditional physics-based models, which motivated the development of hybrid models (Bellinguer et al., Reference Bellinguer, Mahler, Camal and Kariniotakis2020; Castillo-Rojas et al., Reference Castillo-Rojas, Medina Quispe and Hernández2023; Gijón et al., Reference Gijón, Pujana-Goitia, Perea, Molina-Solana and Gómez-Romero2023). The link function between weather conditions and PV panels or wind turbines power output has been thoroughly investigated through different types of models (Dolara et al., Reference Dolara, Leva and Manzolini2015; Mayer and Gróf, Reference Mayer and Gróf2021; Zhou et al., Reference Zhou, Qiu, Feng and Liu2022; Bilendo et al., Reference Bilendo, Meyer, Badihi, Lu, Cambron and Jiang2023). Still, challenges remain when developing models for a large region or country.

Statistical data-driven models such as auto-regressive moving average (ARMA) and their variants (ARIMA, ARIMAX, SARIMA, and SARIMAX) have demonstrated reasonable performance, as shown in recent work (Chen and Folly, Reference Chen and Folly2018; Ryu et al., Reference Ryu, Lee, Park, Hwang, Park, Lee and Kwon2022). Support vector machine, k-Nearest Neighbors, Generalized Additive Model (GAM), and tree-based and boosted models also gave good performance in forecasting power output from weather data (Kim et al., Reference Kim, Jung and Sim2019; Condemi et al., Reference Condemi, Casillas-Pérez, Mastroeni, Jiménez-Fernández and Salcedo-Sanz2021). Current trends have seen the use of artificial neural networks, computer vision (CV), and natural language processing models. Their application in renewable power forecasting shows promising performance. Multilayered perceptron (MLP), convolutionnal neural network (CNN), vision transformers (ViT) (Lim et al., Reference Lim, Huh, Hong, Park and Kim2022; Keisler and Naour, Reference Keisler and Naour2025), and sequence architectures such as recurrent neural network or long–short term memory deep learning models were also applied in various renewable energy forecasting frameworks (solar and wind) (Elsaraiti and Merabet, Reference Elsaraiti and Merabet2022; Abdul Baseer et al., Reference Abdul Baseer, Almunif, Alsaduni and Tazeen2023). A key advantage is their flexibility and ability to combine several data sources to make predictions, not to mention the different ways they can exploit complex spatiotemporal data.

Research on statistical models is not limited to model architectures. Data preprocessing techniques are also important to improve forecast performance. Principal component analysis (PCA), wavelet decomposition, time series detrending, and exponential smoothing can be applied to extract relevant features, reduce dimension, remove noise, or reveal pertinent phenomena from the data (Liu and Chen, Reference Liu and Chen2019; Iheanetu, Reference Iheanetu2022). These techniques are mainly used as a first step to improve the robustness and performance of a model. It is important to point out that such techniques can be applied regardless of the type of data at hand, whether it is time series or gridded data over a region, albeit the second option being less explored.

Besides the methodology and models used for forecasting, differences between studies arise from the input and output data. Depending on the purpose and the availability of the data, the time and space resolution as well as temporal and spatial ranges differ between studies (Engeland et al., Reference Engeland, Borga, Creutin, François, Ramos and Vidal2017). Research works encompass scales from short-term single plant forecasts with a time resolution of 5–10 minutes (Malvoni et al., Reference Malvoni, De Giorgi and Congedo2017; Ryu et al., Reference Ryu, Lee, Park, Hwang, Park, Lee and Kwon2022; Gijón et al., Reference Gijón, Pujana-Goitia, Perea, Molina-Solana and Gómez-Romero2023) to medium-term daily forecasts of a region (Kim et al., Reference Kim, Kim, Yoo, Lee and Kim2017). However, due to the lack of available good quality data, regional forecasts are often made out of single plant forecasts aggregated to the desired region. This means an indirect prediction of the regional power supply. Moreover, the temporal scale rarely exceeds a few years’ worth of data (Chen and Folly, Reference Chen and Folly2018; Iheanetu, Reference Iheanetu2022). Thus, gaps exist between short to medium term and regional forecasts, leading to difficulties in comparing results between studies and improving modeling performance.

Most prior studies have used a bottom-up approach based on single-plant models, which neglects the integration of spatial information for prediction. Additionally, many existing models enhanced their performance by incorporating lagged data of the target time series itself, such as power supply from the previous day or hour. To overcome these limitations, in this study, we use supervised machine learning models and test the impact of using spatially resolved data as model inputs. We also decided to exclude the use of lagged inputs from the time series themselves as model inputs. The first goal is to assess the influence of the model calibration procedure, especially the cross-validation protocol, on time series-based model error estimation. The second goal is to compare models ingesting explicit weather “images” against averaged variables as inputs.

We first explain how we build input datasets for wind and PV production integrating spatially resolved weather data and generation units’ capacity and locations. These input images span the period from January 1, 2012, to December 31, 2023, at hourly resolution as presented in Section 2. Second, we present three different modeling approaches to handle the weather-gridded data to forecast daily wind and PV power production in Section 3.1. Finally, we explore cross-validation and hyperparameter optimization procedures in Section 3.3 to give insights and recommendations for model calibration before benchmarking widespread state-of-the-art machine learning models on our different modeling approaches in Section 4.

2. Data

In this section, we describe the target power supply data, the input weather data and power units data, and other input data sources, with the processing workflow to prepare them as input for supervised learning approaches. Figure 1 presents the overall approach, with more details given in the following sections.

Figure 1. Global framework of this study represented schematically.

2.1. Target data

We used as target wind and solar power from the RTE $ {\mathrm{eCO}}_2\mathrm{mix} $ database. RTE is the public French national Transmission System Operator (TSO) managing the whole electrical grid. RTE provides near-real-time data on electrical consumption, production, flows, and $ {\mathrm{CO}}_2 $ emissions within the $ {\mathrm{eCO}}_2\mathrm{mix} $ application.Footnote 1 Electricity production data from RTE covers eight sectors: coal, oil, gas, nuclear, hydro, solar, wind, and bioenergy. We recovered production data for nondispatchable renewable wind and solar power. Solar refers to photovoltaic solar panels and wind to both onshore and offshore turbines.

Time-wise, data are available since January 1, 2012, and were retrieved until December 31, 2023. Resolution is half-hourly from January 1, 2012, to January 31, 2023, and quarter-hourly from February 1, 2023, to December 31, 2023.Footnote 2 We aggregated the data to an hourly resolution to be consistent with the time resolution of our inputs (see Section 2.2). Data being available at the country (NUTS0) or regional (NUTS1) scale, we chose to work directly with country-scale data. This dataset excluded Corsica and other French islands or overseas territories, which are considered self-sufficient in electricity.

France is part of the EU electricity market and the EU grid interconnection. In this work, we aimed to model the electrical power produced using solar and wind from France only, without taking into account any connection with neighboring countries. Therefore, we did not integrate imports and exports into our power supply target and retained only the production data, presented in Figure 2.

Figure 2. Power supply and capacity time series for wind and solar in France for the period of interest. The power capacity curves have been smoothed to a yearly resolution.

2.2. Input data

Our input data are based on gridded weather data weighted by the power capacity available at the given time and location, electricity day-ahead spot price, and other temporal features such as time or day of the year. We combined several different high-quality open-access databases from French governmental or government-affiliated organizations to create coherent inputs.

2.2.1. Weather data

We recovered hourly weather data from the ERA5 reanalysis (Hersbach et al., Reference Hersbach, Bell, Berrisford, Hirahara, Horányi, Muñoz-Sabater, Nicolas, Peubey, Radu, Schepers, Simmons, Soci, Abdalla, Abellan, Balsamo, Bechtold, Biavati, Bidlot, Bonavita, De Chiara, Dahlgren, Dee, Diamantakis, Dragani, Flemming, Forbes, Fuentes, Geer, Haimberger, Healy, Hogan, Hólm, Janisková, Keeley, Laloyaux, Lopez, Lupu, Radnoti, De Rosnay, Rozum, Vamborg, Villaume and Thépaut2020) on single levels for the period of interest from January 1, 2012, to December 31, 2023. We used the domain bounded by 51° North, 42.5° South, $ - $ 4.55° West, and 7.95° East which covers France, re-interpolating the original spatial grid of 0.25° $ \times $ 0.25° or 30 km $ \times $ 30 km. The weather variables we selected are those usually used for renewable power prediction: temperature at 2 m, Northward and Eastward wind speed at 10 and 100 m, instantaneous wind gust speed at 10 m, surface solar radiation downwards, total precipitation, evaporation, and runoff (Table A1). To select the variables relevant to wind and solar power, we used the mutual information between weather variables and power supply targets (Kraskov et al., Reference Kraskov, Stögbauer and Grassberger2004). We normalized the mutual information to one and kept only variables that had a score higher than 20%. This leads to hourly maps with 35 latitude and 51 longitude points for each considered variable in netCDF files.

2.2.2. Power units location, capacity, and activity

To get information on the location of facilities with installed solar panels or wind turbines, we used yearly released data from the Opérateurs Réseaux Energies (ORE)Footnote 3 agency database of all electrical facilities used for producing or storing electricity in France. The inventory published on December 31, 2023, contained around 84,000 electricity-producing units, among which 2,183 are wind facilities and 72,703 are PV farms. Rooftop PV panels dedicated to autoconsumption are not included. Because the ORE dataset did not provide the exact location of each facility, we merged it with the French governmental city databaseFootnote 4 using City ID, to allocate each facility to a 30-km grid cell of our weather maps. A city refers to an NUTS4 entity. City ID is a unique identifier provided to every French city by Institut National de la Statistique et des Etudes Economiques. Facilities’ city IDs that were missing in ORE accounted for less than 2% of the data and were discarded. We assigned facilities to their corresponding wind or solar sector, keeping only PV panels for solar and including both offshore and onshore turbines for wind. The maximum power that can be produced by each facility in MW provided by ORE was used as its capacity. Some power capacity data were missing, representing 0.25% fo the data and thus were discarded. To account for the activity period of each facility, we added its start and stop dates. If the stop date was not given in the ORE inventory, we assumed that the facility was still in activity. For the start date, we used the start-up date or the date the plant was connected to the grid. We verified that those two starting dates were close to each other for facilities where both were reported. After latitude, longitude, sector, power capacity, and start/stop dates for each facility were added, we only dropped 4.4% of the initial ORE dataset. Most of those discarded plants are located overseas or in Corsica.

2.2.3. Power-weighted weather maps

We generated power capacity-weighted weather maps, by assigning each power facility to the nearest grid cell in the gridded hourly weather data. The weather parameters are thus multiplied by the power capacity weights defined as:

(2.1) $$ {w}_{i,j}^t=\frac{P_{i,j}^t}{\sum_t{\sum}_{i,j}{P}_{i,j}^t} $$

with the power capacity $ {P}_{i,j}^t $ at time $ t $ and latitude, longitude $ i,j $ in MW. We use a spatiotemporal normalization of the weights to account for the fact that nondispatchable renewable energy sources have seen their available production capacity increase in the last few years (see Figure 2). Because this behavior is expected to carry on, it is important to account for it in the model’s input. Figure 3 recaps the weighted weather map creation schematically.

Figure 3. Illustration of power-weighted weather maps creation for wind.

2.2.4. Additional input features

To ensure that models could grasp all of the seasonality and trend, we added two temporal features as it is usually done in the electricity forecasting literature (Chatfield, Reference Chatfield1986; Taylor, Reference Taylor2010; Goude et al., Reference Goude, Nedellec and Kong2014). The time step converted to a numerical integer, and the day of the year encoded using a cosine: $ {doy}_{cos}=\cos \left(\frac{2\pi {doy}_{int}}{365}\right) $ , where $ {doy}_{int} $ is the day of the year encoded as an integer between 1 and 365. We used those two temporal features for the wind and solar sectors. However, to be more consistent with the physical process of producing electricity with PV panels, we replaced $ {doy}_{cos} $ for solar by the sunshine duration of the day. This duration was computed from sunrise and sunset times. We did it for every grid cell and timestep.

Even though PV and wind power supply to the grid are related to weather conditions, they are also dependent on the demand that electricity providers need to meet. The last few years have seen negative electricity prices on the market soar as the electrical demand was low, and the available renewable power was in oversupply. This led to a new practice from electricity providers called curtailment, which consists of deliberately restricting the electricity generation from renewable energy sources to prevent negative prices (De Vita et al., Reference De Vita, Capros, Evangelopoulou, Kannavou, Siskos, Zazias, Boeve, Bons, Winkel, Cilhar, De Vos, Leemput and Mandatova2020; Biber et al., Reference Biber, Felder, Wieland and Spliethoff2022; Yasuda et al., Reference Yasuda, Bird, Carlini, Eriksen, Estanqueiro, Flynn, Fraile, Gómez Lázaro, Martín-Martínez, Hayashi, Holttinen, Lew, McCam, Menemenlis, Miranda, Orths, Smith, Taibi and Vrana2022). Thus, we added as input the electricity spot price for France at hourly resolution from ENTSO-E.Footnote 5 There are different ways participants trade electricity on the market and therefore different electricity prices. We chose to use the auction day-ahead spot price as it is the only one that can be freely retrieved through ENTSO-E. Auction day-ahead spot price is the price of an $ \mathrm{MW}\;{\mathrm{h}}^{-1} $ , which was decided the day before delivery through an auction.

The above-described data processing methodology and workflow allowed us to have input and target datasets for Solar and Wind power, designed for a supervised learning approach, and consisting of a set of $ \left(X,Y\right) $ observations. $ X $ refers to hourly weather maps gridded over France for each selected weather variable, weighted by the power capacity of plants located in the corresponding cells. It also includes day-ahead spot price and temporal features such as the time and day of the year or sunshine duration. $ Y $ refers to the corresponding electrical power produced during this hour.

3. Models and calibration

This section describes the models we tested to predict electricity power production from weather variables. It also includes a discussion on model calibration techniques.

3.1. Modeling choices and approaches

As we aimed to develop models able to predict the power production of PV and wind for a day, given the weather conditions, day-ahead price, and temporal features of that same day, we aggregated all input data from hourly to daily resolution. Aggregation also helped to increase the signal-to-noise ratio and prevent overfitting when predicting daily power from hourly data. This leads to a day-to-day prediction approach without utilizing values of the previous days. In operation, real forecasts could then be easily obtained with our model by plugging daily weather forecasts from numerical weather prediction models.

3.1.1. Model architectures

We chose to test three modeling architectures of increasing complexity, as summarized in Figure 4: first using power-weighted weather images averaged over the whole French territory, second applying to power-weighted weather a dimension reduction method, and third applying a vision or image-based technique.

Figure 4. Representation of the three modeling approaches used in this work to make use of weather maps.

Models using spatially averaged images as input

The first approach is to train models on spatially averaged input data, to have a time series-to-time series regression framework. After averaging, weather time series are combined with price and temporal features series to leverage one-to-one models (models using one input point to predict the corresponding target point). In this family of models, we tested linear regressions, generalized additive models, tree-based models, boosting or artificial neural network, all proven to be capable of reaching state-of-the-art performance (Wood et al., Reference Wood, Goude and Shaw2014; Gaillard et al., Reference Gaillard, Goude and Nedellec2016; Krechowicz et al., Reference Krechowicz, Krechowicz and Poczeta2022; Chen et al., Reference Chen, Hu, Wang, Wang and Zhu2023; Liu et al., Reference Liu, He, Wu, Liu, Zhang, Chen, Shen and Li2023b).

Models using dimensionally reduced input images

The second approach is to use dimension reduction techniques to extract key features from our high-dimensional input power-weighted weather maps before combining them with price and other time features for training a model (Teste et al., Reference Teste, Makowski, Bazzi and Ciais2024). Several dimension reduction methods exist, ranging from empirical orthogonal functions, widely used in the earth sciences community, to autoencoder based on deep network architectures. These methods enable us to reduce the dimension of the input space while providing rich features. In this work, we focused on PCA and optimized the number of principal components as any other model hyperparameter. After obtaining the principal components that behave as time series, we applied the same models as for the spatial average: tree-based models, GAM, and NN.

Models using images as input

The third approach consists of building models capable of directly ingesting the power-weighted weather maps alongside price and temporal features. Here, we used a CNN architecture, previously shown to be capable in image classification, segmentation, or regression tasks, even though they are now slowly being replaced by better performing ViT (Keisler and Naour, Reference Keisler and Naour2025).

3.2. Train, validation, and test subsets

We split our dataset into a training and a test subset for the evaluation of model performance. As our data is time-dependent, power production changed throughout the years, mainly due to openings of new facilities. We chose the period from January 1, 2012, to December 31, 2022, to be the train set and January 1, 2023, to December 31, 2023, to be the test set. Nonetheless, hyperparameter tuning is a key step of model development as it often makes the difference between poor and high-performing models. To perform hyperparameter optimization (HPO) we can use different CV methods as well as different optimization frameworks. To ensure the robustness of our model selection procedure, we chose to keep a validation set dedicated to the investigation of cross-validation and optimization methods. This validation set spans the period from January 1, 2022, to December 31, 2022. After choosing a proper model selection and HPO procedure, it is included in the train set for final HPO and model calibration before evaluation on the test set, as described later.

3.3. Cross-validation and HPO

Cross-validation is used to approximate the generalization error, that is, the error of the trained model exposed to new unseen data (Hyndman and Athanasopoulos, Reference Hyndman and Athanasopoulos2018). Different techniques are used for splitting the training set into a new training set to train the model and a new left-out test set to evaluate its performance for computing the approximated generalization error. This step is usually combined with HPO to select the best set of hyperparameters for a given model architecture. Selecting the best-suited calibration procedure is a complicated process (Arlot and Celisse, Reference Arlot and Celisse2009; Bergstra et al., Reference Bergstra, Bardenet, Bengio and Kegl2011), and we explain later the proposed optimization scheme.

3.3.1. Procedures inspected

Our data are time-dependent because our target is a power supply time series. Different studies investigated which cross-validation procedure was best suited in this case (Tashman, Reference Tashman2000; Bergmeir and Benítez, Reference Bergmeir and Benítez2012; Cerqueira et al., Reference Cerqueira, Torgo and Mozetic2019). However, the scope of those studies was mainly synthetic and stationary, not to mention small, that is, a few hundred points, time series. Another major limitation is that even though real datasets were used, those modeling approaches involved lagged values of the target time series as predictors, which were excluded in our case. Therefore, we chose to study different cross-validation procedures and HPO algorithms to guide the choices for the calibration of our models. We did these experiments using only the models based on spatial averages of input weather images. The following cross-validation procedures were used:

  • Hold-out: Split the training set into a train set and a test set.

  • K-fold: Split the training set into $ K $ folds. At each iteration, a fold is chosen to be the test set while the $ K-1 $ others form the train set. Iterate until all folds were used as test once. After all the iterations, the approximated generalization error is taken to be the average of the error made on each test fold.

  • Expanding: Split the training set into $ K $ folds following the order of the samples. During the $ {i}^{th} $ iteration, the first $ i $ folds are used as the train set and the $ i+1 $ fold is used as the test. Repeat until the entire training set has been used. After all the iterations, the approximated generalization error is taken to be the average of the errors made on each test fold.

  • Sliding: Split the training set into $ K $ folds following the order of the samples. During the ith iteration, the $ i $ fold is used as the train set, and the $ i+1 $ fold is used as the test. Repeat until the entire training set has been used. After all the iterations, the approximated generalization error is taken to be the average of the errors made on each test fold.

  • Blocking: Choose a block length $ l $ based on the temporal structure to conserve most of the correlation between neighboring samples. Split the training set into blocks of length $ l $ . Attribute blocks to the train or test set at random (inspired by Wood, Reference Wood2024).

Figure 5 shows the scheme of these five cross-validation methods. We split the data into a 1-year test set for the Hold-Out method, 10 splits to get yearly folds for every method using folds and blocks of 7 days for the blocking method. The block size was chosen to keep most of the temporal structure using autocorrelation and partial autocorrelation analysis. We also considered the shuffling variants of the K-fold and hold-out methods, which involve randomly shuffling the samples before the folds or subset attributions.

Figure 5. Different cross-validation procedures considered in this work represented schematically. For Hold-Out and K-Fold, only the method without prior random shuffling is represented.

Regarding hyperparameter optimization, we compared two optimization algorithms: Random search and Bayesian search using Gaussian Processes (Bergstra et al., Reference Bergstra, Bardenet, Bengio and Kegl2011; Bischl et al., Reference Bischl, Binder, Lang, Pielok, Richter, Coors, Thomas, Ullmann, Becker, Boulesteix, Deng and Lindauer2023).

To assess the impacts of cross-validation and HPO for different model architectures, we repeated the experiments using three models: a random forest, a tree-based boosting scheme (XGBoost), and a feed-forward neural network or MLP. In total, this led to 7 cross-validations × 2 HPO × 3 models estimators of the generalization error. At first glance, one might think that cross-validation procedures that respect the temporal order of the data are best suited to our approach. Still, we wanted to make an informed decision by doing the experiments. Our final goal is to choose the pairs of cross-validation techniques and HPO algorithms that give the “best” estimator of the generalization error. Here, best refers to different criteria ranging from the precision of the generalization error estimate to the computational resource usage.

3.3.2. Cross-validation experiments

As cross-validation’s main goal is to obtain an approximation of the generalization error $ \hat{\varepsilon} $ , we monitored how far the estimate was from the real error. To do so, we recorded for each of the 100 optimization iterations the test error made during cross-validation on the training part of the data for a given set of hyperparameters. Then, we compared it to the real generalization error $ \varepsilon $ made on the validation set. Here, the training and validation part refers to the one visible in Figure 1. Since we are dealing with a regression task, the error $ \varepsilon $ was taken to be the root-mean-squared error (RMSE) of the modeled and observed daily power production. See Appendix B for metrics definition. Our target being a power production daily time series, the unit of RMSE is MW. Given the real generalization error $ \varepsilon $ and its estimate $ \hat{\varepsilon} $ from cross-validation, for each procedure, we computed the difference between the two quantities as $ \Delta \varepsilon =\varepsilon -\hat{\varepsilon} $ and analyzed the average $ \overline{\Delta \varepsilon } $ and its standard deviation $ \sigma \left(\Delta \varepsilon \right) $ across the HPs. We also determined the optimum value of $ \hat{\varepsilon} $ reach after optimization and compared it with the real error in $ \Delta {\varepsilon}_{min} $ .

During the experiments, we monitored the time taken to perform one iteration and the permutation feature importance of each feature obtained during cross-validation compared to the one obtained on the validation set. These times of computation tell us how costly each error estimation method was. The feature importance tells us if the cross-validation technique impacted the interpretability of the model. Last, we experimented with different dataset sizes to inspect the influence of data size on cross-validation methods since the literature only deals with small sample sizes. As the dataset size increases, older and older data are utilized for training. Computation times can be found in Table 1 and results for random forest on solar are presented in Figures 6 and 7. Results for other models on solar are in Appendix C and on wind in Appendix D, Figures D1–D6. Results about permutation feature importance showed that despite the different cross-validation methods, the ranking of the features stayed the same for the different hyperparameter combinations explored, meaning that the method does not impact the model interpretability.

Table 1. Average and standard deviation of computing times for 1 iteration for each cross-validation method in seconds

Note. The (S) indicates the shuffling variant of the method. Medals indicate the top three fastest methods for each model and dataset.

Figure 6. Results of different cross-validation techniques for random forest on solar. Each axis represents a monitored quantity for a given HPO optimization procedure. The values for each method are plotted as points, and only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure 7. Robustness of cross-validation procedure regarding the dataset size for random forest on solar. The marker indicates the average $ \mid \Delta \varepsilon \mid $ , while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

On the radar chart of Figure 6, we can see that $ \Delta \varepsilon $ is positive on average and for the optimum. This means that our generalization error estimates $ \hat{\varepsilon} $ is lower than the real error $ \varepsilon $ . In other words, the cross-validation tends to overestimate the model performance leading to overconfidence in the model. We can also see that methods that do not preserve the chronological order or shuffling perform worse than those that do. Specifically, hold-out, expanding, and sliding lead to the closest estimate on average and the optimum for both searches. However, sliding is the most sensitive to the set of hyperparameters as its variability $ \sigma \left(\Delta \varepsilon \right) $ is the highest. This might stem from its small training set size which never exceeds 1 year of data. This is also confirmed by the error bars of Figure 7. This same figure shows that increasing the dataset size by appending older and older data leads to a slight increase in $ \mid \Delta \varepsilon \mid $ meaning that our generalization error estimate is moving away from the real one. This is because older data such as 2012 carry less meaningful information than more recent data such as 2020 for predicting the validation set which is the year 2022. This behavior also explains why some methods display an inflection point for a certain dataset size meaning that there is an optimum past period of time to consider to make better predictions on the validation set.

The same conclusions hold for boosting and feed-forward neural networks on the solar dataset (see Figures C1C4). It is worth mentioning that the neural network shows a high variability and a high $ \Delta \varepsilon $ for the Bayesian search HPO, suggesting that this algorithm might not be the best for optimizing neural network hyperparameters. For the Wind dataset (see Appendix D, Figures D1D6), hold-out, sliding, and expanding methods are the best methods to estimate the generalization error for all three model architectures. Yet, we can see for the random forest and boosting models that increasing the dataset size with older data does help better approximate the generalization error with the expanding and sliding methods. This means that in the wind dataset, older data still carry meaningful information for predicting the most recent validation set, even if there is a pronounced annual trend in the wind power production time series (see Figure 2).

Finally, Table 1 shows that cross-validation procedures involving folds are more computationally intensive per iteration, as one can expect. Combined with the previous graphs we can conclude that the longer computing times arising from the use of K-fold methods are not worth it since hold-out and sliding are better performers and between 5 and 10 times faster to compute per iteration.

From the result of those experiments testing different cross-validations, with different HPO and different model architectures we were able to make recommendations on how to choose a model selection procedure when dealing with time series to time series forecasting from covariates. We found that dedicated procedures that keep the chronological order during cross-validation perform better than standard K-fold or shuffled hold-out. Depending on the model architecture and the underlying data, some techniques tend to overestimate or underestimate model performance leading to underconfidence or overconfidence in our model. This systematic work could be extended to deep learning models that directly ingest images as inputs, to also get recommendations to push their performance even further.

4. Benchmark results and discussion

In this section, we present the results of our calibrated models on the training + validation set and evaluated on the test set. The best hyperparameters for each model were selected from the best generalization error, based on experiments from the previous section, that is, using Bayesian search with either an expanding or hold-out cross-validation method, depending on the model complexity, to save computing time. Expanding was preferred over sliding cross-validation due to the high sensitivity of sliding to hyperparameter sets. We assessed the performance of the model using the RMSE, mean absolute error (MAE), mean absolute percentage error (MAPE), normalized root-mean-squared error (nRMSE), and R2 score (R2). The definitions of these metrics are given in Appendix B. Table 2 contains all our results on the solar dataset, while results for wind can be found in Appendix E, Table E1.

Table 2. Benchmark results for different models using three different modeling approaches on the solar dataset

Note. Medals indicate the top three best-performing models on the test set for each metric.

As nondispatchable renewables capacity increased throughout our study period, solar and wind power production time series have an increasing trend from 2012 to 2023 as highlighted by Figure 2. This trend requires the models to be able to extrapolate on the test set. Despite reaching state-of-the-art performance in many tasks, tree-based models such as random forest and boosting are known to face difficulties when it comes to extrapolation outside of the training domain (Hengl et al., Reference Hengl, Nussbaum, Wright, Heuvelink and Graeler2018; Malistov and Trushin, Reference Malistov and Trushin2019). Our case makes no exception, despite low errors on the train set, random forest, and boosting models errors soared on the test set (see Tables 2 and E1). To address this issue, many research works propose alternatives such as stochastic or linear trees (Gama and Brazdil, Reference Gama and Brazdil1999; Zhang et al., Reference Zhang, Nettleton and Zhu2019; Numata and Tanaka, Reference Numata and Tanaka2020; Ilic et al., Reference Ilic, Görgülü, Cevik and Baydoğan2021; Raymaekers et al., Reference Raymaekers, Rousseeuw, Verdonck and Yao2024). We chose to apply two different methods to try to solve this extrapolation problem: linear trees and detrending of the time series.

Our detrending scheme consisted of applying a trend estimation method, such as seasonal trend decomposition using loess, on the entire dataset. Once the trend is estimated, we remove it from the data. The transformed data were thus passed to the model for calibration. The predictions were obtained by reconstruction from the model’s output and trend estimate. The detrending was done on both weather input and power output data, as the weighting scheme introduced trends in the covariates.

Linear trees did not seem to be a silver bullet on the solar dataset as their performance was only marginally better for the forest and worse in the case of boosting. In contrast, for the wind dataset, they prove to be useful in enhancing the extrapolation performance. However, their performance was still far from the tree-based models predicting detrended power supply from detrended weather averages before reconstructing the proper production time series. Despite the error induced by the trend estimation and reconstruction step, this method displays some of the best results on both solar and wind within the spatial average method and even outside. Such behavior could be expected because the trend is estimated on the whole dataset. The extrapolation problem is weaker for GAM and MLP as they manage to better grasp the trend, achieving better performance on the test set.

Compared with the spatial input averaging approach, using tree-based models with PCA did not achieve better performance due to the extracted principal components exhibiting the same trend as the spatial averages. This time, we only applied linear trees, as detrending principal components was more challenging. They exhibited a small improvement on the solar dataset but a bigger decrease in performance when used to predict wind power supply. Combining PCA with GAM does not seem to improve performance on both datasets. For MLP, it depends on the sector, but one thing that we noticed after our calibration is that networks combined with PCA are deeper than networks without it, meaning that it requires more layers to extract meaningful information from the principal components.

Although the increase in complexity between dimension reduction and spatial average approach did not lead to clear improvements in model performance for every model architecture, leveraging the entire weather maps with a more complex computer vision architecture, such as a CNN clearly did. This phenomenom stems from the unsupervised nature of the PCA compared to the supervised CNN. In fact, the CNN is the best-performing model on the wind dataset and the second-best on the solar dataset. By utilizing our spatiotemporal weighting scheme, the CNN has a better grasp of the trends in renewables implementation, as highlighted in Figure 8, and avoids extrapolation difficulties. Combined with the MLP results, it highlights the versatility and suitability of neural network-based models for predicting power production from renewable sources.

Figure 8. Power capacity, occlusion attribution, and regional realized power supply for early and late 2023 for Wind. Occlusion is an interpretation method that hides part of the input and sees how it impacts the CNN prediction. The higher the impact is, the higher the hidden part’s importance (Zeiler and Fergus, Reference Zeiler and Fergus2014). Power supply data are obtained from RTE for all of France’s regions (NUTS1).

Tables 2 and E1 illustrate the challenges that tree-based models face with extrapolation. Without the detrending scheme, these models would not rank among the top three performers. Instead, neural networks would dominate the podium, with the rankings reflecting the increasing complexity of the modeling approaches. Specifically, as models incorporate more spatially explicit data, their performance improves, with vision models outperforming MLPs combined with PCA, which in turn surpass MLPs on time series. Therefore, we recommend that practitioners incorporate spatial information when designing forecasting models.

The work conducted on cross-validation procedures and HPO schemes allowed us to push state-of-the-art machine learning architecture to their best performance. However, such a study could be extended to include deep learning models such as CNN to improve their performance. As deep convolutional neural networks are already amongst the best models for both solar and wind, we did not pursue this path. However, it is worth mentioning that a systematic study would benefit deep learning models and strengthen their edge.

5. Conclusion

This study presented datasets and tested a modeling framework based on machine learning and climate as well as facility locations as an input for predicting daily solar and wind supply at the country level in France. Several different machine learning models with different complexities were applied to create a benchmark. Attention was paid to the methods used for calibrating the model to avoid displaying overconfident metrics. The method proposed was applied over France and could be extended to any other country or region.

Our model calibration experiments showed that there is no “silver bullet” model, as it is dependent on the data and the model at hand. Under- or overconfidence can arise depending on the calibration, leading to desillusions if the model is chosen to be run in operations based on the calibration results. Thus, a thorough validation procedure and analysis are required to avoid such phenomena and improve the production launch. Still, some general recommendations can be made towards preferring cross-validation methods, keeping the temporal structure of the data intact, as they are both more computationally efficient and less biased, leading to more robust models.

Trying to model renewable power supply from weather inputs without including the power capacity at facility locations in the inputs is pointless, as some state-of-the-art already failed to correctly model the trend with this added information. Models that are able to ingest the entire high-dimensional weather input can learn from spatial patterns to achieve better predictions, improving the forecasts. This means that being spatially explicit in both the data curation and preparation, as well as in the modelling process, is key to achieving good predictions. Therefore, we encourage other practitioners to include geospatial data in their framework. However, one must bear in mind that power capacity inventories are not available everywhere and can be of different quality depending on the data source.

In summary, geospatial weather information is key for renewable energy forecasting. By providing an open dataset and benchmark, we hope to foster research and improve comparison between studies.

Open peer review

To view the open peer review materials for this article, please visit http://doi.org/10.1017/eds.2025.10021.

Author contribution

Conceptualization: E.L., Y.G., P.C.; Methodology: E.L., Y.G.; Data curation: E.L. Data visualization: E.L. Supervision: Y.G, P.C. Writing original draft: E.L.; Writing review & editing: E.L., Y.G., P.C. All authors approved the final submitted draft.

Competing interests

The authors declare none.

Data availability statement

The datasets built for this work can be accessed https://doi.org/10.5281/zenodo.14287949.

Ethics statement

The research meets all ethical guidelines, including adherence to the legal requirements of the study country.

Funding statement

This research was supported by a grant from the Association Nationale de la Recherche et de la Technologie (ANRT) No. 2024/0010.

A. Appendix A: Weather variables

Table A1. Description of climate variables

Source: ERA5 Documentation (https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation).

Note. There are 110,808 hourly weather observations spanning 4383 days with a 35 $ \times $ 51 grid for each time step.

B. Appendix B: Metrics Definition

(B.1) $$ \mathrm{MAE}=\frac{1}{N}\sum \limits_i^N\mid {y}_i-{\hat{y}}_i\mid $$
(B.2) $$ \mathrm{MAPE}=100\times \frac{1}{N}\sum \limits_i^N\mid \frac{y_i-{\hat{y}}_i}{y_i}\mid $$
(B.3) $$ \mathrm{nRMSE}=100\times \frac{\sqrt{\frac{1}{N}{\sum}_i^N{\left({y}_i-{\hat{y}}_i\right)}^2}}{y_{max}-{y}_{min}} $$
(B.4) $$ \mathrm{R}2=1-\frac{\sum_i^N{\left({y}_i-{\hat{y}}_i\right)}^2}{\sum_i^N{\left({y}_i-\overline{y}\right)}^2} $$

where $ {y}_{max} $ , $ {y}_{min} $ , and $ \overline{y} $ are the maximum, minimum, and the average of the true target $ y $ , respectively.

C. Appendix C: Cross-validation experiment results for solar

C.1. Boosting

Figure C1. Results of different cross-validation techniques for boosted trees on solar. Only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure C2. Robustness of cross-validation procedure regarding dataset size for boosted tress on Solar. The marker indicates the average $ \mid \Delta \varepsilon \mid $ , while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

C.2. Feed-forward neural network (MLP)

Figure C3. Results of different cross-validation techniques for feed-forward neural network on solar. Only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure C4. Robustness of cross-validation procedure regarding dataset size for feed-forward neural network on solar. The marker indicates the average $ \mid \Delta \varepsilon \mid $ , while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

D. Appendix D: Cross-validation experiment results for wind

D.1. Random forest

Figure D1. Results of different cross-validation techniques for random forest on wind. Only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure D2. Robustness of cross-validation procedure regarding dataset size for random forest on wind. The marker indicates the average $ \mid \Delta \varepsilon \mid $ , while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

D.2. Boosting

Figure D3. Results of different cross-validation techniques for boosted trees on wind. Only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure D4. Robustness of cross-validation procedure regarding dataset size for boosted trees on wind. The marker indicates the average $ \mid \Delta \varepsilon \mid $ , while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

D.3. Feed-forward neural network (MLP)

Figure D5. Results of different cross-validation techniques for feed-forward neural network on wind. Only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure D6. Robustness of cross-validation procedure regarding dataset size for feed-forward neural network on wind. The marker indicates the average $ \mid \Delta \varepsilon \mid $ , while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

E. Appendix E: Benchmark results for wind

Table E1. Benchmark results for different models using three different modeling approaches on the wind dataset

Note. Medals indicate the top three best-performing models on the test set for each metric.

F. Appendix F: Comparison with ENTSO-E day-ahead forecasts

In the literature of renewable energy forecasting, most of the studies use numerical weather predictions, that is, forecasted weather, as inputs to the models, and mainly focus on a local scale, such as a single solar or wind farm. In this work, we used re-analysis ERA5 data as the weather inputs, which do not account for the weather forecasting error, and we directly predict the supply at the regional scale without any lags. These aspects make the comparison with other work difficult. However, we provide in Table F1 a comparison of the spatially explicit CNN results with the ENTSO-E day-ahead forecasts for wind and solar generation in France.Footnote 6 The day-ahead forecasts available on ENTSO-E are sourced from each TSO, and since they are run operationally, they must use numerical weather forecasts. Since the available forecast data granularity is hourly, we aggregated it to daily forecasts for the sake of comparison. We can see that our approach, combined with the use of re-analysis data, improved the forecasts by 18% on solar and around 20% on wind.

Table F1. Comparison of ENTSO-E day ahead renewable Forecast performance for France with our model forecast performance in 2023 (test set)

Note. The hourly ENTSO-E forecasts were aggregated to daily to match our work’s granularity.

G. Appendix G: Sensitivity of CNN model to Gaussian noise applied to the weather inputs

Since this study’s weather aspect is based on re-analysis data and not forecasted data, we study the degradation of the CNN model performance when mimicking weather forecasts as inputs. To do so, a Gaussian white noise without any correlation between the different weather variables is added to each weather map. The noise level is controlled, and the results of the performance degradation are reported in Table G1. It is worth mentioning that adding the same noise level to all the weather predictors does not translate into the same error for every weather variable. Even though this analysis is simple, we can notice that the solar model is less sensitive to the noise added than the Wind model. When looking at the score/metrics given by European Center for Medium-range Weather Forecast for their forecasts in their reference document,Footnote 7 we can see in Figure 26 that the RMSE for wind at 10 m is less than $ 0.5\;\mathrm{m}\;{\mathrm{s}}^{-1} $ (around $ 0.25\;\mathrm{m}\;{\mathrm{s}}^{-1} $ ) for 60- and 72-hour ahead forecasts. Our range of wind speed values is between −14 and $ 14\;\mathrm{m}\;{\mathrm{s}}^{-1} $ , with an average of 3– $ 4\mathrm{m}\;{\mathrm{s}}^{-1} $ , where the wind turbines are located. This would mean an error on forecast variables of around 5-10%, which would lead to a decrease of 10–40% of our predictions for a “fake” 3-day-ahead forecast.

Table G1. Comparison of our model performance when adding Gaussian noise to the weather inputs to mimic weather forecast data

Note. The RMSE is computed on 2023 (test set). The relative change compares the metric with the noise to the metric without. Negative values mean improvement. The experiment was repeated 100 times before computing the mean and a 95% empirical confidence interval.

Footnotes

This research article was awarded Open Data badge for transparent practices. See the Data Availability Statement for details.

1 RTE $ {\mathrm{eCO}}_2\mathrm{mix} $ website, available at https://www.rte-france.com/en/eco2mix (accessed 19 September 2024).

2 Resolutions might change for 2023 in future releases. Current resolutions and types of data are given for September 2024 release.

3 Dataset used can be retrieved from ORE website, available at https://opendata.agenceore.fr/pages/accueil/.

4 This database can be found on the French government Open-Data platform, available at https://data.enseignementsup-recherche.gouv.fr/explore/dataset/fr-esr-referentiel-geographique/export/.

5 Transparency Platfor Transparency Platform, available at https://transparency.entsoe.eu/.

6 Generation Day-Ahead Forecasts for wind and solar can be accessed at: https://transparency.entsoe.eu/.

References

Abdul Baseer, M, Almunif, A, Alsaduni, I and Tazeen, N (2023) Electrical power generation forecasting from renewable energy systems using artificial intelligence techniques. Energies 16, 6414.10.3390/en16186414CrossRefGoogle Scholar
Ahmad, SK and Hossain, F (2020) Maximizing energy production from hydropower dams using short-term weather forecasts. Renewable Energy 146, 15601577.10.1016/j.renene.2019.07.126CrossRefGoogle Scholar
Alcañiz, A, Lindfors, AV, Zeman, M, Ziar, H and Isabella, O (2023). Effect of climate on photovoltaic yield prediction using machine learning models. Global Challenges, 7(1):2200166.10.1002/gch2.202200166CrossRefGoogle ScholarPubMed
Arlot, S and Celisse, A (2009) A survey of cross validation procedures for model selection. Statistics Surveys 4, 4079.Google Scholar
Bellinguer, K, Mahler, V, Camal, S and Kariniotakis, G. (2020). Probabilistic Forecasting of Regional Wind Power Generation for the EEM20 Competition: a Physics-oriented Machine Learning Approach. In 17th European Energy Market Conference, EEM 2020. Stockholm, Sweden: KTH, IEEE.Google Scholar
Bergmeir, C and Benítez, JM (2012). On the use of cross-validation for time series predictor evaluation. Information Sciences, 191:192213.10.1016/j.ins.2011.12.028CrossRefGoogle Scholar
Bergstra, J, Bardenet, R, Bengio, Y and Kegl, B (2011) Algorithms for hyper-parameter optimization. Neural Information Processing Systems Proceedings (NeurIPS) 24, 19.Google Scholar
Biber, A, Felder, M, Wieland, C and Spliethoff, H (2022). Negative price spiral caused by renewables ? Electricity price prediction on the german market for 2030. The Electricity Journal, 35(8):107188.10.1016/j.tej.2022.107188CrossRefGoogle Scholar
Bilendo, F, Meyer, A, Badihi, H, Lu, N, Cambron, P and Jiang, B (2023) Applications and modeling techniques of wind turbine power curve for wind farms: A review. Energies 16(1), 180.10.3390/en16010180CrossRefGoogle Scholar
Bischl, B, Binder, M, Lang, M, Pielok, T, Richter, J, Coors, S, Thomas, J, Ullmann, T, Becker, M, Boulesteix, A, Deng, D and Lindauer, M (2023) Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. WIREs Data Mining and Knowledge Discovery 13(2), e1484.10.1002/widm.1484CrossRefGoogle Scholar
British Petroleum (BP) (2024). Energy Outlook 2024: Exploring the key trends and uncertainties surrounding the energy transition. Available at https://www.bp.com/en/global/corporate/energy-economics/energy-outlook.html (accessed 26 August 2024).Google Scholar
Castillo-Rojas, W, Medina Quispe, F and Hernández, C (2023) Photovoltaic energy forecast using weather data through a hybrid model of recurrent and shallow neural networks. Energies 16(13), 5093.10.3390/en16135093CrossRefGoogle Scholar
Cerqueira, V, Torgo, L and Mozetic, I (2019) Evaluating time series forecasting models: An empirical study on performance estimation methods. Machine Learning 109, 19972028.10.1007/s10994-020-05910-7CrossRefGoogle Scholar
Chatfield, C (1986) Comparative models for electrical load forecasting. Journal of the Royal Statistical Society. Series A (General) 149(3), 272272.10.2307/2981560CrossRefGoogle Scholar
Chen, Q and Folly, K (2018) Wind power forecasting. IFAC-PapersOnLine 51(28), 414419.10.1016/j.ifacol.2018.11.738CrossRefGoogle Scholar
Chen, G, Hu, Q, Wang, J, Wang, X and Zhu, Y (2023) Machine-learning-based electric power forecasting. Sustainability 15(14).Google Scholar
Condemi, C, Casillas-Pérez, D, Mastroeni, L, Jiménez-Fernández, S and Salcedo-Sanz, S (2021) Hydro-power production capacity prediction based on machine learning regression techniques. Knowledge-Based Systems, 222:107012.10.1016/j.knosys.2021.107012CrossRefGoogle Scholar
Couto, A and Estanqueiro, A (2022) Enhancing wind power forecast accuracy using the weather research and forecasting numerical model-based features and artificial neuronal networks. Renewable Energy 201, 10761085.10.1016/j.renene.2022.11.022CrossRefGoogle Scholar
De Giorgi, MG, Congedo, PM and Malvoni, M (2014) Photovoltaic power forecasting using statistical methods: impact of weather data. IET Science, Measurement & Technology 8(3), 9097.10.1049/iet-smt.2013.0135CrossRefGoogle Scholar
De Vita, A, Capros, P, Evangelopoulou, S, Kannavou, M, Siskos, P, Zazias, G, Boeve, S, Bons, M, Winkel, R, Cilhar, J, De Vos, L, Leemput, N and Mandatova, P (2020) Sectoral integration: Long-term perspective in the EU energy system, European Commission Directorate-General for Energy, E3 Modelling, Ecofys, Tractebel, Publications Office. https://data.europa.eu/doi/10.2833/347937.Google Scholar
Dolara, A, Leva, S and Manzolini, G (2015) Comparison of different physical models for pv power output prediction. Solar Energy 119, 8399.10.1016/j.solener.2015.06.017CrossRefGoogle Scholar
Elsaraiti, M and Merabet, A (2022) Solar power forecasting using deep learning techniques. IEEE Access 10, 3169231698.10.1109/ACCESS.2022.3160484CrossRefGoogle Scholar
Engeland, K, Borga, M, Creutin, J-D, François, B, Ramos, M-H and Vidal, J-P (2017) Space-time variability of climate variables and intermittent renewable electricity production – a review. Renewable and Sustainable Energy Reviews 79, 600617.10.1016/j.rser.2017.05.046CrossRefGoogle Scholar
European Commission (2019) The European Green Deal: Striving to be the first climate-neutral continent. Available at https://commission.europa.eu/strategy-and-policy/priorities-2019-2024/european-green-deal_en (accessed 26 August 2024).Google Scholar
Gaillard, P, Goude, Y and Nedellec, R (2016) Additive models and robust aggregation for GEFCom2014 probabilistic electric load and electricity price forecasting. International Journal of Forecasting 32(3), 10381050.10.1016/j.ijforecast.2015.12.001CrossRefGoogle Scholar
Gama, J and Brazdil, P (1999) Linear tree. Intelligent Data Analysis 3(1), 122.10.3233/IDA-1999-3102CrossRefGoogle Scholar
Gijón, A, Pujana-Goitia, A, Perea, E, Molina-Solana, M and Gómez-Romero, J (2023). Prediction of wind turbines power with physics-informed neural networks and evidential uncertainty quantification, arXiv 2307.14675, https://arxiv.org/abs/2307.14675.Google Scholar
Goude, Y, Nedellec, R and Kong, N (2014) Local short and middle term electricity load forecasting with semi-parametric additive models. IEEE Transactions on Smart Grid 5(1), 440446.10.1109/TSG.2013.2278425CrossRefGoogle Scholar
Hengl, T, Nussbaum, M, Wright, M, Heuvelink, G and Graeler, B (2018) Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ, https://doi.org/10.7287/peerj.preprints.26693v2.CrossRefGoogle ScholarPubMed
Hersbach, H, Bell, B, Berrisford, P, Hirahara, S, Horányi, A, Muñoz-Sabater, J, Nicolas, J, Peubey, C, Radu, R, Schepers, D, Simmons, A, Soci, C, Abdalla, S, Abellan, X, Balsamo, G, Bechtold, P, Biavati, G, Bidlot, J, Bonavita, M, De Chiara, G, Dahlgren, P, Dee, D, Diamantakis, M, Dragani, R, Flemming, J, Forbes, R, Fuentes, M, Geer, A, Haimberger, L, Healy, S, Hogan, RJ, Hólm, E, Janisková, M, Keeley, S, Laloyaux, P, Lopez, P, Lupu, C, Radnoti, G, De Rosnay, P, Rozum, I, Vamborg, F, Villaume, S and Thépaut, J (2020) The ERA5 global reanalysis. Quarterly Journal of the Royal Meteorological Society 146(730), 19992049.10.1002/qj.3803CrossRefGoogle Scholar
Hyndman, R and Athanasopoulos, G (2018) Forecasting: Principles and Practice. 2nd Edn. Australia: OTexts.Google Scholar
Iheanetu, KJ (2022) Solar photovoltaic power forecasting: A review. Sustainability 14(24), 17005.10.3390/su142417005CrossRefGoogle Scholar
Ilic, I, Görgülü, B, Cevik, M and Baydoğan, MG (2021) Explainable boosted linear regression for time series forecasting. Pattern Recognition, 120:108144.10.1016/j.patcog.2021.108144CrossRefGoogle Scholar
International Renewable Energy Agency (IRENA) (2020a) Advanced forecasting of variable renewable power generation: Innovation landscape brief. Available at https://www.researchgate.net/profile/Kien-Vu-6/post/Which_techniques_can_be_used_for_renewable_energy_prediction/attachment/6078ac650f39c7000141ebc3/AS%3A1012965440487428%401618521189613/download/IRENA_Advanced_weather_forecasting_2020.pdf (accessed 26 August 2024).Google Scholar
International Renewable Energy Agency (IRENA) (2020b). Renewable Energy Prospects for Central and South-Eastern Europe Energy Connectivity (CESEC). Available at https://www.irena.org/Publications/2020/Oct/Renewable-Energy-Prospects-for-Central-and-South-Eastern-Europe-Energy-Connectivity-CESEC (accessed 26 August 2024).Google Scholar
Keisler, J and Naour, EL (2025) WindDragon: Automated Deep Learning for Regional Wind Power Forecasting, Environmental Data Science 4, e19, https://doi.org/10.1017/eds.2025.10.CrossRefGoogle Scholar
Kim, S-G, Jung, J-Y and Sim, M (2019) A two-step approach to solar power generation prediction based on weather data using machine learning. Sustainability 11(5), 1501.10.3390/su11051501CrossRefGoogle Scholar
Kim, J, Kim, D, Yoo, W, Lee, J and Kim, YB (2017) Daily prediction of solar power generation based on weather forecast information in korea. IET Renewable Power Generation 11(10), 12681273.10.1049/iet-rpg.2016.0698CrossRefGoogle Scholar
Kraskov, A, Stögbauer, H and Grassberger, P (2004) Estimating mutual information. Physical Review E, 69:066138.10.1103/PhysRevE.69.066138CrossRefGoogle ScholarPubMed
Krechowicz, A, Krechowicz, M and Poczeta, K (2022) Machine learning approaches to predict electricity production from renewable energy sources. Energies 15(23), 9146.10.3390/en15239146CrossRefGoogle Scholar
Lim, S-C, Huh, J-H, Hong, S-H, Park, C-Y and Kim, J-C (2022) Solar power forecasting using CNN-LSTM hybrid model. Energies 15(21), 8233.10.3390/en15218233CrossRefGoogle Scholar
Liu, H and Chen, C (2019) Data processing strategies in wind energy forecasting models and applications: A comprehensive review. Applied Energy 249, 392408.10.1016/j.apenergy.2019.04.188CrossRefGoogle Scholar
Liu, L, He, G, Wu, M, Liu, G, Zhang, H, Chen, Y, Shen, J and Li, S (2023a) Climate change impacts on planned supply–demand match in global wind and solar energy systems. Nature Energy 8(8), 870880.10.1038/s41560-023-01304-wCrossRefGoogle Scholar
Liu, L, He, G, Wu, M, Liu, G, Zhang, H, Chen, Y, Shen, J and Li, S (2023b) Climate change impacts on planned supply–demand match in global wind and solar energy systems. Nature Energy 8, 111.10.1038/s41560-023-01304-wCrossRefGoogle Scholar
López Gómez, J, Ogando Martínez, A, Troncoso Pastoriza, F, Febrero Garrido, L, Granada Álvarez, E and Orosa García, JA (2020) Photovoltaic power prediction using artificial neural networks and numerical weather data. Sustainability 12(24), 10295.10.3390/su122410295CrossRefGoogle Scholar
Malistov, A and Trushin, A (2019) Gradient boosted trees with extrapolation. 18th IEEE International Conference On Machine Learning And Applications (ICMLA) 2019, 783789.Google Scholar
Malvoni, M, De Giorgi, M and Congedo, P (2016) Data on photovoltaic power forecasting models for mediterranean climate. Data in Brief 7, 16391642.10.1016/j.dib.2016.04.063CrossRefGoogle ScholarPubMed
Malvoni, M, De Giorgi, MG and Congedo, PM (2017) Forecasting of PV power generation using weather input data-preprocessing techniques. Energy Procedia 126, 651658.10.1016/j.egypro.2017.08.293CrossRefGoogle Scholar
Mayer, MJ and Gróf, G (2021) Extensive comparison of physical models for photovoltaic power forecasting. Applied Energy, 283:116239.10.1016/j.apenergy.2020.116239CrossRefGoogle Scholar
Ministère de la Transition Ecologique (2019) Programmations pluriannuelles de l’énergie (PPE). Available at https://www.ecologie.gouv.fr/politiques-publiques/programmations-pluriannuelles-lenergie-ppe (accessed 26 August 2024).Google Scholar
Ministère de la Transition Ecologique (2020) Stratégie nationale bas-carbone (SNBC). Available at https://www.ecologie.gouv.fr/politiques-publiques/strategie-nationale-bas-carbone-snbc (accessed 26 August 2024).Google Scholar
Numata, K and Tanaka, K (2020) Stochastic threshold model trees: A tree-based ensemble method for dealing with extrapolation. ArXiv, arXiv:2009.09171Google Scholar
Raymaekers, J, Rousseeuw, P, Verdonck, T and Yao, R (2024) Fast linear model trees by pilot. Machine Learning 113, 150.10.1007/s10994-024-06590-3CrossRefGoogle Scholar
Ritchie, H and Rosado, P (2020) Electricity mix. Our World in Data. Available at https://ourworldindata.org/electricity-mix (accessed 26 August 2024).Google Scholar
Ryu, J-Y, Lee, B, Park, S, Hwang, S, Park, H, Lee, C and Kwon, D (2022) Evaluation of weather information for short-term wind power forecasting with various types of models. Energies 15(24), 9403.10.3390/en15249403CrossRefGoogle Scholar
Sharma, N., Sharma, P., Irwin, D., and Shenoy, P. (2011). Predicting solar generation from weather forecasts using machine learning. In 2011 IEEE International Conference on Smart Grid Communications (SmartGridComm), IEEE. pp. 528533.10.1109/SmartGridComm.2011.6102379CrossRefGoogle Scholar
Sweeney, C, Bessa, RJ, Browell, J and Pinson, P (2020) The future of forecasting for renewable energy. WIREs Energy and Environment 9(2), 365.10.1002/wene.365CrossRefGoogle Scholar
Tashman, LJ (2000) Out-of-sample tests of forecasting accuracy: an analysis and review. International Journal of Forecasting 16(4), 437450 The M3- Competition.10.1016/S0169-2070(00)00065-0CrossRefGoogle Scholar
Taylor, JW (2010) Triple seasonal methods for short-term electricity demand forecasting. European Journal of Operational Research 204(1), 139152.10.1016/j.ejor.2009.10.003CrossRefGoogle Scholar
Teste, F, Makowski, D, Bazzi, H and Ciais, P (2024) Early forecasting of corn yield and price variations using satellite vegetation products. Computers and Electronics in Agriculture, 221:108962.10.1016/j.compag.2024.108962CrossRefGoogle Scholar
Tsai, W-C, Hong, C-M, Tu, C-S, Lin, W-M and Chen, C-H (2023) A review of modern wind power generation forecasting technologies. Sustainability 15(14), 10757.10.3390/su151410757CrossRefGoogle Scholar
United Nations Convention on Climate Change (2015) Paris Agreement: Climate Change Conference (COP21). Available at https://unfccc.int/documents/184656 (accessed 26 August 2024).Google Scholar
Vladislavleva, K, Friedrich, T, Neumann, F and Wagner, M (2013) Predicting the energy output of wind farms based on weather data: Important variables and their correlation, Renewable Energy, 50, 236243.10.1016/j.renene.2012.06.036Google Scholar
Wang, H, Lei, Z, Zhang, X, Zhou, B and Peng, J (2019a) A review of deep learning for renewable energy forecasting. Energy Conversion and Management, 198:111799.10.1016/j.enconman.2019.111799CrossRefGoogle Scholar
Wang, J, Zhong, H, Lai, X, Xia, Q, Wang, Y and Kang, C (2019b) Exploring key weather factors from analytical modeling toward improved solar power forecasting. IEEE Transactions on Smart Grid 10(2), 14171427.10.1109/TSG.2017.2766022CrossRefGoogle Scholar
Wood, SN (2024) On neighbourhood cross validation, ArXiv, https://arxiv.org/abs/2404.16490.Google Scholar
Wood, SN, Goude, Y and Shaw, S (2014) Generalized Additive Models for Large Data Sets. Journal of the Royal Statistical Society Series C: Applied Statistics 64(1), 139155.10.1111/rssc.12068CrossRefGoogle Scholar
Yasuda, Y, Bird, L, Carlini, EM, Eriksen, PB, Estanqueiro, A, Flynn, D, Fraile, D, Gómez Lázaro, E, Martín-Martínez, S, Hayashi, D, Holttinen, H, Lew, D, McCam, J, Menemenlis, N, Miranda, R, Orths, A, Smith, JC, Taibi, E and Vrana, TK (2022) C-e (curtailment – energy share) map: An objective and quantitative measure to evaluate wind and solar curtailment. Renewable and Sustainable Energy Reviews, 160:112212.10.1016/j.rser.2022.112212CrossRefGoogle Scholar
Zeiler, MD and Fergus, R (2014) Visualizing and understanding convolutional networks, Computer Vision: ECCV 2014, ECCV 2014, Lecture Notes in Computer Science, vol 8689, 818833.10.1007/978-3-319-10590-1_53Google Scholar
Zhang, H, Nettleton, D and Zhu, Z (2019) Regression-enhanced random forests, arXiv:1904.10416.Google Scholar
Zhong, Y-J and Wu, Y-K (2020) Short-term solar power forecasts considering various weather variables. In 2020 International Symposium on Computer, Consumer and Control (IS3C). IEEE, pp. 432435.10.1109/IS3C50286.2020.00117CrossRefGoogle Scholar
Zhou, H, Qiu, Y, Feng, Y and Liu, J (2022) Power prediction of wind turbine in the wake using hybrid physical process and machine learning models. Renewable Energy 198, 568586.10.1016/j.renene.2022.08.004CrossRefGoogle Scholar
Figure 0

Figure 1. Global framework of this study represented schematically.

Figure 1

Figure 2. Power supply and capacity time series for wind and solar in France for the period of interest. The power capacity curves have been smoothed to a yearly resolution.

Figure 2

Figure 3. Illustration of power-weighted weather maps creation for wind.

Figure 3

Figure 4. Representation of the three modeling approaches used in this work to make use of weather maps.

Figure 4

Figure 5. Different cross-validation procedures considered in this work represented schematically. For Hold-Out and K-Fold, only the method without prior random shuffling is represented.

Figure 5

Table 1. Average and standard deviation of computing times for 1 iteration for each cross-validation method in seconds

Figure 6

Figure 6. Results of different cross-validation techniques for random forest on solar. Each axis represents a monitored quantity for a given HPO optimization procedure. The values for each method are plotted as points, and only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure 7

Figure 7. Robustness of cross-validation procedure regarding the dataset size for random forest on solar. The marker indicates the average $ \mid \Delta \varepsilon \mid $, while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

Figure 8

Table 2. Benchmark results for different models using three different modeling approaches on the solar dataset

Figure 9

Figure 8. Power capacity, occlusion attribution, and regional realized power supply for early and late 2023 for Wind. Occlusion is an interpretation method that hides part of the input and sees how it impacts the CNN prediction. The higher the impact is, the higher the hidden part’s importance (Zeiler and Fergus, 2014). Power supply data are obtained from RTE for all of France’s regions (NUTS1).

Figure 10

Table A1. Description of climate variables

Figure 11

Figure C1. Results of different cross-validation techniques for boosted trees on solar. Only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure 12

Figure C2. Robustness of cross-validation procedure regarding dataset size for boosted tress on Solar. The marker indicates the average $ \mid \Delta \varepsilon \mid $, while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

Figure 13

Figure C3. Results of different cross-validation techniques for feed-forward neural network on solar. Only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure 14

Figure C4. Robustness of cross-validation procedure regarding dataset size for feed-forward neural network on solar. The marker indicates the average $ \mid \Delta \varepsilon \mid $, while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

Figure 15

Figure D1. Results of different cross-validation techniques for random forest on wind. Only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure 16

Figure D2. Robustness of cross-validation procedure regarding dataset size for random forest on wind. The marker indicates the average $ \mid \Delta \varepsilon \mid $, while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

Figure 17

Figure D3. Results of different cross-validation techniques for boosted trees on wind. Only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure 18

Figure D4. Robustness of cross-validation procedure regarding dataset size for boosted trees on wind. The marker indicates the average $ \mid \Delta \varepsilon \mid $, while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

Figure 19

Figure D5. Results of different cross-validation techniques for feed-forward neural network on wind. Only the worst and best values for each axis are printed. The (S) indicates the shuffling variant of the method.

Figure 20

Figure D6. Robustness of cross-validation procedure regarding dataset size for feed-forward neural network on wind. The marker indicates the average $ \mid \Delta \varepsilon \mid $, while the error bars display the standard deviation. The (S) indicates the shuffling variant of the method.

Figure 21

Table E1. Benchmark results for different models using three different modeling approaches on the wind dataset

Figure 22

Table F1. Comparison of ENTSO-E day ahead renewable Forecast performance for France with our model forecast performance in 2023 (test set)

Figure 23

Table G1. Comparison of our model performance when adding Gaussian noise to the weather inputs to mimic weather forecast data

Author comment: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R0/PR1

Comments

Cover Letter

Eloi LINDAS

Commissariat à l’Energie Atomique (CEA)/ Laboratoire des Sciences du Climat et de l’Environnement (LSCE)

Orme des Merisiers, Bat 714, 91190 Saint-Aubin

06/12/2024

Dear Editors,

I am writing to submit our manuscript titled “Towards Accurate Forecasting of Renewable Energy: Building Datasets and Benchmarking Machine Learning Models for Solar and Wind Power in France” for consideration in Environmental Data Science as an Application paper.

Our work addresses a challenge in the transition to renewable energy: accurate forecasting of regional solar and wind power supply. Using over a decade of spatially resolved weather and production data, we developed a dataset and benchmarked state-of-the-art machine learning models across 3 modeling approaches. Our findings give insights and recommendations on how to select a cross-validation procedure to estimate model generalization error as precisely as possible. The work also demonstrates the effectiveness of vision-based models in capturing complex spatial relationships, significantly enhancing forecasting accuracy at a national scale for France.

This study contributes to the journal’s focus on data-driven approaches for sustainable decision-making with:

1. A dataset that integrates spatially explicit weather, generation, and market data spanning 2012–2023.

2. Insights for practitioners, such as recommendations for cross-validation methods tailored to time series forecasting.

3. A benchmark of state-of-the-art machine learning models, exploring techniques from dimension reduction to computer vision.

We believe this research will interest a wide audience, including data scientists, energy forecasters, and policymakers, as it advances methodologies to support the integration of renewable energy into the grid.

We confirm that this work is original and has not been published elsewhere, nor is it currently under consideration for publication elsewhere. We have no conflicts of interest to disclose and all authors approved this submission.

Thank you for considering our manuscript. We look forward to the possibility of contributing to Environmental Data Science. To address any questions or provide additional materials, please contact me at: eloi.lindas@lsce.ipsl.fr

Sincerely,

Eloi LINDAS (On behalf of the co-authors)

Review: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

TITLE: Towards Accurate Forecasting of Renewable Energy: Building Datasets and Benchmarking Machine Learning Models for Solar and Wind Power in France

Manuscript ID : EDS-2024-0110

Manuscript Type : Application Paper

Summary

In the paper, the authors propose a machine learning approach to national-scale French solar and wind power generation forecasting from geographically explicit weather and production data. The method applies heterogeneous modelling approaches and rigorous validation and demonstrates the superior performance of neural networks compared with traditional approaches. I found this paper interesting and important in the energy sector. However, the manuscript requires some improvements. Some general comments are:

General Comments

1. Provide a list of abbreviations used in the paper.

2. Arrange keywords in alphabetical order.

3. I suggest you put table captions above tables and figure captions below figures.

4. Using MAPE and R^2 as evaluation metrics are usually not recommended when working with renewable energies such as solar and wind energy production. The authors are advised to read the following paper. Hong et al. (2020): Energy Forecasting: A Review and Outlook https://doi.org/10.1109/OAJPE.2020.3029979 See, for example, the section on Common Issues. Instead, the authors can use other evaluation metrics such as MBE, MASE, etc.

5. While achieving error levels comparable to single-plant models is a welcome development, the 4-10% nRMSE for a midterm horizon still suggests room for greater accuracy.

6. Page 14, Table, it appears that combining PCA with models proposed in the study did not improve the performance on the datasets. I suggest that the authors combine PCA with shrinkage methods such as Lasoo. Combining these techniques, such as using PCA for initial dimensionality reduction followed by Lasso for feature selection on the reduced set, can often yield the most robust and accurate time series forecasting models.

7. The fact that neural networks have been found to outperform tree-based models due to the latter’s susceptibility to extrapolation difficulties, which rise with increasing renewable capacity, suggests a need for alternative methods to counter this inherent weakness of tree-based algorithms.

8. In the “Conclusion” section, authors should avoid summarising the aspects they have already stated in the body of the manuscript. Instead, they should interpret their findings at a higher level of abstraction than in the previous sections of the manuscript. The authors should highlight whether or to what extent they have addressed the necessity identified within the “Introduction” section (the identified gap). The authors should avoid restating everything they did once again. However, instead, they should emphasise what their findings mean to the readers, making the “Conclusions” section interesting and memorable to them. The authors should not restate what they have done or what the article does. They should focus instead on what they have discovered and, most importantly, on what their findings mean.

Specific Comments

1. Page 3, line 28, proper referencing is required on Malvoni et al. On line 47 (ARIMA, SARIMAX…), what are the three dots for?

2. Page 5, line 42, change figure 2 to Figure 2.

3. Page 6, line 21, change “grrid” to grid. Attend to all minor typos in the manuscript.

Review: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

study presents a comprehensive methodology for predicting solar and wind power production at country scale in France using machine learning models trained with spatially explicit weather data combined with spatial information about production sites’ capacity.

Authors present three different modeling approaches to handle the weather-gridded data to forecast daily wind and PV power production. What is the reason for selecting these three specific modeling techniques?

It is better to mention the size of data set in table 3.

Recommendation: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R0/PR4

Comments

Both reviewers recognized the relevance of the article and provided only minor comments for improvement. Based on their evaluations, my recommendation is for a minor revision.

Decision: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R0/PR5

Comments

No accompanying comment.

Author comment: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R1/PR6

Comments

Cover Letter

Eloi LINDAS

Commissariat à l’Energie Atomique (CEA)/ Laboratoire des Sciences du Climat et de l’Environnement (LSCE)

Orme des Merisiers, Bat 714, 91190 Saint-Aubin

06/12/2024

Dear Editors,

I am writing to submit our manuscript titled “Towards Accurate Forecasting of Renewable Energy: Building Datasets and Benchmarking Machine Learning Models for Solar and Wind Power in France” for consideration in Environmental Data Science as an Application paper.

Our work addresses a challenge in the transition to renewable energy: accurate forecasting of regional solar and wind power supply. Using over a decade of spatially resolved weather and production data, we developed a dataset and benchmarked state-of-the-art machine learning models across 3 modeling approaches. Our findings give insights and recommendations on how to select a cross-validation procedure to estimate model generalization error as precisely as possible. The work also demonstrates the effectiveness of vision-based models in capturing complex spatial relationships, significantly enhancing forecasting accuracy at a national scale for France.

This study contributes to the journal’s focus on data-driven approaches for sustainable decision-making with:

1. A dataset that integrates spatially explicit weather, generation, and market data spanning 2012–2023.

2. Insights for practitioners, such as recommendations for cross-validation methods tailored to time series forecasting.

3. A benchmark of state-of-the-art machine learning models, exploring techniques from dimension reduction to computer vision.

We believe this research will interest a wide audience, including data scientists, energy forecasters, and policymakers, as it advances methodologies to support the integration of renewable energy into the grid.

We confirm that this work is original and has not been published elsewhere, nor is it currently under consideration for publication elsewhere. We have no conflicts of interest to disclose and all authors approved this submission.

Thank you for considering our manuscript. We look forward to the possibility of contributing to Environmental Data Science. To address any questions or provide additional materials, please contact me at: eloi.lindas@lsce.ipsl.fr

Sincerely,

Eloi LINDAS (On behalf of the co-authors)

Recommendation: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R1/PR7

Comments

The authors have fully addressed the reviewers’ comments, and I am pleased to recommend the manuscript for acceptance. Congratulations

Decision: Toward accurate forecasting of renewable energy: Building datasets and benchmarking machine learning models for solar and wind power in France — R1/PR8

Comments

No accompanying comment.