Sage Journals: Discover world-class research

Abstract

The world is becoming more reliant on renewable energy sources to satisfy its growing energy demand. The primary disadvantage of such sources is their significant uncertainty in power production. As appropriate energy production planning and scheduling necessitate a solid and confident assessment of renewable power production, the necessity for developing reliable prediction models grows by the day. This paper proposes an adaptive approach-based ensemble for 1-day ahead production prediction of solar Photovoltaic (PV) systems. Different ensembles of Artificial Neural Networks (ANNs) prediction models are established, whose architectures (number of the ANNs that comprise the ensembles) and configurations (number of hidden nodes required by the ANNs models of the ensembles) change adaptively at each hour h, h∈ [1, 24] of a day, for accommodating the hour seasonality in the solar PV data and, thus, enhancing the 1 day-ahead predictions accuracy. The suggested approach is tested on a 264 kW solar PV system installed at Applied Science Private University, Jordan. Its prediction performance is evaluated, particularly for different weather conditions (seasons) experienced by the concerned PV system, using standard performance metrics. Results show the effectiveness of the suggested approach in predicting solar PV power production and its superiority compared to another prediction approach of the literature that uses single ANNs at each hour h of the day. Specifically, for 1-day ahead prediction, the obtained enhanced accuracy, on average, was around 8%–10% on the test “unseen” datasets.

Keywords

Adaptive ensemble of models artificial neural network BAGGING optimization photovoltaics power prediction

Introduction

Background

Energy has a significant influence on the life of humans, industry, agriculture, transportation, and communication. In general, the generated electricity from non-renewable resources, such as coal, crude oil, and natural gas, covers most energy demand. On the other hand, it is known that – in most of the developing countries – the renewable energy resources, for example, solar, wind, hydro, geothermal, and biomass, are not exploited sufficiently. Due to the population growth and rising living standards, the energy demand and consumption continuously rise, which represents an economic challenge, especially for the countries with limited energy sources. For example, Spain’s overall consumed electrical energy was 268,808 GWh in 2018, and it increased by 0.4% compared to 2017.¹

Moreover, in the last few years, the cost of energy demand in Jordan has been about 10% of its Gross Domestic Product (GDP). It is known that burning oil, natural gas, and coal cause water and air pollution and can cause serious health issues. Therefore, most countries are directed towards renewable energy resources, as they are clean, economically viable, and sustainable. Hence, the production, storage, and delivery of electricity generated from renewable energy resources have attracted significant attention in the last decade.^2–4

Solar energy has recently been a rapidly developing technology compared to wind and other renewable energy resources. Several factors have a noticeable influence on the electrical power generated from the Photovoltaic (PV) plants, for example, solar irradiation, wind speed, and ambient temperature.⁵ Hence, it is concluded that the amount of electricity generated from the PV modules is variable and intermittent, and it becomes challenging to predict the power production from the plant.^6,7 The involvement of PV systems in the electrical networks as one of the distributed generators created challenges to the operators due to the generation intermittency and the variability of the weather conditions. Therefore, the reliable prediction will reduce the deviations between the PV power plant’s expected and generated power. The accuracy of predicting the PV output power is essential for several applications, such as Smart Grids (SGs),⁸ load management,⁹ the virtual power plant’s reliability,¹⁰ and the charging of electric vehicles.¹¹ Therefore, it is essential to predict the PV generation in the SG application at different time horizons.^7,12 The prediction of PV SG applications can be divided into three types based on the time horizon: the short-term (10–30 min), the middle-term (1-few hours), and the long-term (24 h to 2 weeks).¹³ The short-term prediction significantly limits the voltage rise on the load side and suppresses the grid’s power variation. In contrast, middle-term prediction is essential to balance the power production and load demand besides scheduling the charging of electric vehicles. Finally, long-term prediction plays a vital role in load scheduling and dispatching.

Literature review

Various methods have appeared in-state-of-the art for predicting solar PV generation. These methods can be categorized into statistical, physical, and ensemble methods.^14–16 The statistical methods rely on reconstructing the relations between the former meteorological parameters and hourly irradiance, which does not involve determining the system’s internal state information for modeling.^17,18 Further, the statistical approaches have been classified into time series and learning methods. The time series methods included different methods, such as Kalman filter,¹⁹ Support Vector Regression (SVR),²⁰ Grey Forecasting Method,²¹ Auto-Regressive Integrated Moving Average (ARIMA),²² and Hidden-Markov Models (HMM).²³ The learning methods comprised the Artificial Neural Network (ANN),^24,25 Support Vector Machine (SVM),²⁶ Wavelet Analysis (WA),²⁷ and Fuzzy Logic (FL).²⁸ The ANN is deemed one of the most popular statistical methods adopted to predict the PV generation with a prediction horizon of 24-h ahead.^29,30

For example, Kushwaha and Pindoriya³¹ adopted the Seasonal ARIMA (SARIMA) model for multi-step ahead power production prediction of solar PV systems installed in IIT Gandhinagar University, India. Results showed that the SARIMA outperformed the persistence model, but its performance degraded on cloudy or rainy days, making it not suitable for very short-term prediction in such weather conditions. AlShafeey and Csáki³² investigated the capability and compared the performance of Multiple Linear Regression (MLR) and ANN for 1-day ahead power production prediction considering structural, time-series, and hybrid input data methods for a 546 kWp grid-connected solar PV farm located in Hungary. According to the findings, the models obtained different prediction accuracy depending on the inputs being used.

Further, Machine Learning (ML) was deployed to predict the PV generation; this approach depends on Artificial Intelligence (AI) to learn from previous historical data to strengthen its prediction capabilities during the training stage. Powerful computers are needed to run many iterations before a final prediction can be achieved. It can distinguish impossible representations without any predetermined equations. Besides the previous methods, Recurrent Neural Network (RNN),³³ Feed-Forward Neural Network (FFNN),³⁴ and Feed-Back Neural Network (FBNN) have been deployed to predict the PV generation at various time horizons.³⁵ For example, Kumar et al.³⁶ developed three real-time prediction models, namely the Elman Neural Network, FFNN, and Generalized Regression Neural Network (GRNN), for the short-term power production prediction of a Semi-Transparent PV (STPV) system. The three developed models used the ambient temperature, solar irradiance, and wind speed as the input parameters to forecast the output power for an STPV system in India. The final results revealed a small error between the forecast and actual output power with a Root Mean Square Error (RMSE) of 0.25, 0.30, and 0.426 for the three investigated models, respectively. Sharadga et al.³⁷ presented and compared several time-series statistical (e.g. SARIMA) and AI-based (e.g. Neural Networks (NNs)) models for the hourly solar PV power output prediction of a largescale 20 MW grid-connected solar PV station located in China. For time series prediction of solar PV power production, the results revealed that NNs were more accurate and required less computational effort than statistical models. However, the authors concluded that the NNs and statistical prediction models were both superior for 1-h forecasting without using any meteorological parameters as inputs. Li et al.³⁸ coupled individually two powerful optimization techniques, Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), with the ANN model for the short-term power output prediction of a solar PV plant located in China. Specifically, the two techniques were investigated to verify if they could overcome the restrictions of the classic Back-Propagation (BP) learning algorithm and effectively adjust the ANN’s internal parameters (i.e. weights and biases) to improve prediction outcomes. The PSO-BP outperformed the classic BP-ANN and GA-BP, according to the findings. Moreover, Khademi et al.³⁹ used the Multi-Layer Perceptron (MLP) supported with Artificial Bee Colony (ABC) method to predict the power production of 3.2 kW solar PV plant. The collected data was divided into sunny and cloudy, each being used to build the MLP-ABC model. Results showed that separation of the data into different weather conditions enhanced the production prediction accuracy compared to that obtained by the same model built on the entire dataset.

The Extreme Learning Machine (ELM) is one of the ML prediction and statistical methods identified by its faster training capabilities and simplified implementation than the ANNs. Further details on the ELM method can be found in Ahmed et al.⁴⁰ For instance, the ELM was employed to predict the 24 h-ahead solar PV power production.⁴¹ The developed prediction approach was optimized concerning the number of hidden nodes and historical weather conditions (i.e. solar irradiations and ambient temperatures). Results showed that the ELM model provided slightly accurate predictions with little computational burden than those from the BP-ANN model.

The low accuracy of the prediction is one of the shortcomings that challenged the statistical methods. Therefore, the combination of two or more prediction approaches was proposed to minimize the error and enhance (improve) the prediction accuracy.⁴² This approach is known as the hybrid prediction method (i.e. an ensemble of models). For more details, the readers may refer to Guermoui et al.⁴³ For example, Kaffash and Deconinck⁴⁴ proposed a prediction approach based on the combination of ANN and SVR for 1-day ahead PV power generation prediction without using any exogenous inputs. To this aim, the Random Forest (RF) technique was employed for selecting the relevant historical data (i.e. feature selection). Results showed that the prediction accuracy is improved to each sole base model (ANN and SVR); Khan et al.⁴⁵ proposed an improved generally applicable stacked ensemble algorithm of ANN and Long Short-Term Memory (LSTM) as base models, whose predictions were aggregated using an Extreme Gradient Boosting (XGBoost) algorithm, to enhance the predictability of the solar PV production. Results showed that the proposed ensemble approach outperformed each base model (ANN and LSTM) and Bagging ensemble learning method on two different case studies.

In addition to the statistical and hybrid approaches, the physical approach that was presented in Tiwari et al.⁴⁶, Bacher et al.⁴⁷ and known as Numerical Weather Prediction (NWP) approach was applied to predict the PV generation by solving mathematical equations based on the meteorological variables, that is, temperature, humidity, pressure, and wind. Further, the clouds’ detection and estimating their behavior are used to predict the PV generation, known as the Sky Imaginary Forecasting Method (SIFM). One of the methods used to predict the PV plant’s generation relies on Satellite Images (SI) to trace the cloud motion. Still, this approach lacks high performance due to the clouds’ capability to reform and scatter quickly.^48–53 The SI method can be categorized into global and mesoscale physical methods, depending on the covered area of the atmosphere in the simulation. Generally, the physical method is mainly used for forecasting the PV generation with very short to long time horizons, and one of its disadvantages is the performance, which is reliable if the weather conditions were stable.⁵⁴

Original contributions

This work aims to develop an adaptive prediction approach-based ensemble of data-driven models for solar PV power production prediction. More specifically, the proposed approach is structured in two phases: (i) partitioning the solar PV system dataset into $H = 24$ different datasets, each for 1 h of the day, previously suggested by some of the authors³⁰ and (ii) developing an adaptive prediction approach based on ensembles of ANNs for enhancing the production prediction. It is worth mentioning here that, in particular, the ANN is employed as a base prediction model of the ensembles due to its simplicity, ease of understanding and development, efficacy demonstrated in different industrial applications, and convenient computational efforts needed.³⁰ However, the proposed approach-based ensemble is general, and other data-driven techniques from the literature could be employed as a base prediction model, such as SVR, ELM, RF, RNN, etc. Once the overall available dataset is partitioned/clustered, the proposed prediction approach amounts for building $H = 24$ different ensembles of ANNs models using the corresponding training datasets, each $h$ th ensemble is built to predict the $h$ th solar PV power production of the next day, $h \in [1, 24]$ (i.e. 1-day ahead). The diversity among the base models of each ensemble is established by resorting to the Bootstrapping AGGregatING (BAGGING) technique⁴² to assure enhancing the $h$ th prediction accuracy, thus boosting the overall predictability of the next day. The built ensembles are optimized using the corresponding validation datasets for the number of the ANNs that constitute the ensembles (i.e. architecture) and the number of hidden nodes required by the ANNs models of the ensembles (i.e. configuration) at each hour h. Thus the proposed approach is adaptive (i.e. different ensembles of data-driven prediction models are built, whose architectures and configurations change adaptively at each hour h, h∈ [1, 24] of a day, to accommodate the hour seasonality that appears in the solar PV data and, thus, enhancing the prediction accuracy while reducing its variability). To the best of the authors’ knowledge, no efforts have been dedicated towards establishing ensembles of prediction models adaptively at each hour of the day to accommodate the hour seasonality appears in the solar PV dataset for ultimately boosting the power production prediction accuracy while reducing its variability with convenient computational effort.

The capability of the proposed approach is tested on a solar PV system located at the rooftop of the Faculty of Engineering at the Applied Science Private University (ASU) (Amman, Jordan)⁵⁵ and compared with the benchmark prediction approach of Al-Dahidi et al.³⁰ The comparison is carried out using standard performance metrics, namely Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and its Weighted version (WMAE). In addition, the prediction stability/variability provided by the two prediction approaches has also been studied by examining various statistical measures of the performance metrics distributions obtained over different simulation trials.

Thus, the original contributions of this work are:

The development of an adaptive prediction approach based on ensembles of ANN models. The adaptivity entails building, optimizing, and evaluating ensembles of ANN models locally at each cluster/hour $h$ , $h \in [1, 24]$ ;

Comparing the proposed prediction approach with a benchmark approach based on a single ANN that has been built, optimized, and evaluated at each hour $h$ , $h \in [1, 24]$ .³⁰

The remainder of this paper is structured as follows. Section “Problem formulation and work objectives,” formulates the problem and defines the work objectives. Section “Case study and methodology,” presents the real case study being investigated in this work and illustrates the proposed prediction approach. section “Results and discussion,” shows the application results of the proposed approach on the real case study and compares them with the results obtained by the benchmark prediction approach from the literature. Finally, section “Conclusions” concludes the work and draws some future recommendations.

Problem formulation and work objectives

For a solar PV system, let us assume that we have the following information collected at each hour $h$ , $h \in [1, 24]$ for $D$ days:

The time stamp, represented by the hour counter ( $hour$ ) and day counter ( $day$ );

The weather conditions experienced by the PV system, represented by the wind speed ( $WS$ ), relative humidity ( $RH$ ), ambient temperature ( $T_{amb}$ ), and solar irradiation ( $Irr$ );

The associated power productions ( $P$ ).

The overall available dataset could be structured in a matrix $X$ , as follows:

X = [\vec{hour}, \vec{day}, \vec{WS}, \vec{RH}, {\vec{T}}_{amb}, \vec{Irr} | \vec{P}]

(1)

Due to the inherent hourly seasonality appearing in the solar PV dataset, the objective of this work is to accommodate such seasonality/variability for ultimately enhancing the 1-day ahead solar PV power production prediction performance with convenient computational effort. To this aim, the available dataset, $X$ , is initially divided/clustered into $H = 24$ datasets, each comprising the time stamp, weather conditions, and the associated power productions collected at each hour $h, h \in [1, 24]$ .³⁰ Once the $H$ datasets are established, $H$ different ensembles of data-driven prediction models have to be built, optimized, and evaluated adaptively at each hour $h$ , $h \in [1, 24]$ . This work aims to boost the prediction performance (accuracy and stability) obtained by Al-Dahidi et al.³⁰ by exploring the capability of the ensemble of prediction models whose effectiveness has been shown in different industrial applications.⁵⁶

Case study and methodology

In this Section, the real case study studied in this work regarding a solar PV system and the proposed methodology are illustrated in section “Case study” and section “Methodology,” respectively.

Case study

This Section illustrates the case study used to prove the proposed prediction approach’s effectiveness. The solar PV system installed at the Applied Science Private University (ASU) on the rooftop of the Faculty of Engineering (ASU09) is studied (Figure 1).⁵⁵

Figure 1.

Solar PV system installed on the rooftop of the Faculty of Engineering (ASU09) and the ASU weather station.

The dataset being used comprises hourly historical weather conditions and the associated power productions collected from 16th May 2015 to 31st December 2018 (i.e. $D =$ 1326 days with 31,824 data patterns or rows).³⁰ In particular, following the instructions reported in Al-Dahidi et al.,³⁰ the weather conditions data are collected from a nearby weather station (as shown in Figure 1) and consist of: wind speed ( $WS$ ) (m/s), relative humidity ( $RH$ ) (%), ambient temperature ( $T_{amb}$ ) (°C), solar irradiation ( $Irr$ ) (W/m²), in addition to the time stamp at which the weather conditions and the corresponding power productions are measured and collected, that is, hour counter ( $hour$ ) and day counter ( $day$ ), respectively. The corresponding power productions ( $P$ ) (kW) are collected from inverters installed in the solar PV system.

The overall dataset is recorded in a matrix X used later to develop the proposed and the benchmark approaches for the 1-day ahead solar PV system production prediction. It is worth highlighting that the dataset was pre-processed according to the guidelines outlined in Al-Dahidi et al.³⁰ as follows:

Negative solar irradiations (due to offset in the solar irradiation sensors) and missing associated power productions (due to inverter failures) were recognized early and late daily hours. Thus, they were set to zeros;

Missing weather conditions (due to sensors failures) and power productions (due to inverter failures or network disruptions) were recognized in some middle-day hours. Thus, they were excluded from the analysis;

For illustration purposes, Figure 2 shows the pre-processed weather conditions and the associated power productions of four particular days representative of the four seasons (shown in different colors). Notice the large variability appears at different hours of a day (i.e. hourly seasonality) in addition to that appearing at the different days of the seasons (i.e. annual seasonality). This implies that building a prediction model capable of handling such variability will be beneficial in improving prediction accuracy and stability.^6,30

Lastly, the overall dataset $X$ was, then, normalized (between 0 and 1) to have the same range of values for each input to the ANNs (equation (2)). This is useful to guarantee stable convergence of ANNs’ weights and biases (i.e. internal parameters).

X_{norm} = \frac{X - {\vec{X}}_{\min}}{{\vec{X}}_{\max} - {\vec{X}}_{\min}},

(2)

Figure 2.

The hourly weather conditions: (a) wind speed, (b) relative humidity, (c) ambient temperature, (d) solar irradiation and (e) the associated power productions for four different days representative of the four seasons.

where $X$ and $X_{norm}$ are the overall dataset (that comprises the time stamp, weather conditions, and the corresponding power productions values) and its normalized version, respectively, ${\vec{X}}_{\min}$ and ${\vec{X}}_{\max}$ are the minimum and maximum values of the inputs and output vectors that constitute the matrix $X$ .

It is worth mentioning here that the dataset being used in this work to develop the proposed adaptive ensemble-based prediction approach is the historical weather conditions (collected from a nearby weather station) experienced by the PV system under study and the associated power productions. For 1-day ahead prediction, one requires in inputs the weather conditions to be experienced by the PV system in the next day. To this aim, one could utilize the NWPs data and/or to forecast the weather conditions of the next day by resorting to data-driven techniques from the literature. However, the intention of this work is solely to investigate the capability of the ensemble of models built, optimized, and evaluated locally at each hour $h$ of a day using such “historical” pairs of weather conditions and their corresponding real power productions.

Methodology

This Section presents the suggested prediction approach for the 1-day ahead solar PV system power production. As shown in Figure 3, the proposed approach can be structured in two phases: In Phase 1, the overall available dataset $X$ is partitioned/clustered into $H = 24$ different datasets of equal size; each $h$ th dataset, $X_{h}$ , comprises the time stamp ( ${\vec{hour}}_{h}$ , ${\vec{day}}_{h}$ ), weather variables ( ${\vec{WS}}_{h}, {\vec{RH}}_{h}, {\vec{T}}_{{amb}_{h}}, {\vec{Irr}}_{h}$ ), and the corresponding power productions ( ${\vec{P}}_{h}$ ) collected at the hour $h$ , $h \in [1, 24]$ .³⁰ Therefore, a generic $h$ dataset can be written as follows (equation (3)):

X_{h} = [{\vec{hour}}_{h}, {\vec{day}}_{h}, {\vec{WS}}_{h}, {\vec{RH}}_{h}, {\vec{T}}_{{amb}_{h}}, {\vec{Irr}}_{h} | {\vec{P}}_{h}], h \in [1, 24]

(3)

Figure 3.

The proposed adaptive prediction approach-based ensemble of data-driven models.

Once the $H = 24$ datasets are established, Phase 2 entails three main steps: $H = 24$ different ensembles of models will be built (Step 1), optimized (Step 2), and evaluated (Step 3) adaptively at each hour $h$ , $h \in [1, 24]$ .

Thus, the established datasets are divided into three different datasets at each hour $h$ for the three purposes mentioned above:

Training datasets ( $X_{h}^{train}$ ). The $h$ th training dataset is used to build the $h$ th ensemble of models dedicated to predicting the $h$ th solar PV power production. The datasets comprise the time stamp, weather conditions, and the corresponding productions collected from 16th May 2015 to 30th November 2016 (i.e. 565 days). Each dataset is composed by $N_{h}^{train} = 565$ training patterns.

Validation datasets ( $X_{h}^{valid}$ ). The $h$ th validation dataset is used to optimize the $h$ th ensemble of models in terms of its configuration and architecture. The datasets comprise the time stamp, weather conditions, and the corresponding productions collected from 1st December 2016 to 30th November 2017 (i.e. 365 days). Each dataset is composed by $N_{h}^{valid} = 365$ validation patterns.

Test datasets ( $X_{h}^{test}$ ). The $h$ th test dataset is used to evaluate the capability of the $h$ th ensemble of models in predicting the $h$ th solar PV power production. The datasets comprise the time stamp, weather conditions, and the corresponding productions collected from 1st December 2017 to 31st December 2018 (i.e. 396 days). Each dataset is composed by $N_{h}^{test} = 396$ test patterns.

In detail:

Step 1 (Build H = 24 different ensembles-based ANN using the training datasets): This step requires building $H = 24$ different ensembles-based ANN prediction models, each $h$ th ensemble ( $Ensembl e_{h}$ ) is built ( ${Ensemble}_{h}^{'}$ ) (using the training datasets $X_{h}^{train}$ ) to predict the $h$ th solar PV power production of the next day, $h \in [1, 24]$ , that is, thus $H = 24$ ensembles are to be built using the available 24 training datasets.

It is worth mentioning that diversity among the base models of each $h$ th ensemble, $h \in [1, 24]$ , is being generated by resorting to the Bootstrapping AGGregatING (BAGGING) technique.⁴² This would assure enhancing the accuracy and stability of the predictions obtained by each ensemble, thus boosting the overall predictability of the next day. At each hour $h$ , $h \in [1, 24]$ , the BAGGING technique entails sampling (randomly with replacement) an arbitrary fraction (e.g. 70%) from the available training patterns ( $N_{h}^{train} = 565$ ) to be used to build the base models of each $h$ th ensemble ( $Ensembl e_{h}$ ).

Each $i$ th ANN base prediction model built as part of each $h$ th ensemble ( $Ensembl e_{h}$ ) at each hour $h$ ( ${ANN}_{h}^{i}$ ) comprises three main layers (i.e. input, hidden, and output layers), as depicted in Figure 4 and described hereafter:

Input layer. This layer receives in input the $j$ th training pattern, $x_{h}^{j}$ , that consists of the $h$ th inputs ( $[{hour}_{h}^{j}, {day}_{h}^{j}, {WS}_{h}^{j}, {RH}_{h}^{j}, {T_{amb}}_{h}^{j}, {Irr}_{h}^{j}], h \in [1, 24]$ ), $j = 1, \dots, N_{h}^{train}$ . It is worth mentioning that the $hou r_{h}$ in this regard will be excluded from the inputs of each $h$ th ensemble ( $Ensembl e_{h}$ ) since its value will be constant for each $h$ th ensemble ( $Ensembl e_{h}$ ) to be built;

Hidden layer. This layer comprises $N h_{h}$ hidden neurons used to process the received inputs via a continuous non-polynomial function (called hidden neuron activation function), such as “Radial Basis,” $g_{1} ()$ , and send the processed information to the next output layer;

Output layer. This layer aims to provide a prediction of the corresponding $h$ th power production ( ${\hat{P}}_{h}^{j}$ ) via a linear transfer function (called output neuron activation function), $g_{2} ()$ .

Figure 4.

The configuration of the ANN base prediction model.

In summary, the predicted $h$ th power production ( ${\hat{P}}_{h}^{j}$ ) of the $j$ th input training pattern, can be written as follows (equaiton (4)):

\begin{matrix} {\hat{P}}_{h}^{j} = g_{2} (\sum_{n h_{h}}^{N h_{h}} {\vec{β}}_{n h_{h}} g_{1} ({\vec{w}}_{n h_{h}} x_{h}^{j} + b_{h_{h}}) + b_{o_{h}}), \\ j = 1, \dots, N_{h}^{train}, h \in [1, 24] \end{matrix}

(4)

where $n h_{h}$ and $j$ are the indexes of the number of hidden nodes, $n h_{h} = 1, \dots, N h_{h}$ , and the number of available training patterns ( $j = 1, \dots, N_{h}^{train}$ ) available at each hour $h$ , $h \in [1, 24]$ , respectively. $b_{h_{h}}$ and ${\vec{w}}_{n h_{h}}$ are the weights of the connections established between the input bias neuron and inputs neurons with every $n h_{h}$ hidden neuron, respectively. Similarly, $b_{o}$ and ${\vec{β}}_{n h_{h}}$ are the weights of the connections established between the hidden bias neuron with the output neuron and the hidden neurons with every $n h_{h}$ hidden neuron, respectively.

The internal parameters of the ${ANN}_{h}^{i}$ (i.e. its weights and biases) are defined randomly at the beginning and, then, updated in an iterative way to minimize the error between the production prediction ( ${\hat{P}}_{h}^{j}$ ) and its actual value ( $P_{h}^{j}$ ) on the entire training datasets, $j = 1, \dots, N_{h}^{train}$ , by means of error Back-Propagation (BP) learning algorithms. Levenberg-Marquardt (LM), Bayesian Regularization, and Scaled Conjugate Gradient are typical BP learning algorithms.

Step 2 (Optimize the built ensembles using the validation datasets): Once the $H = 24$ ensembles are built, this step aims at optimizing their architectures and configurations using the validation datasets to accommodate the hour seasonality that arises in the data, and thus, assuring accurate and stable prediction results. Specifically:

• The number of base models of each ensemble needs to be optimized (i.e., the architectures). The optimum number of the base models ( $M_{h}^{opt}$ ) of the $h$ th ensemble ( $Ensembl e_{h}$ ) is selected among different candidate values ( $M_{h}^{candidate}$ ), where $M_{h}^{candidate} = [M_{h}^{\min}, M_{h}^{\max}]$ and $M_{h}^{\min}$ and $M_{h}^{\max}$ are the minimum and maximum possible numbers of the base models.

• The number of hidden nodes of the ANNs models that comprise the different ensembles needs to be optimized (i.e. the configurations). The optimum number of the hidden nodes ( ${Nh}_{h}^{opt}$ ) of the base models of the $h$ -th ensemble ( $Ensembl e_{h}$ ) is selected among different candidate values ( ${Nh}_{h}^{candidate}$ ), where ${Nh}_{h}^{candidate} = [{Nh}_{h}^{\min}, {Nh}_{h}^{\max}]$ and ${Nh}_{h}^{\min}$ and ${Nh}_{h}^{\max}$ are the minimum and maximum possible numbers of the hidden nodes.

Different architectures and configurations are built using the training datasets and evaluated using the validation datasets to select the optimum architecture and configuration adaptively at each hour $h$ , $h \in [1, 24]$ . To this aim, 25 different simulation trials are performed for each possible architecture and configuration. Each simulation trial entails establishing ensembles built using different initial parameters of the ensembles’ base models (i.e. weights and biases) and with different training datasets (i.e. 70% random selection from the corresponding training dataset using the BAGGING technique). Three standard well-known performance metrics are calculated for each built ensemble (at each hour $h$ ) and for each simulation trial (of the 25). The performance metrics calculated in this work at each hour $h$ , $h \in [1, 24]$ , on the validation datasets are the^30,40,58,59:

1. Root mean square error ( $RMSE$ ) (kW): It describes the average relevance of the errors on the entire validation patterns by computing the square of the mismatches between the actual and the predicted power productions provided by the proposed and the benchmark prediction approaches (equation (5)). Thus, this metric penalizes large mismatches (more robust). It ranges from 0 to ∞;

RMS E_{h} = \sqrt{\frac{\sum_{j = 1}^{N_{h}^{valid}} {(P_{h}^{j} - {\hat{P}}_{h}^{j})}^{2}}{N_{h}^{valid}}}

(5)

2. Mean absolute error ( $MAE$ ) (kW): It describes the average relevance of the errors on the entire validation patterns by linearly averaging (all individual errors have equal weights in the average) the mismatches between the actual and the predicted power productions of the proposed and the benchmark prediction approaches (equation (6)). It ranges from 0 to ∞; and

MA E_{h} = \frac{\sum_{j = 1}^{N_{h}^{valid}} | P_{h}^{j} - {\hat{P}}_{h}^{j} |}{N_{h}^{valid}}

(6)

3. Weighted mean absolute error ( $WMAE$ ): It is basically the $MAE$ , but it is calculated relative to the actual power productions (equation (7)). This metric is of interest to the PV system owners to compare the prediction accuracy when the power production capacities change. It ranges from 0 to ∞.

WMA E_{h} = \frac{\sum_{j = 1}^{N_{h}^{valid}} | P_{h}^{j} - {\hat{P}}_{h}^{j} |}{\sum_{j = 1}^{N_{h}^{valid}} P_{h}^{j}}

(7)

Small values of these metrics indicate the superiority of the obtained predictions and vice versa. Then, various statistical measures (across the 25 simulation trials) are calculated to evaluate the prediction accuracy and stability of the proposed approach. The optimum configuration and architecture ( ${Ensemble}_{h}^{opt}$ ) are selected at which the overall metric ( $RMS E_{h}$ × $MA E_{h}$ × $WMA E_{h}$ ) at each hour $h$ is minimized.³⁰ It is worth mentioning that the ANN is selected to serve as a base model of the built ensembles due to its simplicity, ease of understanding and development, and convenient computational efforts with sufficient predictability as proved, particularly for solar PV power production prediction³⁰ and, generally for load/demand forecasting⁶⁰ and electricity price forecasting.⁶¹ However, other data-driven techniques from the literature might be used in replacement to the ANN, such as SVR, ELM, RF, RNN, etc. In this case, the predictability of the eventual prediction approach-based ensemble will, in fact, be determined by the predictability of the employed base prediction model. This is out of the scope of this work, and the sole objective is to investigate the effectiveness of having ensembles of ANNs instead of a single ANN prediction model at each hour $h$ .

Step 3 (Evaluate the optimized ensembles on the test datasets): Once the $H = 24$ ensembles are built and optimized, their performance is evaluated on the test datasets ( $X_{h}^{test}$ ) at each hour $h$ , $h \in [1, 24]$ , by quantifying the considered performance metrics, and compared with another benchmark prediction approach from the literature.³⁰

On the other hand, the benchmark approach entails building $H = 24$ different ANNs prediction models. Specifically, a single ANN model will be built (using the training datasets ( $X_{h}^{train}$ )), optimized (in terms of the number of hidden nodes ( $Nh$ ) using the validation datasets ( $X_{h}^{valid}$ )), and evaluated (using the test datasets ( $X_{h}^{test}$ )) – similar to Steps 1–3 of the proposed approach – at each hour $h$ , $h \in [1, 24]$ . Similar to the proposed approach, the optimum number of hidden nodes ( ${Nh}_{h}^{opt}$ ) of every single ANN built at each hour $h, h \in [1, 24],$ is selected at which the overall metric ( $RMS E_{h}$ × $MA E_{h}$ × $WMA E_{h}$ ) computed at the $h$ th hour is minimized.³⁰

Results and discussion

The application results of the proposed approach (section “Methodology”) on the ASU case study (section “Case study”) are here presented and compared with the results obtained by the benchmark.³⁰ It is important to state here that the early morning hours ( $h \in [1, 7]$ ) and late evening hours ( $h \in [21, 24]$ ) are excluded from the analysis, that is, there are no significant power productions at these times. Thus, the reported results are presented solely for $h \in [8, 20]$ .

Application results of the proposed approach

At each hour $h$ , the $h$ th ensemble ( $Ensembl e_{h}$ ) of the proposed prediction approach is built using the corresponding training datasets ( $X_{h}^{train}$ ) (Step 1) and its optimum configuration and architecture are identified on the validation datasets ( $X_{h}^{valid}$ ) (Step 2).

Table 1 reports the optimum number of base models ( $M_{h}^{opt}$ ) obtained for each $h$ th ensemble ( $Ensembl e_{h}$ ) and their corresponding optimum number of hidden nodes ( ${Nh}_{h}^{opt}$ ), $h \in [8, 20]$ , together with the metrics’ values using the 25 simulation trials. Specifically, at each hour $h$ , $h \in [8, 20]$ , the optimum number of base models ( $M_{h}^{opt}$ ) and their corresponding optimum number of hidden nodes ( ${Nh}_{h}^{opt}$ ) are selected according to the overall metric ( $RMS E_{h}$ × $MA E_{h}$ × $WMA E_{h}$ ) computed for different combinations of the number of base models ( $M_{h}^{candidate}$ ) and the number of hidden nodes ( ${Nh}_{h}^{candidate}$ ) that span the intervals [1, 30] and [1, 20], respectively.

Table 1.

The optimum configurations and architectures obtained on the validation datasets for the built ensembles from $h$ = 8 to $h$ = 20.

$h \in [8, 20]$	$M_{h}^{opt}$	${Nh}_{h}^{opt}$	$RMSE [kW]$	$MAE [kW]$	$WMAE$
8	9	1	2.7095	2.1849	0.6800
9	24	1	11.6311	9.9372	0.5284
10	2	15	22.6355	17.1404	0.3148
11	6	14	26.8843	19.1005	0.1950
12	26	17	27.2548	17.4070	0.1351
13	4	2	25.4299	14.6441	0.1007
14	18	2	24.9994	13.1864	0.0920
15	27	3	24.5680	13.0756	0.0998
16	8	2	20.6430	13.9123	0.1315
17	13	3	19.7243	14.4168	0.2011
18	2	14	15.2370	11.6427	0.2977
19	1	17	9.8521	7.8518	0.3729
20	1	10	3.3037	2.2333	0.3904

Looking at Table 1, one can recognize that the inherent hourly variability of the solar PV power production requires building different ensembles characterized by different configurations and architectures (thus adaptive). In this way, each of the built ensembles will handle the inherent hourly variability of the solar PV power production for enhancing the corresponding hourly predictions. Notice that:

For the middle days hours (e.g. $h = 14$ ), the optimum number of base models is, in general, large and the corresponding optimum number of hidden nodes is small, in a way to capture the specific hour variability;

For the late morning (e.g. $h = 9$ ) and early evening (e.g. $h = 19$ ) days hours, the optimum number of base models is either large or small, whereas the corresponding optimum number of hidden nodes is small or large, respectively, in a way to capture the specific hour variability. More specifically, at $h = 19$ , a single ANN base model with a relatively large number of hidden nodes is required (similar to the benchmark approach, as we shall see in section “Application results of the benchmark approach”) is sufficient to capture the specific hour variability.

It is worth noting that an exhaustive searching procedure could be used to identify different optimum numbers of hidden nodes for the base models that constitute each $h$ th ensemble, $h \in [8, 20]$ . This would be, indeed, useful in enhancing the prediction accuracy further because more diversity will be injected among the ensembles’ base models.⁶² However, to reduce the computational efforts required in the optimization task (Step 2), a sole optimum number of hidden nodes ( ${Nh}_{h}^{opt}$ ) has been selected for the entire base models of each $h$ th ensemble, as reported in Table 2. In this work, the development of the base models using different training datasets (i.e. BAGGING) was adequate to establish diversity among the ensembles’ base models while emphasizing the potential of the suggested approach-based ensemble.

Table 2.

The detailed characteristics of the ANN models of the built ensembles.

ANN characteristics	Description
Configuration	Three layers (input-hidden-output)
Transfer function (hidden layer)	Radial basis
Transfer function (output layer)	Linear
Training optimization algorithm	Levenberg-Marquardt
Performance function	Mean square error (MSE)

For replication purposes, Table 2 reports the detailed characteristics of the ANN models of the built ensembles. Once the proposed prediction approach is built and optimized, it is evaluated on the test datasets and compared with the benchmark prediction approach of.³⁰

Application results of the benchmark approach

The benchmark prediction approach entails building, optimizing, and evaluating single ANN models (whose detailed characteristics are reported in Table 2) at each hour $h$ (i.e. $M_{h}^{opt} = 1$ ) for predicting the corresponding power production, $h \in [8, 20]$ , as obtained in.³⁰

The optimum configurations (i.e. $N_{h}^{opt}$ ) of the $H = 24$ different ANNs prediction models obtained on the validation datasets ( $X_{h}^{valid}$ ) at each hour $h$ , $h \in [8, 20]$ are reported in Table 3 together with the metrics’ values using the 25 simulation trials. Similarly, the optimum number of hidden nodes is selected according to the overall metric ( $RMS E_{h}$ × $MA E_{h}$ × $WMA E_{h}$ ) computed for different numbers of the hidden nodes ( ${Nh}_{h}^{candidate}$ ) that span the interval [1, 20].

Table 3.

The optimum configurations obtained on the validation datasets for the single ANN models built from $h$ = 8 to $h$ = 20.

$h \in [8, 20]$	${Nh}_{h}^{opt}$	$RMSE [kW]$	$MAE [kW]$	$WMAE$
8	1	4.6143	3.3571	1.0449
9	1	17.3063	13.1890	0.7012
10	20	29.7785	23.3723	0.4292
11	11	31.3369	23.7429	0.2423
12	6	30.3182	20.6133	0.1600
13	3	27.7697	17.6575	0.1214
14	2	27.9150	16.6927	0.1165
15	2	27.9612	16.9400	0.1293
16	2	24.1191	17.2164	0.1627
17	4	22.5488	17.4864	0.2439
18	8	19.5818	15.3186	0.3917
19	17	11.6245	9.3780	0.4454
20	10	4.5276	2.8132	0.4918

Notice that:

The proposed approach significantly outperforms the benchmark approach on the validation datasets ( $X_{h}^{valid}$ ) using the three performance metrics at each $h$ , $h \in [8, 20]$ ;

The prediction performances obtained by both the proposed and the benchmark approaches are proportional to the variability level inherent in the solar PV data at hand (e.g. small and large variability entails small and large metrics’ values obtained at $h = 8$ and $h = 14$ using the proposed and the benchmark prediction approaches, respectively).

Comparisons and discussions

Figure 5 shows the box plots of the three-performance metrics distributions, thatr is, RMSE (Figure 5(a)), MAE (Figure 5(b)), and WMAE (Figure 5(c)), obtained by the proposed and benchmark approaches on the test datasets at each hour $h$ , $h \in [8, 20]$ using the 25 simulation trials. The box plot provides a visual summary of many statistical measures (25th percentile or first quartile (Q1), 50th percentile or second quartile (Q2) or median, and 75th percentile or third quartile (Q3)) of the metrics distributions obtained across the 25 simulation trials using the two prediction approaches. To offer extra information about the variability of the results, the box plot also shows the minimum and maximum values (still within 1.5 Interquartile Range (IQR) of Q1 and Q3, respectively) connected with the box by vertical lines (called “whiskers”) and the distributions’ outliers represented as circles.⁶³ Such statistical information is useful to evaluate and compare the prediction accuracy and variability of the two prediction approaches. By observing Figure 5, one can recognize the power of the ensemble of models (proposed approach) to the single models (benchmark approach) 30 at each hour of the day:

The proposed approach provides more accurate predictions than the benchmark approach at each hour $h$ , $h \in [8, 20]$ , thta is, lower RMSE, MAE, and WMAE values as represented by the median statistical measure shown inside the box;

The proposed approach provides more stable predictions than the benchmark approach at each hour $h$ , $h \in [8, 20]$ , that is, small boxes which indicate the least variation among the RMSE, MAE, and WMAE values. In contrast, the benchmark approach produces bigger boxes and has the maximum number of outliers, that is, the most unstable prediction results.

Figure 5.

Hourly prediction comparison for the proposed approach and benchmark approach: box plot of (a) RMSE, (b) MAE, and (c) WMAE.

To further show the effectiveness of the suggested approach, Performance Gain ( $p g_{METRIC}$ ) of each performance metric ( $METRIC$ ) is calculated according to equation (8) below³⁰ using the 25 simulation trials:

p g_{M E T R I C} = \frac{M E T R I C^{B e n c h m a r k} - M E T R I C^{P r o p o s e d}}{M E T R I C^{B e n c h m a r k}} x 100 %

(8)

In practice, the performance gain describes the improvement in the prediction achieved by the proposed approach compared to the benchmark approach for each of the three performance metrics: positive values mean that the proposed approach outperforms the benchmark approach and vice versa.³⁰

Figure 6 shows the performance gains calculated on the test datasets for the three performance metrics over the 25 simulation trials. One can see that the proposed approach enhances the production predictions (i.e., positive gain values) for the three performance metrics for most of the day hours with respect to the benchmark. On average, enhancements reach around 5% (for the $RMSE$ – Figure 6(a)) and 6.5% (for the $MAE$ and $WMAE$ –Figure 6(b) and (c), respectively) (dashed lines) compared to the benchmark approach of.³⁰

Figure 6.

The performance gains of the (a) RMSE, (b) MAE and (c) WMAE performance metrics obtained by the proposed approach and the benchmark approach on the test datasets over the 25 simulation trials.

In addition, the computational efforts required by the proposed approach and the benchmark approach on the test datasets over the 25 simulation trials are 8.32 and 1.02 min, respectively. Thus, the transition from having a single ANN model to an ensemble of ANN models at each hour $h$ necessitates significant computational efforts. Still, the computational efforts required by the proposed approach are convenient and acceptable for the PV system owner concerning the online 1-day ahead prediction while considering the performance gain (in terms of prediction accuracy and stability) obtained by the proposed approach to the benchmark approach.

To further clarify the effectiveness of the proposed approach to the benchmark approach, Figure 7 shows the box plots of the RMSE distributions (as a representative of the other two performance metrics) computed over the 25 simulation trials on the test datasets at each season (i.e. Winter (Figure 7(a)), Spring (Figure 7(b)), Summer (Figure 7(c)), and Autumn (Figure 7(d))). By observing Figure 7, one can recognize the following:

The proposed approach provides more accurate predictions than the benchmark approach at each hour $h$ , $h \in [8, 20]$ in each season, that is, lower RMSE values as represented by the median statistical measure shown inside the box;

The proposed approach provides more stable predictions than the benchmark approach at each hour $h$ , $h \in [8, 20]$ in each season, that is, the lowest percentiles or small boxes, which indicate the least variation among the RMSE values. In contrast, the benchmark approach produces bigger boxes and has the maximum number of outliers, that is, the most unstable/varying prediction results.

Figure 7.

Box plots of the RMSE performance metric distributions obtained by the proposed approach and benchmark approach at each season of the test datasets using the 25 simulation trials: (a) winter, (b) spring, (c) summer, and (d) autumn.

Specifically, the superiority of the proposed approach can be clearly recognized by computing the achieved performance gains for the 1-day ahead solar PV power production prediction in each season. In this regard, Table 4 reports the performance gains of the three performance metrics computed for each season in the test datasets as well as for the entire test datasets. Looking at Table 4, one can notice the following:

Throughout the four seasons, and hence across the full test datasets, the suggested approach significantly outperforms the benchmark approach for the three performance indicators;

The suggested approach achieves the highest performance gains using the three performance metrics during the winter season followed by autumn, summer, and spring;

Throughout the whole test datasets, the suggested approach achieves around 9%, 8%, and 10% performance gains to the benchmark approach using the RMSE, MAE, and WMAE performance metrics, respectively.

Table 4.

Seasonal 1-day ahead prediction comparison for the proposed and benchmark approaches.

	$RMSE [kW]$ ( $p g_{R M S E} [%]$ )		$MAE [kW]$ ( $p g_{M A E} [%]$ )		$WMAE$ ( $p g_{W M A E} [%]$ )
	Benchmark	Proposed	Benchmark	Proposed	Benchmark	Proposed
Winter	13.8744 (0)	11.3298 (~18)	10.2533 (0)	8.6896 (~15)	0.2678 (0)	0.2271 (~15)
Spring	21.3992 (0)	20.4058 (~5)	16.2920 (0)	15.6940 (~4)	0.1832 (0)	0.1758 (~4)
Summer	18.4078 (0)	17.5214 (~5)	14.5685 (0)	13.6213 (~7)	0.1312 (0)	0.1230 (~6)
Autumn	14.0380 (0)	12.8038 (~9)	10.8135 (0)	10.0054 (~8)	0.1754 (0)	0.1598 (~9)
Overall	16.7351 (0)	15.2452 (~9)	12.8068 (0)	11.7885 (~8)	0.1957 (0)	0.1759 (~10)

For clarification purposes, Figure 8 shows the productions predictions obtained by the proposed approach (circles) compared to those obtained by the benchmark approach in one of the best simulation trial (diamonds) for four different days (selected at the four different seasons in the test datasets) together with the actual productions (squares). In addition, Table 5 reports the corresponding performance gains computed for the three performance metrics of these particular days. One can notice the better matching (i.e. higher prediction accuracy and, thus, higher performance gain) between the proposed approach’s predictions and the actual productions compared to the benchmark approach’s predictions in the four different days.

Figure 8.

The power production predictions obtained by the proposed and the benchmark prediction approaches for four different days selected from the test datasets: (a) winter, (b) spring, (c) summer, and (d) autumn.

Table 5.

The corresponding performance gains computed for the three performance metrics on the four different days.

Day	$p g_{R M S E} [%]$	$p g_{MAE}$ [%]	$p g_{W M A E} [%]$
Day 1 19 December 2017	67.0121	67.6619	67.6619
Day 2 15 March 2018	33.3573	29.4374	29.4374
Day 3 7 June 2018	17.3157	17.8602	17.8602
Day 4 3 November 2018	38.3905	35.8190	35.8190

Conclusions and future recommendations

This work aims to propose an adaptive approach-based ensemble for the 1-day ahead production prediction of solar Photovoltaic (PV) systems. The proposed prediction approach entails building ensembles of Artificial Neural Networks (ANNs) at each hour h of a day, h∈ [8, 20] using the corresponding training datasets. Then, the built ensembles are optimized using the related validation datasets in terms of (1) the number of the ANNs that constitute the ensembles (i.e. architecture) and the number of hidden nodes required by the ANNs modes of the ensembles at each hour h (i.e. configuration). Thus, the proposed prediction approach is adaptive. Finally, the suggested prediction methodology is verified on a real case study of a 264 kW solar PV system installed at Applied Science Private University (Amman, Jordan). Three standard well-known performance metrics were computed to evaluate the effectiveness of the proposed approach, namely the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and the Weighted MAE (WMAE). In contrast to other prediction approaches of the literature that uses single ANNs at each hour h of the day, the proposed approach showed, for instance:

An improved prediction accuracy characterized by lower RMSE, MAE, and WMAE values (Figure 5) and higher corresponding performance gains (Figure 6) on the test datasets. Specifically, an improved prediction accuracy reaches up to 8%, 9%, and 10% for the RMSE, MAE, and WMAE performance metrics, respectively, on the test datasets for the 1-day ahead prediction (Table 4);

More stable predictions (characterized by small boxes with a minimum number of outliers computed for the distributions of the three performance metrics over the entire 25 simulation trials on the test datasets (Figure 5)).

Furthermore, the effectiveness of the proposed approach was proved regarding the different weather conditions (i.e. seasons) experienced by the ASU PV system under study. The proposed approach significantly outperforms the benchmark approach for the three performance indicators across the four seasons. The highest performance gains were obtained for the winter season (with about 18%, 15%, and 15% for the RMSE, MAE, and WMAE, respectively), whereas the lowest performance gains were obtained for the spring season (with about 5%, 4%, and 4% for the RMSE, MAE, and WMAE, respectively).

Additional diversity injection techniques will be investigated in the future to improve the efficacy of the suggested adaptive ensemble approach in the context of solar PV power production prediction, for example, by using a mixture of different types of base prediction models that constitute the ensembles.

Footnotes

Appendix

Acknowledgements

The authors would like to acknowledge the Renewable Energy Center at the Applied Science Private University for sharing with us the Solar PV data. The authors would like to thank the reviewers for their valuable comments to improve the quality of the paper.

Handling Editor: Chenhui Liang

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Mohammad Alrbai

Data availability

The data that support the findings of this work are available from the Renewable Energy Center at Applied Science Private University (ASU), but restrictions apply to the availability of these data, which were used under license for the current work, and so are not publicly available. However, data are available from the authors upon reasonable request and with permission of the Renewable Energy Center at ASU.

References

Red Eléctrica de España (RED). The Spanish Electricity System 2017. Madrid: External Communication Department, Department of Access to Information on the Electricity System: 2018, www.ree.es/en

Reddy

Bijwe

Abhyankar

. Joint energy and spinning reserve market clearing incorporating wind power and load forecast uncertainties. IEEE Syst J 2015; 9: 152–164.

Reddy

Abhyankar

Bijwe

. Market clearing for a wind-thermal power system incorporating wind generation and load forecast uncertainties. In: IEEE power and energy society general meeting, San Diego, CA, USA, pp.1–8. IEEE.

Wang

Dong

, et al. Renewable Energy and economic growth: new insight from country risks. Energy 2022; 238: 122018.

Wang

Zhang

Rezazadeh

. Hydrogen fuel and electricity generation from a new hybrid energy system based on wind and solar energies and alkaline fuel cell. Energy Rep 2021; 7: 2594–2604.

Gupta

Singh

. PV power forecasting based on data-driven models: a Review. Int J Sustain Eng 2021; 14: 1733–1755.

Aslam

Herodotou

Mohsin

, et al. A survey on deep learning methods for power load and renewable energy forecasting in smart microgrids. Renew Sustain Energ Rev 2021; 144: 110992.

Qadir

Khan

Khalaji

, et al. Predicting the energy output of hybrid PV–Wind renewable energy system using feature selection technique for smart grids. Energy Rep 2021; 7: 8465–8475.

Shivam

Tzou

. A multi-objective predictive energy management strategy for residential grid-connected PV-battery hybrid systems based on machine learning technique. Energy Convers Manag 2021; 237: 114103.

10.

Bhuiyan

Hossain

Muyeen

, et al. Towards next generation virtual power plant: technology review and frameworks. Renew Sustain Energ Rev 2021; 150: 111358.

11.

Lan

Liu

Wang

, et al. An advanced machine learning based energy management of renewable microgrids considering hybrid electric vehicles’ charging demand. Energies 2021; 14: 569.

12.

Di Piazza

La Tona

, et al. An artificial neural network-based forecasting model of energy-related time series for electrical grid management. Math Comput Simul 2021; 184: 294–305.

13.

Aman

Simmhan

Prasanna

. Holistic measures for evaluating prediction models in smart grids. IEEE Trans Knowl Data Eng 2015; 27: 475–488.

14.

Sobri

Koohi-Kamali

Rahim

. Solar photovoltaic generation forecasting methods: a review. Energy Convers Manag 2018; 156: 459–497.

15.

Wang

Lei

Zhang

, et al. A review of deep learning for renewable energy forecasting. Energy Convers Manag 2019; 198: 111799.

16.

Garud

Jayaraj

Lee

. A review on modeling of solar photovoltaic systems using artificial neural networks, fuzzy logic, genetic algorithm and hybrid models. Int J Energy Res 2021; 45: 6–35.

17.

Lan

Zhang

Hong

, et al. Day-ahead spatiotemporal solar irradiation forecasting using frequency-based hybrid principal component analysis and neural network. Appl Energy 2019; 247: 389–402.

18.

Antonanzas

Osorio

Escobar

, et al. Review of photovoltaic power forecasting. Sol Energy 2016; 136: 78–111.

19.

Yang

Zhao

, et al. Kalman filter photovoltaic power prediction model based on forecasting experience. Front Energy Res 2021; 9: 1–9.

20.

Lima

MAFB

Fernández Ramírez

Carvalho

PCM

, et al. A comparison between deep learning and support vector regression techniques applied to solar forecast in Spain. J Sol Energy Eng 2022; 144: 010802.

21.

Wang

Zhang

Zhao

, et al. Photovoltaic system power forecasting based on combined grey model and BP neural network. In: 2011 international conference on electrical and control engineering, ICECE 2011 – proceedings, Yichang, China, 2011, pp.4623–4626.

22.

Das

. Short term forecasting of solar radiation and power output of 89.6 kWp solar PV power plant. Mater Today Proc 2021; 39: 1959–1969.

23.

Eniola

Suriwong

Sirisamphanwong

, et al. Validation of genetic algorithm optimized hidden Markov model for short-term photovoltaic power prediction. Int J Renew Energ Res 2021; 11: 796–807.

24.

Fathi

Parian

. Intelligent MPPT for photovoltaic panels using a novel fuzzy logic and artificial neural networks based on evolutionary algorithms. Energy Rep 2021; 7: 1338–1348.

25.

Ajayi

Heymann

. Data centre day-ahead energy demand prediction and energy dispatch with solar PV integration. Energy Rep 2021; 7: 3760–3774.

26.

Liu

Feng

, et al. An improved whale algorithm for support vector machine prediction of photovoltaic power generation. Symmetry 2021; 13: 212.

27.

Sharma

Mangla

Yadav

, et al. A sequential ensemble model for photovoltaic power forecasting. Comput Electr Eng 2021; 96: 107484.

28.

Dec

Drałus

Mazur

, et al. Forecasting models of daily energy generation by PV panels using fuzzy logic. Energies 2021; 14: 1676.

29.

Benali

Notton

Fouilloy

, et al. Solar radiation forecasting using artificial neural network and random forest methods: application to normal beam, horizontal diffuse and global components. Renew Energy 2019; 132: 871–884.

30.

Al-Dahidi

Louzazni

Omran

. A local training strategy-based artificial neural network for predicting the power production of solar photovoltaic systems. IEEE Access 2020; 8: 150262–150281.

31.

Kushwaha

Pindoriya

. Very short-term solar PV generation forecast using SARIMA model: a case study. In: 2017 7th international conference on power systems, ICPS 2017, Pune, India, 2017, pp.430–435: IEEE.

32.

AlShafeey

Csáki

. Evaluating neural network and linear regression photovoltaic power forecasting models based on different input methods. Energy Rep 2021; 7: 7601–7614.

33.

Khan

Shaikh

Siddiqui

, et al. Hourly forecasting of solar photovoltaic power in Pakistan using recurrent neural networks. Int J Photoenergy 2022; 2022: 1–11.

34.

Durrani

Balluff

Wurzer

, et al. Photovoltaic yield prediction using an irradiance forecast model based on multiple neural networks. J Mod Power Syst Clean Energy 2018; 6: 255–267.

35.

Zhong

Liu

Sun

, et al. Prediction of photovoltaic power generation based on general regression and back propagation neural network. Energy Proc 2018; 152: 1224–1229.

36.

Kumar

Saravanakumar

Karthick

, et al. Artificial neural network-based output power prediction of grid-connected semitransparent photovoltaic system. Environ Sci Pollut Res 2022; 29: 10173–10182.

37.

Sharadga

Hajimirza

Balog

. Time series forecasting of solar power generation for large-scale photovoltaic plants. Renew Energy 2020; 150: 797–807.

38.

Zhou

Gao

, et al. Short-term power generation forecasting of a photovoltaic plant based on PSO-BP and GA-BP neural networks. Front Energy Res 2022; 9: 824691.

39.

Khademi

Moadel

Khosravi

. Power prediction and technoeconomic analysis of a solar PV power plant by MLP-ABC and COMFAR III, considering cloudy weather conditions. Int J Chem Eng 2016; 2016: 1–8.

40.

Ahmed

Sreeram

Mishra

, et al. A review and evaluation of the state-of-the-art in PV solar power forecasting: Techniques and optimization. Renew Sustain Energ Rev 2020; 124: 109792.

41.

Al-Dahidi

Ayadi

Adeeb

, et al. Extreme learning machines for solar photovoltaic power predictions. Energies 2018; 11: 2725.

42.

Al-Dahidi

Ayadi

Alrbai

, et al. Ensemble approach of optimized artificial neural networks for solar photovoltaic power prediction. IEEE Access 2019; 7: 81741–81758.

43.

Guermoui

Melgani

Gairaa

, et al. A comprehensive review of hybrid models for solar radiation forecasting. J Clean Prod 2020; 258: 120357.

44.

Kaffash

Deconinck

. Ensemble machine learning forecaster for day ahead PV system generation. In: Proceedings of 2019 the 7th international conference on smart energy grid engineering, SEGE 2019, Oshawa, ON, Canada, 2019, pp.92–96.

45.

Khan

Walker

Zeiler

. Improved solar photovoltaic energy generation forecast using deep learning-based ensemble stacking approach. Energy 2022; 240: 122812.

46.

Tiwari

Sabzchgar

Rasouli

. Short term solar irradiance forecast using numerical weather prediction (NWP) with gradient boost regression. In: 2018 9th IEEE international symposium on power electronics for distributed generation systems, PEDG 2018, Charlotte, NC, USA, 2018, pp.1–8.

47.

Bacher

Madsen

Nielsen

. Online short-term solar power forecasting. Sol Energy 2009; 83: 1772–1783.

48.

Zhen

Pang

Wang

, et al. Pattern classification and PSO optimal weights based sky images cloud motion speed calculation method for solar PV power forecasting. IEEE Trans Ind Appl 2019; 55: 3331–3342.

49.

Kim

Kang

, et al. Toward improved solar irradiance forecasts: comparison of the global horizontal irradiances derived from the COMS satellite imagery over the Korean Peninsula. Pure Appl Geophys 2017; 174: 2773–2792.

50.

Yang

Gao

Hua

, et al. Very short-term surface solar irradiance forecasting based on FengYun-4 geostationary satellite. Sensors 2020; 20: 2606.

51.

Ayet

Tandeo

. Nowcasting solar irradiance using an analog method and geostationary satellite images. Sol Energy 2018; 164: 301–315.

52.

Martínez-Chico

Batlles

Bosch

. Cloud classification in a mediterranean location using radiation data and sky images. Energy 2011; 36: 4055–4062.

53.

Peng

Huang

, et al. 3D cloud detection and tracking system for solar forecast using multiple sky imagers. Sol Energy 2015; 118: 496–519.

54.

Wolff

Kühnert

Lorenz

, et al. Comparing support vector regression for PV power forecasting to a physical modeling approach using measurement, numerical weather prediction, and cloud motion data. Sol Energy 2016; 135: 197–208.

55.

Applied Science Private University (ASU). PV system ASU09: faculty of engineering, http://energy.asu.edu.jo/ (2021, accessed 19 December 2021).

56.

Pintelas

Livieris

. (eds.). Ensemble algorithms and their applications. Basel: MDPI, 2020.

57.

Google Earth. Solar PV system and weather station installed at Applied Science Private University (ASU). Amman: Google Earth, 2021.

58.

Matteri

Ogliari

Nespoli

. Enhanced day-ahead PV power forecast: dataset clustering for an effective artificial neural network training. Eng Proc 2021; 5: 16.

59.

Ogliari

Nespoli

. Photovoltaic plant output power forecast by means of hybrid artificial neural networks. In: Mellit

Benghanem

(eds) A practical guide for advanced methods in solar photovoltaic systems. Advanced Structured Materials. Cham: Springer, 2020, pp.203–222.

60.

Reddy

. Bat algorithm-based back propagation approach for short-term load forecasting considering weather factors. Elect Eng 2018; 100: 1297–1303–DOI: 10.1007/s00202-017-0587-2

61.

Reddy

Jung

Seog

. Day-ahead electricity price forecasting using back propagation neural networks and weighted least square technique. Front Energy 2016; 10: 105–113.

62.

Polikar

. Ensemble based systems in decision making. IEEE Circuits Syst Mag 2006; 6: 21–45.

63.

Massart

Smeyers-Verbeke

Capron

, et al. Visual presentation of data by means of box plots. LC-GC Europe 2005; 18: 215–218.

An adaptive approach-based ensemble for 1 day-ahead production prediction of solar PV systems

Abstract

Keywords

Introduction

Background

Literature review

Original contributions

Problem formulation and work objectives

Case study and methodology

Case study

Methodology

Results and discussion

Application results of the proposed approach

Application results of the benchmark approach

Comparisons and discussions

Conclusions and future recommendations

Footnotes

Appendix

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iD

Data availability

References