Abstract
The world is becoming more reliant on renewable energy sources to satisfy its growing energy demand. The primary disadvantage of such sources is their significant uncertainty in power production. As appropriate energy production planning and scheduling necessitate a solid and confident assessment of renewable power production, the necessity for developing reliable prediction models grows by the day. This paper proposes an adaptive approach-based ensemble for 1-day ahead production prediction of solar Photovoltaic (PV) systems. Different ensembles of Artificial Neural Networks (ANNs) prediction models are established, whose architectures (number of the ANNs that comprise the ensembles) and configurations (number of hidden nodes required by the ANNs models of the ensembles) change adaptively at each hour h, h∈ [1, 24] of a day, for accommodating the hour seasonality in the solar PV data and, thus, enhancing the 1 day-ahead predictions accuracy. The suggested approach is tested on a 264 kW solar PV system installed at Applied Science Private University, Jordan. Its prediction performance is evaluated, particularly for different weather conditions (seasons) experienced by the concerned PV system, using standard performance metrics. Results show the effectiveness of the suggested approach in predicting solar PV power production and its superiority compared to another prediction approach of the literature that uses single ANNs at each hour h of the day. Specifically, for 1-day ahead prediction, the obtained enhanced accuracy, on average, was around 8%–10% on the test “unseen” datasets.
Keywords
Introduction
Background
Energy has a significant influence on the life of humans, industry, agriculture, transportation, and communication. In general, the generated electricity from non-renewable resources, such as coal, crude oil, and natural gas, covers most energy demand. On the other hand, it is known that – in most of the developing countries – the renewable energy resources, for example, solar, wind, hydro, geothermal, and biomass, are not exploited sufficiently. Due to the population growth and rising living standards, the energy demand and consumption continuously rise, which represents an economic challenge, especially for the countries with limited energy sources. For example, Spain’s overall consumed electrical energy was 268,808 GWh in 2018, and it increased by 0.4% compared to 2017. 1
Moreover, in the last few years, the cost of energy demand in Jordan has been about 10% of its Gross Domestic Product (GDP). It is known that burning oil, natural gas, and coal cause water and air pollution and can cause serious health issues. Therefore, most countries are directed towards renewable energy resources, as they are clean, economically viable, and sustainable. Hence, the production, storage, and delivery of electricity generated from renewable energy resources have attracted significant attention in the last decade.2–4
Solar energy has recently been a rapidly developing technology compared to wind and other renewable energy resources. Several factors have a noticeable influence on the electrical power generated from the Photovoltaic (PV) plants, for example, solar irradiation, wind speed, and ambient temperature. 5 Hence, it is concluded that the amount of electricity generated from the PV modules is variable and intermittent, and it becomes challenging to predict the power production from the plant.6,7 The involvement of PV systems in the electrical networks as one of the distributed generators created challenges to the operators due to the generation intermittency and the variability of the weather conditions. Therefore, the reliable prediction will reduce the deviations between the PV power plant’s expected and generated power. The accuracy of predicting the PV output power is essential for several applications, such as Smart Grids (SGs), 8 load management, 9 the virtual power plant’s reliability, 10 and the charging of electric vehicles. 11 Therefore, it is essential to predict the PV generation in the SG application at different time horizons.7,12 The prediction of PV SG applications can be divided into three types based on the time horizon: the short-term (10–30 min), the middle-term (1-few hours), and the long-term (24 h to 2 weeks). 13 The short-term prediction significantly limits the voltage rise on the load side and suppresses the grid’s power variation. In contrast, middle-term prediction is essential to balance the power production and load demand besides scheduling the charging of electric vehicles. Finally, long-term prediction plays a vital role in load scheduling and dispatching.
Literature review
Various methods have appeared in-state-of-the art for predicting solar PV generation. These methods can be categorized into statistical, physical, and ensemble methods.14–16 The statistical methods rely on reconstructing the relations between the former meteorological parameters and hourly irradiance, which does not involve determining the system’s internal state information for modeling.17,18 Further, the statistical approaches have been classified into time series and learning methods. The time series methods included different methods, such as Kalman filter, 19 Support Vector Regression (SVR), 20 Grey Forecasting Method, 21 Auto-Regressive Integrated Moving Average (ARIMA), 22 and Hidden-Markov Models (HMM). 23 The learning methods comprised the Artificial Neural Network (ANN),24,25 Support Vector Machine (SVM), 26 Wavelet Analysis (WA), 27 and Fuzzy Logic (FL). 28 The ANN is deemed one of the most popular statistical methods adopted to predict the PV generation with a prediction horizon of 24-h ahead.29,30
For example, Kushwaha and Pindoriya 31 adopted the Seasonal ARIMA (SARIMA) model for multi-step ahead power production prediction of solar PV systems installed in IIT Gandhinagar University, India. Results showed that the SARIMA outperformed the persistence model, but its performance degraded on cloudy or rainy days, making it not suitable for very short-term prediction in such weather conditions. AlShafeey and Csáki 32 investigated the capability and compared the performance of Multiple Linear Regression (MLR) and ANN for 1-day ahead power production prediction considering structural, time-series, and hybrid input data methods for a 546 kWp grid-connected solar PV farm located in Hungary. According to the findings, the models obtained different prediction accuracy depending on the inputs being used.
Further, Machine Learning (ML) was deployed to predict the PV generation; this approach depends on Artificial Intelligence (AI) to learn from previous historical data to strengthen its prediction capabilities during the training stage. Powerful computers are needed to run many iterations before a final prediction can be achieved. It can distinguish impossible representations without any predetermined equations. Besides the previous methods, Recurrent Neural Network (RNN), 33 Feed-Forward Neural Network (FFNN), 34 and Feed-Back Neural Network (FBNN) have been deployed to predict the PV generation at various time horizons. 35 For example, Kumar et al. 36 developed three real-time prediction models, namely the Elman Neural Network, FFNN, and Generalized Regression Neural Network (GRNN), for the short-term power production prediction of a Semi-Transparent PV (STPV) system. The three developed models used the ambient temperature, solar irradiance, and wind speed as the input parameters to forecast the output power for an STPV system in India. The final results revealed a small error between the forecast and actual output power with a Root Mean Square Error (RMSE) of 0.25, 0.30, and 0.426 for the three investigated models, respectively. Sharadga et al. 37 presented and compared several time-series statistical (e.g. SARIMA) and AI-based (e.g. Neural Networks (NNs)) models for the hourly solar PV power output prediction of a largescale 20 MW grid-connected solar PV station located in China. For time series prediction of solar PV power production, the results revealed that NNs were more accurate and required less computational effort than statistical models. However, the authors concluded that the NNs and statistical prediction models were both superior for 1-h forecasting without using any meteorological parameters as inputs. Li et al. 38 coupled individually two powerful optimization techniques, Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), with the ANN model for the short-term power output prediction of a solar PV plant located in China. Specifically, the two techniques were investigated to verify if they could overcome the restrictions of the classic Back-Propagation (BP) learning algorithm and effectively adjust the ANN’s internal parameters (i.e. weights and biases) to improve prediction outcomes. The PSO-BP outperformed the classic BP-ANN and GA-BP, according to the findings. Moreover, Khademi et al. 39 used the Multi-Layer Perceptron (MLP) supported with Artificial Bee Colony (ABC) method to predict the power production of 3.2 kW solar PV plant. The collected data was divided into sunny and cloudy, each being used to build the MLP-ABC model. Results showed that separation of the data into different weather conditions enhanced the production prediction accuracy compared to that obtained by the same model built on the entire dataset.
The Extreme Learning Machine (ELM) is one of the ML prediction and statistical methods identified by its faster training capabilities and simplified implementation than the ANNs. Further details on the ELM method can be found in Ahmed et al. 40 For instance, the ELM was employed to predict the 24 h-ahead solar PV power production. 41 The developed prediction approach was optimized concerning the number of hidden nodes and historical weather conditions (i.e. solar irradiations and ambient temperatures). Results showed that the ELM model provided slightly accurate predictions with little computational burden than those from the BP-ANN model.
The low accuracy of the prediction is one of the shortcomings that challenged the statistical methods. Therefore, the combination of two or more prediction approaches was proposed to minimize the error and enhance (improve) the prediction accuracy. 42 This approach is known as the hybrid prediction method (i.e. an ensemble of models). For more details, the readers may refer to Guermoui et al. 43 For example, Kaffash and Deconinck 44 proposed a prediction approach based on the combination of ANN and SVR for 1-day ahead PV power generation prediction without using any exogenous inputs. To this aim, the Random Forest (RF) technique was employed for selecting the relevant historical data (i.e. feature selection). Results showed that the prediction accuracy is improved to each sole base model (ANN and SVR); Khan et al. 45 proposed an improved generally applicable stacked ensemble algorithm of ANN and Long Short-Term Memory (LSTM) as base models, whose predictions were aggregated using an Extreme Gradient Boosting (XGBoost) algorithm, to enhance the predictability of the solar PV production. Results showed that the proposed ensemble approach outperformed each base model (ANN and LSTM) and Bagging ensemble learning method on two different case studies.
In addition to the statistical and hybrid approaches, the physical approach that was presented in Tiwari et al. 46 , Bacher et al. 47 and known as Numerical Weather Prediction (NWP) approach was applied to predict the PV generation by solving mathematical equations based on the meteorological variables, that is, temperature, humidity, pressure, and wind. Further, the clouds’ detection and estimating their behavior are used to predict the PV generation, known as the Sky Imaginary Forecasting Method (SIFM). One of the methods used to predict the PV plant’s generation relies on Satellite Images (SI) to trace the cloud motion. Still, this approach lacks high performance due to the clouds’ capability to reform and scatter quickly.48–53 The SI method can be categorized into global and mesoscale physical methods, depending on the covered area of the atmosphere in the simulation. Generally, the physical method is mainly used for forecasting the PV generation with very short to long time horizons, and one of its disadvantages is the performance, which is reliable if the weather conditions were stable. 54
Original contributions
This work aims to develop an adaptive prediction approach-based ensemble of data-driven models for solar PV power production prediction. More specifically, the proposed approach is structured in two phases: (i) partitioning the solar PV system dataset into
The capability of the proposed approach is tested on a solar PV system located at the rooftop of the Faculty of Engineering at the Applied Science Private University (ASU) (Amman, Jordan) 55 and compared with the benchmark prediction approach of Al-Dahidi et al. 30 The comparison is carried out using standard performance metrics, namely Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and its Weighted version (WMAE). In addition, the prediction stability/variability provided by the two prediction approaches has also been studied by examining various statistical measures of the performance metrics distributions obtained over different simulation trials.
Thus, the original contributions of this work are:
The development of an adaptive prediction approach based on ensembles of ANN models. The adaptivity entails building, optimizing, and evaluating ensembles of ANN models locally at each cluster/hour
Comparing the proposed prediction approach with a benchmark approach based on a single ANN that has been built, optimized, and evaluated at each hour
The remainder of this paper is structured as follows. Section “Problem formulation and work objectives,” formulates the problem and defines the work objectives. Section “Case study and methodology,” presents the real case study being investigated in this work and illustrates the proposed prediction approach. section “Results and discussion,” shows the application results of the proposed approach on the real case study and compares them with the results obtained by the benchmark prediction approach from the literature. Finally, section “Conclusions” concludes the work and draws some future recommendations.
Problem formulation and work objectives
For a solar PV system, let us assume that we have the following information collected at each hour
The time stamp, represented by the hour counter (
The weather conditions experienced by the PV system, represented by the wind speed (
The associated power productions (
The overall available dataset could be structured in a matrix
Due to the inherent hourly seasonality appearing in the solar PV dataset, the objective of this work is to accommodate such seasonality/variability for ultimately enhancing the 1-day ahead solar PV power production prediction performance with convenient computational effort. To this aim, the available dataset,
Case study and methodology
In this Section, the real case study studied in this work regarding a solar PV system and the proposed methodology are illustrated in section “Case study” and section “Methodology,” respectively.
Case study
This Section illustrates the case study used to prove the proposed prediction approach’s effectiveness. The solar PV system installed at the Applied Science Private University (ASU) on the rooftop of the Faculty of Engineering (ASU09) is studied (Figure 1). 55

Solar PV system installed on the rooftop of the Faculty of Engineering (ASU09) and the ASU weather station.
The dataset being used comprises hourly historical weather conditions and the associated power productions collected from 16th May 2015 to 31st December 2018 (i.e.
The overall dataset is recorded in a matrix
Negative solar irradiations (due to offset in the solar irradiation sensors) and missing associated power productions (due to inverter failures) were recognized early and late daily hours. Thus, they were set to zeros;
Missing weather conditions (due to sensors failures) and power productions (due to inverter failures or network disruptions) were recognized in some middle-day hours. Thus, they were excluded from the analysis;
For illustration purposes, Figure 2 shows the pre-processed weather conditions and the associated power productions of four particular days representative of the four seasons (shown in different colors). Notice the large variability appears at different hours of a day (i.e. hourly seasonality) in addition to that appearing at the different days of the seasons (i.e. annual seasonality). This implies that building a prediction model capable of handling such variability will be beneficial in improving prediction accuracy and stability.6,30
Lastly, the overall dataset

The hourly weather conditions: (a) wind speed, (b) relative humidity, (c) ambient temperature, (d) solar irradiation and (e) the associated power productions for four different days representative of the four seasons.
where
It is worth mentioning here that the dataset being used in this work to develop the proposed adaptive ensemble-based prediction approach is the historical weather conditions (collected from a nearby weather station) experienced by the PV system under study and the associated power productions. For 1-day ahead prediction, one requires in inputs the weather conditions to be experienced by the PV system in the next day. To this aim, one could utilize the NWPs data and/or to forecast the weather conditions of the next day by resorting to data-driven techniques from the literature. However, the intention of this work is solely to investigate the capability of the ensemble of models built, optimized, and evaluated locally at each hour
Methodology
This Section presents the suggested prediction approach for the 1-day ahead solar PV system power production. As shown in Figure 3, the proposed approach can be structured in two phases: In Phase 1, the overall available dataset

The proposed adaptive prediction approach-based ensemble of data-driven models.
Once the
Thus, the established datasets are divided into three different datasets at each hour
Training datasets (
Validation datasets (
Test datasets (
In detail:
It is worth mentioning that diversity among the base models of each
Each

The configuration of the ANN base prediction model.
In summary, the predicted
where
The internal parameters of the
• The number of base models of each ensemble needs to be optimized (i.e., the architectures). The optimum number of the base models (
• The number of hidden nodes of the ANNs models that comprise the different ensembles needs to be optimized (i.e. the configurations). The optimum number of the hidden nodes (
Different architectures and configurations are built using the training datasets and evaluated using the validation datasets to select the optimum architecture and configuration adaptively at each hour
Small values of these metrics indicate the superiority of the obtained predictions and vice versa. Then, various statistical measures (across the 25 simulation trials) are calculated to evaluate the prediction accuracy and stability of the proposed approach. The optimum configuration and architecture (
On the other hand, the benchmark approach entails building
Results and discussion
The application results of the proposed approach (section “Methodology”) on the ASU case study (section “Case study”) are here presented and compared with the results obtained by the benchmark.
30
It is important to state here that the early morning hours (
Application results of the proposed approach
At each hour
Table 1 reports the optimum number of base models (
The optimum configurations and architectures obtained on the validation datasets for the built ensembles from
Looking at Table 1, one can recognize that the inherent hourly variability of the solar PV power production requires building different ensembles characterized by different configurations and architectures (thus adaptive). In this way, each of the built ensembles will handle the inherent hourly variability of the solar PV power production for enhancing the corresponding hourly predictions. Notice that:
For the middle days hours (e.g.
For the late morning (e.g.
It is worth noting that an exhaustive searching procedure could be used to identify different optimum numbers of hidden nodes for the base models that constitute each
The detailed characteristics of the ANN models of the built ensembles.
For replication purposes, Table 2 reports the detailed characteristics of the ANN models of the built ensembles. Once the proposed prediction approach is built and optimized, it is evaluated on the test datasets and compared with the benchmark prediction approach of. 30
Application results of the benchmark approach
The benchmark prediction approach entails building, optimizing, and evaluating single ANN models (whose detailed characteristics are reported in Table 2) at each hour
The optimum configurations (i.e.
The optimum configurations obtained on the validation datasets for the single ANN models built from
Notice that:
The proposed approach significantly outperforms the benchmark approach on the validation datasets (
The prediction performances obtained by both the proposed and the benchmark approaches are proportional to the variability level inherent in the solar PV data at hand (e.g. small and large variability entails small and large metrics’ values obtained at
Comparisons and discussions
Figure 5 shows the box plots of the three-performance metrics distributions, thatr is, RMSE (Figure 5(a)), MAE (Figure 5(b)), and WMAE (Figure 5(c)), obtained by the proposed and benchmark approaches on the test datasets at each hour
The proposed approach provides more accurate predictions than the benchmark approach at each hour
The proposed approach provides more stable predictions than the benchmark approach at each hour

Hourly prediction comparison for the proposed approach and benchmark approach: box plot of (a) RMSE, (b) MAE, and (c) WMAE.
To further show the effectiveness of the suggested approach, Performance Gain (
In practice, the performance gain describes the improvement in the prediction achieved by the proposed approach compared to the benchmark approach for each of the three performance metrics: positive values mean that the proposed approach outperforms the benchmark approach and vice versa. 30
Figure 6 shows the performance gains calculated on the test datasets for the three performance metrics over the 25 simulation trials. One can see that the proposed approach enhances the production predictions (i.e., positive gain values) for the three performance metrics for most of the day hours with respect to the benchmark. On average, enhancements reach around 5% (for the

The performance gains of the (a) RMSE, (b) MAE and (c) WMAE performance metrics obtained by the proposed approach and the benchmark approach on the test datasets over the 25 simulation trials.
In addition, the computational efforts required by the proposed approach and the benchmark approach on the test datasets over the 25 simulation trials are 8.32 and 1.02 min, respectively. Thus, the transition from having a single ANN model to an ensemble of ANN models at each hour
To further clarify the effectiveness of the proposed approach to the benchmark approach, Figure 7 shows the box plots of the RMSE distributions (as a representative of the other two performance metrics) computed over the 25 simulation trials on the test datasets at each season (i.e. Winter (Figure 7(a)), Spring (Figure 7(b)), Summer (Figure 7(c)), and Autumn (Figure 7(d))). By observing Figure 7, one can recognize the following:
The proposed approach provides more accurate predictions than the benchmark approach at each hour
The proposed approach provides more stable predictions than the benchmark approach at each hour

Box plots of the RMSE performance metric distributions obtained by the proposed approach and benchmark approach at each season of the test datasets using the 25 simulation trials: (a) winter, (b) spring, (c) summer, and (d) autumn.
Specifically, the superiority of the proposed approach can be clearly recognized by computing the achieved performance gains for the 1-day ahead solar PV power production prediction in each season. In this regard, Table 4 reports the performance gains of the three performance metrics computed for each season in the test datasets as well as for the entire test datasets. Looking at Table 4, one can notice the following:
Throughout the four seasons, and hence across the full test datasets, the suggested approach significantly outperforms the benchmark approach for the three performance indicators;
The suggested approach achieves the highest performance gains using the three performance metrics during the winter season followed by autumn, summer, and spring;
Throughout the whole test datasets, the suggested approach achieves around 9%, 8%, and 10% performance gains to the benchmark approach using the RMSE, MAE, and WMAE performance metrics, respectively.
Seasonal 1-day ahead prediction comparison for the proposed and benchmark approaches.
For clarification purposes, Figure 8 shows the productions predictions obtained by the proposed approach (circles) compared to those obtained by the benchmark approach in one of the best simulation trial (diamonds) for four different days (selected at the four different seasons in the test datasets) together with the actual productions (squares). In addition, Table 5 reports the corresponding performance gains computed for the three performance metrics of these particular days. One can notice the better matching (i.e. higher prediction accuracy and, thus, higher performance gain) between the proposed approach’s predictions and the actual productions compared to the benchmark approach’s predictions in the four different days.

The power production predictions obtained by the proposed and the benchmark prediction approaches for four different days selected from the test datasets: (a) winter, (b) spring, (c) summer, and (d) autumn.
The corresponding performance gains computed for the three performance metrics on the four different days.
Conclusions and future recommendations
This work aims to propose an adaptive approach-based ensemble for the 1-day ahead production prediction of solar Photovoltaic (PV) systems. The proposed prediction approach entails building ensembles of Artificial Neural Networks (ANNs) at each hour h of a day, h∈ [8, 20] using the corresponding training datasets. Then, the built ensembles are optimized using the related validation datasets in terms of (1) the number of the ANNs that constitute the ensembles (i.e. architecture) and the number of hidden nodes required by the ANNs modes of the ensembles at each hour h (i.e. configuration). Thus, the proposed prediction approach is adaptive. Finally, the suggested prediction methodology is verified on a real case study of a 264 kW solar PV system installed at Applied Science Private University (Amman, Jordan). Three standard well-known performance metrics were computed to evaluate the effectiveness of the proposed approach, namely the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and the Weighted MAE (WMAE). In contrast to other prediction approaches of the literature that uses single ANNs at each hour h of the day, the proposed approach showed, for instance:
An improved prediction accuracy characterized by lower RMSE, MAE, and WMAE values (Figure 5) and higher corresponding performance gains (Figure 6) on the test datasets. Specifically, an improved prediction accuracy reaches up to 8%, 9%, and 10% for the RMSE, MAE, and WMAE performance metrics, respectively, on the test datasets for the 1-day ahead prediction (Table 4);
More stable predictions (characterized by small boxes with a minimum number of outliers computed for the distributions of the three performance metrics over the entire 25 simulation trials on the test datasets (Figure 5)).
Furthermore, the effectiveness of the proposed approach was proved regarding the different weather conditions (i.e. seasons) experienced by the ASU PV system under study. The proposed approach significantly outperforms the benchmark approach for the three performance indicators across the four seasons. The highest performance gains were obtained for the winter season (with about 18%, 15%, and 15% for the RMSE, MAE, and WMAE, respectively), whereas the lowest performance gains were obtained for the spring season (with about 5%, 4%, and 4% for the RMSE, MAE, and WMAE, respectively).
Additional diversity injection techniques will be investigated in the future to improve the efficacy of the suggested adaptive ensemble approach in the context of solar PV power production prediction, for example, by using a mixture of different types of base prediction models that constitute the ensembles.
Footnotes
Appendix
Acknowledgements
The authors would like to acknowledge the Renewable Energy Center at the Applied Science Private University for sharing with us the Solar PV data. The authors would like to thank the reviewers for their valuable comments to improve the quality of the paper.
Handling Editor: Chenhui Liang
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Data availability
The data that support the findings of this work are available from the Renewable Energy Center at Applied Science Private University (ASU), but restrictions apply to the availability of these data, which were used under license for the current work, and so are not publicly available. However, data are available from the authors upon reasonable request and with permission of the Renewable Energy Center at ASU.
