Abstract
Photovoltaic power is now a major green energy resource, and its generated power can be directly connected to the power grid. However, the stability of power grid may be affected by the random and intermittent characteristics of photovoltaic power. In order to solve this problem, a forecasting model based on the deep belief nets is proposed. First, affecting factors of photovoltaic power generation are studied, including solar radiation intensity, air temperature, relative humidity, and wind speed. Based on the correlation coefficient between output power and each factor, the most influential factors can be determined and used as inputs of the proposed forecasting model for training process. Second, the forecasting model is then established and applied to predict the photovoltaic output powers for 2 weeks in summer and winter, respectively. The mean absolute percentage error, mean squared error, and Theil’s inequality coefficient are used to evaluate the performance efficiency between the proposed deep belief net model and back propagation neural network model. The performance outcomes reveal that the proposed deep belief net model can improve the prediction errors with rapid convergence significantly, better than the back propagation model.
Introduction
Nowadays, the scale of grid-connected photovoltaic (PV) system is becoming larger and larger, and thus the impact of grid-connected PV power generation increases more considerably.1,2 For this reason, well-predicted PV output power system can effectively improve the impact of power grid dispatching, enhance the system security and stability, 3 and also reduce the operating reserve.3–5
Recently, many methods have been reported to predict the PV output power system for a short term. 6 They can be divided into two categories: one is to predict the PV panel temperature, solar radiation intensity,7–10 and other factors related to PV power generation. Then, the predicted results are input to the PV physical model to obtain the PV output power. 11 Another type is to establish the prediction model based on historical data including output power, atmospheric temperature, and solar radiation. The output power for this type is forecasted directly. As mentioned above, the first type is an indirect forecasting method with complex process, and its prediction accuracy is low. However, the second type has been studied and applied widely.12–15
Tascikaraoglu et al. 16 used three factors for prediction related to the PV output power, including radiant intensity, ambient temperature, and wind speed. The output power prediction model was applied to the PV output power conversion. However, the prediction is incapable of achieving a high accuracy due to a low correlation between the wind speed and the output power. Liu et al. 17 introduced a new factor to improve PV power forecast ion using the aerosol index. But, it is easy to fall into a local minimum, and it may also suffer from a slow convergence speed. Giorgi et al. 18 applied Elman artificial neural network (ANN) to predict the PV output power. Although this method presented a high prediction accuracy, it did not take into account the influence of both weather and season. Therefore, the results may not be convinced. Zhong et al. 19 combined the particle swarm optimization (PSO) and back propagation (BP) neural network to realize the prediction process. The traditional BP algorithm was thus improved in the prediction performance, but the calculation was too complex to be applied in reality. Shi et al. 20 proposed a PV output power prediction model based on support vector machine (SVM) using the weather forecast data and the actual power generation data from the PV system. However, it was unable to reach a good prediction result due to the complexity of parameter selection. Jang et al. 21 announced a new prediction method based on satellite image and SVM. This method used the satellite image of atmospheric motion to predict the movement of the cloud, and the satellite image data were collected for SVM learning. Unfortunately, the intermittent and random PV output power condition made the prediction uncertain. The neural network and SVM belong to the algorithm of shallow structure, and it is difficult to express the nonlinear complex function effectively when the shallow structure is limited. It will be difficult to get accurate predict results for the output power of the PV system with intermittent and random characteristics.
In order to solve the problems in the prediction process of the output power of PV system, the deep belief network 22 (DBN) which has developed rapidly in recent years is adopted. DBN belongs to a deep learning algorithm, this kind of algorithm has a great potential in solving regression problems. It can be used to achieve more complex function approximation through the deep nonlinear network structure and has a strong ability to learn the essential features of data sets from a small number of samples. CY Zhang et al. 23 proposed a predictive deep Boltzmann machine (PDBM) method to predict the wind speed with a high accuracy thus giving an effective processing capability for nonlinear time series. T Hirata et al. 24 reported a new method combining DBN with autoregressive integrated moving average model (ARIMA) to predict the time series problems, especially for chaotic time series. The algorithm has made a breakthrough in speech recognition, computer vision, and other classification problems. However, in the field of PV output power prediction, the application of deep learning algorithm is relatively few. Taking into account the output power with intermittent and random characteristics, the powerful nonlinear mapping ability of DBN algorithm is decided to be used to predict the output power of PV generation system. At the same time, the two factors of weather types and month are considered, and their influence on prediction is analyzed.
DBN
DBN model is stacked by the restricted Boltzmann machine (RBM) based on energy function.25,26 For a given set of states (v, h), v represents the visible layer, h represents the hidden layer, and its energy function is defined as follows
where
Z is a normalization factor, that is, a partition function, and its expression is shown as follows
The probability distribution of v and h, namely, P(v) and P(h), also known as the likelihood function, can be expressed as follows
Assume that the state of all neurons in the visible layer is known, and the activation probability of the jth neurons in the hidden layer can be expressed as
The probability of the whole hidden layer can be expressed as
Similarly, when the state of all neurons in the hidden layer is known, the activation probability of the ith neurons in the visible layer can be expressed as
The probability of the whole visible layer can be expressed as
When the training sample is as follows
where n is the number of samples,
The above formula is difficult to resolve, so it is converted to logarithmic form as follows
When considering
The first item on the right side corresponds to the expectation of energy function
The second item on the right side corresponds to the expectation of energy function
The contrastive divergence (CD) algorithm is used to calculate the approximate value.
The algorithm process is shown as follows:
Set the initial state of the visible layer, namely,
Perform K-step Gibbs sampling. The t step
The approximate value of
The approximate value of formula (13) can be calculated by the CD-k algorithm, and the value of the parameter matrix can be obtained. Under normal circumstances, a better result can be reached after 1-step CD algorithm.
The network parameters of DBN in each layer can be initialized during the pre-training process, achieving a better local optimum or even the global optimal region. At the highest two levels, the weights are connected together so that the output of the lower layer can associate with the top layer. DBN uses the label units to adjust the discriminant performance by means of BP algorithm. At first, the bottom-up forward propagation is carried out, and then the top-down backward propagation of multi-round supervised training is carried out. The network of proposed prediction model is shown in Figure 1.

Prediction model.
As can be seen from Figure 1, DBN has a total of m layers. Vector x represents the input variables, v represents the visible layer,
Analysis of influence factors on PV output power
There are five crucial factors found to affect the PV output power, such as solar radiation intensity, temperature, relative humidity, and wind speed. In this research, the relationship between each factor and the PV output power is formed from qualitative and quantitative study, respectively. Note that the data used in the article come from The Desert Knowledge Australia Solar Center (DKASC).
Solar radiation intensity
It is known that PV can generate the electricity power from the solar energy. The solar radiation intensity is defined as the amount of solar radiation energy at per unit area in unit time, where the unit is W m−2. In order to verify the effect of solar radiation intensity on the output power, the data in different two days between 9 February 2015 and 13 May 2015 were collected, and the solar radiation intensity varies differently in these two days. The curves of radiation intensity and output power indicate their rough linear relationship, as shown in Figure 2(a) and (b).

(a) Curves of radiation intensity and (b) output power.
Figure 2 proves that the correlation between the solar radiation intensity and output power is very high. Therefore, the solar radiation intensity is regarded as one of the main factors that affect the PV output power.
Atmospheric temperature
The physical quantity of atmospheric temperature can affect the PV conversion efficiency. When the air temperature is relatively high, the reduction of PV output voltage is greater than the increasing of the output current. This phenomenon results in relatively low output power, and therefore, the PV conversion efficiency becomes relatively low. Considering that the variation of the solar radiation intensity in adjacent days is little, the historical data of the atmospheric temperature and output power between 9 February 2015 and 11 February 2015 time period are selected, thus eliminating the influence of radiation intensity on output power, and the corresponding curves of atmospheric temperature and output power are shown in Figure 3.

Curves of atmospheric temperature and output power.
In Figure 3, it can be seen that there is a slightly positive correlation between the atmospheric temperature curve and the output power curve. In fact, the local maximum point of the atmospheric temperature curve does not correspond to that of the output power in the PV system.
Relative humidity
The percentage between water vapor pressure and saturated vapor pressure under the same temperature is called the relative humidity of the air. The relative curves between relative humidity and the PV output power are shown in Figure 4, where the data of the relative humidity and output power between 9 February2015 and 11 February 2015 time period were collected similarly. The results reveal that the relative humidity and PV output power are negatively correlated. It means that the solar radiation absorbed by the PV panel will be reduced when the relative humidity increases. Sequentially, the PV system output power will decrease.

Curves of relative humidity and output power.
Wind speed
The wind speed can change the surface temperature of PV modules by changing the efficiency of heat transfer. In addition, the change of wind speed can also affect the clean degree of the surface of PV modules. The relative curves of wind speed and PV output power are shown in Figure 5. Similarly, the data were collected between 9 February 2015 and 11 February 2015 time period. It can be seen from Figure 5 that the volatility of wind speed curve is stronger than the output power curve. The wind speed is seen more unstable, and the correlation of the two curves is weak. So, the wind speed is regarded as an indirect factor to affect the PV output power.

Curves of wind speed and output power.
Correlation coefficient between factor and the output power of PV
In this study, Pearson correlation coefficient r is used to evaluate the correlation strength between factor and the PV output power, which is defined as follows
where X and Y represent two data sets, and N represents the number of variables. Here, X is the PV output power, and Y is the factor that may affect the PV output power. The correlation coefficient r is located between −1 and +1, namely, |r| ≤ 1. The relation between correlation coefficient r and correlation degree is shown in Table 1.
Relation between correlation coefficient and correlation degree.
From Table 1, when
Relationship of correlation coefficient and correlation degree between variables.
As can be seen from Table 2, the correlation between solar radiation intensity and the PV output power is strongly linear, where r = 0.9496. The correlation between the atmospheric temperature and the PV output power is moderately positive, where r = 0.5011. The relative humidity is moderately negative with the PV output power, where r = −0.4355. The correlation r = 0.3392 between the wind speed, and the PV output power is located in a low linear range, unveiling the lowest influence on the output power.
Short-term output power prediction of PV system
Determination of model inputs
Based on the above analysis, the prediction model using the DBN algorithm is established. The historical PV output power data and the physical quantity of affecting PV power are used as the input of the prediction model. The input variables in the forecasting model include the solar radiation intensity, the atmospheric temperature, and the relative humidity. These factors are proved having the mostly strong correlation with the output power. The input variables for the prediction model are shown in Table 3.
Input variable of prediction model.
Data normalization
The inputs of the forecasting model contain the historical PV output power, intensity of solar radiation, air temperature, and relative humidity, where their units are kW, m s−2, °C, and %, respectively. The input values should be normalized within the scope of [0, 1] for model implementation. Please note that the unit of relative humidity is % so that it can be normalized by dividing the value by 100 directly. The other inputs are processed as follows.
Using the following formula for normalization
where x represents the current load value, and xmax, xmin represent the load maximum and minimum values in a day, respectively.
Prediction model evaluation
In order to verify the effectiveness of the proposed model, three functions are used to evaluate the error of output power prediction, as follows. The mean absolute percentage error (MAPE), mean squared error (MSE), and Theil’s inequality coefficient (TIC) are defined as follows
where l represents the number of samples of the test sample set,
Prediction example
The data of PV output power and meteorological factors from DKASC were selected as training/test samples for two prediction models. The first group data were used for training process, being collected during 1 January 2016 to 22 February 2016 in summer and 1 March 2016 to 24 May 2016 in winter. The collection range is 9:00–17:00 and the data are collected at a frequency of one time every half an hour. Considering that the amount of solar radiation in other time periods is very small, it is not included in the study period. The second group data were used for testing process, being collected during 23 February 2016 to 29 February 2016 in summer and 25 May 2016 to 31 May 2016 in winter. A total of 17 output power values of the 9:00–17:00 for each day are predicted at intervals of half an hour, thus making the prediction time horizon 30 min, which belongs to the short-term forecasting.
Table 4 gives the weather information used for test samples, where it is derived from Meteorological Bureau of Australian Government, including air temperature, rainfall and relative humidity, and cloud information.
Weather information for test samples.
RH: relative humidity; cld.: cloud information.
In the DBN prediction model, the number of hidden layers and the number of hidden nodes have great influence on the prediction results. Here, the hidden layer consists of three layers, and the number of nodes in each hidden layer is set to 10. To compare the results between DBN and BP algorithms, BP model used the same training samples as DBN model for prediction. The training times are set to 200. The training convergence curves from DBN and BP in February are shown in Figure 6. In the whole training process, the convergence rate of DBN is found relatively faster than BP, since the 10th time, the DBN convergent rate reaches lower error, and when the iteration is about 20 generations, the convergence curve of BP begins to converge. Besides, the training error of DBN is also smaller.

Training convergent curves for both DBN and BP models.
After the training process, the test samples were used for testing process to achieve prediction. In February, for example, Figure 7 shows the error curves of the predicted results under different levels of hidden layers; it can be found that in the hidden layer number 3, the prediction error of the 7 days in February is the smallest. With the increase in the number of hidden layers, it can lead to the problem of over learning, which can reduce the prediction accuracy.

Prediction error under different hidden layers in February.
The short-term output power forecast results in May and February are shown in Figures 8 and 9, respectively. From the results, it can be seen that DBN presents better outcomes than BP. It is more obvious at the point where the output power varies significantly, but the output power is low.

Short-term output power forecast results in May.

Short-term output power forecast results in February.
In order to further verify the rationality of the proposed model, it is necessary to evaluate the prediction model. From Table 5, it can be seen that in the period of 25 May 2016 to 31 May 2016, the average errors of MAPE, MSE, and TIC obtained from the DBN prediction model are 8.92%, 0.37, and 0.039, respectively. Contrastively, the average errors of MAPE, MSE, and TIC from the BP prediction model are 16.74%, 1.00, and 0.166, respectively. It is obvious that DBN model is superior to BP model in this case. However, it is noted that the MAPE values in 28 May, 30 May, and 31 May are larger than those of other forecast days due to the cloudy and rainy weather condition. In a sunny day, the prediction error is relatively low. In other words, it is found that bad weather may increase the prediction error, especially in a rainy day. Similarly, from Table 6, it can be seen that in the period of 24 February 2016 to 29 February 2016, the average errors of MAPE, MSE, and TIC obtained by the DBN prediction model are 5.02%, 0.28, and 0.025, respectively. However, the respective values from BP model are 9.22%, 0.89, and 0.046, respectively.
Results of short-term output power prediction in May.
DBN: deep belief network; BP: back propagation; MAPE: mean absolute percentage error; MSE: mean squared error; TIC: Theil’s inequality coefficient.
Results of short-term output power prediction in February.
DBN: deep belief network; BP: back propagation; MAPE: mean absolute percentage error; MSE: mean squared error; TIC: Theil’s inequality coefficient.
Comparing the predicted results with the same weather types in Tables 5 and 6, the conclusion can be drawn as follows: in the rainy day, prediction accuracy is different in the different months. According to Table 4, the rainy day of the seven forecast days in May is 30th and 31st, and 26th in February is also the rainy day of the seven forecast days in the forecast area. It can be seen that the forecast accuracy of rainy day in February is higher than that of the rainy day in May. The output power of PV system in these days will be different according to the difference of the month. The PV power generation in the forecast area is high in February, and the fluctuation is not big so that the prediction error in February is not significant. In addition, according to Table 5, 28 May is cloudy and 24th, 25th in February are the cloudy day and the sunny to cloudy day, respectively. It is found that there is difference between the forecast accuracy of different months in cloudy weather types. Besides, in sunny weather types, there is little difference between prediction accuracy and different months; however, according to the difference of the month, the situation of PV power generation is diverse, as can be seen in Figures 8 and 9, 27 May and 29 May belong to sunny weather, the output power has a certain fluctuation, so compared to other sunny forecast days, the forecast accuracy of these two days is slightly increased.
The relative error (RE) is used for the comparison of prediction accuracy evaluation between different months, which is defined as follows

Prediction error curves in February.

Prediction error curves in May.
Conclusion
Currently, PV has been an important power generation resource in the electric power industry even though it may affect the power grid due to the inherent instability in nature. This proposed DBN prediction model for short-term output power forecasting has been performed successfully. The crucial contributions are concluded as follows.
The correlation strength between five major factors and the PV output power has been achieved, more details shown in Table 2. It indicates that the solar radiation intensity is highly correlated with the output power, and both the atmospheric temperature and relative humidity are significantly correlated. However, the relative humidity is negatively correlated, and the wind speed shows a low linear correlation.
During the 2-week prediction tests, MAPE is reduced to 7.82% and 4.2% by comparing DBN with BP in May and February, respectively. Contrastively, MSE is reduced to 0.63 and 0.61, respectively, and TIC is reduced to 0.127 and 0.021, respectively. It is obvious that the proposed DBN model for forecasting the short-term output power is superior to the BP model.
In the sunny weather, the forecast accuracy shows no significant difference in different months, having a high reliability. Contrarily, under the condition in a rainy or cloudy weather, the prediction accuracy changes considerably and causes a low reliability. This finding proves that the weather condition does affect the prediction results.
Cloudy and rainy weather have a big influence on prediction accuracy, causing an increase of the prediction error. However, sunny weather condition with small fluctuation can produce a high output power and reach a small prediction error.
Footnotes
Academic Editor: Kuei Hu Chang
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science and Technology Supporting Plan of China (No. 2015BAA09B01), the Science and Technology Plan of Hebei Province of China (No. 14214503D), the Program of Tianjin Science and Technology Commissioner of China (No. 16JCTPJC50700), and the Colleges and Universities in Hebei Province Science and Technology Research Youth Fund (No. QN2015111).
