Abstract
Accurate prediction of the generation capacity of photovoltaic systems is fundamental to ensuring the stability of the grid and to performing scheduling arrangements correctly. In view of the temporal defect and the local minimum problem of back-propagation neural network, a forecasting method of power generation based on long short-term memory-back-propagation is proposed. On this basis, the traditional prediction data set is improved. According to the three traditional methods listed in this article, we propose a fourth method to improve the traditional photovoltaic power station short-term power generation prediction. Compared with the traditional method, the long short-term memory-back-propagation neural network based on the improved data set has a lower prediction error. At the same time, a horizontal comparison with the multiple linear regression and the support vector machine shows that the long short-term memory-back-propagation method has several advantages. Based on the long short-term memory-back-propagation neural network, the short-term forecasting method proposed in this article for generating capacity of photovoltaic power stations will provide a basis for dispatching plan and optimizing operation of power grid.
Keywords
Introduction
According to the latest statistics released by the International Energy Agency’s Photovoltaic Power System Project (IEA PVPS), 1 in 2018, the new installed capacity of PV worldwide was 98 GW, which was about 29% higher than that of 2017. By then, the total installed capacity of PV worldwide reached 300 GW. To the new installed capacity in 2017, China contributed 53 GW. Its total installed capacity is 130.42 GW. China ranked first in the world in the new and accumulated installed capacity. Photovoltaic power generation presents many advantages, such as easy installation. 2 The clean production process will not cause environmental pollution. 3 When PV power generation is connected to the grid, it brings great troubles/challenges to the grid system scheduling and operation. In order to ensure the stability of the grid and make the scheduling arrangement and operation optimization correctly, it is necessary to make accurate prediction of the generation capacity of PV system.
At present, there are two main methods to predict photovoltaic power generation, that is, classical statistical time series prediction method and machine learning method. 4 Among them, the prediction methods based on time series statistics require less computation, and the specific models include autoregressive (AR) models, 5 vector autoregressive (VAR) models, 6 spatio-temporal aware methods, 7 and so on. J Tastu et al. 8 proposed multivariate conditional parametric models for a spatio-temporal analysis of short-term wind power forecast errors using VAR. The main limitation is a non-sparse matrix of coefficients since feature selection is not performed. L Cavalcante et al. 9 then improved the models. They proposed least absolute shrinkage and selection operator (LASSO) vector autoregression structures. The following limitations are identified for these models: (1) timing data should be stable or be stable by differencing; (2) in essence, only linear relations can be captured, but not nonlinear relations. 10 Prediction methods based on machine learning are mainly based on neural network. The specific models include back-propagation algorithm neural network model, 11 self-organizing maps model, 12 the prediction model based on genetic algorithm neural network, 13 and the prediction model based on the particle swarm back-propagation (BP) neural network. 14 A Malondkar et al. 15 proposed an algorithm using growing hierarchical self-organizing map. The main difficulty comes from the computation of a hierarchical distance function which can be used to modify the optimization function of growing hierarchical self-organizing map (GHSOM). S Bouktif et al. 16 proposed a long short-term memory (LSTM) model for electric load forecasting using feature selection and genetic algorithm. C Ye et al. 17 proposed a data-driven bottom-up approach for spatial and temporal electric load forecasting. M Ceci et al. 18 presented a spatial autocorrelation and entropy method for renewable energy forecasting. However, these prediction models are suitable for certain fixed PV power stations, but do not fully consider the time correlation of data and the variable climatic factors. None of them analyzed or improved the meteorological data set, which is most important to the power prediction. When meteorological conditions change dramatically, the prediction results become distorted and other conditions are not able to adapt to large PV systems.
Based on the existing photovoltaic power generation prediction in the literature research, careful consideration of various meteorological factors and environmental factors, photovoltaic power generation device electrical characteristics, we propose a power generation model using long-term memory (LSTM) to optimize the BP neural network. On this basis, the traditional prediction data set is improved. According to the traditional three methods, a fourth method to improve the traditional photovoltaic power station short-term power generation prediction data set is proposed. A large power generation data in recent years is used for the training data. Through comparison with the multiple linear regression (MR) and the support vector machine, we present a new power generation forecasting scheme and show it is more accurate, more comprehensive, and suitable for large photovoltaic power stations.
Output power model
At present, scholars focus on the improvement of the forecasting algorithm for photovoltaic power generation and obtaining the prediction results through continuous training of the original data. The variable environmental factors, meteorological factors, and the electrical characteristics of the photovoltaic device are not precise enough as the predicted training parameters after unified naturalization. As a result, there is still much to be improved in photovoltaic power generation forecasting technology.
Photovoltaic power generation depends on weather conditions, such as cloud cover, temperature, and humidity. When these conditions change extremely, photovoltaic output presents uncertainty and intermittency. When the PV system is connected to the grid, the fluctuation occurs, which causes the output imbalance to the grid dispatch. An excellent photovoltaic power generation prediction scheme must be able to improve the reliability of grid dispatch and provide support for grid operation optimization.
Considering the influence factors of photovoltaic power output, we can get the output power of engineering model 19 such as type (1)
where
In equation (1),
LSTM-BP neural network
BP neural network is a multi-layer feedforward network of error reverse propagation, which is generally composed of input layer, output layer, and hidden layer, as shown in Figure 1. The network had

Three-layer structure of BP neural network.
Activation functions
The output of the
Therefore, the relationship between the input
Neural network training refers to adjusting the connection weight of the network so that the output can be as close as expectation
Among them, the actual output of the neural network under the input
Since BP neural network does not consider the data correlation, that is, the time correlation between data is not considered. For the same data set, the same result will still be obtained after shuffling. At the same time, the traditional neural network prediction is easy to produce gradient descent disappearance and fall into local optimal problems. 20 In this article, a method of BP optimization using LSTM is proposed.
LSTM is greatly different from BP neural network. 21 It is a special recurrent neural network structure called RNN. LSTM improves RNN by adding state C to the hidden layer to become the cell state. Cell state is preserved by three special structures: forgetfulness gate, input gate, and output gate. Long-term state control of LSTM diagram is shown in Figure 2.

LSTM long-term state control diagram.
The short-term photovoltaic power generation prediction based on LSTM-BP neural network first inputs the training data set into LSTM network and processes the original data. The output of the hidden layer is taken as the input of BP neural network using the characteristics of time correlation of LSTM. Finally, the predicted output is obtained through the optimized BP neural network. The prediction model is shown in Figure 3.

Prediction model of LSTM-BP.
LSTM-BP neural network training process is as follows
1. The photovoltaic data are sorted and the standard matrix, eigenvalues, and eigenvectors are obtained
In the type,
2.The LSTM network is initialized. The initial weight matrix is entered, which is a random number between (0, 1). The goal of least error of the network and the maximum epochs are set.
3.
where
4.The output of LSTM is used as the input of BP neural network model. The mean square error of the actual output and the theoretical output of BP network is used as the error calculation formula. The weight
5.Stop training if the epochs or goal of least error was reached.
LSTM-BP prediction effect comparison
Training data
In the experiment, adopting SolarGIS Meteosat (© EUMETSAT, DE) and GOES (© NOAA, USA) radiation of satellite remote sensing data, combined with Meteosat (© EUMETSAT, DE) and GOES (© NOAA, USA) of cloud and snow index, Global Forecast System (GFS) database (© NOAA, USA) of water vapor data, a series of meteorological elements including solar radiation and temperature value are calculated. Taking photovoltaic power station no. 11282 in Zhangdian district, Zibo city, Shandong province, China (118° east longitude, 32° north latitude) as an example, the generation data from 2012 to 2017 are selected/studied. Specific meteorological and environmental data collection patterns are shown in Table 1.
Training data style.
GHI: global horizontal irradiation; DNI: direct normal irradiance; RH: relative humidity; AP: atmospheric pressure; WS: wind speed; WD: wind direction.
Solar radiation: total horizontal radiation (GHI), direct normal radiation, scattered radiation (DIFF). Meteorological parameters: air temperature at 2 m, RH, average WS and WD at 10 m, precipitation RH, atmospheric pressure.
Implementation of LSTM-BP neural network and parameter selection
A three-layer neural network with input layer, hidden layer, and output layer is set up in MATLAB. By referring to the “trial and error method,” the memory length of LSTM is determined to be 5 through constant testing and changing of the model topology. We set least error to 0.001 and the maximum epoch to 1000.
Empirical results analysis of LSTM-BP neural network
Through the construction of LSTM-BP neural network, the error function in the power generation prediction model of photovoltaic power station will not be overfitted. In order to test the effectiveness of the prediction of power generation by LSTM-BP neural network, this article draws a conclusion by comparing the prediction of power generation by BP and GA (genetic algorithm)-BP neural network. We use data from 2014 to 2016 as the training data and the data in early October 2017 as validation or prediction data. The predicted results and relative errors are shown in Table 2.
Power generation prediction results of early October 2017 based on LSTM-BP, BP, and GA-BP (kW h).
LSTM-BP: long short-term memory-back-propagation; BP: back-propagation; GA-BP: genetic algorithm-back-propagation.
Table 2 shows the daily power generation in a certain area of the photovoltaic power station in degrees. According to the relative errors of three kinds of neural networks in predicting the power output of photovoltaic power station, LSTM-BP neural network has improved the prediction accuracy compared with BP neural network and GA-BP neural network.
Improvement of forecasting accuracy
There are many factors affecting photovoltaic power generation forecast. In this article, a variety of schemes will be proposed considering the influence of historical power generation data and current short-term power generation forecast in previous days.
where

The relationship between the average state-owned enterprise capacity of the first 5, 10, and 20 days and the actual state-owned enterprise capacity of the forecast day.
In the formula,

The comparison of mean value of historical data and actual output.
Based on the above three schemes, scheme 4 is adopted in this article to add the average data of previous days and the average data of the same day in previous years into the original data. The comparison of the four schemes is shown in Table 3. The LSTM-BP neural network is trained with the data collected from 2014 to 2016 to predict the data of October 2017 and November 2017. The relative error of the prediction of the four schemes is shown in Table 4. All data are normalized. The prediction effect of the four schemes is shown in Figure 6. The relative error is calculated using equation (17). It is multiplied by 100 and expressed as a percentage
Comparison of four schemes.
Comparison of power prediction in November 2017 based on four schemes.

The predictive effect of four schemes.
With MR 22 prediction model of support vector machine (SVM) 23 as a reference for horizontal comparison, the relative error between scheme 4 based on LSTM-BP and the other two prediction methods is given in Table 5. As shown in Figure 7, the short-term power generation prediction effect of the photovoltaic power station based on LSTM-BP is better than the other two methods.
Lateral comparison of power prediction among scheme 4, MR, and SVM in November 2017.
MR: multiple linear regression; SVM: support vector machine.

Lateral comparison among scheme 4, MR, and SVM.
Conclusion
In view of the time correlation and good convergence of LSTM-BP neural network, it is unlikely to get into overfitting in the process of sample training, thus reducing the relative error of prediction results. In the forecasting process of PV power generation, the forecasting effect will be distorted due to meteorological conditions, especially in extreme weather. Considering that the power generation of a power station always fluctuates around the average power generation, this provides a basis for the accuracy of the prediction model. In scheme 4, historical power generation data of photovoltaic power station are used as reference, and the mean value of historical power generation data is obtained from horizontal and vertical data. Then a prediction model with high accuracy is constructed by LSTM-BP neural network for sample data training. The accurate prediction results provide a reliable basis for power network scheduling and optimal operation.
Footnotes
Acknowledgements
Some of the authors of this publication are also working on these related projects: (1) Brand Profession Project of Colleges and Universities of JiangSu Province under grant no. PPZY2015C239 and (2) “333” High-Level Talents Training Project of JiangSu Province under grant 2016(17).
Handling Editor: Michelangelo Ceci
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
