A Novel Dynamic Weight Neural Network Ensemble Model

Abstract

Neural network is easy to fall into the minimum and overfitting in the application. The paper proposes a novel dynamic weight neural network ensemble model (DW-NNE). The Bagging algorithm generates certain neural network individuals which then are selected by the K-means clustering algorithm. In order to solve the problem that K-value cannot be selected automatically in the K-means clustering algorithm when conducting the selection of individuals, the K-value optimization algorithm based on distance cost function is put forward to find the optimal K-values. In addition, for the integrated output problems, the paper proposes a dynamic weight model which is based on fuzzy neural network with accordance to the ideas of dynamic weight. The experimental results show that the integrated approach can achieve better prediction accuracy compared to the traditional single model and neural network ensemble model.

1. Introduction

With the continuous development of artificial intelligence technology, neural network, as an important method in the machine learning field, has advantages which make it a new and favorable choice in many fields. The advantages mentioned above are the function of self-learning and adaptation which enables people to obtain a more efficient model only if sufficient training data is available, showing great advantages over the traditional models. Any single model, however, is often veiled with the characteristic of not being robust. Therefore, in some cases, a single model can not adapt itself to subtle changes in projects, thus making the prediction error increase sharply. Shortcomings like overfitting create many problems in using a single model. However, in the current society, the increasing demand urges a model with generality and high accuracy. On the basis of guarantying the model accuracy, the problem of how to improve the generalization ability of the models has been put forward to people as a main research topic.

Neural network ensemble was officially put forward by Hansen and Salamon [1] in 1990. Hansen and Salamon conducted an experiment in which multiple neural networks were used to study the same issue and the prediction results were synthesized, thus effectively improving the generalization ability of neural networks. Sollich and Krogh [2] presented similar neural network ensemble definitions in 1996. Because of its convenience and ideal effect, this method won wide approval from experts. This method was referred to as the first of the four current most important machine learning research directions [3] by the international authoritative TG Dietterich. Currently, neural network ensemble has been widely used in various fields, such as medical diagnosis [4], the exchange rate prediction [5], image classification [6], and software reliability prediction [7].

The research on neural network ensemble is mainly concentrated on the following two aspects: individual network generation and integrated output. Neural network generalization error is mainly affected by the network individual accuracy and the degree of difference. Generally, it is, to some extent, difficult to improve the network individual “accuracy.” Therefore, the research mainly focuses on how to increase the neural network individual differences. Currently, the research on how to obtain greater individual difference in general can be divided into three categories: data transformation, changing of the network characteristics, and individual neural network optimization. As for the regression problems of “integrated output,” a simple average or weighted average method is usually adopted.

In this paper, the research on the software reliability prediction based on neural network ensemble also concentrates on two aspects: the individual neural network optimization and the integration output. Meanwhile, the optimization algorithm of K-value based on the distance cost function is put forward to solve the problem that clustering number needs to be manually set in clustering algorithm. This paper also proposes the method to solve the problem of the integrated output of neural network ensemble through the establishment of dynamic weighting model based on fuzzy neural network.

2. Related Work

2.1. Individual Generation Method

Neural network generalization error is mainly affected by the network individual accuracy and the degree of difference. Generally, it is, to some extent, difficult to improve the network individual “accuracy” because if there were some effective method, the integration algorithm is of no necessity. Therefore, only by increasing the individual differences of the neural network method can the goal of generalization error reduction be realized. However, currently, there is no unified solution to obtain greater individual difference and this question has also become a hot research topic. According to the characteristics of neural network, in order to obtain different network individuals, the most simple and practical method in the construction of the neural network and training process is to conduct this process independently and separately. In general, the methods can be divided into three categories: data transformation, changing the network characteristics, and individual neural network optimization.

2.1.1. Transform Training Data

For the same set of training data, using different methods to extract samples for training, or removing the redundant attributes in the data, is a common method for generating the neural network integrated individuals and is also the kind of method being most commonly studied. This kind of method can be divided into the following two cases.

(a) Resembling the Sample Set. Due to the fact that the training set of this method is a subset of the original sample data extracted, the commonly used methods are as follows: Boosting method [8] proposed by Schapire and Bagging method [9] proposed by Breiman.

The Boosting method was first put forward by Schapire [8] and then improved by Freund [10]. This method can be described as giving the same weights to each training sample, obtaining a learning machine by training samples, and testing the learning effect of these training samples and then improving the weight of samples with poor learning effect. With this method, the appearance probability of the samples in this part will be increased in the next training set, thus enhancing the learning dynamics of the samples in this part to ensure a better performance in dealing with the samples with poor learning effect in the new period. By repeating the above process, an ideal learning machine will eventually be got.

“Bagging,” as an ensemble learning technique, was first put forward by Breiman. Its main idea is to generate multiple training samples through the Bootstrap resampling, thus training multiple classifiers or predictors. In this method, the integrating differences between members are mainly obtained through the “Bootstrap resampling.” In other words, they are obtained through the random sampling technique. Its main idea is giving a learning machine and a training sample set and setting a “maximum iteration time” as T; several samples were randomly selected from the training samples to form a new training set when an iteration is conducted every time, and then a prediction function is got as $h_{t}$ after each training. T prediction functions will be obtained after T iterations (T is a number), and then by integrating these prediction functions through the weighted method can the new sample be predicted.

(b) Changing the Input Variables Set. The input variable set can be changed by choosing different attributes. Due to the differences of the attributes of the training set, the emphasis of the models is different, which leads to the differences between models. For example, when predicating software quality, many descriptions can be made from many different attributes. Therefore, different attributes help produce different training samples used for model training. In comparison with Bagging algorithm, this method tends to reduce the input dimensions, thus simplifying the structure of the network model. This method, to some extent, accelerates the individual training speed. In addition, this method can also remove some redundant attributes to avoid the influence on the individual precision.

2.1.2. Changing of the Network Characteristics

Except for changing the neural network training data, another method commonly used to produce neural network individuals is to generate different individuals by changing the internal characteristics of neural network. The internal characteristics here mainly refer to initial weights of neural network, topology structure [11], learning algorithm, and target function. (a)

Changing the initial weights of individual neural network training: because the weights of neural network use the gradient descent method for correction, different initializing weight will lead to different final convergent extreme value, causing differences of individuals.

(b)

Changing the individual neural network topology structure: to generate individuals and train these network individuals mainly by changing the number of hidden layer nodes [12, 13]. The differences of the network topology structure lead to comparatively huge differences of the whole training model, thus generating individuals with differences.

(c)

Changing the objective function of network training: the most representative method to produce individuals by changing the objective function is the negative correlation learning method. By adding a penalty term based on the traditional error function, it produced individual differences and guaranteed the individual accuracy [14].

2.1.3. Neural Networks Individual Optimization

After a certain number of neural network individuals are generated comes another question of equal importance: which network individual should be chosen and how many network individuals should be selected as more appropriate? This question has gradually evolved into an individual selection problem in neural network ensemble and has drawn increasing attention from many experts.

Neural network individual selective problem is defined as generating a certain number of individual neural networks by a certain method, and then, according to the analysis of the neural network ensemble generalization ability in this paper, selecting the individuals with huge difference and high precision for the final integration.

A global search algorithm is a kind of method which can be viewed as a method choosing a number of individuals from multiple candidate individuals to achieve the optimal effect, which essentially is about global optimization. Meanwhile, the global optimal problems can be solved by method such as intelligent algorithm. Take the GASEN algorithm put forward by Zhou et al. [15] as an example. It adopted the neural network integration error as the objective function and used the genetic algorithm to find the differences of individuals from the trained neural network individuals, thus helping the construction of neural network ensemble to achieve ideal results.

The neural network ensemble of individual selection is based on clustering algorithm. Selective algorithm can be interpreted as selecting large individual differences in network to integrate. By analyzing the degree of similarity between individuals and making classification according to the difference degree between the individual candidates, the clustering algorithm produces differences between individuals of different categories. Li and Huang put forward a new neural network integration method based on the choice of clustering named CLU_ENN which classifies the individual candidates through the clustering algorithm to produce differences between individuals in similar category and then selected individual with differences to integrate [16]; Giacinto and Roli also used a similar algorithm for selection of neural networks individuals and chose an individual for integration from each category [17].

2.2. Integrated Output Method

After the above generation and selection of individuals, the final step is network individual integration. In the process of neural network ensemble, the problem of combination of individual output is also a very important research direction, which, to a certain extent, determines the quality of integration. Assuming that all the independent outputs of the neural network individuals are available, then the next necessity is to proceed from relevant information of network individuals to comprehensively utilize the predicted information provided by each individual network and then to integrate these individuals with appropriate weighting method; thus the accuracy of prediction will be improved.

Integration output method for regression problems is unlike what it is for the classification problems for it is usually being carried out by simple average or weighted average method. Perrone and Cooper [18] and others thought that choosing the appropriate weighting with the weighted average could achieve better generalization ability than the simple average method does. However there are also some experts [19] who believed that simple average method is suitable when the number of the individuals is large, while when the number of individuals is relatively small, it is better to adopt weighted average method.

In recent years, many researchers are using different optimization methods to determine the weights of the network during the process of integrating output. After obtaining a certain number of neural network individuals, Pan and Wu dynamically solved the weight coefficient of the integrated individuals with genetic algorithm, researched integrated modeling, and then achieved good results [20]. Shen and Kong proposed a neural network ensemble construction method based on dynamic weight [21]. After the selection of the network individuals by genetic algorithms, the training sample is established according to the fitting error of the selected network individuals and then the generalized regression neural network is trained to predict the future time in order to calculate the weight of network individuals during different periods.

3. Selective Neural Network Ensemble Algorithm Based on the Dynamic Weight

3.1. Individual Neural Networks Optimization Based on K-Means Clustering

As discussed above, the main principle of Bagging is based on a kind of technology called “Put back the random sampling technique” (Bootstrapping sampling) to generate multiple predictors. The main idea of Bagging is to choose different training samples by random sampling technique to get the differences of training model, which actually means a reinforcement of the difference between individual neural networks, whereby the generalization ability has been improved. Here Elman neural network is adopted as individual neural networks.

Software reliability prediction is adopted as example for the DW-NNE model. The problem can be described as follows. (a)

Initialization: the given original training set is $R = (x_{1}, x_{2}, x_{3}, x_{4})$ , $(x_{2}, x_{3}, x_{4}, x_{5}),…,$ $(x_{n - 3}, x_{n - 2}, x_{n - 1}, x_{n})$ ; n is the capacity of the number of failures in the original training set. The next failure occurrence time can be predicated according to the previous three failure occurrence times, assuming that the capacity of neural network ensemble is T and setting the initial neural network ensemble E as null.

(b)

To $t = 1,2, \dots, T$ cycle:

(i)

$R_{t}$ , a training set with m samples replaceable sampled from training set R;

(ii)

training the neural network individuals $h_{t}$ in the last section according to the training set $R_{t}$ ;

(iii)

putting the neural network $h_{t}$ into the neural network ensemble.

(c)

Returning to the neural network ensemble $E = \{h_{t}\}$ , $t = 1,2, \dots, T$ .

(d)

Predicting $h_{ens} (x_{i}) = (1 / T) \sum_{i = 1}^{T} h_{t} (x_{i})$ .

Clustering algorithm is an important method in the field of data mining. It is mainly used to solve classification problems and has been widely adopted in many fields for its superior performance. Its main idea is to partition data objects into several classes to make the data objects with high similarity flock in the same class, while the data objects in different classes differentiate with each other obviously.

In the neural network ensemble, clustering algorithms are mainly used for individual choice. After a certain number of neural network individuals have been produced, this clustering algorithm divides these individual candidates into several classes based on their differences and then, respectively, selects one individual from each class as a representative of the whole category to participate in the integration. Generally, the individual with the lowest fitting error of each class would be chosen to participate in the integration.

3.1.1. K-Value Optimization Problem in K-Means Clustering Algorithm

With very high efficiency, the traditional k-means clustering algorithm is simple and easy to operate. Particularly for large data set with comparatively complex structures, the algorithm has strong scalability and higher execution efficiency. But in the process of using this algorithm, a critical problem is exposed which is the selection of k value. The fact that the k value in the k-means clustering algorithm needs to be given in advance will have certain impact and restriction on its rationality of applicability. Particularly as to the selection of individual neural networks in this paper, the exact number of the individuals supposed to be selected from the neural network individual candidates is hard to be determined.

Based on this problem, in order to realize the optimization algorithm of K-value, this paper puts forward the concept of distance cost function. Suppose that n spatial objects are divided into K clusters; respectively, define the distance between classes as L and the distance inside a class as D. Consider that

\begin{matrix} L = \sum_{i = 1}^{k} |m_{i} - m|, \\ D = \sum_{i = 1}^{k} \sum_{p \in C_{i}} |p - m_{i}| . \end{matrix}

(1)

Herein, L represents the sums of the distances from all the clustering centers to the global center, D represents the sums of the total internal clustering distances, K represents clustering number, m represents the mean of all samples, $m_{i}$ is the mean of the samples in cluster $C_{i}$ , and p is any possible spatial object.

Then the distance cost function $F (s, k)$ is the sum of the distance between classes L and class internal distance D. Therefore,

\begin{matrix} F (s, k) = L + D = \sum_{i = 1}^{k} |m_{i} - m| + \sum_{i = 1}^{k} \sum_{p \in C_{i}} |p - m_{i}| . \end{matrix}

(2)

The comparison of the $F (s, k)$ values at different K-values helps to select K in the condition when $F (s, k)$ is at its minimum; then the results can be considered as the optimal clustering results.

However, currently there are still no theoretical approaches to solve the problem of the range of K. Reference [22] is an example to derive by assuming that the sample space is distributed with sets, as follows:

\begin{matrix} \frac{\bar{d}}{D / k} \leq \frac{\bar{l}}{L} . \end{matrix}

(3)

In the formula, $\bar{d}$ represents the average distance from sample to the clustering center, $\bar{d} = D / n$ ; $\bar{l}$ represents the average distance from clustering center to the global center, $\bar{l} = L / k$ .

When $L = D$ and $L = k \bar{l} = D = n \bar{d}$ , then $k^{2} \leq n$ ; that is to say, $k \leq n^{1 / 2}$ .

The K-value optimization algorithm can be described as follows. (a)

Set the limit of iterations (T) as $n^{1 / 2}$ .

(b)

To $k = 1,2, \dots, T$ cycle:

(i)

to realize the spatial clustering under K numbers by using K-means algorithm;

(ii)

calculate the current $F (s, k)$ value under current k value.

(c)

Search the minimum $F (s, k)$ distance when the current K-value is the required value.

3.1.2. Individual Selection Method Based on K-Means Clustering Algorithm

After a plurality of neural network individuals have been produced with Bagging algorithm, each neural network will produce a set of predictions on the training data $y_{i} = (y_{i}^{1}, y_{i}^{2}, \dots, y_{i}^{5})$ (“i” is an ordinal number meaning the ith neural network individual). K-means clustering will divide data from different categories into different clusters to meet the individual differences standard of neural network individuals selection principle.

The k-means clustering algorithm and the K optimization method are mainly for the model clustering; the main sector is as follows. (a)

Generate n neural network individuals with Bagging algorithm.

(b)

The outputs of every individual neural network on the training set $y_{i} = (y_{i}^{1}, y_{i}^{2}, \dots, y_{i}^{m})$ (m represents the number of the training output samples) are organized as a matrix $Y (y_{1}, y_{2}, \dots, y_{n})$ . Then a cluster analysis of the matrix Y is conducted.

(c)

To $k = 1,2, \dots, T$ cycle:

(i)

divide matrix Y into k groups with K-means clustering algorithm;

(ii)

calculate the current $F (s, k)$ value when k value is adopted.

(d)

Search for the minimum of the distance cost $\min (F (s, k))$ .

(e)

Clustering for matrix Y by using K-values and selecting individual with the highest fitting accuracy from each cluster as the candidate individual meet both the two requirements for individual neural network of its high precision and large differences.

The flow chart of individual neural networks optimization algorithm based on K-means clustering is as shown in Figure 1.

Figure 1

The flow chart of individual neural networks optimization algorithm based on K-clustering.

3.2. Neural Network Ensemble Method Based on Dynamic Weighting

After a certain number of neural network individuals being generated, how to comprehensively utilize the prediction offered by each individual network as well as to achieve neural network ensemble predictive model with appropriate weighted average form has become another important research direction. For regression problems, the integrated approach such as the arithmetic average method and the simple weighted average method is commonly used. These methods, to different extent, can effectively solve the problem about the neural network ensemble output.

However, the weights calculated by this kind of algorithm are fixed, which leads to limited reflection of overall prediction accuracy of model and the prediction accuracy of the single model is not immutable and frozen. Thus, the prediction accuracy will inevitably decline if the fixed weights method is adopted directly. In this section, the dynamic integrated output method based on fuzzy neural network is adopted to solve the dynamic weight problems under the time-varying condition.

3.2.1. T-S Fuzzy Neural Network

T-S fuzzy neural network structure, as shown in Figure 2, mainly includes four layers: input layer, fuzzification layer, calculation of fuzzy rules layer, and output layer.

Figure 2

The structure of T-S fuzzy neural network.

The first layer is the input layer, in which each node is directly connected with the input vector $x = [x_{1}, x_{2}, \dots, x_{n}]$ for receiving the input information. Here, the selection of n varies according to the actual applications.

The second layer is the fuzzy layer for transforming information from the input layer into the fuzzy quantity. It works mainly by the calculation of the membership function μ, which is about the input vector belonging to each node (the linguistic variables) of fuzzy set. Here Gauss function is usually used.

The third layer is fuzzy rules calculation layer. Each node of the layer represents a fuzzy rule. The fitness of each rule is calculated with the fuzzy multiplication formula.

The fourth layer is output layer. The weighted average method is used in this layer to realize normalized calculation, namely, to calculate the neural network output by using the fitness ω.

The main learning parameters of the fuzzy neural network are the coefficient of neural network $p_{j}^{i}$ , the center of membership function $c_{j}^{i}$ , and the width of membership function $b_{j}^{i}$ . The learning algorithm can be described as follows. (a)

Error cost function:

\begin{matrix} E = \frac{1}{2} {(y_{d} - y_{c})}^{2}, \end{matrix}

(4)

$y_{d}$ represents the desired output and $y_{c}$ represents the actual output value.

(b)

$p_{j}^{i}$ coefficient of correction:

\begin{matrix} p_{j}^{i} (k + 1) = p_{j}^{i} (k) - α \frac{\partial e}{\partial p_{j}^{i}}, \\ \frac{\partial e}{\partial p_{j}^{i}} = \frac{(y_{d} - y_{c}) ω^{i}}{\sum_{i = 1}^{m} ω^{i} \cdot x_{j}}, \end{matrix}

(5)

α represents neural network learning rate; $x_{j}$ is the network input parameters; and $ω^{j}$ represents the input parameters of membership of the product.

(c)

$c_{j}^{i}$ and $b_{j}^{i}$ learning algorithm:

\begin{matrix} c_{j}^{i} (k + 1) = c_{j}^{i} (k) - β \frac{\partial e}{\partial c_{j}^{i}}, \\ b_{j}^{i} (k + 1) = b_{j}^{i} (k) - β \frac{\partial e}{\partial b_{j}^{i}} . \end{matrix}

(6)

3.2.2. Dynamic Weight Model Based on Fuzzy Neural Network

For the dynamic weight model based on neural network, the solution of weights still mainly depends on the computation of the error. As to the error solution, in reference to the previous method and combined with fuzzy neural network, this paper designs a dynamic weight model with three inputs and a single output. Therefore, the error variation can be reflected more objectively from both overall and partial perspectives.

Here is a hypothesis. For a prediction problem, if the actual value is $Y (t)$ at the moment t $(t = 1,2, \dots, n)$ and there are m models for this prediction problem, then the predicted value of the number i ( $i = 1,2, \dots, m$ ) model at the moment t is $f_{i} (t)$ and the relative error of the number i $(i = 1,2, \dots, m)$ model at the moment t is $e_{i} (t)$ . The average absolute value of relative error of the number i prediction model at the t moment and the former K moments is $E_{i} (t)$ (representing the overall prediction performance of model at the former K moments) and the change rate of the absolute value of relative error of the number i prediction model at the moment t is $c_{i} (t)$ . Consider

\begin{matrix} e_{i} (t) = |\frac{Y (t) - f_{i} (t)}{Y (t)}|, \\ E_{i} (t) = \frac{1}{k} \sum_{j = t - k + 1}^{t} e_{i} (j), \\ c_{i} (t) = |e_{i} (t) - e_{i} (t - 1)| . \end{matrix}

(7)

While the overall neural network structure can be represented as a model with three inputs and a single output, of which the three inputs are, respectively, $e_{i} (t)$ , $E_{i} (t)$ , and $c_{i} (t)$ , and the output is the absolute value of relative error $\bar{e_{i}} (t)$ at number $t + 1$ moment, then the weight marked as $ω_{i} (t) = (1 / \bar{e_{i}} (t)) / (\sum_{i = 1}^{m} 1 / \bar{e_{i}} (t))$ of the number i model can be calculated by $\bar{e_{i}} (t)$ at $t + 1$ moment; these single models are combined and then the prediction results $y (t) = \sum_{i = 1}^{m} ω_{i} (t) f_{i} (t)$ can be generated. Therefore when $e_{i} (t)$ , $E_{i} (t)$ (reflecting the current overall level of the single model), and $c_{i} (t)$ (reflecting the recent change rate of the error) are combined to predict the error from both overall and partial perspectives, the prediction result can be more reliable and accessible while the weights can achieve a greater accuracy.

The main process of this algorithm is as follows. (a)

Producing m neural individuals after the optimal individual selection to form a set.

(b)

Predicting the training sample by using the trained N neural networks and then calculating $e_{i} (t)$ , $E_{i} (t)$ , and $c_{i} (t)$ , respectively.

(c)

Establishing the dynamic weight model based on fuzzy neural network with three inputs which are $e_{i} (t)$ , $E_{i} (t)$ , and $c_{i} (t)$ and an output which is $e_{i} (t + 1)$ to train the model.

(d)

Using the test data as input dynamic weight to predict the absolute value of relative error $\bar{e_{i}} (t)$ of the network individuals at different times. Then weight of the number i model at t moment can be generated as follows:

\begin{matrix} ω_{i} (t) = \frac{1 / \bar{e_{i}} (t)}{\sum_{i = 1}^{m} 1 / \bar{e_{i}} (t)} in, \sum_{i = 1}^{m} ω_{i} (t) = 1 . \end{matrix}

(8)

(e)

Predicting the test data by using m trained models with the weights obtained from (3) and the predicted results used for weighted integration. The final prediction result of the prediction model of neural network ensemble is generated as $y (t) = \sum_{i = 1}^{m} k_{i} (t) f_{i} (t)$ .

Dynamic weight integration is based on fuzzy neural network, as shown in Figure 3.

Figure 3

The flow chart of dynamic weight integration based on fuzzy neural network.

3.2.3. Selective Neural Network Ensemble Based on the Dynamic Weight

Selective neural network ensemble algorithm based on the dynamic weights improves the traditional neural network ensemble algorithm by combining the two improvements mentioned in this paper.

The thought of this algorithm can be described as follows. By using Bagging algorithm a number of the individual neural networks can be generated; then cluster individuals by K-clustering algorithm and select optimal K-value by the K-value optimization algorithm, finally integrate the selected individuals using dynamic weight model based on fuzzy neural network, and then get the final solution.

The main processes of this algorithm are as follows. (a)

Generate n neural network individuals by Bagging algorithm.

(b)

Make a matrix $Y (y_{1}, y_{2}, \dots, y_{n})$ with the results of each individual neural network on the training set's output $y_{i} = (y_{i}^{1}, y_{i}^{2}, \dots, y_{i}^{m})$ (m represents the number of the training output samples), and then carry on cluster analysis of the matrix Y.

(c)

Divide matrix Y into k groups through the K-means clustering algorithm.

(d)

Search for the minimum of the distance cost $\min (F (s, k))$ from 1 to T.

(e)

Cluster matrix Y by using K-values, and select individual which has the highest fitting accuracy in each cluster as the candidate individual; then k neural network individuals were generated.

(f)

Predict the training sample by using the trained N neural networks, and then, respectively, calculate $e_{i} (t)$ , $E_{i} (t)$ , and $c_{i} (t)$ .

(g)

Establish the dynamic weight model based on fuzzy neural network; then input $e_{i} (t)$ , $E_{i} (t)$ , and $c_{i} (t)$ and output $e_{i} (t + 1)$ to train the model.

(h)

Use the test data as input dynamic weight, and predict the absolute value of relative error $\bar{e_{i}} (t)$ of the network individuals at different times; then the first t model weight at t moment can be calculated:

\begin{matrix} ω_{i} (t) = \frac{1 / \bar{e_{i}} (t)}{\sum_{i = 1}^{m} 1 / \bar{e_{i}} (t)} in, \sum_{i = 1}^{m} ω_{i} (t) = 1 . \end{matrix}

(9)

(i)

Predict the test data by using m trained models. Make the weights obtained from (g) and the predicted results into weighted integration. The forecast result of final prediction model of neural network ensemble is

\begin{matrix} y (t) = \sum_{i = 1}^{m} k_{i} (t) f_{i} (t) . \end{matrix}

(10)

Selective neural network ensemble algorithm is based on the dynamic weight flow chart as shown in Figure 4.

Figure 4

The flow chart of selective dynamic weight neural network ensemble algorithm.

4. Simulation Experiment and Analysis

This paper chooses Data11 and Data12 from [1]. Data11 contains 118 groups of sampled data, and Data12 consists of 180 groups of sampled data. Sampling points in each group include the cumulative number of defects and the cumulative execution time. Firstly, normalize data $[0,1]$ by using the following formula:

\begin{matrix} M_{i} = \frac{X_{i} - X_{\min}}{X_{\max} - X_{\min}}, \end{matrix}

(11)

where

X_{i}

is the input fault data

(i = 1,2, \dots, n)

X_{\max}

is the maximum fault data,

X_{\min}

is the minimum fault data, and

M_{i}

is the original data after being normalized.

For the Elman individual neural network, its network structure has three layers: an input layer, a hidden layer, and an output layer. The number of the input nodes is 3, the number of the output nodes is 1, and the number of the hidden layer nodes is 7. The training frequency is 2000 times with 0 error, during which the network learning algorithm adopted is LM algorithm.

To quantitatively compare the performance of different methods, the MAE (Mean Absolute Error), MSE (Mean Squared Error), and MAPE (Mean Absolute Percent Error) are used in our simulation as the evaluation criterion of network prediction:

\begin{matrix} MAE = \frac{\sum_{i = 1}^{n} |e_{i}|}{n} = \frac{\sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|}{n}, \\ MSE = \frac{\sum_{i = 1}^{n} {(e_{i})}^{2}}{n} = \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{n}, \\ MAPE = \frac{1}{N} \sum_{i = 1}^{n} \frac{|y_{i} - {\hat{y}}_{i}|}{|y_{i}|} . \end{matrix}

(12)

4.1. Experimental Verification of the Individual Optimization Based on Clustering

4.1.1. Experiment 1 (Data11 Set)

According to the method (optimization algorithm of K-value) described in the paper, limit the maximum value of K to 5 in the K-means clustering algorithm, then cluster, respectively, when $k = 1,2, 3,4, 5$ , and calculate $F (s, k)$ value. The results can be obtained by the following calculations: $F (s, 1) = 183.81$ , $F (s, 2) = 166.03$ , $F (s, 3) = 134.23$ , $F (s, 4) = 157.67$ , and $F (s, 5) = 152.18$ .

Thus it proves that the optimal value of K is 3.

Here the individual optimization method based on the optimization algorithm of K-value was defined as KF-NNE. In addition, to verify the performance of KF-NNE algorithm, this paper chooses K as 1, 5, 7, and 9, which are, respectively, called K1-NNE, K5-NNE, K7-NNE, and K9-NNE, and they are, respectively, compared with KF-NNE. In order to verify the effectiveness of individual optimization method, this paper also compares it with traditional integration method (NNE). According to previous introduction, by the arithmetic average method, integrate the individual which has minimum fitting errors in each class (the number in each class may be different).

4.1.2. Experiment 2 (Data12 Set)

According to the method of experiment 1, the results can be calculated as follows: $F (s, 1) = 425.62$ , $F (s, 2) = 427.18$ , $F (s, 3) = 410.98$ , $F (s, 4) = 394.64$ , and $F (s, 5) = 457.98$ . Thus the optimal K-value is 4.

Select 2, 6, 8, and 10 as the value of k; meanwhile, name them as K2-NNE, K6-NNE, K8-NNE, and K10-NNE. Then compare them with KF-NNE and NNE; the performance is shown in Table 2.

Performance comparison of each model in Tables 1 and 2 shows that, in the case that the number of integration of experiment 1 is 1 and that experiment 2 is 2 or 10, the performance of neural network ensemble algorithm based on clustering is inferior to the traditional integrated approach, but it cannot completely negate the validity of clustering algorithm. The goal of this paper is to achieve the best clustering effect by selecting the appropriate k value. Tables 1 and 2 also show that after using the KF-NNE algorithm proposed in this paper, the prediction accuracy is obviously higher than that of traditional integration method and the other k value clustering method, and the integrated prediction effect also has a significant improvement.

Table 1

The performance comparison of different individual optimization models in Data11.

Model	MAE	MSE	MAPE
NNE	2.54512	10.2480	0.02383
K1-NNE	3.0502	14.0627	0.0468
KF-NNE	1.1414	2.5436	0.0184
K5-NNE	1.7855	4.9332	0.0271
K7-NNE	1.7153	4.5709	0.0265
K9-NNE	1.8755	5.9026	0.0294

Table 2

The performance comparison of different individual optimization models in Data12.

Model	MAE	MSE	MAPE
NNE	2.531	9.9302	0.0175
K2-NNE	5.3977	32.8675	0.0375
KF-NNE	1.6527	5.4079	0.01153
K6-NNE	2.0861	9.0021	0.01439
K8-NNE	2.1856	8.4719	0.0151
K10-NNE	3.3726	14.7083	0.0235

In addition, the contrast of the relative error in Figures 5 and 6 proves that by using KF-NNE method put forward in this paper, except that individual point error is slightly lower than other clustering methods, the relative error of most points is lower than other methods. It also shows the application effectiveness of KF-NNE algorithm proposed in this paper in the field of software reliability prediction, which is a kind of effective method to improve the prediction performance compared with the NNE algorithm.

Figure 5

The relative error comparison of different individual optimization models in Data11.

Figure 6

The relative error comparison of different individual optimization models in Data12.

4.2. Verification of Integrated Output Method Based on Dynamic Weighting

4.2.1. Experiment 1 (Data11 Set)

In the previous section, the experiment proved that K takes effect best when its value is 3 in Data11 Set. Here four individuals under k value of 3 were chosen to calculate the weight values of square error method, inverse method, and the simple weighted square sum method and then compared with the dynamic weight method.

The weight values of each model of square error method and inverse method were $w_{1} = 0.30$ , $w_{2} = 0.34$ , and $w_{3} = 0.35$ .

The weight values of each model of simple weighted square sum method were $w_{1} = 0.17$ , $w_{2} = 0.33$ , and $w_{3} = 0.5$ .

We also use the MAE (Mean Absolute Error), the MSE (Mean Squared Error), and the MAPE (Mean Absolute Percent Error) as the evaluation criterion of network prediction in our simulation experiment. The performance comparison of different integration methods is shown in Table 3.

Table 3

The performance comparison of different integration models in Data11.

Model	MAE	MSE	MAPE
Arithmetic average method	1.1414	2.5436	0.0184
Square error method and inverse method	1.0915	2.6181	0.0178
Simple weighted average method	1.7553	4.6493	0.0274
Dynamic weighting method	0.7214	1.0478	0.0114

4.2.2. Experiment 2 (Data12 Set)

For Data12, data in the previous section, the best results have been obtained when $K = 4$ . Conduct experiment when $k = 4$ and calculate the weight value of square error method, inverse method, and the simple weighted square sum method, and then compare them with the dynamic weight method.

The weight values of each model of square error method and inverse method were $w_{1} = 0.28$ , $w_{2} = 0.29$ , $w_{3} = 0.14$ , and $w_{4} = 0.29$ .

The weight values of each model of simple weighted square sum method were $w_{1} = 0.20$ , $w_{2} = 0.30$ , $w_{3} = 0.10$ , and $w_{3} = 0.40$ .

The comparison of the error performance in Tables 3 and 4 proves that the dynamic weight method proposed in this paper shows high performance under different performance indexes. In addition, expect the dynamic weighting method, other methods are not stable in performance. For instance, the predictive performance of square error method and inverse method in experiment 1 is the best; however, in the second experiment, the predictive performance of arithmetic average method is the best. In other words, using the method of fixed weight will make integrated output model exhibit great instability; thus the user may blindly choose the number of neural networks ensemble individuals. The cause of this situation, in the final analysis, is the difference between the fitting performance and prediction performance of single model. In other words, although the results of traditional method of fixed weights are different, the results can only reflect the fitting data error. For future data, it is obvious that the method of fixed weight has a lot of disadvantages because the predictive effect of single model may change at every moment.

Table 4

The performance comparison of different integration models in Data12.

Model	MAE	MSE	MAPE
Arithmetic average method	1.6527	5.4079	0.0115
Square error method and inverse method	2.1304	8.3798	0.0151
Simple weighted average method	2.1942	8.0576	0.0155
Dynamic weighting method	0.8023	1.3781	0.0055

In addition, the error comparison in Figures 7 and 8 shows that after using dynamic weight, the fluctuation of error is significantly decreased. Particularly when the traditional method shows drastic change, the dynamic weight method still has a relatively stable effect. In other words, the dynamic weight method proposed in this paper is an effective neural network ensemble output method.

Figure 7

The relative error comparison of different integration models in Data11.

Figure 8

The relative error comparison of different integration models in Data12.

4.3. Verification of the Selective Dynamic Weights of Neural Network Ensemble

In this section, the improved points of the two parts mentioned above are combined and then compared with BP, Elman, and the traditional NNE algorithm, and the selective dynamic weights of neural network ensemble algorithm proposed in this paper are supposed to be DW-NNE.

The performance comparison of each model in Data11 and Data12 is shown in Tables 5 and 6.

Table 5

The performance comparison of different neural network ensemble models in Data11.

Model	MAE	MSE	MAPE
BP	4.4346	23.4861	0.0653
Elman	3.0502	14.0627	0.0468
NNE	1.5969	4.5621	0.0247
DW-NNE	0.7213	1.0478	0.01139

Table 6

The performance comparison of different neural network ensemble models in Data12.

Model	MAE	MSE	MAPE
BP	4.134	25.6080	0.0279
Elman	3.3693	13.4693	0.0232
NNE	2.531	9.9302	0.0175
DW-NNE	0.8023	1.3781	0.0055

The comparison of relative error in Figures 9 and 10 shows that BP is relatively stable. Except that the individual points show relatively large deviation, the vast majority of points’ error in Elman neural network is relatively small. In addition, from the overall trend of neural network integrated model, the whole prediction accuracy tends to be steady and its error fluctuation is small, except that the rate of prediction error in very few points is higher than that of single model. However, the fluctuation of the error has been significantly reduced after being further optimized by using the method proposed in this paper.

Figure 9

The relative error comparison of different neural network ensemble models in Data11.

Figure 10

The relative error comparison of different neural network ensemble models in Data12.

Besides, the comparison with each predictive index of each model in Table 2 proves that after using the integrated method, the performance under MAE, MSE, and MAPE, three indicators, was significantly improved compared with any other single models. In addition, due to the fact that the DW-NNE algorithm proposed in this paper combines two kinds of improvement methods based on the traditional integration methods, the integration effect has been further improved; meanwhile its generalization ability also has been enhanced.

The above analysis shows that DW-NNE algorithm proposed in this paper can achieve better prediction effect than other methods.

5. Conclusion

The paper proposed a novel dynamic weight neural network ensemble model (DW-NNE), which solves the problem that the clustering number of clustering algorithm cannot be automatically selected in individual optimization of neural network set. Besides that, the integration output model based on dynamic weight was set up in the paper, and two improved points proposed in the paper have been proved through experiment that they can improve the prediction effect of neural network integration in the software reliability which has the relatively obvious improvements compared with the prediction performance of the traditional models.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The authors acknowledge the support of the Natural Science Foundation of Shandong Province, China (no. ZR2013FL034).

References

Hansen

L. K.

Salamon

Neural network ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence 1990 12 10 993 1001

10.1109/34.58871

2-s2.0-0025507176

Sollich

Krogh

Learning with ensembles: how over-fitting can be useful

Advances in Neural Information Processing System 8 1996

Denver, Colo, USA

MIT Press

190 196

Dietterich

T. G.

Machine-learning research: four current directions

AI Magazine 1997 18 4 97 136

2-s2.0-0031361611

Zhou

Z.-H.

Jiang

Yang

Y.-B.

Chen

S.-F.

Lung cancer cell identification based on artificial neural network ensembles

Artificial Intelligence in Medicine 2002 24 1 25 36

10.1016/s0933-3657(01)00094-x

2-s2.0-0036146402

Lai

K. K.

Wang

S. Y.

Multistage RBF neural network ensemble learning for exchange rates forecasting

Neurocomputing 2008 71 16–18 3295 3302

10.1016/j.neucom.2008.04.029

2-s2.0-56549097547

Han

Zhu

Yao

Remote sensing image classification based on neural network ensemble algorithm

Neurocomputing 2012 78 1 133 138

10.1016/j.neucom.2011.04.044

2-s2.0-82655173888

Wenying

Kang

Kewen

Chenxi

A software reliability prediction model of combining multiple neural networks based on support vector regression

Proceedings of the International Conference on Electrical, Control and Automation Engineering (ECAE '13)

December 2013

Hong Kong, China

Schapire

R. E.

The strength of weak learnability

Machine Learning 1990 5 2 197 227

10.1007/BF00116037

2-s2.0-0025448521

Breiman

Bagging predictors

Machine Learning 1996 24 2 123 140

2-s2.0-0030211964

10.

Freund

Boosting a weak learning algorithm by majority

Information and Computation 1995 121 2 256 285

2-s2.0-58149321460

MR1348530

10.1006/inco.1995.1136

11.

Yates

W. B.

Partridge

Use of methodological diversity to improve neural network generalisation

Neural Computing & Applications 1996 4 2 114 128

10.1007/bf01413747

2-s2.0-0030549306

12.

Zhao

Zhang

Liao

Design of ensemble neural network using the Akaike information criterion

Engineering Applications of Artificial Intelligence 2008 21 8 1182 1188

10.1016/j.engappai.2008.02.007

2-s2.0-54049096726

13.

Zhao

Zhang

Design of ensemble neural network using entropy theory

Advances in Engineering Software 2011 42 10 838 845

10.1016/j.advengsoft.2011.05.027

2-s2.0-79960845368

14.

Liu

Yao

Simultaneous training of negatively correlated neural networks in an ensemble

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 1999 29 6 716 725

10.1109/3477.809027

2-s2.0-0033280266

15.

Zhou

Z.-H.

Tang

Ensembling neural networks: many could be better than all

Artificial Intelligence 2002 137 1-2 239 263

10.1016/s0004-3702(02)00190-x

MR1906477

2-s2.0-0036567392

16.

Huang

Selective approach to neural network ensemble based on clustering technology

Journal of Computer Research and Development 2005 42 4 594 598

10.1360/crad20050410

2-s2.0-17944381253

17.

Giacinto

Roli

Design of effective neural network ensembles for image classification purposes

Image and Vision Computing 2001 19 9-10 699 707

10.1016/s0262-8856(01)00045-2

2-s2.0-0035420134

18.

Perrone

M. P.

Cooper

L. N.

When networks disagree: ensemble method for neural networks

Artificial Neural Networks for Speech and Vision 1993

New York, NY, USA

Chapman & Hall

126 142

19.

Opitz

D. W.

Shavlik

J. W.

Actively searching for an effective neural network ensemble

Connection Science 1996 8 3-4 337 353

10.1080/095400996116802

2-s2.0-0030356238

20.

Pan

Study on the stock market prediction model of neural ensemble based on genetic algorithms

Journal of Guangxi Teachers Education University (Natural Science) 2007 24 77 83

21.

Shen

Kong

Dynamically weighted ensemble neural networks which selected by genetic algorithm for solving regression problems

Computer Engineering and Applications 2005 41 8 11

22.

Yang

Pan

Optimization study on K-value of K-means algorithm

System Engineering—Theory & Practice 2006 2 97 101