Sparse long short-term memory for information fusion in wireless sensor networks

Abstract

Wireless sensor networks are designed to perceive, gather, and process external environmental information and send it to the observer. However, the transmission of mass information is a challenge to the sensor nodes. To address this challenge, information fusion technologies are proposed to reduce mass redundant data. However, these techniques rarely consider the historical information, and thereby often encounter the difficulty of low prediction accuracy. In order to solve this difficulty, we propose a novel information fusion approach for the cluster heads. The proposed approach is based on time-recurrent neural network, called sparse long short-term memory, which is derived from the long short-term memory network. The sparse long short-term memory uses sparse matrix to reduce the dimension for a high-dimensional coefficient matrix. Therefore, the computational cost of the fusion algorithm is reduced in wireless sensor networks. The simulation results show that the sparse long short-term memory algorithm increases the survival number of sensor nodes in wireless sensor networks. Furthermore, the prediction accuracy of the sparse long short-term memory algorithm is almost the same as the other comparison algorithms.

Keywords

Information fusion long short-term memory sparse wireless sensor networks

Introduction

Wireless sensor networks (WSNs) adapt to many demanding environments due to the development of sensor technology and microelectronics industry, for example, military affairs, environmental monitoring, forest fire prevention, and traffic control.^1,2 However, the transmission of mass observed data shortens the life of sensor nodes, which is a challenge of WSNs.³ To meet this challenge, information fusion technology is used in WSNs, since it reduces redundant information by integrating multiple sources information.

In the past few years, some information fusion methods have been proposed. Sun and Deng⁴ used the likelihood ratio to fuse information. However, the method requires perfect knowledge of local decision performance indices. Therefore, Chen et al.⁵ proposed a fusion method combining channel perception and likelihood ratio, which does not need the perfect knowledge. In order to avoid random packet loss in sensor networks, John and Vorontsov⁶ and Tsanas et al.⁷ considered information fusion in terms of an estimation method. Furthermore, Zhang et al.⁸ proposed a distributed fusion estimation method. In order to solve the data fusion problem of WSNs for the greenhouse monitoring system, Bai et al.⁹ proposed a weighted based data fusion estimation algorithm for WSNs. However, these estimation fusion methods require a good estimation. Without requirement of estimation, some fusion technologies based on the clustering algorithm were proposed in previous works.^10–12 Furthermore, some online algorithms were proposed in sensor networks.¹³

The methods cited above do not consider the impact of previous observations. However, the external environment is continuously changing, and the current state of external environment is related to the historical state. There is a lot of useful information in the previous observations. For this reason, this article proposes a novel neural network approach based on the historical information. The approach utilizes long short-term memory (LSTM) networks to retain the useful parts of historical data and discard the useless parts. The simulation experiments demonstrate that we not only reduce a lot of redundant information but also maintain the prediction accuracy.

In this article, we have the following main contributions: our algorithm compresses the high-dimensional coefficient matrix, so it reduces the amount of information transmitted by the sensors, and ultimately lengthens the life of WSNs and reduces the number of sensor deaths; the algorithm has low complexity and low energy consumption for calculations; and there is large room for scalability (no additional cost of calculations).

The content of this article to be introduced next is as follows. Section 2 is about some related work. Section 3 introduces the model and algorithm of sparse long short-term memory (SLSTM). Our simulation experiments and results are in Section 4, and Section 5 is the conclusion.

Related work

Artificial neural network (ANN) is a physically realizable system that simulates the structure and the function of human brain neural cells. The network has various forms, such as feedforward neural network, convolution neural network, and recurrent neural network (RNN). RNN has fixed weights, external inputs, and internal states. Nowadays, RNN plays an important role in neural networks.¹⁴ In recent years, RNN has performed well in image recognition and text analysis. For this reason, many people utilized it to improve the efficiency of information fusion.^15,16 Therefore, many information fusion technologies based on neural networks are widely used.^17–20

The LSTM network is a kind of time RNN. One characteristic of LSTM is suitable for processing and predicting events with relatively long intervals and delays in the time series.²¹ Nowadays, the LSTM method is also of great interest in information fusion situations. Zhen et al.²² proposed a novel LSTM Context Fusion Model. The model captures and fuses contextual information from multiple channels of photometric and depth data. Although the model achieves a fusion rate of around 49% on the SUNRGBD data set, this is not ideal in WSNs. Gandhi et al.²³ proposed a novel temporally hybrid RNN system for multi-modal information fusion. In his system, the LSTM algorithm was used for data fusion. For all we konw, LSTM has the advantage of preserving historical information, but training high-dimensional coefficient matrix is a great challenge.²⁴ To meet this challenge, the sparse technology was proposed to deal with high-dimensional coefficient matrix. Rahmani and Atia²⁵ solved the high-dimensional low rank problem of big data by sparse matrix. In this article, we propose a simple and sparse model named SLSTM. The SLSTM utilizes the delta network to generate sparse matrices. Then, the sparse matrices are used to reduce the dimension of the high-dimensional coefficient matrix in SLSTM. We not only accelerate the training process but also maintain the accuracy of the prediction. Another advantage of this approach is that the thresholds of the delta network can be adjusted for different scenarios.

In the simulation, our algorithm is compared with the original model (i.e. full transmission of information), the LSTM algorithm (i.e. used in),²³ and the proposed algorithm by Bai et al.⁹ The simulation results show that the realization of SLSTM is excellent.

Model and algorithm of SLSTM

Normally, the WSNs installed in external environment do not have information fusion modules, as shown in Figure 1(a). There are lots of sensor nodes in a WSN. Some sensor nodes, closer together, have a common cluster head. The sensors send all the observed data to their cluster heads, and then each cluster head sends its received data to the computing center. Therefore, the power consumption of cluster heads is very fast. Moreover, the frequent collection of environmental information by sensors leads to a large amount of redundant information. In order to reduce the energy consumption of the redundant data transmissions, many people compressed the detected data through information fusion technology in cluster heads.^26,27 The transport model is shown in Figure 1(b).

Figure 1.

(a) The original wireless sensor network (WSN). It consists of sensor nodes, cluster heads, and computing centers. In the WSN, a large number of sensor nodes are installed in various locations. Generally, sensor nodes that are closer together share a cluster head. The cluster head sends the information collected by the sensors to the computing center. Finally, all information collected is processed by the computing center. (b) The WSN network with a information fusion module, and the SLSTM algorithm is used in this module.

Figure 2 shows the methodology of our proposed model. We add a memory device in each cluster head to store useful information that comes from the previous round. The memory device is a storage component added to the sensors, which is a simple implementation of the current hardware technology. The useful information is the part of the historical records, which is generated by the SLSTM, that is, the output of SLSTM. However, we focus on the effect of information fusion on the energy consumption of WSN. Therefore, we assume that the sensors do not consume energy without collecting information. Furthermore, we denominate collecting and sending of information once as a round. At the beginning of each round, we get the difference between the current round information and the previous round information. Then, we judge whether this difference exceeds the threshold. If not, the round ends. Otherwise, the proposed algorithm processes the message in the next step. Finally, the cluster heads send the final fused message to the computing center. In the fusion section, our SLSTM algorithm is used to fuse the detected information. Next, we will analyze the model in detail.

Figure 2.

Overview of our method: the sensors get external environment information. In the cluster head, the fusion compress information if threshold is yes, otherwise the round is end. The SLSTM works in the fusion.

Model analysis

Our model applies to the cluster head which has a memory device. The input and output of the previous round are recorded by the memory device. The SLSTM unit is shown in Figure 3. The middle portion is the unit state of the current round. The left portion is the unit state of the last round, and the right portion is the unit state of the next round. Obviously, there is only one unit from beginning to end. However, the previous round output of the unit is memorized, and it participates in the calculation of the current round. The current input $x_{t}$ and the previous input $x_{t - 1}$ are combined to generate a delta matrix. Then, the delta matrix is used to sparse the weight matrix. The output of the previous unit is involved in the current round calculation. The final output is controlled by both the forgetting gate and the output gate in the SLSTM algorithm. The principle and mathematical model of the LSTM and the SLSTM algorithms are described in the following sections.

Figure 3.

The working principle of SLSTM unit.

LSTM

The RNN is a special neural network which has the ability to memorize sequences. In other words, the RNN is a recurrent network that allows information to be persistent. Moreover, the RNN handles the problem between the previous moment and the next moment of inputs. At present, the RNN has been successful in natural language processing (NLP).²⁸ Unfortunately, the RNN is difficult to train for long-term dependence problems, because the training gradient either tends to disappear or to explode.²⁹ The disappearance of gradient is caused by the exponential decay of the update information. In addition, the exponential increment of the update information leads to an explosion of gradient. The topology of the RNN is shown in Figure 4(a), where $x_{t}$ is an input vector which has n dimensions, and $C_{t}$ is a cell unit which processes input information and exports the result $h_{t}$ . Therefore, $h_{t}$ only depends on the output of the previous round.

Figure 4.

(a) Recurrent neural networks and (b) long short-term memory.

In order to solve the aforementioned disadvantages of RNNs, Hochreiter and Schmidhuber²¹ proposed an innovative recurrent unit called LSTM. At time t in the LSTM networks, a unit learns the knowledge of time $(t - s)$ which can be a long time away from t. A cell unit in LSTM stores the most useful knowledge that has been processed, and it determines whether the knowledge is to be covered or retained in the following processing (as shown in Figure 4(b)). The LSTM network helps us to deal with the problems of information fusion in WSNs. Moreover, the problems of node positioning are more energy efficient by this network.

The memory cell $h_{t}$ is updated by two portions. One is to forget some useless parts of existing memories, and the other is to add new memory content $c_{t}$

h_{t} = f_{t} h_{t - 1} + i_{t} c_{t}

(1)

where the new memory content is

c_{t} = \tanh (W_{c} [x_{t}, h_{t - 1}] + b_{c})

(2)

When some new information enters the memory cell, it is determined by the forgetting gate f whether to forget, that is, if the input $x_{t}$ meets the learning conditions, then it will be allowed to enter the next step, otherwise, it will be forgotten by f. Similarly, the input gate i modulates the added parts of the memory unit according to the input $x_{t}$ and $h_{t - 1}$ . The two gates are computed by

f_{t} = σ (W_{f} [x_{t}, h_{t - 1}] + b_{f})

(3)

i_{t} = σ (W_{i} [x_{t}, h_{t - 1}] + b_{i})

(4)

where $b_{i}$ , $b_{f}$ , and $b_{c}$ are the bias terms.

Sparse method

In recent years, the sparse optimization method is used to reduce dimensions of the selected features. The sparse optimization method constructs a sparse optimization model by the sparseness of solutions. Furthermore, this method is based on the structural characteristics of the data. Finally, the sparse optimization method obtains a sparse numerical solution. This method has two different decision variables. One is vector sparse optimization model and the other is matrix joint sparse optimization model.

In this article, we use the delta network³⁰ to sparse optimize the LSTM algorithm. The delta network uses a cached neuron activation threshold to skip unnecessary calculations, where the input changes slightly compared to the previous cache. The purpose of the delta network is to transform a dense matrix-vector multiplication into a sparse one.

In order to understand the delta network better, we explain it through formulas. This delta network formulation is from³⁰

s_{t} = W x_{t}

(5)

s_{t} = W (x_{t} - x_{t - 1}) + s_{t - 1}

(6)

According to equation (6), we know that there was a recursive relationship between $s_{t}$ and $s_{t - 1}$ , and we iterate over equation (6) for $t - 1$ times

\begin{matrix} s_{t} = W (x_{t} - x_{t - 1}) + s_{t - 1} \\ = W (x_{t} - x_{t - 1}) + W (x_{t - 1} - x_{t - 2}) + \dots + s_{0} \\ = W x_{t} \end{matrix}

(7)

It can be seen from the results that the sparse optimization of the delta network is consistent with the original formula.

SLSTM algorithm

The computational cost of the LSTM is very high, which is caused by a large number of parameters. In order to utilize the LSTM to fuse information in WSN, we optimize the LSTM to reduce its computational cost. Neil et al.³⁰ proposed a method to optimize the gated recurrent units (GRUs)³¹ using delta network. The GRU is a variant of the LSTM with only two gates, which are the update gate and the reset gate. The purpose of the delta network is to sparse the coefficient matrices, thereby greatly reducing the computational cost. In this article, we do the sparse processing by combining the LSTM algorithm and the delta network.

Algorithm 1 introduces the training process of the SLSTM algorithm. Each iteration updates the forgetting gate f, the input gate i, and the cell memory c. The $\tanh$ function scales the input in $(- 1, 1)$ , and the $σ$ function compresses the information into $(0, 1)$ . The SLSTM algorithm updates the output h when the forgetting information exceeds $ε$ , where $ε$ is a fixed value in the SLSTM networks, and then, the algorithm enters the next iteration.

Algorithm 1 Sparse long short-term memory
Input: $x_{t}$
Output: $h_{t}$
1: if $x_{t} - x_{t - 1}$ $< threshold$ then
2: End;
3: else
4: $δ_{t} = x_{t} - x_{t - 1}$ ;
5: $\tilde{W}$ $\leftarrow W$ $δ_{t}$ ;
6: $c \leftarrow$ $\tanh$ ( ${\tilde{W}}_{c}$ [ $x_{t}$ , $h_{t - 1}$ ] + $b_{c}$ );
7: $i_{t}$ $\leftarrow σ$ ( ${\tilde{W}}_{i}$ [ $x_{t}$ , $h_{t - 1}$ ] + $b_{i}$ );
8: $f_{t}$ $\leftarrow σ$ ( ${\tilde{W}}_{f}$ [ $x_{t}$ , $h_{t - 1}$ ] + $b_{f}$ );
9: if $f_{t} < ϵ$ then
10: Forget;
11: else
12: $h_{t}$ $\leftarrow f_{t}$ $h_{t - 1}$ + $i_{t}$ $c_{t}$ ;
13: Remember $h_{t}$
14: end if
15: end if

It is worth noting that the output of LSTM is generally quantified by probability, and many output layers are implemented by the function softmax

S_{i} = \frac{e^{i}}{\sum_{j} e^{j}}

In many realistic scenarios, it is useful for learning when the difference between prior knowledge and current knowledge exceeds a certain threshold; for example, in the monitoring of water quality pollution, it is worthy of our attention when the content of water-affecting elements changes sufficiently. On the contrary, it does little to help learning when the difference is on a small scale. From the above analysis, we have been inspired to handle the coefficient matrix W.

If $W_{c}$ is a $n \times n$ matrix, then the difference matrix is

{\tilde{W}}_{c} = W_{c} δ_{x}

(8)

where $δ_{x}$ is the measure of the difference between $x_{t}$ and $x_{t - 1}$

δ_{x} = g (x_{t})

(9)

g (x_{t}) = {\begin{matrix} x_{t} - x_{t - 1}, & if | x_{t} - x_{t - 1} | > Θ \\ 0, & otherwise \end{matrix}

(10)

where $Θ$ is a chosen threshold value for a state i at time t.

In the same way

{\tilde{W}}_{f} = W_{f} δ_{x}, {\tilde{W}}_{i} = W_{i} δ_{x}

(11)

where $x_{t - 1}$ and $h_{t - 2}$ are stored in memory cell.

Each term in equations (2), (3), and (4) can be replaced by the delta update defined in equations (5) and (6), and we have

c_{t} = \tanh ({\tilde{W}}_{c} [x_{t}, h_{t - 1}] + b_{c})

(12)

f_{t} = σ ({\tilde{W}}_{f} [x_{t}, h_{t - 1}] + b_{f})

(13)

i_{t} = σ ({\tilde{W}}_{i} [x_{t}, h_{t - 1}] + b_{i})

(14)

h_{t} = f_{t} h_{t - 1} + i_{t} c_{t}

(15)

Here, we introduce the time complexity of Algorithm 1. We define $γ$ to represent the ratio of nonzero elements in the sparse vector $δ$ . n denotes the dimension of a vector. According to the original model equation (5), the total cost for calculating a $n \times n$ matrix is $n^{2}$ (i.e. n operations for a n-dimensional vector). The calculation cost of the standard formula before sparse is $o (n^{2})$ , that is

N_{original, computation} = n^{2}

(16)

Furthermore, the total memory cost of equation (5) is

N_{original, memory} = n^{2} + n

(17)

After the standard formula is sparse, we get a new formula (6) and calculate its cost as $o (γ n^{2})$ , that is

N_{new, computation} = γ n^{2} + 2 n

(18)

Moreover, the total memory cost of the formula (6) is

N_{new, memory} = γ n^{2} + 4 n

(19)

In many scenarios of WSNs, a certain monitoring value changes very little over a long period of time, such as forest fire monitoring and gas monitoring. Therefore, the value of the $γ$ is small. We compare the total cost of the two calculations, $N_{original}$ and $N_{new}$

\frac{N_{original}}{N_{new}} = \frac{2 n^{2} + n}{2 γ n^{2} + 6 n} \approx \frac{1}{γ}

(20)

For example, if $γ = 5 %$ , then the calculating costs are theoretically accelerated by 20 times, and our algorithm is faster than the original algorithm without the delta network optimization.

Accuracy improvement

There are two important principles for information fusion in WSN. One is to reduce network energy consumption and extend network life. The other is to ensure that the accuracy of information fusion is not affected; for example, in water quality monitoring, in most cases, water quality changes slowly, that is, water quality data change little. However, once the pollution occurs, the water quality data will change, thereby we should report to the data center in time.

Many threshold methods have been used in neural networks and have achieved good results.^32–34 The selection of threshold is a learning process, and we need to make multiple substitutions to determine a good value. Too big threshold may lead to inaccurate results, and too small threshold cannot avoid unnecessary calculation. In addition, the threshold is selected according to the actual scenario. Based on the above conditions, our algorithm balances fusion rate and fusion accuracy with threshold values. In the simulation experiments, we compare the different selection of threshold value. Finally, we choose an appropriate threshold value, which achieves the similar accuracy as the other three algorithms.

Simulations and results

In this section, we compare the original method, LSTM, the proposed algorithm of Bai et al.,⁹ and SLSTM in the task of water quality which is monitored by WSN. The task aims at monitoring the content of various elements in water, which devotes to timely and effectively warning the pollution degree of water quality. There are many indicators to measure water pollution, and we select the most common indicators: pH value, dissolved oxygen, alkalinity, heavy metal content, and hardness (i.e. total calcium and magnesium ion concentration). Furthermore, we record the value of these indicators every 2 h. The relationship of these indicators and water pollution is not linear. Therefore, the linear regression cannot capture the relationship. Moreover, it is extremely difficult to determine a nonlinear regression model, thereby the nonlinear regression model is not suitable to deal with the large data fusion problem. In order to capture the relationship and accurate prediction, we utilize the SLSTM algorithm to capture the hidden relationships between indicator values at different time. Finally, we predict the pollution value of the next time.

The predictions of water quality depend on the previous information, which is similar to time series process. We cannot accurately predict the facts just by the values of the current indicators, because this is a complicated gradual change process.^35,36 For instance, the anomaly of dissolved oxygen may be caused by weather or the movement of the earth’s crust instead of pollution; the change of pH value may also be caused by the activity of fishes or the growth of aquatic grasses. Therefore, using the SLSTM model can comprehensively consider the previous information and the current indicator values, which can provide a new idea for the prediction of water pollution.

Model training

In this section, we use 10,000 labeled data of water quality monitoring to train the SLSTM model. We define five weights to represent pH value, dissolved oxygen, alkalinity, heavy metal content, and hardness, which is denoted by the symbol $w_{i}$ , $i \in {1, 2, 3, 4, 5}$ . We set the initial value of $W_{i}$ to 0.5. The algorithm updates the value of $W_{i}$ after each iteration. In the training process of the SLSTM, we set the number of input layers to 20, the number of hidden layers to 128, and the number of output layers to 2. The input is a $20 \times 6$ matrix, where 20 rows of data are taken at each iteration, and the $6$ denotes $5$ features and $1$ label. Moreover, the output has two classes, $0$ represents no pollution and $1$ denotes pollution. We iterate 10 epochs over the entire data set.

The updated policy is as follows

w_{i + 1} = w_{i} + η Δ w_{i}

Δ w_{i} = \frac{\partial ε}{\partial w_{i}}

ε (y_{T}, S) = - Σ_{i = 1}^{T} y_{i} \log S_{i}

where T denotes the number of iterations, $1 \leq i \leq T$ , $η$ denotes the learning rate, and $S_{i}$ represents the output of the softmax function. Furthermore, the loss function is denoted by $ε$ , and $y_{i}$ denotes the $i th$ target value of the sample.

After 10,000 iterations, we get the final value of weights. Table 1 is the final result of each weight. After training of model, our SLSTM model will be used to solve the problem of information fusion in WSN. Next, we analyze the performance of SLSTM in WSN, which is used to predict water pollution. There are two aspects of performance indicators in our simulations. One aspect is the number of alive nodes. The other aspect is the total consumption of network energy.

Table 1.

The final result of each weight value.

Weight	Initial value	Final value
$w_{1}$	0.5	0.7
$w_{2}$	0.5	0.69
$w_{3}$	0.5	0.67
$w_{4}$	0.5	0.57
$w_{5}$	0.5	0.54

Task and data sets

The parameters of our simulation are set as follows. A total of 100 sensor nodes were randomly distributed in a plane area of $100 m \times 100 m$ . The distance between each node and the cluster head is random, as shown in Figure 5. We use 20,000 data of water quality monitoring in our simulated experiments. Among them, 10,000 data are used as a training set, 1500 data are used as a validation set, and 8500 data are used as a test data. This data set is collected by a cluster-based WSN every 2 h. In the data set, a very small amount of data is lost due to transmission or device problems. Therefore, we utilize the mean of the values at the previous time and the values at the next time to replace the lost value. We assume that the energy consumption of sending unit data is $50 \times 10^{- 9}$ .

Figure 5.

The distance from each node to the cluster head.

The consumption of computational energy is much lower than that of information transmission. In order to eliminate other influencing factors, we raise the benchmark of energy consumption for calculation. To this end, we assume the computational energy is $20 \times 10^{- 9}$ . Moreover, the initial energy of each node is $0.5 J$ .

The softwares of the numerical simulations used in this section are MATLAB 2016a and MySQL 5.7. Our simulation data are stored in MySQL. Therefore, we use MySQL to perform a preliminary cleaning of the data source. The ML module functions in MATLAB implement the original algorithm and generate data results.^37–39 Furthermore, our proposed algorithm is also written using MATLAB. In the end, the MATLAB generates intuitive graphs of data results.

Results and discussion

The SLSTM algorithm has two outstanding advantages. One is that it saves useful information for a long time. The other is that the algorithm is optimized by adjustment of threshold size. We set four different thresholds in Figure 6. Furthermore, we choose the best threshold from the generated results.

Figure 6.

Selected different thresholds and performance of SLSTM algorithm (energy attenuation trend).

We set four thresholds of 1, 5, 10, and 20. We find that when the threshold is equal to 20, the energy consumption of entire network is the smallest. In addition, when the threshold is 1, the total energy consumption is the largest. We consider the accuracy and energy consumption synthetically. Finally, we choose 10 as the value of threshold, which is used to do a comparative simulation.

In the simulation experiment, we compare four models (i.e. original, LSTM, the proposed algorithm of Bai et al.,⁹ and SLSTM). As show in Figure 7, we iterate a total of 100 rounds of information transmission. The initial total energy of these four networks is $50 J$ .

Figure 7.

We compared four different models (original, the proposed algorithm of Bai et al.,⁹ LSTM, and SLSTM), and their respective residual energy.

As the number of rounds increases, the total energy of the four models decreases. Moreover, the deceleration of the original is the fastest. At the $100 th$ round, total residual energy of the original is almost zero, while LSTM remains about $17 J$ (i.e. 34% remaining), and SLSTM performs best which remains nearly $19 J$ (i.e. 38% remaining). Therefore, the energy consumption of the whole network is very low in the SLSTM model, which is vital for the life of WSN. Another indicator to judge the quality of WSN is the death rate of nodes in the network. We compare the number of death nodes of four models after 100 rounds. We define a single node as a dead node when its energy is less than $0.15 J$ . In general, the normal information sending and receiving cannot be carried out after the energy of node is less than $0.15 J$ .

As shown in Figure 8, at the $60 th$ round, dead nodes begin to appear in the original. In the proposed algorithm of Bai et al.,⁹ the dead nodes appear at the $68 th$ round, while dead nodes appear in LSTM at the $70 th$ round and dead nodes appear in SLSTM at the $74 th$ round. At the $100 th$ round, the death rate of nodes of original is almost 69%, while the LSTM is 35% and the SLSTM is 42%. Therefore, in terms of the number of dead nodes, SLSTM performs best in these four models.

Figure 8.

We compared four different models (original, the proposed algorithm of Bai et al.,⁹ SLSTM, and LSTM), and their number of dead nodes in a network.

The above simulations verify that our model is excellent on saving node energy. Next, our simulation verifies the accuracy of SLSTM. We get the model parameters by training of data set. Whereafter, the fusion modal compresses the test data by SLSTM and sends the results to the computing center. By comparing the predicted results to the real results, we get the prediction accuracy of SLSTM in fusion modal.

From Table 2, we can see that our SLSTM model achieves the same accuracy as the LSTM model when the thresholds are 1 and 5. They were both 98.32% accurate under the same training set and the same test set. When the threshold is 20, the accuracy of SLSTM is as low as 97.50%. We can also see that the SLSTM model needs an appropriate threshold to achieve good reliability. To this end, we should adjust the value of threshold repeatedly.

Table 2.

Comparison of LSTM algorithm, the proposed algorithm of Bai et al.,⁹ and SLSTM calculation accuracy under different thresholds.

Model	Trainingset	Validationset	Test set	Accuracy
The algorithmof (9)	10,000	1500	8500	98.22%
LSTM	10,000	1500	8500	98.32%
SLSTM(1)	10,000	1500	8500	98.32%
SLSTM(5)	10,000	1500	8500	98.32%
SLSTM(10)	10,000	1500	8500	97.62%
SLSTM(20)	10,000	1500	8500	97.50%

LSTM: long short-term memory; SLSTM: sparse long short-term memory.

Conclusion

In this article, we propose a sparse model based on LSTM algorithm, that is, SLSTM. We introduce the mathematical basis of the algorithm. In the simulation experiment section, we use 20,000 water quality monitoring data and choose MATLAB and MySQL as our simulation tools. We evaluate the performance of four models based on the total residual energy of the system and the total number of node deaths. In the simulation results, we compare the performance of the original, LSTM, and SLSTM models. The results show that our SLSTM is excellent, which saves a lot of energy for WSN and thereby lengthens the life of WSN. One advantage of our approach is that thresholds in sparse matrix can be adjusted. Therefore, our SLSTM model can be used in more wireless networks. Furthermore, it will be helpful for information fusion in other networks.

Footnotes

Handling Editor: Janos Botzheim

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study was supported in part by the National Natural Science Foundation of China (NSFC; grant no. 61801171), in part by the Industry university research project of Henan Province (grant no. 172107000005), and in part by the basic research projects in the University of Henan Province (grants no. 19zx010).

ORCID iD

Mingchuan Zhang

References

Cao

Choi

Masazade

et al . Sensor selection for target tracking in wireless sensor networks with uncertainty. IEEE Trans Sig Process 2016; 64(20): 5191–5204.

Zheng

Fei

A smart collaborative routing protocol for reliable data diffusion in IoT scenarios. Sensors 2018; 18(6): 1926–1947.

Gavalas

Mpitziopoulos

Pantziou

et al . An approach for near-optimal distributed data fusion in wireless sensor networks. Wire Netw 2010; 16(5): 1407–1425.

Sun

Deng

ZL.

Multi-sensor optimal information fusion Kalman filter. Automatica 2004; 40(6): 1017–1023.

Chen

Jiang

Kasetkasem

et al . Channel aware decision fusion in wireless sensor networks. IEEE Trans Sig Process 2004; 52(12): 3454–3458.

John

Vorontsov

MA.

Multiframe selective information fusion from robust error estimation theory. IEEE Trans Image Process 2005; 14(5): 577–584.

Tsanas

Zaartu

Little

et al . Robust fundamental frequency estimation in sustained vowels: detailed algorithmic comparisons and information fusion with adaptive Kalman filtering. J Acoust Soc Am 2014; 135(5): 2885–2901.

Zhang

Feng

Multi-rate distributed fusion estimation for sensor networks with packet losses. Automatica 2012; 48(9): 2016–2028.

Bai

Wang

Sheng

et al . Reliable data fusion of hierarchical wireless sensor networks with asynchronous measurement for greenhouse monitoring. IEEE Trans Contr Systems Technology 2018; 99: 1–11.

10.

Jia

Kong

et al . Locally weighted fusion of structural and attribute information in graph clustering. IEEE Trans Cybernet 2017; 49(1): 247–260.

11.

Quan

Wang

Software-defined collaborative offloading for heterogeneous vehicular networks. Wire Comm Mob Comput 2018; 2018: 3810350.

12.

Qian

Wang

et al . Multigranulation information fusion: a Dempster-Shafer evidence theory based clustering ensemble method. In: International conference on machine learning & cybernetics, Guangzhou, China, 12–15 July 2015. New York: IEEE.

13.

Liu

Wang

Optimal online data dissemination for resource constrained mobile opportunistic networks. IEEE Trans Veh Tech 2017; 66(6): 5301–5315.

14.

Qin

Yang

Xue

et al . A one-layer recurrent neural network for pseudoconvex optimization problems with equality and inequality constraints. IEEE Trans Cybernet 2017; 47(10): 3063–3074.

15.

Mesin

Aram

Pasero

A neural data-driven algorithm for smart sampling in wireless sensor networks. EURASIP J Wire Comm Netw 2014; 2014(1): 23–31.

16.

Chen

Wang

et al . BP neural network based continuous objects distribution detection in WSNs. Wire Netw 2016; 22(6): 1917–1929.

17.

Suk

Lee

Shen

Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. Neuroimage 2014; 101: 569–582.

18.

Koesdwiady

Soua

Karray

Improving traffic flow prediction with weather information in connected cars: a deep learning approach. IEEE Trans Veh Technol 2016; 65(12): 9508–9517.

19.

Han

Chen

Liu

et al . CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Trans Cybernet 2017; 48: 3171–3183.

20.

Chen

Jahanshahi

MR.

NB-CNN: deep learning-based crack detection using convolutional neural network and Naive Bayes data fusion. IEEE Trans Ind Electron 2018; 65(5): 4392–4400.

21.

Hochreiter

Schmidhuber

Long short-term memory. Neur Comput 1997; 9(8): 1735–1780.

22.

Zhen

Gan

Liang

et al . LSTM-CF: unifying context modeling and fusion with LSTMs for RGB-D scene labeling. In: European conference on computer vision, Amsterdam, 11–14 October 2016, pp.541–557. Cham: Springer.

23.

Gandhi

Sharma

Biswas

et al . GeThR-Net: a generalized temporally hybrid recurrent neural network for multimodal information fusion. In: European conference on computer vision, Amsterdam, 8–10 October 2016, pp.883–899. Cham: Springer.

24.

Gers

Schmidhuber

Cummins

Learning to forget: continual prediction with LSTM. Neur Comput 2014; 12(10): 2451–2471.

25.

Rahmani

Atia

GK.

High dimensional low rank plus sparse matrix decomposition. IEEE Trans Sig Process 2017; 65(8): 2004–2019.

26.

Kreibich

Neuzil

Smid

Quality-based multiple-sensor fusion in an industrial wireless sensor network for MCM. IEEE Trans Ind Electron 2014; 61(9): 4903–4911.

27.

Gao

Chen

et al . Discriminative multiple canonical correlation analysis for information fusion. IEEE Trans Image Process 2017; 27: 1951–1965.

28.

Huang

Qian

Zhu

Encoding syntactic knowledge in neural networks for sentiment classification. ACM Trans Inform Syst 2017; 35(3): 1–27.

29.

Zhang

Xiong

et al . Long short-term memory recurrent neural network for remaining useful life prediction of lithium-ion batteries. IEEE Trans Veh Technol 2018; 67(7): 5695–5705.

30.

Neil

Lee

Delbruck

et al . Delta networks for optimized recurrent network computation. PMLR 2017; 70: 2584–2593.

31.

Cho

Merrienboer

Bahdanau

et al . On the properties of neural machine translation: encoder-decoder approaches. In: Proceedings of the 8th workshop on syntax, semantics and structure in statistical translation, Doha, Qatar, 25 October 2014. Stroudsburg, PA: Association for Computational Linguistics.

32.

Simen

Cohen

Holmes

Rapid decision threshold modulation by reward rate in a neural network. Neural Netw 2006; 19(8): 1013–1026.

33.

Baldi

Neural networks, orientations of the hypercube, and algebraic threshold functions. IEEE Trans Inform Theor 1988; 34(3): 523–530.

34.

Zhang

et al . Activity invariant sets and exponentially stable attractors of linear threshold discrete-time recurrent neural networks. IEEE Trans Auto Contr 2009; 54(6): 1341–1347.

35.

Shortle

Dunn

JW.

The relative efficiency of agricultural source water pollution control policies. Am J Agri Econ 1986; 68(3): 668–677.

36.

Puckett

LJ.

Identifying the major sources of nutrient water pollution. Environ Sci Technol 1995; 29(9): 408A–414A.

37.

Vedaldi

Lenc

. MatConvNet: convolutional neural networks for MATLAB. In: Proceedings of the 23rd international conference on multimedia, Brisbane, QLD, Australia, 26–30 October 2015, pp.689–692. New York: ACM Press.

38.

Grewal

Andrews

AP.

Kalman filtering: theory and practice using MATLAB. 3rd ed. Hoboken, NJ: Wiley-Interscience, 2008.

39.

Liu

SQ.

Research and application on MATLAB BP neural network. Comput Eng Des 2003; 11(3): 025.