Abstract
Wireless sensor networks are designed to perceive, gather, and process external environmental information and send it to the observer. However, the transmission of mass information is a challenge to the sensor nodes. To address this challenge, information fusion technologies are proposed to reduce mass redundant data. However, these techniques rarely consider the historical information, and thereby often encounter the difficulty of low prediction accuracy. In order to solve this difficulty, we propose a novel information fusion approach for the cluster heads. The proposed approach is based on time-recurrent neural network, called sparse long short-term memory, which is derived from the long short-term memory network. The sparse long short-term memory uses sparse matrix to reduce the dimension for a high-dimensional coefficient matrix. Therefore, the computational cost of the fusion algorithm is reduced in wireless sensor networks. The simulation results show that the sparse long short-term memory algorithm increases the survival number of sensor nodes in wireless sensor networks. Furthermore, the prediction accuracy of the sparse long short-term memory algorithm is almost the same as the other comparison algorithms.
Introduction
Wireless sensor networks (WSNs) adapt to many demanding environments due to the development of sensor technology and microelectronics industry, for example, military affairs, environmental monitoring, forest fire prevention, and traffic control.1,2 However, the transmission of mass observed data shortens the life of sensor nodes, which is a challenge of WSNs. 3 To meet this challenge, information fusion technology is used in WSNs, since it reduces redundant information by integrating multiple sources information.
In the past few years, some information fusion methods have been proposed. Sun and Deng 4 used the likelihood ratio to fuse information. However, the method requires perfect knowledge of local decision performance indices. Therefore, Chen et al. 5 proposed a fusion method combining channel perception and likelihood ratio, which does not need the perfect knowledge. In order to avoid random packet loss in sensor networks, John and Vorontsov 6 and Tsanas et al. 7 considered information fusion in terms of an estimation method. Furthermore, Zhang et al. 8 proposed a distributed fusion estimation method. In order to solve the data fusion problem of WSNs for the greenhouse monitoring system, Bai et al. 9 proposed a weighted based data fusion estimation algorithm for WSNs. However, these estimation fusion methods require a good estimation. Without requirement of estimation, some fusion technologies based on the clustering algorithm were proposed in previous works.10–12 Furthermore, some online algorithms were proposed in sensor networks. 13
The methods cited above do not consider the impact of previous observations. However, the external environment is continuously changing, and the current state of external environment is related to the historical state. There is a lot of useful information in the previous observations. For this reason, this article proposes a novel neural network approach based on the historical information. The approach utilizes long short-term memory (LSTM) networks to retain the useful parts of historical data and discard the useless parts. The simulation experiments demonstrate that we not only reduce a lot of redundant information but also maintain the prediction accuracy.
In this article, we have the following main contributions: our algorithm compresses the high-dimensional coefficient matrix, so it reduces the amount of information transmitted by the sensors, and ultimately lengthens the life of WSNs and reduces the number of sensor deaths; the algorithm has low complexity and low energy consumption for calculations; and there is large room for scalability (no additional cost of calculations).
The content of this article to be introduced next is as follows. Section 2 is about some related work. Section 3 introduces the model and algorithm of sparse long short-term memory (SLSTM). Our simulation experiments and results are in Section 4, and Section 5 is the conclusion.
Related work
Artificial neural network (ANN) is a physically realizable system that simulates the structure and the function of human brain neural cells. The network has various forms, such as feedforward neural network, convolution neural network, and recurrent neural network (RNN). RNN has fixed weights, external inputs, and internal states. Nowadays, RNN plays an important role in neural networks. 14 In recent years, RNN has performed well in image recognition and text analysis. For this reason, many people utilized it to improve the efficiency of information fusion.15,16 Therefore, many information fusion technologies based on neural networks are widely used.17–20
The LSTM network is a kind of time RNN. One characteristic of LSTM is suitable for processing and predicting events with relatively long intervals and delays in the time series. 21 Nowadays, the LSTM method is also of great interest in information fusion situations. Zhen et al. 22 proposed a novel LSTM Context Fusion Model. The model captures and fuses contextual information from multiple channels of photometric and depth data. Although the model achieves a fusion rate of around 49% on the SUNRGBD data set, this is not ideal in WSNs. Gandhi et al. 23 proposed a novel temporally hybrid RNN system for multi-modal information fusion. In his system, the LSTM algorithm was used for data fusion. For all we konw, LSTM has the advantage of preserving historical information, but training high-dimensional coefficient matrix is a great challenge. 24 To meet this challenge, the sparse technology was proposed to deal with high-dimensional coefficient matrix. Rahmani and Atia 25 solved the high-dimensional low rank problem of big data by sparse matrix. In this article, we propose a simple and sparse model named SLSTM. The SLSTM utilizes the delta network to generate sparse matrices. Then, the sparse matrices are used to reduce the dimension of the high-dimensional coefficient matrix in SLSTM. We not only accelerate the training process but also maintain the accuracy of the prediction. Another advantage of this approach is that the thresholds of the delta network can be adjusted for different scenarios.
In the simulation, our algorithm is compared with the original model (i.e. full transmission of information), the LSTM algorithm (i.e. used in), 23 and the proposed algorithm by Bai et al. 9 The simulation results show that the realization of SLSTM is excellent.
Model and algorithm of SLSTM
Normally, the WSNs installed in external environment do not have information fusion modules, as shown in Figure 1(a). There are lots of sensor nodes in a WSN. Some sensor nodes, closer together, have a common cluster head. The sensors send all the observed data to their cluster heads, and then each cluster head sends its received data to the computing center. Therefore, the power consumption of cluster heads is very fast. Moreover, the frequent collection of environmental information by sensors leads to a large amount of redundant information. In order to reduce the energy consumption of the redundant data transmissions, many people compressed the detected data through information fusion technology in cluster heads.26,27 The transport model is shown in Figure 1(b).

(a) The original wireless sensor network (WSN). It consists of sensor nodes, cluster heads, and computing centers. In the WSN, a large number of sensor nodes are installed in various locations. Generally, sensor nodes that are closer together share a cluster head. The cluster head sends the information collected by the sensors to the computing center. Finally, all information collected is processed by the computing center. (b) The WSN network with a information fusion module, and the SLSTM algorithm is used in this module.
Figure 2 shows the methodology of our proposed model. We add a memory device in each cluster head to store useful information that comes from the previous round. The memory device is a storage component added to the sensors, which is a simple implementation of the current hardware technology. The useful information is the part of the historical records, which is generated by the SLSTM, that is, the output of SLSTM. However, we focus on the effect of information fusion on the energy consumption of WSN. Therefore, we assume that the sensors do not consume energy without collecting information. Furthermore, we denominate collecting and sending of information once as a round. At the beginning of each round, we get the difference between the current round information and the previous round information. Then, we judge whether this difference exceeds the threshold. If not, the round ends. Otherwise, the proposed algorithm processes the message in the next step. Finally, the cluster heads send the final fused message to the computing center. In the fusion section, our SLSTM algorithm is used to fuse the detected information. Next, we will analyze the model in detail.

Overview of our method: the sensors get external environment information. In the cluster head, the fusion compress information if threshold is yes, otherwise the round is end. The SLSTM works in the fusion.
Model analysis
Our model applies to the cluster head which has a memory device. The input and output of the previous round are recorded by the memory device. The SLSTM unit is shown in Figure 3. The middle portion is the unit state of the current round. The left portion is the unit state of the last round, and the right portion is the unit state of the next round. Obviously, there is only one unit from beginning to end. However, the previous round output of the unit is memorized, and it participates in the calculation of the current round. The current input

The working principle of SLSTM unit.
LSTM
The RNN is a special neural network which has the ability to memorize sequences. In other words, the RNN is a recurrent network that allows information to be persistent. Moreover, the RNN handles the problem between the previous moment and the next moment of inputs. At present, the RNN has been successful in natural language processing (NLP).
28
Unfortunately, the RNN is difficult to train for long-term dependence problems, because the training gradient either tends to disappear or to explode.
29
The disappearance of gradient is caused by the exponential decay of the update information. In addition, the exponential increment of the update information leads to an explosion of gradient. The topology of the RNN is shown in Figure 4(a), where

(a) Recurrent neural networks and (b) long short-term memory.
In order to solve the aforementioned disadvantages of RNNs, Hochreiter and Schmidhuber
21
proposed an innovative recurrent unit called LSTM. At time t in the LSTM networks, a unit learns the knowledge of time
The memory cell
where the new memory content is
When some new information enters the memory cell, it is determined by the forgetting gate f whether to forget, that is, if the input
where
Sparse method
In recent years, the sparse optimization method is used to reduce dimensions of the selected features. The sparse optimization method constructs a sparse optimization model by the sparseness of solutions. Furthermore, this method is based on the structural characteristics of the data. Finally, the sparse optimization method obtains a sparse numerical solution. This method has two different decision variables. One is vector sparse optimization model and the other is matrix joint sparse optimization model.
In this article, we use the delta network 30 to sparse optimize the LSTM algorithm. The delta network uses a cached neuron activation threshold to skip unnecessary calculations, where the input changes slightly compared to the previous cache. The purpose of the delta network is to transform a dense matrix-vector multiplication into a sparse one.
In order to understand the delta network better, we explain it through formulas. This delta network formulation is from 30
According to equation (6), we know that there was a recursive relationship between
It can be seen from the results that the sparse optimization of the delta network is consistent with the original formula.
SLSTM algorithm
The computational cost of the LSTM is very high, which is caused by a large number of parameters. In order to utilize the LSTM to fuse information in WSN, we optimize the LSTM to reduce its computational cost. Neil et al. 30 proposed a method to optimize the gated recurrent units (GRUs) 31 using delta network. The GRU is a variant of the LSTM with only two gates, which are the update gate and the reset gate. The purpose of the delta network is to sparse the coefficient matrices, thereby greatly reducing the computational cost. In this article, we do the sparse processing by combining the LSTM algorithm and the delta network.
Algorithm 1 introduces the training process of the SLSTM algorithm. Each iteration updates the forgetting gate f, the input gate i, and the cell memory c. The
It is worth noting that the output of LSTM is generally quantified by probability, and many output layers are implemented by the function softmax
In many realistic scenarios, it is useful for learning when the difference between prior knowledge and current knowledge exceeds a certain threshold; for example, in the monitoring of water quality pollution, it is worthy of our attention when the content of water-affecting elements changes sufficiently. On the contrary, it does little to help learning when the difference is on a small scale. From the above analysis, we have been inspired to handle the coefficient matrix W.
If
where
where
In the same way
where
Each term in equations (2), (3), and (4) can be replaced by the delta update defined in equations (5) and (6), and we have
Here, we introduce the time complexity of Algorithm 1. We define
Furthermore, the total memory cost of equation (5) is
After the standard formula is sparse, we get a new formula (6) and calculate its cost as
Moreover, the total memory cost of the formula (6) is
In many scenarios of WSNs, a certain monitoring value changes very little over a long period of time, such as forest fire monitoring and gas monitoring. Therefore, the value of the
For example, if
Accuracy improvement
There are two important principles for information fusion in WSN. One is to reduce network energy consumption and extend network life. The other is to ensure that the accuracy of information fusion is not affected; for example, in water quality monitoring, in most cases, water quality changes slowly, that is, water quality data change little. However, once the pollution occurs, the water quality data will change, thereby we should report to the data center in time.
Many threshold methods have been used in neural networks and have achieved good results.32–34 The selection of threshold is a learning process, and we need to make multiple substitutions to determine a good value. Too big threshold may lead to inaccurate results, and too small threshold cannot avoid unnecessary calculation. In addition, the threshold is selected according to the actual scenario. Based on the above conditions, our algorithm balances fusion rate and fusion accuracy with threshold values. In the simulation experiments, we compare the different selection of threshold value. Finally, we choose an appropriate threshold value, which achieves the similar accuracy as the other three algorithms.
Simulations and results
In this section, we compare the original method, LSTM, the proposed algorithm of Bai et al., 9 and SLSTM in the task of water quality which is monitored by WSN. The task aims at monitoring the content of various elements in water, which devotes to timely and effectively warning the pollution degree of water quality. There are many indicators to measure water pollution, and we select the most common indicators: pH value, dissolved oxygen, alkalinity, heavy metal content, and hardness (i.e. total calcium and magnesium ion concentration). Furthermore, we record the value of these indicators every 2 h. The relationship of these indicators and water pollution is not linear. Therefore, the linear regression cannot capture the relationship. Moreover, it is extremely difficult to determine a nonlinear regression model, thereby the nonlinear regression model is not suitable to deal with the large data fusion problem. In order to capture the relationship and accurate prediction, we utilize the SLSTM algorithm to capture the hidden relationships between indicator values at different time. Finally, we predict the pollution value of the next time.
The predictions of water quality depend on the previous information, which is similar to time series process. We cannot accurately predict the facts just by the values of the current indicators, because this is a complicated gradual change process.35,36 For instance, the anomaly of dissolved oxygen may be caused by weather or the movement of the earth’s crust instead of pollution; the change of pH value may also be caused by the activity of fishes or the growth of aquatic grasses. Therefore, using the SLSTM model can comprehensively consider the previous information and the current indicator values, which can provide a new idea for the prediction of water pollution.
Model training
In this section, we use 10,000 labeled data of water quality monitoring to train the SLSTM model. We define five weights to represent pH value, dissolved oxygen, alkalinity, heavy metal content, and hardness, which is denoted by the symbol
The updated policy is as follows
where T denotes the number of iterations,
After 10,000 iterations, we get the final value of weights. Table 1 is the final result of each weight. After training of model, our SLSTM model will be used to solve the problem of information fusion in WSN. Next, we analyze the performance of SLSTM in WSN, which is used to predict water pollution. There are two aspects of performance indicators in our simulations. One aspect is the number of alive nodes. The other aspect is the total consumption of network energy.
The final result of each weight value.
Task and data sets
The parameters of our simulation are set as follows. A total of 100 sensor nodes were randomly distributed in a plane area of

The distance from each node to the cluster head.
The consumption of computational energy is much lower than that of information transmission. In order to eliminate other influencing factors, we raise the benchmark of energy consumption for calculation. To this end, we assume the computational energy is
The softwares of the numerical simulations used in this section are MATLAB 2016a and MySQL 5.7. Our simulation data are stored in MySQL. Therefore, we use MySQL to perform a preliminary cleaning of the data source. The ML module functions in MATLAB implement the original algorithm and generate data results.37–39 Furthermore, our proposed algorithm is also written using MATLAB. In the end, the MATLAB generates intuitive graphs of data results.
Results and discussion
The SLSTM algorithm has two outstanding advantages. One is that it saves useful information for a long time. The other is that the algorithm is optimized by adjustment of threshold size. We set four different thresholds in Figure 6. Furthermore, we choose the best threshold from the generated results.

Selected different thresholds and performance of SLSTM algorithm (energy attenuation trend).
We set four thresholds of 1, 5, 10, and 20. We find that when the threshold is equal to 20, the energy consumption of entire network is the smallest. In addition, when the threshold is 1, the total energy consumption is the largest. We consider the accuracy and energy consumption synthetically. Finally, we choose 10 as the value of threshold, which is used to do a comparative simulation.
In the simulation experiment, we compare four models (i.e. original, LSTM, the proposed algorithm of Bai et al.,
9
and SLSTM). As show in Figure 7, we iterate a total of 100 rounds of information transmission. The initial total energy of these four networks is

We compared four different models (original, the proposed algorithm of Bai et al., 9 LSTM, and SLSTM), and their respective residual energy.
As the number of rounds increases, the total energy of the four models decreases. Moreover, the deceleration of the original is the fastest. At the
As shown in Figure 8, at the

We compared four different models (original, the proposed algorithm of Bai et al., 9 SLSTM, and LSTM), and their number of dead nodes in a network.
The above simulations verify that our model is excellent on saving node energy. Next, our simulation verifies the accuracy of SLSTM. We get the model parameters by training of data set. Whereafter, the fusion modal compresses the test data by SLSTM and sends the results to the computing center. By comparing the predicted results to the real results, we get the prediction accuracy of SLSTM in fusion modal.
From Table 2, we can see that our SLSTM model achieves the same accuracy as the LSTM model when the thresholds are 1 and 5. They were both 98.32% accurate under the same training set and the same test set. When the threshold is 20, the accuracy of SLSTM is as low as 97.50%. We can also see that the SLSTM model needs an appropriate threshold to achieve good reliability. To this end, we should adjust the value of threshold repeatedly.
Comparison of LSTM algorithm, the proposed algorithm of Bai et al., 9 and SLSTM calculation accuracy under different thresholds.
LSTM: long short-term memory; SLSTM: sparse long short-term memory.
Conclusion
In this article, we propose a sparse model based on LSTM algorithm, that is, SLSTM. We introduce the mathematical basis of the algorithm. In the simulation experiment section, we use 20,000 water quality monitoring data and choose MATLAB and MySQL as our simulation tools. We evaluate the performance of four models based on the total residual energy of the system and the total number of node deaths. In the simulation results, we compare the performance of the original, LSTM, and SLSTM models. The results show that our SLSTM is excellent, which saves a lot of energy for WSN and thereby lengthens the life of WSN. One advantage of our approach is that thresholds in sparse matrix can be adjusted. Therefore, our SLSTM model can be used in more wireless networks. Furthermore, it will be helpful for information fusion in other networks.
Footnotes
Handling Editor: Janos Botzheim
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study was supported in part by the National Natural Science Foundation of China (NSFC; grant no. 61801171), in part by the Industry university research project of Henan Province (grant no. 172107000005), and in part by the basic research projects in the University of Henan Province (grants no. 19zx010).
