Abstract
A core factor to consider when designing wireless sensor networks is the reliable and efficient transmission of massive data from source to destination. In practical situations, data transmission is often disrupted by link interference and interruption resulting in the data losses. Link quality prediction is an important approach to solve this problem. By estimating the link quality based on the past knowledge and information, link quality prediction is essential for routing decisions of future data transmission. Traditional link quality prediction algorithms are simply based on the statistical information of the links in the wireless sensor network. By introducing complex network theory and machine learning techniques, we propose a neighborhood-based nonnegative matrix factorization model to predict link quality in wireless sensor networks. Our model learns latent features of the nodes from the information of past data transmissions combing with local neighborhood structures of the underlying network topology and then estimates the link quality depending on the common latent features of the two nodes between the link. Extensive experiments on both real-world networks and simulation networks demonstrate the effectiveness and efficiency of our proposed model.
1. Introduction
Wireless sensor network (WSN) is a special kind of mobile communication network, which exhibits the properties of dynamic network topology, multihop communication, and limited energy [1, 2]. In wireless sensor network, stable links have great significance since many critical applications fundamentally rely on efficient and reliable data transmission from source node to sink node [3, 4]. However, in practical situations, data transmission is often disrupted by link interference and interruption resulting in the data losses [5, 6]. When the link quality deteriorates, data losses are inevitable since the size of internal buffers of intermediate nodes is limited. In the presence of poor link quality, the sender could easily fill up the buffer so that there is no sufficient time and space to transmit all packets reliably. This problem is especially severe for those applications which strongly require real-time data transmission or high-fidelity data transmission [7]. The main reason for this problem is the lack of feedback of link layer information to the upper layer applications and protocols.
Link quality prediction is an important approach to decrease the probability of data losses in wireless sensor network. For most applications in wireless sensor networks [8–10], it is necessary that each node has thorough knowledge about its direct neighbors. This information is collected and provided by neighborhood management protocols. One important criterion used by neighborhood management protocols to determine the importance of a node is the quality of the communication between nodes, which is provided by link quality prediction [11]. By estimating the link quality based on the past knowledge and information, link quality prediction is essential for routing decisions of future data transmission.
However, link quality prediction remains a challenging task due to the dynamic nature of wireless links. Firstly, the correlations between nodes cannot be directly obtained from the network. Secondly, in some situations, network data is quite sparse that the links used for data transmission are only a small proportion of all possible links.
Motivated by the great practical significance, many algorithms for link quality prediction have been proposed recently. The most existing algorithms simply use statistical information of the links to evaluate the link quality. In this paper, we proposed a nonnegative matrix factorization model for predicting link quality in wireless sensor networks, by introducing complex network theory and machine learning techniques. Our model is a supervised learning model. In our model, we associate the probability of a link with a nonnegative strength variable, which is related with the latent features of the nodes between the link. We also consider the influence of the neighborhood structure and make the latent features of each node learnt from the past data transmissions combing with local neighborhood structures of the underlying network topology.
The rest of the paper is organized as follows. In Section 2, we introduce the related background of link quality prediction in wireless sensor networks. Our proposed neighborhood-based nonnegative matrix factorization model for link quality prediction is described in detail in Section 3. The experimental results and discussions are reported in Section 4. Finally, Section 5 gives the conclusion of this paper.
2. Related Work
In recent years, various approaches have been proposed to provide a meaningful metric describing the actual link-quality and then predicting its future behavior. In this section, an overview of popular link quality prediction algorithms is presented at first. Then, we make an introduction of the link prediction theory in complex networks.
2.1. Link Quality Prediction for Wireless Sensor Networks
Basically, the measurements of link quality can be classified into two categories: physical metrics and logical metrics. Physical metrics depend on the radio hardware and evaluate link quality by the signal strength of a received packet. No additional costs are required by physical metrics, since the measurement is performed by the receiver hardware every time, a packet is received. Common physical metrics include the received signal strength indication (RSSI), the link-quality indication (LQI), and signal-to-noise ratio (SNR). On the other hand, logical metrics estimate the link quality by keeping track of message losses. They do not depend on specific hardware so that they are not influenced by the characteristics of hardware. Typical examples of logical metrics are packet success rate (PSR) [12], required number of packets (RNP) [13], and expected transmission count (ETX) [14].
Woo et al. [12] made the first attempt to estimate the link quality by proposing the window mean with exponentially weighted moving average (WMEWMA). This metric computes the average success rate of the link over a time period and smoothens the average with an EWMA. WMEWMA predicts the PSR of the link and has been widely adopted in WSNs. de Couto et al. [14] proposed ETX which uses the number of expected transmissions for a successful packet transmission as the evaluation of link quality. The ETX of a link is calculated using the forward and reverse delivery ratios of the link. The number of received packets within a fixed time period is counted and compared to the number of expected packets that are periodically broadcasted by each node. RNP introduced by Cerpa et al. [13] incorporates the distribution of losses within the time period. This metric is based on their observation fact that a link with consecutive losses should be rated lower than links with discrete losses.
Weyer et al. [15] proposed adaptive link estimator (ALE), which is an EWMA filter of the measurement PSR. Links with different qualities have different weights in the EWMA filter. For good links, ALE uses a higher weight for more stable estimation, while links with a lower quality are estimated in an agile fashion for a faster reaction.
A Kalman Filter Based Link-Quality Estimation was proposed by Senel et al. [16]. A Kalman filter is used to smoothen the RSSI of successfully received packets and the noise floor is subtracted to obtain an estimation of the SNR. The final PSR is derived by applying a hardware specific SNR-PSR mapping for the transceiver. This approach is very complicated and only applicable to the cases in which SNR and PSR are strictly correlated.
Chen et al. [17] proposed a model to predict link interruption and route interruption in wireless sensor networks by the historical link information and channel state obtained by periodic detection. The periodicity of environmental changes is utilized to help predict link interruption.
LISP proposed by Ma et al. [18] is based on the premise that, in order to achieve the best performance, the application layer behavior should be aware of the link layer conditions and adjust its behavior accordingly. They used the state space model to predict link quality and then provided these estimates as a system level service to application developers.
Wang et al. [19] introduced the supervised learning techniques to predict link quality in wireless sensor network. In their approach, link quality prediction is modeled as a classification problem with the features of the link, including forward probability, backward probability, channel load, and node depth from the source node.
2.2. Link Prediction for Complex Networks
In complex network theory, the research field of link prediction is very similar to the link quality prediction in wireless sensor network. Link prediction in complex networks focuses on estimating the likelihood of the existence of links in the network rather than the link quality.
Link prediction algorithms can be roughly classified into two classes: unsupervised models and supervised models. In unsupervised models, the probability of the existence of a link is measured by some specific similarity indices between the two nodes. Local similarity indices [20–24] only depend on the information of the neighborhoods, such as the common neighbors. Global similarity indices [25–27] require the entire topological information of the network from the perspectives of paths, random walks, and other properties. The similarity in unsupervised models is predefined and is invariant to the specific structure of the input networks.
On the other hand, supervised models use the supervised learning approaches. They usually propose some patterns of the link behaviors and learn a series of parameters according to the observed links. Some popular approaches are hierarchical structure model (HSM) [28], stochastic block model (SBM) [29], and latent factor model (LFM) [30, 31]. The former two models use the explicit topological properties of the network, while latent factor models depend on latent features of the network, which can be viewed as an implicit representation of the network topological information. The main drawback of the former two models is the high calculation complexity, which makes them only applicable to small networks. By contrast, the latent factor models can be trained in linear time with the number of observed links.
3. Our Model
In our model, link quality prediction is formally modeled as a supervised learning problem. The information of past data transmissions is the training set of the model. Given a wireless sensor network, let the link between node
3.1. Basic Latent Factor Model
In basic latent factor model (LFM), each node
The latent features of each node can be learnt by solving the following optimization problem:
Stochastic gradient descent method is usually used to solve this optimization problem. The total training process exhibits linear time with the number of transmissions in the training set.
3.2. Neighborhood-Based Nonnegative Matrix Factorization Model
In basic latent factor model, the latent features of some nodes may have negative values, which may mislead the whole approach. Moreover, the influences of the neighborhood structure in the network are not considered in basic latent factor model.
We first assume that the latent feature matrix
Let latent feature matrix
Now, let us consider the influence of the neighborhoods on the link probability between the nodes. In unsupervised models, several link metrics are defined in the following form:
In order to reduce the number of parameters, we factorize the matrix
Finally, we combine the previous two models and predict the link quality by
An optimal solution of this optimization problem can be obtained using stochastic gradient descent method. Let the prediction error where
Due to the link function, the predicted score of link quality in our model is in the range
4. Experimental Results
4.1. Evaluation on Complex Networks
We first apply our neighborhood-based nonnegative matrix factorization model to several real-world networks, which are widely used in link prediction literature. General information of these real-world networks is shown in Table 1. We also make comparisons with some unsupervised link prediction models, including Common Neighbors Index (CN) [20], Salton Index [22], Preferential Attachment Index (PA) [21], Adamic-Adar Index (AA) [23], and Karz Index [25]. The experiments are implemented by MATLAB 2009b running on a PC with a 3.0 GHz processor and 3 GB memory.
General information of the real-world networks.
To evaluate the accuracy of different models, we adopt AUC proposed by Hanley and McNeil [33] as the basic measure for the experiments reported in this paper. The AUC value is defined as the probability in which a randomly chosen high-quantity link is assigned with a higher score than a randomly chosen low-quality link. If among
For each network, the present links are partitioned into training set (90%) and test set (10%). The performances of different models on real-world networks are shown in Table 2.
The AUC values of different models on real-world networks.
As is shown in Table 2, our model performs the best among all the other models on most real-world networks and is only inferior to basic LFM on Email network and Karz Index on Powergrid network. For Karate network, Blog network, and PGP network, our model shows superiority over the unsupervised link prediction models and obvious improvement over basic LFM.
We also find out the reason why the performance of our model is inferior to that of Karz Index on Powergrid network. This is due to the fact that Powergrid network is a highly sparse network in which about 60% of the nodes only have one or two links connecting with other nodes. The sparsity makes many model parameters get insufficient training, which results in the fact that our model does not perform well on this network.
4.2. Simulation on Wireless Sensor Networks
To verify the prediction model, we also make simulations of our model on wireless sensor networks. Here, we use the Matlab as the simulation platform. The channel model is flat Rayleigh fading, carrier frequency is 20 kHz, the modulation is (QAM), and data transmission rate is set to 1000 bits/s. The packet size is set to 1000 bits. The topology of the wireless sensor network is randomly generated by the LFR approach [34]. An example network topology is shown in Figure 1, where the red node denotes the source node and the black node denotes the sink node. The training set contains all the packet transmission records in the wireless sensor network in a period of 1000 seconds. The packet transmission records in the next period of 200 seconds constitute the test set of the experiment.

An example topology of wireless sensor network. The red node denotes the source node and the black node denotes the sink node.
Besides AUC, three other measures, known as precision, recall, and
The recall is defined as the ratio of the number of true positives to total number of instances that are actually positive:
Figure 2 shows the precision, recall, and

The precision, recall, and
We also use our model to select the next hop for the data transmission in the network. The neighbor node with larger predicted score has a higher probability to become the next hop of the packet transmission. The improvements of data transmission with our model in a certain period of time are shown in Table 3.
The improvements of data transmission with our model.
From the above experimental results, we can see that our model is effective for link quality prediction and very suitable for practical applications in wireless sensor networks.
5. Conclusion
In this paper, we propose a neighborhood-based nonnegative matrix factorization model for solving the problem of link quality prediction in wireless sensor networks. We extend link prediction model in complex networks to wireless sensor networks and use the supervised learning techniques to predict the link quality in wireless sensor networks. In our model, the quality of a link is associated with a nonnegative strength variable, which is related with the latent features of the nodes. The influence of the neighborhood structure is also taken into consideration. Thus, the latent features of each node are learnt from the overall topological structure combing with local neighborhood structures of the underlying network. We test our model on several real-world complex networks and also make simulations on wireless sensor networks. The experimental results demonstrate the effectiveness and efficiency of our model for link quality prediction in wireless sensor networks.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This research work is funded by the National Science Foundation of China (61271316), Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, and Chinese National Engineering Laboratory for Information Content Analysis Technology.
