Abstract
Wireless sensor network is a self-organizing multihop network, which is composed of a large number of wireless sensor nodes and usually deployed in unattended environment to collect target information. At present, WSN is facing a growing range of security threats owing to its wireless and resource-constrained characteristics. Compromised sensor nodes can easily corrupt data accuracy and integrity by falsifying sensed information, selectively forwarding or misdirecting received data packets during the process of data aggregation. To solve these security problems, we propose a smart reputation-based data aggregation protocol for WSNs (SRDA) that can provide adequate protection of data accuracy and integrity for wireless sensor network. The simulation results indicate that SRDA can effectively identify the compromised nodes and isolate them from the network by the deployment of the smart reputation system.
1. Introduction
Wireless sensor networks, for their low cost, convenient deployment, and high automatic, are being widely used in various areas, such as military surveillance, industrial production, medical monitoring, and hazardous materials transportation. Wireless sensor networks have some unique characters which distinguish them from the traditional networks [1]. One is the significant amount of redundant data generated by the overlapping sense range. In a large scale wireless sensor network, the redundant data can cause a large amount of unnecessary data traffic, bring a processing burden to the base station, and delay the decision time of the system. Moreover, node energy is limited and is unrenewable; the enormous amount of energy consumption generated by the redundant data will dramatically shorten the lifetime of the entire network. To improve the energy efficiency of WSN, the technology of data aggregation was introduced, which can effectively increase the operational efficiency and prolong the network lifetime by reducing redundant data [2]. The essence of data aggregation can be defined as the process of eliminating redundant data and provide the processed data to the base station.
Meanwhile, WSN is facing a growing range of security threats, which are exacerbated by the characteristics of WSN, such as wireless channel sharing and resource-constrained characteristics. Wireless sensor networks are usually deployed in unattended environment which lacks physical security; sensor nodes can be easily compromised by the intruders to perform malicious activities. Therefore, the detection of the compromised sensor nodes has become a focus in the design of secure data aggregation protocols. According to the description above, an energy efficient data aggregation protocol should be able to perform data aggregation securely in the network which might contain compromised sensor nodes. Therefore, an effective mechanism of detecting compromised sensor nodes is needed to achieve this target. Parts of the existing data aggregation protocols were designed under the assumption that all the sensor nodes in the network are trustable, which are vulnerable, facing the intruders [2–4]. Some security sensitive data aggregation protocols are cryptography based [3, 5]. However, cryptographic technology is not sufficient to solve the security problems caused by the compromised sensor nodes, because, according to the operation principles of the cryptosystems [3, 5], compromised sensor nodes can pose as the authorized sensor nodes to access the cryptographic keys legally, and the cryptosystems cannot even distinguish between compromised sensor nodes and normal functioning sensor nodes in the network [6, 7].
Previous studies have been done on the security problems of WSN, and many results have been made by the scholars. To ensure the data integrity in WSN, Ozdemir and Çam proposed a secure data aggregation protocol which integrated with the false data detection mechanism [8]. This proposed detection mechanism can verify the integrity of aggregated data from the aggregator node. But the disadvantage of this protocol is lack of the protection of data privacy during data aggregation. In order to provide data privacy preservation in the phase of data aggregation, He et al. applied the technique of data slicing and assembling into the design of data aggregation protocol and proposed a secure data aggregation protocol with the preservation of data privacy [9]. Zhang et al. proposed an energy efficient data aggregation protocol that is applicable to WMSNs [10]. Papadopoulos et al. designed secure data aggregation protocol SIES, which is applied with homomorphic encryption technology to ensure data integrity and confidentiality in the process of data aggregation [11]. And the energy consumption of SIES is also controlled at a low level.
In this paper, we have made further research on the trust management system of wireless sensor network based on the earlier researches and analyzed the vulnerabilities of the existing reputation systems from an intruder's point of view. Based on the vulnerabilities, we proposed a new possible attack strategy, “alternating attack,” which can effectively make the compromised nodes avoid being detected by the reputation system and prolong the duration of malicious activities. The details of alternative attack will be described in the following section. Meanwhile, we found that the full-range monitoring and information sharing activities between the nodes generate a large amount of data transfers, increase the energy consumption, and reduce the lifetime of the sensor nodes.
Therefore, the security problem to be solved in this paper is to eliminate the vulnerabilities of the existing reputation systems which can be exploited to degrade the security performance of network and reduce the amount of data transfers generated by the monitoring and information sharing mechanisms of the reputation system.
To solve these issues, we proposed a smart reputation-based data aggregation protocol for wireless sensor network (SRDA). The existing reputation-based data aggregation protocols, such as RSDA and RDAT, monitor some important nodes' behaviors and calculate the trust value of each behavior. For each trust value of the node behavior, there is a threshold value. When one of the trust values goes below the corresponding threshold value, the node will be identified as a compromised node and isolated from the network [12, 13]. These protocols can identify compromised nodes effectively but also have some shortcomings. In SRDA, a new calculating method of the trust value is designed which can evacuate the overall status of sensor nodes. This method can evacuate the nodes' behaviors more accurately and detect the intelligent compromised nodes effectively. Meanwhile, the monitoring range is limited inside the unit, which means a sensor node can only monitor the nodes' behaviors happening in the same unit. In this way, the energy consumption made by monitoring and information sharing activities can be dramatically decreased. The detail of the proposed protocol will be introduced in the sections below.
The organization of this paper is listed as follows. Section 2 describes the details of the network model and the design requirements of the data aggregation protocol. Section 3 proposes the smart reputation-based data aggregation protocol, SRDA. The performance simulations of SRDA are described in Section 4. Section 5 is the conclusions.
2. Network Model and Requirements
In this section, we will describe the details of the network model and the design requirements of SRDA. The goal of the reputation system is to detect the compromised nodes as soon as possible, while keeping a balance between energy efficiency and security performance. Therefore, the energy conservation measures will also be considered in the design of the protocol.
2.1. Network Model
An appropriate network architecture for the proposed secure data aggregation protocol will be detailed in this section. The network architecture of SRDA is grid-based, which accorded with the geographical adaptive fidelity algorithm (GAF) [14]. In GAF algorithm, the network area is divided into a lot of small virtual rectangle cells with equal size based on the nodes' geographic location and communication range. Every node in the network can communicate with the nodes in the neighboring cells directly, which means the maximum distance in the area of any two neighboring cells must be equal to or shorter than the communication radius of the nodes. If we assume r is the communication radius of the nodes and l is the side length of each cell, then the geometry expression of the relationship of l and r can be illustrated as in Figure 1.

The relationship of l and r.
As we can see from Figure 1, the maximum distance within any two adjacent cells can be seen as the hypotenuse of a right-triangle, whose lengths of the right-angle sides are l and

The network architecture of SRDA.
2.2. Adversary Model
In this paper, we assume that all the compromised sensor nodes are intelligent and acquired the preset threshold values of the reputation system. The compromised sensor nodes are capable of three kinds of basic malicious activities: tampering the raw sensed data, falsifying the data aggregation result, and falsifying the forwarding path of the received data packets. Any kind of these malicious activities can cause a serious damage to the reliability and integrity of aggregated data and then distort the final aggregation result at the base station. Based on these basic malicious activities, we propose a new attack strategy, alternating attack. The details of alternating attack are as follows: first, a compromised node which acquired the threshold values of the reputation system will continually perform only one type of malicious behavior to degrade the security performance of the network. When the trust value of the behavior being performed is going to be lower than the corresponding threshold value, the compromised node will discontinue performing the malicious behavior and switch to another one. In this way, the compromised node only performs one type of the malicious activities at any time, and all of the detection indexes are within normal limits. According to the design principles of the existing secure data aggregation protocol [12, 13], these compromised nodes can effectively conceal themselves and prolong the duration of malicious activities. The simulation results will prove this in Section 4. All the compromised sensor nodes in this model will try to disrupt the network with alternating attack.
Thus, the basic idea behind SRDA is to evaluate the trustworthiness of sensor nodes by using three types of functional reputation, namely, sensing, forwarding, and aggregation, and combine these functional reputation indexes with an improved detection algorithm which can effectively identify alternating attack while keeping a balance between security and energy efficiency.
2.3. The Mathematical Model of SRDA
In this paper, we consider a target node behavior in the network as a stochastic binary event with two possible outcomes, “correct” and “false.” And take a and b as the monitored number of “correct” target behaviors and “false” target behaviors of a sensor node. Meanwhile, we employ a Bayesian formulation to model the reputation system of SRDA, namely, beta reputation system. As known, the a posteriori probabilities of a binary event can be expressed as beta distribution,
Then, the expectation value of probability can be expressed as below:
With the above functions, we can calculate the probability that the outcome of a target behavior is “correct” in the future with two parameters: α and β. Therefore, the monitored number of “correct” target behaviors and “false” target behaviors of a sensor node, a and b, can be applied into beta function as below:
As we can see in formula (4), when
2.4. The Design Requirements of the Protocol
2.4.1. Data Integrity
The protection of data integrity is a vital issue in the design of secure data aggregation protocol, which aims to avoid the transmitted data packets from being malicious or being accidentally altered in the process of data aggregation. In this paper, we focus on setting up a smart reputation system to avoid the malicious altered behaviors.
2.4.2. Data Accuracy
Data accuracy is determined by the success rate of data transfers in the network, and then data accuracy affects the final aggregation result directly. Nodes' misbehaviors and data collisions are two major factors that influence the success rate of data transfers [1]. Compromised nodes can distort data accuracy by falsifying sensed data or selectively forwarding received data packets during the process of data aggregation. Therefore, in the proposed protocol, we set up a reputation system, which can effectively detect the compromised nodes while requiring a lower amount of data traffic to reduce the probability of data collisions.
2.4.3. Efficiency
Energy consumption is always a major concern in the research of WSN, and the technique of data aggregation emerged and rapidly developed for solving this problem. But the deployment of security mechanisms for data aggregation would inevitably cause some additional communication overhead. Therefore, we need to maintain a balance between network security and efficiency in the design of secure data aggregation protocol. In this paper, the proposed secure data aggregation protocol SRDA can maintain a low level of energy consumption while providing a good security performance.
3. The Proposed Secure Data Aggregation Protocol SRDA
SRDA includes a reputation-based trust management system whose main capabilities are monitoring, analysis, and isolation. The monitoring mechanism is aimed at three kinds of node activities: sensing, forwarding, and aggregation. These node activities will be recorded and classified by the neighboring nodes within the same cell. Based on the monitoring data, each node in the network will generate an activity table which listed the number of correct and false node activities happening within its own cell. In the phase of data sharing, the initial activity tables will be exchanged among the sensor nodes in the same cell. After exchanging, the nodes will compare and combine the received initial activity tables with their own and then generate a new activity table to replace the initial activity table. In the phase of analysis, each node applies the monitoring data listed in the activity table into the calculation formula of reputation value and generates a functional reputation table based on the results of computation. The nodes whose reputation values are lower than the preset threshold will be judged to be compromised sensor nodes and isolated from the network. Additionally, to deal with alternating attack, we proposed the notions of “trend index (TI)” and “volatility index (VI)” to record the changing trend and volatility of functional reputation value. By analyzing these indexes, we can effectively identify the alternating attacks. The following are detection mechanisms we used in the phase of monitoring.
3.1. The Detection Mechanism of Malicious Forwarding Behavior
As known, compromised sensor nodes can easily degrade the performance of network by malicious forwarding behaviors, like tampering with the forward routing path of received data packets or selectively forwarding the received data packets. In this paper, to detect the malicious forwarding behaviors, we employed an existing detection mechanism, Watch Dog [15]. Watch Dog mechanism requires nodes to keep a buffer of the sent data packets to verify the forwarding behaviors of their neighboring nodes. Take a simple example; node

The work mechanism of Watch Dog.
3.2. The Detection Mechanism of Malicious Sensing Behavior
To distort the reliability of the aggregation result, compromised nodes forge or tamper with the sensed raw data and send these false data to the ancestor nodes [5, 12]. In a dense network, the neighboring sensor nodes always have overlapping sensing areas, which mean the information of an overlapping area can be sensed by multiple neighboring nodes. To take advantage of this feature, we applied the technique of density based local outlier detection with
Formulas (5) and (6) are the original density based local outlier factor formula and the reformed formula for the application in WSN. In formula (5),

The detection mechanism of malicious sensing behavior.
3.3. The Detection Mechanism of Malicious Aggregation
During the process of data aggregation, the compromised sensor nodes can falsify or tamper with the aggregated data and upload the fake aggregated data to the ancestor nodes to disrupt the final aggregation result. In SRDA, a neighboring sensor node of the aggregator node sends its sensed data to the aggregator node and keeps monitoring the aggregator node. After that, the node aggregates its own sensed data with the overheard data packets which were sent to the aggregator node by the other neighboring nodes and compares its own aggregated data with the aggregated data overheard from the aggregator node. If the difference is oversized, then the aggregation behavior of the aggregator node is determined to be malicious. The detection mechanism of malicious aggregation behaviors is very similar to the Watch Dog monitoring mechanism that was introduced previously.
3.4. The Calculation Mechanism of the Reputation System
In this section, we take the reputation value of data sensing as the example to introduce the calculation mechanism of the reputation system in SRDA. We consider
Therefore, we take the reputation value of data sensing
An activity table is generated in each sensor node after the period of aggregation and monitoring, which recorded the node activities happened within the cell. An example of the activity table is shown in Table 1.
Activity table.
In Table 1,
In the phase of data sharing, the initial activity tables will be exchanged among the sensor nodes in the same cell. After this, each node compares and merges the received initial activity tables with its own activity table and then generates a new activity table to replace the initial one. In the phase of analysis, each node applies the monitoring data listed in the activity table into the calculation formula of reputation value and generates a functional reputation table based on the results of computation. Table 2 is an example.
Reputation table.
To detect the compromised sensor nodes which perform alternating attacks, we proposed the notion of
As we can see in the formula,
Volatility index and trend index table.
An important feature of the compromised nodes that perform alternating attack is the increase of their reputation volatility. Therefore, we can determine the normal range of volatility index by observing the normal functioning sensor nodes and then set a threshold for volatility index to identify the compromised nodes. To compare with RSDA protocol, this algorithm is much simpler and faster which can efficiently decrease the computation cost.
3.5. The Work Flow of SRDA
As shown in Figure 5, the work flow of SRDA can be divided into four phases, namely, monitoring phase, data sharing phase, analyzing phase, and isolating phase. After the formation of aggregation tree, the network enters the phase of monitoring. In monitoring phase, as requested by the query, sensor nodes start the collection of target data, including data sensing, data forwarding, and data aggregation. Meanwhile, each node monitors the node activities of its neighboring nodes and stores the monitored information as required by the detection mechanisms. In data sharing phase, the nodes exchange their generated activity tables and the monitored information within the cell. In the phase of analyzing, the nodes compare and combine the received node activity tables while computing the reputation value, volatility index, and trend index of the nodes and generate reputation tables, volatility index, and trend index tables. In the final phase, isolating, the network will isolate the detected malicious sensor nodes from the network and update the routing path without the isolated sensor nodes.

The sequence flow diagram of SRDA.
4. Simulation and Analysis
The proposed data aggregation protocol SRDA is simulated in Tiny OS 2.0 simulator (TOSSIM) and the simulation environment is set as a square of
4.1. The Detection of Alternative Attack
In this section, the average value of reputation and volatility index for sensing behavior are monitored during the simulation to verify the security performance of the proposed protocol under alternative attack. We set 10% of the sensor nodes being compromised and perform alternative attacks against the network. RDAT and SRDA are simulated in the same network environment separately, and the simulation time is set to 5000 seconds. We assume that the threshold value of

The comparison between RDAT and SRDA under alternative attack.

The average value of
As we can see in Figure 6, the average reputation value of the normal nodes grows steadily and eventually stabilized at around 0.89 in both protocols. However, the average reputation value of RDAT grows relatively quick. That is because RDAT protocol combines both first-hand and second-hand information in the computation of reputation value, which lead to the reputation system of RDAT being more sensitive to the quantity changes of node activity. In contrast, the average reputation value of the compromised nodes fluctuated frequently. This can be explained by the strategy of alternative attack. The alternative attack requires a compromised node switch to other kinds of misbehaviors when
4.2. The Analysis of Energy Consumption
In this section, we will compare the energy consumption of the two protocols in the same simulation environment. The simulation time is set to 4000 seconds and 10% of the nodes are compromised which perform alternative attacks against the network. The average ratios of consumed energy of the nodes in RDAT and SRDA were recorded separately.
As shown in Figure 8, the average ratio of consumed energy of RDAT is always higher than SRDA in the simulation. This can be due to several reasons: firstly, the monitoring and information sharing behaviors among the nodes of SRDA are constrained within the cell; then the amount of data traffic in SRDA is lower than RDAT. Moreover, the algorithm of reputation system for RDAT is more complicated compared to SRDA, which generates more computation cost in the phase of analyzing. All these factors result in the difference of the energy consumption between SRDA and RDAT. At a running time of 3500 s, parts of the sensor nodes in RDAT became invalid due to running out of energy, which led to the reduction of the amount of data traffic in the network. And then its average ratio of consumed energy was stabilized around 93.7%. In SRDA protocol, all of the compromised nodes were isolated from the network after 2500 s. However, most of the normal nodes were still functioning as normal and also were assigned more data sensing and sharing tasks that should be assigned to the isolated nodes. At the end of the simulation, the average ratio of SRDA was about 83.2%. The simulation result shows that the energy consumption of SRDA is much lower than RDAT.

The comparison of energy consumption between RDAT and SRDA.
4.3. The Analysis of Aggregating Accuracy
The accuracy of aggregation is an important performance index for data aggregation protocol. In this part, to evaluate the aggregating accuracy of the two protocols, we set the simulation time to 4000 seconds and 10% of the nodes are compromised nodes. Half of the compromised nodes perform the basic malicious behaviors during data aggregation, such as false sensing, selective forwarding, and false aggregation. The other half of the compromised nodes perform alternative attacks against the network. And the threshold for volatility index is set to 2. The simulation results are presented in Figure 8.
As shown in Figure 9, the growth rate of aggregating accuracy of SRDA was always higher than RDAT during the simulation. Theoretically, due to channel sharing character, data loss is inevitable in the transmission of wireless sensor network. The larger the amount of data traffic in the network, the higher the probability of data collisions, which lead to the loss of data [17]. The simulation results in the previous section proved that the amount of data traffic of RDAT is much higher than SRDA. That means, apart from the negative influences of the compromised nodes, the aggregating accuracy of SRDA should be higher than RDAT, which is fully confirmed by the simulation results in this section. Moreover, in SRDA, when the running time of 2500 s was reached, most of the compromised nodes were isolated from the network. Without the negative influences of compromised nodes, the aggregating accuracy of SRDA improved significantly and finally reached about 83.5%. On the other hand, the detection mechanisms of RDAT could not detect the compromised nodes that perform alternative attacks, and that slowed the growth rate of the aggregating accuracy of RDAT, which finally reached about 77.3%. The simulation results in this section presented the idea that the aggregating accuracy of SRDA is higher than RDAT.

The comparison of aggregating accuracy between RDAT and SRDA.
5. Conclusions
In this paper, we analyzed the existing reputation-based data aggregation protocols and proposed a new attack strategy, named alternative attack, which could evade the existing detection mechanisms to distort data aggregation results. To solve this problem, we presented a smart reputation-based secure data aggregation protocol, SRDA. The reputation system of SRDA can identify the behavioral characteristics of the compromised nodes by the use of characteristic indexes, such as volatility index and trend index. The simulation results illustrated that the proposed protocol SRDA can effectively detect the alternative attack with a lower level of energy consumption. Meanwhile, the lower communication overhead leads to the lower probability of data collision, which improved the accuracy of aggregation. Therefore, the proposed protocol SRDA achieved the design requirements that efficiently aggregate the target data while providing data accuracy and integrity. As the future research work, we will focus on identifying other kinds of malicious behaviors by the use of node behavioral characteristics indexes and improve the detection mechanisms for secure data aggregation.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This research is supported by National Natural Science Foundation under Grant 61371071, Beijing Natural Science Foundation under Grant 4132057, Beijing Science and Technology Program under Grant Z121100007612003, and Academic Discipline and Postgraduate Education Project of Beijing Municipal Commission of Education.
