Abstract
Large-scale wireless sensor networks follow the two-tiered architecture, where master nodes take charge of storing data and processing queries. However, if a master node is compromised, the information stored in it may be exposed, and query results can be juggled. This paper presents a novel scheme called SEF for secure range queries. To preserve privacy, SEF employs the order-preserving symmetric encryption which not only supports efficient range queries, but also maintains a strong security standard. To preserve authenticity and integrity of query results, we propose a novel data structure called Authenticity & Integrity tree. Moreover, SEF is flexible since it allows users to include or exclude the authenticity and integrity guarantee. To the best of our knowledge, this paper is the first to use the characteristic of NAND flash to achieve high storage utilization and query processing efficiency. The efficiency of the proposed scheme is demonstrated by experiments on real sensor platforms.
1. Introduction
The traditional architecture of wireless sensor networks (WSNs) always assumes that a trusted base station (or sink) is present and responsible for collecting data from the sensor nodes and processing query requests from the users. However, many WSNs are deployed in hostile and harsh environments such as battlefields, forests, and oceans where it is impossible or difficult to establish a stable communication link from sensor nodes to the base station. Summarizing the above, we can see that such an architecture only makes sense for experimental and small networks. As illustrated in Figure 1, large-scale WSNs follow a two-tiered architecture, where a large number of regular resource-poor sensor nodes are at the lower tier and fewer resource-rich master nodes are at the upper tier. The two-tiered architecture has been widely adopted [1, 2] because of the benefits of energy and storage saving as well as the efficiency of query processing. Compared with sensor nodes, master nodes have more powerful computing power and more energy supply, additionally they are equipped with several gigabytes of NAND flash for tens of dollars. The sensor nodes are organized as cells and each cell includes a master node referred to as the cell header. The sensor nodes are responsible for sensing data and submitting data to the master nodes, while the master nodes take charge of storing data and performing resource-demanding tasks such as query processing. In this paper, we focus on range queries, namely, once the base station launches a query request to a master node to query the data items in the range [low, high], the master node must return all data items in the query range. Note that if low is equal to high, the query is actually an equality query that requires the master node to return all data items that are equal to low (or high). Our scheme also supports equality queries. For simplicity, we refer to the sensor node, master node, and base station as SN, MN, and BS, respectively, in the rest of this paper.

Architecture of a two-tiered WSN.
The inclusion of MNs also incurs significant security challenges. Since both data storage and query processing rely on MNs, MNs are prone to be the target of attacks, especially in some unattended and hostile environments such as military scenarios. And the adversary is more attracted to compromise MNs to greatly damage WSNs. If MNs are compromised, the adversary can launch at least three types of attacks. First, the adversary can read all data stored in the compromised MNs and breach data privacy. A sound defense to this attack should ensure data privacy and still enable efficient query processing. Second, the compromised MNs may send tampered or forged data in response to queries, which breaches the authenticity of query results. Thirdly, the returned query results may not be integral, which means that some qualified data items are omitted by the compromised MNs. Therefore, a solution is required urgently, which can offer efficient and effective range queries. At the same time it can also preserve data privacy as well as the authenticity and integrity of query results at the presence of compromised MNs.
In some cases, the authenticity and integrity guarantee might not be always necessary, for example, out of consideration for reducing communication and verification cost. Therefore, a range query scheme should be flexible, namely, the users can choose to include or exclude the authenticity and integrity guarantee according to different applications.
The main contributions of this paper include the following: (1) To the best of our knowledge, this paper, for the first time in the literature, considers the characteristic of NAND flash when designing query schemes in WSNs. Our work explores the storage media concerns in general WSNs and provides a novel approach for data storage and query processing. (2) We propose a secure, efficient, and flexible range query scheme called SEF for two-tiered WSNs, in which only the order information of the encrypted data is exposed to MNs. (3) Since some applications may require authenticity and integrity guarantee of query results, and others may not, by keeping encrypted data separated from verification information, our scheme provides much flexibility. (4) We evaluate our scheme in a test bed; the results show that compared with existing schemes, our scheme performs better on both energy consumption and storage consumption.
The rest of this paper is organized as follows. Section 2 gives a brief review of the related work. Section 3 presents the network and adversary models. Section 4 introduces some techniques and notations used in this paper. Section 5 describes our scheme in detail. Section 6 gives a detailed security analysis of the proposed scheme. Section 7 reports the performance evaluation results. Section 8 concludes the paper.
2. Related Work
Data security has gained some attention in outsourced database community [3–5]. In an outsourced database system, a data owner publishes his/her data through a remote server which may be untrusted or compromised. The remote server answers user queries on behalf of the data owner. However, due to the unique and challenging characteristics and security requirements of WSNs, such as limited computing power and communication capability, these approaches are not suitable for WSN applications.
Despite significant progress in WSN security such as [6, 7], secure range queries have started receiving people's attention very recently [8–12]. In [8] Sheng and Li first considered the privacy issue in the design of range query schemes in WSNs. We call it “SL” scheme in the rest of this paper. They apply the bucketing technique introduced in [13] to fulfill secure range queries. The basic idea of bucketing technique is to partition the domain of sensed data values into buckets and each bucket is assigned with a bucket tag. The bucket tags are exposed to SNs, MNs, and the BS, but how to partition the domain of sensed data values into buckets is only known to SNs and the BS. A data item sensed by a SN is placed into a bucket according to the partition. Then all data items falling into the same bucket are encrypted as a whole, at last the SN sends the buckets as well as the bucket tags to its MN. In order to make the BS verify whether the returned query result is integral, [8] introduces the encoding technique. The bucket into which no data item falls is attached with an encoding number.
In [9, 10], Shi et al. proposed an optimized integrity verification version of SL scheme, to reduce the communication cost between the SN and its MN caused by encoding technique. The basic idea is that each SN uses a bit map to represent which buckets have data and broadcasts its bit map to nearby SNs. Each SN attaches the bit maps received from other SNs to its own data items, then encrypts them together. However, the optimization technique causes a serious problem that is a compromised SN can easily breach the integrity verification of the network by sending false bit maps. On the contrary, SL scheme and our scheme do not have the problem. As the techniques used in [9, 10] are similar to [8] except the optimization for integrity verification, they inherit the same weakness with SL scheme.
There are many common drawbacks in the schemes of [8–10]. (1) They require in advance the accurate distribution of sensed data items,
In order to address the drawbacks of these schemes, Chen and Liu proposed a novel privacy and integrity preserving rang query protocol called SafeQ-Basic in [11, 12]. To preserve privacy, their protocol uses the prefix membership verification scheme first introduced in [15]. The basic idea of the prefix membership verification scheme is to convert the verification of whether a number is in a range to several verifications of whether two numbers are equal. Interested readers can refer to [15] for more detailed information. To preserve integrity, they propose a data structure called neighborhood chaining. The main idea of neighborhood chaining is to sort the sensed data items in ascending order first, and each data item
SafeQ-Basic causes large computation, communication, and storage overhead, because the prefix membership verification scheme requires many HMAC operations and produces a lot of HMACs. To reduce the communication and storage overhead, [11, 12] also proposed an optimized version of SafeQ-Basic, called SafeQ-Bloom. SafeQ-Bloom uses a Bloom filter to represent HAMCs. Thus, an SN only needs to send the Bloom filter instead of HMACs. The number of bits needed to represent the Bloom filter is much smaller than that needed to represent HMACs. However, to produce the Bloom filter further leads to more computation overhead. These features make SafeQ-Basic and SafeQ-Bloom not efficient and not suitable for WSNs. We find that though SafeQ-Basic and SafeQ-Bloom apply the ordinary symmetric encryption algorithm to encrypt sensed data items, such as DES in their experiments, the order of encrypted data items is still exposed to MNs because all encrypted data items are sorted in ascending order. This security standard is the same as that of SEF. Moreover, neither SafeQ-Basic nor SafeQ-Bloom considers the storage media issue.
3. Network and Adversary Models
3.1. Network Model
We consider a large-scale two-tiered WSN as illustrated in Figure 1. The WSN consists of SNs and MNs; the network is partitioned into cells, according to the geographic location, each SN belongs to one cell, and each cell contains an MN which is in charge of other SNs in the cell. In some scenarios, two adjacent cells maybe overlap; in this case the SNs in the overlapping regions are affiliated with both MNs. The SNs are limited in storage, energy supply, and computing power; in contrast the MNs are resource-rich devices.
Time is assumed to be divided into time slots and all SNs and MNs are loosely synchronized with the BS. We assume that every SN senses the environment data in a fixed rate and periodically submits sensed data to its MN.
Contrary to conventional WSNs, our WSN has no stable always-on communication link to the external BS. MNs store all data collected from SNs which they are in charge of. The BS can query MNs for data on demand. We assume that MNs have enough storage to store data. Nowadays, most embedded devices use NAND flash as the storage media. In this paper, we follow the assumption that the storage media of MNs is the most conventional NAND flash [16, 17].
In this paper, for sake of simplicity, we assume that any query request can be converted into multiple range query primitives having the format:
3.2. Adversary Model
Due to the broadcast nature of wireless communication environments, the adversary may launch many attacks such as jamming, passive eavesdropping, and bogus-message injection. There exist rich elegant defenses to these attacks such as in [18–23]. We resort to the existing rich literature for these attacks. This paper particularly focuses on SNs compromise and MNs compromise attacks.
SNs compromise: if an SN is compromised, the data values stored in the SN will be exposed to the adversary and the compromised SN may send forged data to its MN. Unfortunately, it is very difficult to prevent such attack without the use of tamper proof hardware. We follow the same SNs compromise assumption with [11, 12]. If an SN is compromised, the adversary can just access the data stored in the compromised SN. It must be guaranteed that the adversary cannot make any damage to other SNs and MNs by the compromised SN. MNs compromise: once an MN is compromised, the adversary can read all data stored in the compromised MN and try to obtain the data values, which violates data privacy. Second, the compromised MN can return tampered, forged or not integral query results in response to queries. As compromising an MN can cause greater damage to WSNs than compromising an SN, the adversary is more attracted to compromise MNs. We propose a novel data structure called Authenticity&Integrity tree (AI tree for brevity) to guarantee the authenticity and integrity of query results from MNs.
4. Preliminaries and Notations
In this section we review some cryptographic essentials used by our system.
Order-Preserving Symmetric Encryption
The Order-Preserving Symmetric Encryption (OPSE) is a deterministic encryption scheme whose encryption function preserves the numerical order of the plaintexts [24]. More specifically OPSE has the property that given two numbers x and y,
HMAC
In cryptography, HMAC (Hash-based Message Authentication Code) is a specific construction for calculating a message authentication code (MAC) involving a collision-resistant hash function in combination with a secret key. As with any MAC, it may be used to simultaneously verify both authenticity and integrity of a message. Any cryptographic hash function, such as MD5 or SHA-1, may be used in the calculation of an HMAC. The cryptographic strength of the HMAC depends upon the cryptographic strength of the underlying hash function.
Merkle Hash Tree
Merkle hash tree (MHT) first invented in 1989 by Merkle [26] is a tree of digests in which the leaf is the digest of a data block. Nodes further up in the tree are the digests of their respective children. For example, as shown in Figure 2,
Table 1 lists the primary notations used in this paper.
Primary notations.

A Merkle hash tree example.
5. The Proposed Scheme
In this section, our scheme, SEF, is described in detail. Without loss of generality, we focus on one cell consisting of L SNs referred to as
5.1. System Initialization
A simple approach for data privacy is to require each SN to encrypt each data item using ordinary symmetric encryption algorithm (such as DES and AES) before sending it to MN. Although this approach leaks the least information to MN and provides strong privacy, it is not suitable for efficient range queries because MN has no knowledge to locate the encrypted data items which exactly match the range. MN has to send the entire encrypted data collected in the specified time slot to the BS. Obviously, this approach is very inefficient and wastes much energy. To address this issue, we employ OPSE in our scheme, which only exposes the order information of the encrypted data to MN. OPSE not only allows efficient range queries, but also maintains a strong security standard.
The BS loads an OPSE function and a hash function to each
5.2. Sensor Node Behavior
Sensor node behavior is divided into three phases: sensing phase, AItree-construction phase, and communication phase. In sensing phase,
In sensing phase,
Since the I/O interface of NAND flash does not provide a random-access external address bus, data must be read on a pagewise basis, our scheme makes full use of this characteristic to achieve high storage utilization and query processing efficiency. In AItree-construction phase, on top of

An AI tree example.
In order to record the structure of the AI tree, two labels are attached to each digest, called descendant label and level label. We use the notation
For visualization, here we use an example to describe the process. Figure 3 illustrates an AI tree in a scenario where there are
In order to minimize the energy consumption,

Average energy consumption of communication and hash.
5.3. Master Node Behavior
After receiving all encrypted data items and HMAC, MN rebuilds the AI tree, then stores all encrypted data items, HMAC and partial digests of the AI tree in NAND flash. Considering the characteristic of NAND flash, that is, NAND flash is divided into pages, in Section 5.5, we will discuss which digests of the AI tree are stored, and how to store these digests.
If the BS wants to query MN for the data items in range [low, high] sensed by
For example, suppose the BS wants to query the data sensed by
In a real-world scenario not all applications require the authenticity and integrity guarantee. For example, some resource-critical applications may favor an energy-saving response over a verified one. In these situations, MN only returns the query result S, achieving the performance of a nonauthenticated range query scheme. But in some cases the authenticity and integrity guarantee is required, MN returns
5.4. Network Owner Behavior
After receiving
The BS uses a digest array to store the temporal digests computed on the fly. For simplicity of expression, we call the digest inserted into the digest array first the most left digest denoted by There are two digests to which the level label η is attached, denoted by There is only one digest to which the level label η is attached, denoted by There is no digest to which the level label η is attached, the BS takes no action.
Then, the BS processes the unprocessed digests (including
Recalling the example in Figure 3, the BS wants to query the data sensed by
5.5. AI Tree Compression
To reduce the storage overhead and the cost of reading data from NAND flash, MN does not materialize the entire AI tree. That is some internal nodes in AI tree will not be stored in MN, but recomputed on the fly when necessary. Since the I/O interface of NAND flash does not provide a random-access external address bus, data must be read on a pagewise basis, our scheme makes full use of this characteristic to achieve the optimal storage utilization and query processing efficiency.
Taking into account the characteristic of NAND flash, with respect to the lower trees, MN only stores the leaves of the lower trees (i.e., the encrypted data items). For example, in Figure 3, when the BS wants to query
With regard to the upper tree, a basic approach is that only the leaves of the upper tree are stored. The leaves of the upper tree are read from NAND flash immediately, and the internal nodes of the upper tree can be computed on the fly using the leaves of the upper tree. For instance, in Figure 3, assume that a page of NAND flash can accommodate more than 3 digests, we use a single page to store
To address the above issue, we apply the optimized approach of [4]. Motivated by the characteristic of NAND flash, that is, reading data from NAND flash is on a pagewise basis, MN stores all digests of the upper tree required to construct

An upper tree compression example.
As reading data from NAND flash is in pages, when one page is loaded, all digests of the upper tree required to construct
But there exists an extreme case in which N is so huge that m is equal to 0. We define the page utilization ratio as
6. Security Analysis
6.1. Privacy Analysis
In a two-tiered WSN protected by our scheme, even if an arbitrary number of MNs is compromised, the adversary cannot obtain the actual values of data items collected by SNs and the actual queries issued by the BS. The reasons are given as follows. Both in the submission and in the query, the data items and the query range sent to the MNs are all encrypted using order-preserving encryption algorithm. Without knowing the keys used in the encryption, it is computationally infeasible for the adversary to get the actual values of data items and query range.
If a SN is compromised, the collected data and key are exposed to adversaries, and the compromised SN may send forged data to its MN; it is hard to prevent such an attack without the use of tamper proof hardware. In our scheme, the key is updated every time slot and the old keys are erased, even a SN and its MN are compromised, adversaries cannot obtain the data values before the compromise. Additionally, as each SN shares a distinct key with the BS, even an SN is compromised, other SNs will not be affected.
6.2. Integrity Analysis
Let If there exists Since If
7. Implementation and Performance Evaluation
In this section, we evaluate the performance of our scheme and perform side-by-side comparison with SL scheme SafeQ-Basic, and SafeQ-Bloom in a test-bed. We analyze the performance comparison in five aspects: the energy consumption of processing sensed data for an SN; the energy consumption of submission from an SN to MN; the storage consumption for MN; the energy consumption of query processing for MN; the energy consumption of sending response to a query to the BS for MN. As the BS is a powerful device, the cost for the BS is neglectable. In our experiments, we do not choose the schemes of [9, 10] for side-by-side comparison for two reasons. First the techniques used in [9, 10] are similar to [8] except the optimization for integrity verification. Second the optimization incurs serious security problems, a compromised SN can easily breach the integrity verification of the network by sending false bit maps.
7.1. Experimental Test Bed and Setup
The test-bed consists of eleven TelosB motes as SNs and one TelosB mote as MN. Note that the number of SNs in a cell is application dependent. If the number of SNs is large, the storage or the energy of MN is exhausted quickly and reduces the life cycle of the WSN. While, if the number of SNs is small, the efficiency of the WSN is very low. SNs should be carefully deployed to achieve the best equilibrium. The TelosB mote is equipped with an 8 MHz CPU, 10 KB RAM, 48 KB ROM, 1024 KB flash, and an 802.15.4/ZigBee radio [28]. The flash of the TelosB mote is divided into pages of 256 bytes. These TelsoB motes run on the operating system Tinyos-2.1.0 [29].
We use SHA-1 [30] as the hash function. Note that it has been pointed out that there are collision attacks on SHA-1 [31]. However, since it requires about 269 hash operations, it is difficult to launch the attacks in practice [32]. The HMAC operation also uses SHA-1 as the underlying hash function, both the size of a digest and an HMAC is 20 bytes. The size of an encrypted data item is set to 4 bytes. An 8-bits array is used to represent the descendant label and level label, which can record as much as 2128 encrypted data items. We use the DES encryption algorithm in implementing SL scheme, SafeQ-Basic and SafeQ-Bloom as in [8, 11, 12]. We employ the OPSE function proposed in [24] as the underlying OPSE function.
To present which portion of the data are queried, we introduce the query selectivity. Assume that the number of all possible sensed data items is Z, the accuracy of sensed data is a. If the BS wants to query the data items in range [low, high],
As in [8], we use their optimal bucket partition algorithm for computing the optimal bucket partition in implementing SL scheme. Note that in some situations, it is hard to get the accurate distribution of sensed data in advance, the BS is unable to obtain the optimal bucket partition. This factor will greatly influence the performance of SL scheme. We use SEF-basic and SEF-opt to denote our schemes with basic and optimized AI tree compression, respectively.
7.2. Result Summary
(1) Figure 6 depicts the energy consumption of processing sensed data for an SN. Our experiments show that, compared with SL scheme, SEF-basic/SEF-opt consume 3 times more energy since it has to perform some operations (such as hash operation) to construct the AI tree. But they have great advantage compared with SafeQ-Basic and SafeQ-Bloom. Compared with SEF-basic/SEF-opt, SafeQ-Basic consumes 1.67 times more energy and SafeQ-Bloom consumes 4.3 times more energy, because SafeQ-Basic and SafeQ-Bloom adopt the prefix membership verification scheme which requires a large number of hash operations. As SafeQ-Bloom has to perform a lot of additional hash operations to obtain the Bloom filter, it consumes the most energy.

Average energy consumption of processing data for a SN.
(2) Figure 7 shows the energy consumption of submitting data from an SN to MN. Note that the submission involves two parties: the SN and MN, so the energy consumption is the total energy consumption of the SN and MN. Our experiments show that SL scheme performs almost as well as SEF-basic/SEF-opt. In comparison with SEF-basic/SEF-opt, SafeQ-Basic consumes

Average energy consumption of submission.
(3) Figure 8 illustrates the storage consumption for MN. As we can see that the storage consumption of SEF-basic and SEF-opt is very close to that of SL scheme. Compared with SL scheme, SEF-opt incurs only 11% storage overhead on average. And compared with SEF-opt, SafeQ-Basic requires up to

Average storage consumption for MN.
(4) We carry out two kinds of experiments to measure the energy consumption of query processing for MN. One is to vary the time slot and generate range queries randomly. The other is to fix the time slot (we set the time slot to 80 minutes) and vary selectivity. Given a selectivity, low is generated randomly.
In the first case, as shown in Figure 9, SEF-opt consumes the least energy, it saves 21% energy compared with SEF-basic and 6.7% energy compared with SL scheme. SafeQ-Basic/SafeQ-Bloom is always consuming the most energy. Compared with SEF-opt, they consume as much as 2 times more energy. When time slot is 80 minutes, SL scheme consumes 2.37 mJ energy, SafeQ-Basic/SafeQ-Bloom consumes 4.47 mJ energy, SEF-basic consumes 2.69 mJ energy, while SEF-opt consumes only 2.23 mJ energy.

Average energy consumption of query processing for MN.
In the second case, as shown in Figure 10, SEF-opt is overall the most efficient scheme for any ratio. With the increase of selectivity, the energy consumption of SEF-basic is close to that of SEF-opt more and more. When the selectivity is 0.1, SEF-opt consumes the least energy, 0.7 mJ, SEF-basic consumes 1.1 mJ energy, SafeQ-Basic/SafeQ-Bloom consumes 1.18 mJ energy, SL scheme consumes 0.8 mJ energy. It is apparent that SafeQ-Basic/SafeQ-Bloom consumes the most energy except selectivity is less than 0.07. Compared with SEF-opt, SafeQ-Basic/SafeQ-Bloom consumes nearly 2 times more energy.

Average energy consumption of query processing for MN.
(5) Figure 11 presents the energy consumption of sending response to a query to the BS for MN. The response includes the query result and verification information. Since the false positive problem incurred by bucketing technique, the performance of SL scheme is a little worse than that of SEF-basic/SEF-opt. Both our schemes consume about 2 times less energy than SafeQ-Basic/SafeQ-Bloom.

Average energy consumption of sending response for MN.
8. Conclusions
In this paper, we have proposed a secure, efficient, and flexible scheme, called SEF, for range queries in two-tiered WSNs. Our scheme employs the order-preserving symmetric encryption algorithm to preserve data privacy. We have proposed a novel data structure called AI tree to guarantee the authenticity and integrity of query results. We first consider the storage media issue in the design of range query schemes and present how to use the characteristic of NAND flash to achieve high storage utilization and query processing efficiency. The experiments have shown that SEF significantly achieves efficiency.
Footnotes
Acknowledgments
This work is supported by the National Science Foundation of China under Grants no. 61070155 and no. 60903153, Program for New Century Excellent Talents in University (NCET-09-0685), the Fundamental Research Funds for the Central Universities (DUT10ZD110), and the SRF for ROCS, SEM.
