Abstract
The existing privacy-preserving aggregation query processing methods in sensor networks rely on pre-established network topology and require all nodes in the network to participate in query processing. Maintaining the topology results in a large amount of energy overhead, and in many cases, the user is interested only in the aggregated query results of some areas in the network, and thus, the participation of the entire network node is not necessary. Aiming to solve this problem, this article proposes a spatial range aggregation query algorithm for a dynamic sensor network with privacy protection (energy-efficient privacy-preserving data aggregation). The algorithm does not rely on the pre-established topology but considers only the query area that the user is interested in, abandoning all nodes to participate in distributing the query messages while gathering the sensory data in the query range. To protect node data privacy, Shamir’s secret sharing technology is used to prevent internal attackers from stealing the sensitive data of the surrounding nodes. The analysis and experimental results show that the proposed algorithm outperforms the existing algorithms in terms of energy and privacy protection.
Introduction
A wireless sensor network (WSN) is a multihop self-organizing network composed of multiple sensor nodes. An important feature of a WSN is data-centricity, and the nodes of the network cooperate with each other to store and process the sensing data of the monitoring area. Sensor networks are widely used in military defense,1,2 intelligent transportation, 3 medicine and health, 4 and environmental monitoring.5,6
In a WSN, users often submit a spatial range aggregation query to obtain statistical information of an area in the network, such as the average temperature and the maximum humidity of an area in a forest.
Sensor nodes are battery powered, having limited energy, and in many cases, their replacements of battery are difficult. Therefore, how to reduce the energy cost of the sensor network has always been a difficult problem; in the application scenario with information confidentiality requirements, how to improve the security of the sensor network information, prevent the sensor nodes from being captured and reduce the leakage of sensitive private data are additional problems. Therefore, it is necessary to study the privacy-preserving and energy-efficient spatial range aggregation query processing technique to solve these two problems.
The existing spatial range aggregation query algorithms Tiny AGgregation (TAG), 7 R tree, 8 and STWinFlood 9 rely on preconfigured network topologies such as the tree rooted at the base station. But they do not take privacy protection into account. In addition, due to factors such as the environment and node failure, the network topology frequently changes. Maintaining a pre-established topology consumes considerable energy. In addition, the existing aggregation query privacy protection algorithm, Slice-Mix-AggRegaTe (SMART), 10 and its variants privacy preserving, energy-efficient and scalable continuous data aggregation (PECDA) 11 and adaptive slice-based secure data aggregation (ASSDA), 12 hide original and sensing data by slicing the data. Although these algorithms have certain privacy protection capabilities, they all assume that the query area is the entire monitoring area, and all nodes in the network are required to participate in query processing. Obviously, these methods require considerable unnecessary energy overhead, and it is not necessary for all nodes to participate in query processing.
The existing privacy-preserving aggregation query processing methods rely on pre-established network topology and require all nodes in the network to participate in query processing. Maintaining the topology results in a large amount of energy overhead, and in many cases, the user is only interested in the aggregated query results of some areas in the network, and thus, the participation of the entire network node is not necessary. The main idea of the algorithm is as follows. The base station first sends a query message to a node in the query area, starting from the node and distributing the query message while gathering the sensing data in the query area. The algorithm uses the point-by-point overlay strategy to aggregate the data in the query area according to the established route and finally returns the final query result to the base station. The algorithm does not rely on the preconfigured topology, and the aggregation node sends the aggregation result to its next node instead of the original data, which ensures that the node’s sensing data will not be leaked.
The main contributions of this article are as follows:
We propose a route-based dynamic data collection protocol, which does not depend on the pre-established topology.
The privacy protection strategy proposed in this article does not require nodes to perform data slicing, confusion, and encryption, thus saving energy consumption.
The method proposed in this article only needs the nodes in the in query area to participate in query processing and does not require the nodes in the whole network, thus saving energy consumption.
The method proposed in this article provides efficient data aggregation while preserving data privacy.
The organization of this article is as follows. Section “Existing research works and their shortcomings” summarizes the related work, and section “Preliminaries” introduces the preliminaries. Section “Energy-efficient and privacy-preserving spatial range aggregation query processing in WSNs—E2PDA” proposes an energy-efficient and privacy-preserving spatial range aggregation query algorithm. Section “Analysis of experimental results” presents an analysis of the algorithm and its results, and section “Summary” summarizes the paper and presents future research directions.
Existing research works and their shortcomings
With the development of Internet of Things, IoT devices play an important role in smart cities. IoT devices in e-payment services are crucial for modernization.13–15 Meanwhile, a large amount of data generated by these devices face the threats of privacy leakage. 16 There are many works considering the protecting of sensing data. Due to the distrust between server and user, 17 blind signature is crucial to protect user privacy. Blind signature can protect user’s anonymous instead of encrypting all the data and searching on the ciphertexts.18–20 In addition, researchers proposed some methods to protect security in previous works,21–24 which can provide us with new methods to make blind signature scheme in practice. Yun Wang et al. proposed a privacy-preserving framework for signature-based intrusion detection in a distributed network based on fog devices. 25 To prevent breaches of privacy, Yang et al. 26 proposed a novel blockchain-based privacy-preserving crowdsensing system. They use the anonymous nature of blockchains to protect the real identity of workers.
The objective of Data aggregation processes is to reduce the packet overhead and improve energy efficiency. Vodel and Hardt 27 discuss usual ways for data aggregation, including the adapted communication process. Guo et al. 28 use discrete particle swarm optimization (DPSO) to save node energy expenses and build an optimal routing tree assembly considering communication and aggregation cost.
Attacks against WSNs could be broadly considered from two different levels of views. One is the attack against the security mechanisms and another is against the basic mechanisms. Denial of Service (DoS29,30) is produced by the unintentional failure of nodes or malicious action. Attacks proposed in Pfleeger and Pfleeger 31 provide wrong information to the base stations or sinks.
The existing privacy-preserving aggregation query algorithms rely on pre-established topologies, which can be divided into two categories according to their topology type: tree-based algorithms and cluster-based algorithms.
Tree-based algorithms
He et al. 10 proposed data aggregation privacy protection technology, SMART, based on data fragmentation. Each node divides its perceptual data into several fragments to hide its original sensing data and sends the data fragments to different intermediate nodes. After distributing the data fragmentations, the final aggregation result is finally derived at the base station. Considering that the data sent by the non-leaf node to its parent node is the result of the aggregation of the sub-tree where it is rooted rather than the original data, there is no need to protect the privacy of the data sent by the non-leaf node. Wang et al. 11 proposed a method for fragmenting the sensing data of the leaf nodes based on SMART called PECDA. The data fragments of the leaf nodes are sent to the neighbor nodes through the secure channel to protect the privacy of the leaf nodes. Ozdemir et al. 32 proposed a data aggregation protocol, PRDA (polynomial regression-based privacy-preserving data aggregation), based on polynomial regression. Sensor nodes represented their data as polynomial functions to reduce data transmission, and the nodes sent coefficients of polynomial functions to intermediate aggregation nodes to avoid directly sending the sensing data of the nodes, thus protecting the privacy of the node data. Akila and Sheela 33 proposed the PDKP (preserving data and key privacy in data aggregation) protocol, which takes advantage of homomorphic encryption. The node’s data in the routing tree are encrypted for transmission, and the intermediate node gathers its children’s data and its own encrypted data until the final result is gathered at the sink node. Only the sink node can decrypt the encrypted data of each node. Hua et al. 12 proposed the ASSDA protocol, which dynamically determines the number of slices and the size of each piece of data, improving the performance of the SMART protocol.
Cluster-based algorithms
The CPDA (cluster-based private data aggregation) scheme in He et al. 10 used the algebraic properties of the polynomial to calculate the required aggregate value. Ozdemir and Xiao 34 proposed the IPHCDA (integrity protecting hierarchical concealed data aggregation) protocol, which divides the network into multiple regions. The protocol allows hierarchical aggregation of encrypted sensor data while providing confidentiality and integrity. The data returned to the base station can be classified according to the encrypted key. The protocol uses homomorphic encryption to protect the sensing data privacy and authenticates the aggregated data by verifying the MAC to ensure the privacy of the data. LPDA (link-based privacy-preserving data aggregation) protocol was proposed in Zhang et al. 35 The sensing data of the nodes in the cluster subtract the base value that given by cluster head, then add a random number generated by the node, and then the data aggregation operation is performed. Elhoseny et al. 36 used the elliptical curve encryption algorithm to generate binary strings for each sensor, combined with a node ID, distance to the cluster head, and the index of the transmission to form a unique encryption key. Using XOR and other operations, encryption and decryption could be effectively implemented. Fang et al. 37 proposed a privacy-enhancing and energy-efficient data aggregation scheme, CSDA (cluster-based private data aggregation), which adopts privacy shard assembly technology, and the number of shards varies with the dynamic change of the network size, thus reducing communication and energy costs.
Many researchers have conducted very effective work and have laid a good foundation for subsequent research; however, their algorithms mainly rely on a pre-established tree or cluster topology, and the network topology frequently changes due to the influence of the environment, node failure, node movement, and other factors. Maintaining the topology incurs considerable energy overhead and reduces the lifetime of the network.
Preliminaries
To facilitate the discussion, we first present some basic concepts and preliminaries, including the attack model and the privacy protection protocol. In addition, this article mainly discusses the aggregation query and typical data aggregation functions including SUM, AVG, COUNT, and MAX/MIN. Without loss of generality, takes SUM as an example for discussion. 38
Attack model
Common attack modes in sensor networks include external attacks and internal attacks. External attacks refer to the private data of sensors obtained by attackers through eavesdropping. Because sensor networks use wireless communication, attackers acquire private data through eavesdropping on the data link layer when transmitting information between nodes. The eavesdropping attack is a common external attack mode in the sensor network. This article assumes that attackers can eavesdropping the entire network. An internal attack refers to an attacker acquiring information such as the sensing data and the key stored by the node through the capture node, which not only destroys the privacy of the captured node but also can infer the data flowing through the captured node by obtaining the secret key or colluding with multiple compromised nodes.
SMART protocol
He et al. 10 proposed the privacy protection protocol SMART based on data slicing. SMART adopted the random secret key allocation strategy proposed in Eschenauer and Gligor, 39 which is mainly composed of three stages:
Secret key pre-allocation: There are
Discovery of shared secret keys: Each node finds a node in its neighbor that has the same secret key and establishes a secure connection if both nodes have the same secret key.
Establish the path secret key: Neighbor nodes without the same secret key can establish a secure connection through multi-hop routing.
Figure 1 shows the execution procedure of the SMART algorithm, the whole process of SMART is divided into three steps: data slicing, data mixing, and data aggregation. As shown in Figure 1, each node slices its own data into three pieces, keep a piece for itself, and then transmits the rest two pieces to its neighbor nodes. After all pieces are received, each node mixes its own piece and the received pieces to obtain a new result:
Data slicing: Each node randomly selects some nodes within its own adjacent or
Date mixing: After receiving the encrypted data slicing, the node uses the key of the sending node to decrypt it. The receiving node waits for a period of time to ensure that the remaining shard data arrive and then accumulate all received shard data.
Data aggregation: After the node’s data are mixed, the final query result is aggregated at the sink node according to the tree topology.

The execution procedure of the SMART algorithm.
Based on SMART, Wang et al.
11
proposed an improved sensor network privacy protection protocol called PECDA. Figure 2 shows the execution procedure of the PECDA algorithm. There is no need to slice data for all the nodes in the query region. Only the leaf nodes slice their data in the data slicing stage. As shown in Figure 2, the leaf node

The execution procedure of the PECDA algorithm.
Both SMART and PECDA are privacy protection protocols based on topology trees. Because the corresponding topology will change due to node failure and movement, maintenance of the topology will incur additional overhead.
Energy-efficient and privacy-preserving spatial range aggregation query processing in WSNs—E2PDA
The idea of E2PDA
To avoid relying on a pre-constructed topology and to reduce the energy consumption of the nodes in query processing, this article proposes an energy-efficient and privacy-preserving spatial range aggregation query algorithm, named E2PDA.
Figure 3 shows the execution procedure of the E2PDA algorithm, assuming that the average temperature of the query region

The execution procedure of the E2PDA algorithm.
The E2PDA algorithm is divided into three stages: (1) the query message’s geographic routing stage. The base station sends the query message to a node in the query region; (2) the query distribution and sensory data aggregation stage, the query message is sent to all nodes in the query region, and the sensing data is collected and aggregated to calculate the query results; (3) the query results generated in the second phase are returned to the query initiation node by geographic routing protocol.
Without loss of generality, the query area is described as a rectangle. The sink node of the initiating query node sends the query message to node
The entire dynamic query process of E2PDA does not depend on the pre-constructed topology, which aggregates the sensing data while ensuring the privacy of the node’s data. Taking node

A concrete diagram of the simulation route.
The design of E2PDA
This section describes the specific design of our algorithm, including four parts: the grid-based data collection strategy, the design of grid size, the method of handling voids and calculating the next node. The sensor network in the query region is represented as
Grid-based data collection strategy
To be able to traverse the entire query area, a grid data collection strategy is proposed. The query area is divided into multiple grids, and the query message is sent to all nodes in the entire query area by means of the progression of the grid. Thus, the sensing data of the region of interest of the user are collected.
Grid size design
The width and height of the grid are denoted as
Method of handling voids
As shown in Figure 5, after the query node

Bypassing the void region.
Calculating the next node
If the query message and partial query results reach a node
Energy consumption analysis
The energy of sensor nodes is mainly consumed by processing sensory data and transmitting data, and the energy consumed to transmit 1 bit of data is equivalent to the energy consumed to execute 800 computing instructions.
41
Therefore, the greater the traffic, the greater the energy consumption. In the process of data aggregation, the energy consumption of data transmission takes up a large proportion. Therefore, we mainly focuses on the energy consumed in the transmission of data. Suppose that each node has an average of
The energy consumed by the SMART and PECDA schemes includes the establishment of secure channels for data transmission, the transmission of query messages, and the query results. Assume that the proportion of leaf nodes in SMART and PECDA is
Theorem 1
Proof
E2PDA
where
SMART
where
PECDA
where
From equations (2), (5), and (8), it can be deduced that
From equations (3), (6), and (9), it can be deduced that
The conclusion can be drawn from equations (10) and (11)
Privacy analysis
The literature
39
contributes two important properties of key distribution: the probability of at least one identical key between any two neighbors is
In the SMART algorithm, when establishing a secure connection, node
Analysis of experimental results
In this section, the energy consumption of the E2PDA is analyzed through experiments. E2PDA, SMART, and PECDA are implemented in the simulator. 42 The experiments are conducted on a PC with a P4 2.6 GHz CPU and 8 GB memory running Ubuntu operating system. The sensor nodes are deployed in a rectangular monitoring area in a uniformly distributed and non-uniform distribution, respectively. And the experiment uses dataset provided by the University of Southern California ANRG Laboratory.
The experimental parameters were selected as follows. According to Rappaport,
43
the energy consumption formula of wireless communication transmission and reception of 1 byte is
Experimental parameters.
Figure 6 shows the effect of the number of network nodes on energy loss. When the number of network nodes is fixed, PECDA and SMART consume significantly more energy than E2PDA. There are a certain number of nodes. Under the same query area, the SMART algorithm needs to distribute the slicing data to the neighboring nodes in the data slicing stage, and the energy consumed in this process accounts for a large proportion of the total energy consumption. The PECDA algorithm conducts slicing transmission only of leaf node perception data, while the SMART algorithm requires data slicing transmission of all nodes in the network.

Influence of the number of network nodes on energy loss: (a) uniformly distributed network and (b) non-uniformly distributed network.
Figure 7 shows the energy consumption of E2PDA, SMART, and PECDA in different network topologies. As shown in the figure, the change in network topology has little effect on the energy loss of E2PDA. However, SMART and PECDA are susceptible to the network topology due to their dependence on pre-constructed topology trees. Different topology structures have different energy losses. In addition, according to the figure, E2PDA performs better than SMART and PECDA in energy consumption when the topology is certain.

Influence of different network topologies on energy loss: (a) uniformly distributed network and (b) non-uniformly distributed network.
Figure 8 shows the energy consumption performance of E2PDA, SMART, and PECDA in the case of query messages of different sizes. According to the figure, the energy loss of SMART and PECDA is greater than that of E2PDA. The energy consumed by the three algorithms in the process of sending a message is the same, and the energy consumption difference between the three algorithms is mainly derived from data slicing and distribution in SMART and PECDA.

Impact of the query message size on energy loss: (a) uniformly distributed network and (b) non-uniformly distributed network.
As shown in Figure 9, when the perceived data size is certain, the energy consumption of E2PDA is lower than that of SMART and PECDA. When the sensing data stored by the node increases, the energy consumption of SMART and PECDA in the data slicing and distribution phase occupies a certain proportion in the total energy consumption because SMART and PECDA include three stages: slicing, mixing, and aggregation. As the sensing data increase, the energy consumption of the data slicing and distribution also increases. Compared with SMART and PECDA, there is no need to divide and distribute the node-aware data during E2PDA data aggregation, which decreases the energy overhead.

Influence of the perceived data size on energy loss: (a) uniformly distributed network and (b) non-uniformly distributed network.
As shown in Figure 10, the energy consumption of the three algorithms increases with the increase in the query region, and when the query region is constant, the energy consumption of E2PDA is lower than that of SMART and PECDA. In a uniformly distributed sensor network, the query area becomes larger, the number of nodes to be aggregated increases, and the energy consumption increases as the number of nodes in the query area increases.

Influence of the query area size on energy loss: (a) uniformly distributed network and (b) non-uniformly distributed network.
Summary
We proposed an energy-efficient and privacy-preserving spatial range aggregation query processing algorithm in WSNs. It does not depend on the pre-constructed topology structure and collects the sensing data of the specified area while ensuring the data privacy of the sensor nodes. In the query area, if attacker wants to get the original data of a node
Footnotes
Handling Editor: Weizhi Meng
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research work was supported by the Aeronautical Science Foundation of China (grant no. 20165515001), the National Natural Science Foundation of China (grant no. 61402225), State Key Laboratory for smart grid protection and operation control Foundation, Science and Technology Funds from National State Grid Ltd. (The Research on Key Technologies of Distributed Parallel Database Storage and Processing based on Big Data).
