Abstract
In recent years, the big data emerged as a hot topic because of the rapid growth of the information and wireless communication technology. One of the significant sources of the big data is wireless sensor networks. Due to the power limitation of sensor nodes, energy-efficient big data collecting is a challenging task in wireless sensor networks. Being considered as an effective solution to address this challenge is to utilize sink’s mobility to assist data collecting. Although this method can reduce the volume of data transfer between sensor nodes and thus save energy consumption of nodes, the low speed of mobile sink hinders its use in data-intensive sensing applications with time constraint. In this article, we propose a four-phase mobility-assisted data collecting protocol consisting of network clustering, routes planning, routes combination, and data collecting. Two heuristic routes planning algorithms are presented to build a set of trajectories which satisfy the deadline constraint and have the minimum overall movement cost. Numeric results show that our approaches have better performance in terms of energy saving, latency, and movement cost.
Keywords
Introduction
Due to the rapid development of various information acquisition and wireless communication technology, the volume of data grows in an explosive way every day all over the world. Recently, a report published by IBM 1 indicates that 90% of the data in the world was generated in the previous 2 years. Therefore, the concept of big data has emerged as a hot research area and is attracting much attention from government, industry, and academic. The big data exhibits three remarkable characteristics which are high volume, high velocity, and high variety. 2 The variety means that the data generated by a wide range of sources such as radio frequency identification (RFID), Global Positioning System (GPS), sensors is of highly varied structures. The velocity refers to the high speed of processing and analysis. The volume refers to a large amount of data needs to be gathered, stored, and analyzed. Although services like social network, cloud storage, and Internet application are generating much volume of the big data, wireless sensor networks (WSNs) will produce a large portion of the big data, the volume of data generated by sensors is expected to reach the order of petabytes. WSNs have been well suited for use with a variety of applications including environmental monitoring, biological detection, smart spaces, and battlefield surveillance. Collecting the large volume of sensed data generated in these applications is crucial because the sensed information plays an important role on various human activities.3,4 In order to gather these data, a routing tree along which the sensors relay their data to the sink is constructed.5–7 Wireless transmission consumes significant amount of power of sensor nodes only equipped with limited battery. Even though the volume of data generated by an individual sensor is not significant, each sensor needs a lot of energy to relay the data generated by surrounding sensors. Therefore, energy-efficient method to collect big data in high-density large-scale WSNs is necessary and crucial.
Many data collection approaches aimed to reduce energy consumption have been proposed for WSNs recently. For example, data aggregation is one of the most effective approaches to reduce the total communication cost.8,9 However, in large-scale WSNs, the number of aggregated data is still significant and a huge volume of data is required to be transferred to the base station (BS); furthermore, the precision of aggregation should be traded off with energy saving. Another approach of energy-efficient data collection in WSNs is data prediction technique leveraging the spatial or temporal correlation between sensor data. The data compression technology is capable of reducing the volume of transmitted data. Although it is easy to be implemented, the data compression technology requires the nodes to be equipped with large storage and powerful computation. Recently, the mobile sink schemes have received great attention in literatures. In such schemes, ordinary sensor nodes are static and densely deployed in the sensing area. One or multiple data collectors, referred to as the sinks, are assumed to be capable of moving. As the sinks traveling around the sensing area, the sensors send their data to the sinks when the sinks appear in their proximity. In such way, significant network energy saving can be achieved by reducing the number of wireless transmissions. However, the primary disadvantage of this approach is the increased latency. For instance, the typical speed of several practical Mobile Sink systems is about
In this article, we are concerned with the problem of energy-efficient method to collect big data in mobility-assisted WSNs with delay requirements. We propose a four-phase mobile data collecting protocol called mobility-assisted big data collecting protocol (MDCP). It consists of network clustering, route planning, route combination, and data collecting. In order to minimize the network energy consumption, the mobile sinks are required to visit all the source nodes along constraint routes. This problem is related to the vehicle routing and scheduling problem with time window (VRSPTW) which falls into the category of non-deterministic polynomial (NP)-hard problem. Even finding a feasible solution to the VRSPTW with fixed number of vehicles is NP-complete. Therefore, we propose two heuristic algorithms to solve the route planning problem. The first heuristic algorithm builds routes by adding a link to the partially formed routes between two end nodes based on a measure of cost savings. The second heuristic algorithm belongs to the class of sequential route building algorithm. After initializing the current route, the heuristic sequentially adds a new node to the current partial route guided by a cost metric accounting for both geographic and temporal closeness of nodes. If no feasible nodes could be added, a new route will be started from the BS. In large-scale sensor network, the heuristic algorithms will build a large number of distinct routes for mobile elements (MEs) in order to visit all sensors within given deadline. In order to trade off between energy saving and movement cost, we first divide the source nodes into clusters and all nodes within a single cluster are fully connected. The MEs only collect data from cluster head (CH). Figure 1 shows an example of sensor nodes distribution, nodes clustering, and two optimal tours for MEs. We analyze the impact of time constraint, network scale, and movement speed on energy consumption through extensive simulations. We also compare the performance of two heuristic algorithms with rendezvous-based data collection (RDC) protocol and traditional data collection method without using MEs (NET). Simulation results show that our algorithms outperform RDC and NET in terms of energy saving, movement cost, and deadline misses.

Example for route combination.
The rest of this article is organized as follows. Section “Related works” reviews related work and compares different ME-based data collection approaches. Section “Introduction” introduces our assumptions and formulates the problem studied. The two heuristic algorithms are given in section “MDCP.” Simulation results of the proposed algorithm are discussed in section “Performance evaluation.” Finally, section “Conclusion and future work” draws the conclusion of this article.
Related works
Recent work has provided mobility-assisted method to WSN aiming at enhancing the connectivity of sparse ad hoc networks and reducing the energy consumption of WSN. Francesco and Das 10 present a comprehensive literature survey on WSNs with MEs.
Many solutions available in the literature have addressed the design of a static trajectory for MEs. For instance, Luo and Hubaux 11 consider a single ME which collects data from a circular dense WSN. It is showed that the optimal path of MEs is the perimeter of the sensing field. However, the average network energy consumption is high as nodes must communicate with the mobile sinks through multi-hop routes. Several heuristics are proposed by Gu and colleagues12,13 to schedule the movement of MEs such that the source nodes can be visited before buffer overflow. While this approach minimizes energy consumption by avoiding multi-hop wireless transmissions, it incurs high latency when collecting data from large sensing field due to slow speed of ME. Xing et al. 14 present an RDC approach jointly considering the motion control of ME. Several mobility patterns are considered, depending on whether the ME can move freely or not.
Data mule scheduling (DMS) is presented in Sugihara and Gupta 15 for sparse WSNs with only single-hop communications and then extended in Sugihara and Gupta 16 for a hybrid multi-hop communication scheme. The network-assisted data collection (NADC) problem is considered in Rao and Biswas 17 as an extension of network-assisted sink navigation (NAN). In this case, a hop-bound factor k can be specified, so that data can be forwarded up to k hops. Abdullah et al. 18 present a connectivity-based data collection (CBDC) algorithm. The CBDC algorithm utilizes the connectivity between sensor nodes so as to determine the trajectory of the mobile sink while satisfying its path constraint and minimizing the number of multi-hop communications. We proposed a genetic algorithm to determine the optimal travel routes for mobile sinks; it is not suitable for big data collection in large-scale sensor networks. 19
MDCP
In this section, we first outline the MDCP in large-scale WSNs. After that, we introduce the network clustering and two route planning algorithms in detail.
Overview of MDCP
In order to energy efficiently collect huge volume of data in large-scale WSNs, we propose an MDCP. We assume that the network consists of many static ordinary sensor nodes and several mobile sink nodes. The static sensors sample the measurements of surroundings periodically and store the sensed data locally. The mobile sink nodes can move around the network to collect sensor data from static sensor nodes. MDCP consists of four phases: clustering, route planning, route combination, and data collecting. Initially, the static sensor nodes in network are grouped into many clusters and then route planning algorithm generates optimal trajectories for mobile sinks. If the number of routes beyond the number limit of mobile sinks, route combination is exploited to reduce the number of routes. At last, mobile sinks move along their trajectories to collect data.
Network clustering
Network clustering is the first phase of MDCP. As the movement speed of mobile sink is relatively much lower than wireless transmission, it is impossible for mobile sink to visit all the static nodes within a given deadline. Therefore, network clustering phase of MDCP aims at dividing the network into clusters and deciding the data collection points (CHs). Sensors in the same cluster transfer their data to their CH. The CHs store the aggregated data in their buffer memory until mobile sinks approach them. The information is transferred to mobile sink by wireless communication.
We can exploit any of the existing clustering algorithms like low-energy adaptive clustering hierarchy (LEACH). Sensors are grouped into clusters based on their connectivity to each other. Every node can communicate with all the other nodes in the same cluster via one-hop communication. Let
Route planning problem formulation
Since random walk cannot guarantee the deadline requirements for data collecting, route planning for mobile sink is necessary and important in MDCP. In this section, we formulate the problem of route planning for big data collection in mobility-assisted WSN. This problem involves the building of a set of optimal routes for mobile sinks, each route originates and terminates at the BS. Before we formally give the mathematical formulation of this problem, let us first state the assumptions and introduce the concepts that will be needed throughout this article.
We assume that sensor nodes are randomly and uniformly distributed in the sensing field. Each node periodically generates a chunk of data and must transfer the data to the BS before its deadline. The delivery deadline is often determined by the data freshness or the buffer size of sensors. Mobile sinks move along their specific routes and collect data from sensors on the routes. Each sensor needs to be visited by mobile sink exactly once and all nodes in network must be assigned to specific mobile sink. Sensor nodes and sinks are aware of their physical locations by GPS.
Definition 1
An undirected graph
Definition 2
For convenience, let us define the following notions and variables:
L the time constraint for all routes;
R a sequence of
We now formally formulate our problem as follows.
Let G be a given undirected graph corresponding to the ME route planning problem. Vertex 0 corresponds to the BS and vertex i corresponds to node i for
Then, the route planning problem can be formulated as the following optimization problem
where
In order to minimize the network energy consumption, the mobile sinks are required to visit all the data collection nodes (CHs) along constraint routes. This problem is related to the VRSPTW which is NP-hard problem. Even finding a feasible solution to the VRSPTW with fixed number of vehicles is NP-complete. We propose two heuristic algorithms to solve the route planning problem in the following two sections.
Cost-saving algorithm
The first heuristic algorithm builds the optimal routes by a route combination procedure. This procedure begins with n distinct partial routes in which each sensor is visited by a dedicated ME. Then, two partial routes with common end node are combined by adding an edge between them. The combination is guided by a measure of cost saving given by the following formula
As shown in Figure 1, node 0 represents the BS, node 1 to node 4 is the ordinary static sensor node in network. The Saving Heuristic algorithm first constructs four initial routes from BS to each node S = {R1, R2, R3, R4}. After that, the algorithm computes
In order to guarantee the time constraint of each route, we must consider the route direction when we combine two partial routes. Two partial routes with end node i and j, respectively, have compatible directions if node i is the first node on one partial route and node j is the last node on another partial route or on the opposite. We only add link to the compatible partial routes.
Furthermore, we must check time constraint in the heuristic algorithm. Let us examine the necessary and sufficient conditions for time feasibility when inserting a node x between node
We compute the difference between these two start times as follows
If
Lemma 1
The necessary and sufficient conditions for time feasibility when inserting a node x between node
Here, L is the time constraint of all routes.
The saving heuristic is described in the following algorithm.
Nearest-neighbor algorithm
The nearest-neighbor heuristic starts every route by finding the unrouted node closest to the BS. At every subsequent iteration, the heuristic searches for the node closest to the last node added to the route. This search is performed among all nodes which can be added to the end of the partial route without violating the time constraint. If the search fails, a new route is started at any time, unless there are no more nodes to schedule.
The metric used in this method tries to account for both geographical and temporal closeness of nodes. Assume the last node on the current partial route is i and j denotes any unrouted node,
The nearest-neighbor heuristic is described in the following algorithm.
Performance evaluation
In this section, we evaluate the performance of MDCP protocol.
Experimental setup
In this simulation, sensor nodes are randomly and uniformly deployed within 800 × 800 m2 area. The BS is located at the top left corner of the region. The movement speed of mobile sink varies from 0.2 to 1 m/s. In this experiment, energy consumption is considered for data transmission and reception only; while during other time periods, sensor nodes are supposed to be in the sleep mode for which no energy consumption is assumed. A source generates and stores a data sample of 2 bytes every second. It sends all the accumulated data to the BS every 20 min (i.e. the deadline is 20 min). Thus, the route length L must not be longer than 1200 m.
We compare the performance of the proposed heuristic algorithm referred to as cost-save based algorithm (SAV) and nearest-neighbor based algorithm (NN), respectively, with RDC and NET algorithm. RDC 9 is a rendezvous-based data collection protocol that facilitates reliable data transfers from the network to mobile sinks. NET represents the network collecting data through multi-hop transmission without any mobile sinks.
Scalability of sensor nodes
An important factor that should be considered while designing a sensor network is the scalability, which measures how efficient a WSN is for small and large numbers of sensor nodes. The scalability of our algorithm is evaluated by testing the algorithm at varying number of sensor nodes (from 200 to 1000 nodes). The simulation is repeated for MEs of different deadline constraints (i.e. L = 800, 1000, and 1200). We also compare the energy consumption of our heuristic algorithms with NET and RDC when varying the number of sensor nodes. The main difference between our algorithm and RDC is that we minimize the total cost (distance, time) of the MEs’ tours which is not considered in RDC. Therefore, we compare the ME cost for these two algorithms by varying the number of nodes. The results are shown in Figures 2–4, respectively.

The impact of number of nodes on energy consumption for different deadline constraints.

The impact of number of nodes on energy consumption for different algorithms.

The impact of number of nodes on ME’s cost for different algorithms.
When the number of sensor nodes increased, larger number of clusters is formulated. This, in turn, leads to increase in the number of multi-hop nodes and therefore the network energy consumption increased as shown in Figure 2. It is also clear from this figure that the increase in the ME path constraint leads to less energy consumption of sensor nodes. This is due to the reason that increasing the path constraint allowed the MEs to travel larger area of the sensor field and hence reduced the number of multi-hop sensor nodes.
Figure 3 shows that all algorithm yield larger energy consumption when increasing the number of nodes. The NET algorithm incurred the most energy consumption because the multi-hop transmissions increase a lot with the larger number of nodes. Since our algorithms first group all the sensor nodes into clusters, the number of multi-hop nodes increases slowly compared with RDC algorithm. Therefore, our algorithm has the least energy consumption. The nearest-neighbor heuristic consumes less energy than saving heuristic a little because NN considers the distance from BS.
From Figure 4 we can see that the movement cost (total distance) varies with the number of nodes. In order to cover more nodes, the MEs must travel a longer path which leads to the increase in ME’s cost. The goal of our algorithm is to minimize the total cost of mobile sink’s routes within deadline constraint and the genetic algorithm presented in this article can obtain a least-cost feasible solution. In contrast, RDC protocol did not take into consideration the cost of mobile sink.
Impact of movement speed
In this section, we evaluate the network energy consumption when the speed of mobile sinks varies from 0.1 to 2 m/s. Only radio transmission energy is counted in the simulations. Figure 5 shows that all algorithms yield lower energy consumption when the mobile sinks move faster except NET. This is because the mobile sinks are able to collect data on a longer tour within the deadline. NET collects data through multi-hop routing without using MEs. The results of this simulation show that our algorithms can effectively take advantage of speed increase of the mobile sink.

The impact of movement speed on energy consumption for different algorithms.
Impact of number of mobile sink
In this section, we evaluate the performance in terms of energy consumption and percentage of deadline misses by varying the mobile sink number constraint. From Figure 6 we can see that the network energy consumption decreases when the maximum ME number increases. This is because the more the ME, the less the multi-hop transmissions. Figure 7 shows that the percentage of deadline misses decreases when the mobile sink number increases. This is because the more the mobile sink, the more the sensor nodes would be covered within deadline. The NET algorithm does not use any ME; therefore, the deadline misses would not be affected by the number of mobile sinks.

The impact of maximum number of MEs on energy consumption for different deadline.

The impact of maximum number of MEs on percentage of deadline misses for different algorithms.
Conclusion and future work
In this article, we study the challenging problem of energy-efficient big data collection in WSNs. Although the mobile sink approaches can significantly reduce energy consumption of sensor nodes and increase the network lifetime, they increase the delay of response. This makes the mobile sink schemes not suitable for real-time and emergency applications. To address this challenge, we propose a four-phase mobility-assisted data collecting protocol called MDCP. It first divides the network into connected clusters, and the mobile sinks only collect data from CH nodes. Then, it builds a set of optimal travel routes for mobile sinks by running two heuristic algorithms presented in this article. To meet the number limitation of mobile sinks, a route combination procedure is conducted in MDCP. Extensive simulation results show that MDCP can provide substantial energy saving and incur the least movement cost compared with traditional multi-hop transmission data collecting protocols and the existing mobility-assisted data collecting protocols.
In this article, we assume that sensor nodes have infinite storage memory and the same data generation rate. In future work, we will consider the more practical scenario that sensor nodes have buffer limitation and different buffer overflow rate.
Footnotes
Academic Editor: Donghyun Kim
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported, in part, by the National Science Foundation for Young Scholars of China (61100048, 61102105, and 81273649) and the Natural Science Foundation of Heilongjiang Province (F2016034).
