Abstract
Data aggregation is a useful technology that can decrease the communication bandwidth cost in the process of data gathering in VANETs. However, data aggregation may lose some data accuracy. Current data aggregation schemes in VANETs only consider saving bandwidth cost while ignoring the application requirement, which may result in the inaccuracy of aggregated data for dynamic routing application. Therefore, we propose a framework in which application demands are considered in the process of data aggregation to ensure the accuracy of aggregated data for dynamic routing application. The framework consists of three parts: extracting QoI constraints of aggregated data, distributing the QoI-based data gathering queries, routing and aggregating data with QoI constraints. First, we propose average allocation method to handle the demand of single user and utilize convex optimization to handle multiuser demands. Then, we distribute the QUERY message with QoI constraints in the interested area. Last, we propose QoI-DG protocol to do two kinds of data aggregation operation, namely, AVERAGE aggregation and HISTOGRAM aggregation. Simulation experiments show that our proposed method can increase about 20 percent in the collected data rate and save 15 percent communication bandwidth cost in the process of data gathering in VANETs.
1. Introduction
Over the last decades, the technology of vehicular ad hoc networks (VANETs) [1] has developed rapidly. VANETs have a broad application prospect in intelligent transportation system (ITS) [2]. The application scene in VANETs is shown in Figure 1. There are two kinds of wireless communication equipment in VANETs, vehicle nodes on the roads and access points (APs) fixed on the roadsides. In VANETs, one vehicle node communicates with another vehicle node through V2V [3] communication mode, while it utilizes V2I [4] communication mode to communicate with APs. Many sorts of sensors are installed at the vehicles to get information about the environment of the drivers, such as velocity, GPS, acceleration and available parking place. Then such information is gathered to APs through data transmission and aggregation. APs integrate such information to provide diversified applications (real-time navigation, safe warning of intersections, discovery of free parking places, etc.) [5] in order to make driving safer, more efficient, and comfortable. As shown in Figure 1, there are three applications, which are dynamic routing navigation application, accident warning, and request for free parking place. The data sensed by vehicle nodes is always streamy and large, so decreasing the bandwidth consumption in the process of data gathering is one key research point.

VANET application scene.
Data aggregation techniques can effectively decrease the amount of data transmission. However, they will also make the data accuracy decrease and the latency added. The accuracy and the latency of aggregated data are called the quality of information (QoI). The QoI of aggregated data in VANETs has a direct effect on the performance of routing guidance application. Aggregated data with high QoI can make estimated travel time more accurate, and then the result provided by dynamic routing guidance application is better. At the same time, the higher the QoI of aggregated data can be reached, the more wireless communication bandwidth in the process of data gathering need to be costed. Susceptible situations in VANETs (communication disruption, the mobility of vehicles, etc.) make the available bandwidth in VANET limited. Therefore, data gathering protocol in VANETs should cost wireless communication bandwidth as little as possible under the premise of satisfying the requirements of routing guidance application proposed by the users. Thus, our basic idea is that the QoI requirements of routing guidance applications should be considered in the process of data gathering while communication bandwidth consumption can be decreased as much as possible.
The contributions of this paper are summarized as follows.
We propose a framework, which considers the application demands in the process of data aggregation in VANETs. QoI constraints of aggregated data are extracted from the requirement of routing guidance raised by the users using convex optimization. The strategy of distributing the QoI-based data gathering queries is proposed, and the request message with QoI requirements is designed. QoI-DG protocol is proposed to gather data in the interest area to the requesting AP through routing and aggregating data with QoI constraints. We conduct extensive experimental study. Simulation experiments show that our proposed method can increase about 20 percent in the collected data rate and save 15 percent communication bandwidth cost in the process of data gathering in VANETs.
The rest of the paper is organized as follows. Related works are reviewed and analyzed in Section 2. Section 3 describes the system model. Section 4 describes the overview of QoI-based data gathering and routing guidance in VANETs. Extracting QoI constraints of data gathering from the requirements of routing guidance application is presented in Section 5. QoI-based query propagation is stated in Section 6. In Section 7, QoI-DG protocol is introduced in detail. Section 8 contains the results and the analysis of an experimental study on QoI-based data gathering and routing guidance in VANETs. Finally, in Section 9, conclusions are given and some future researches are outlined.
2. Related Works
Data gathering is one of the most important issues in VANETs which gathers a mass of real-time data used for ITS applications. Many researches and implementation efforts [6–14] have been involved in the process of data gathering.
The research about data gathering in VANETs focuses on two main aspects: routing-related and aggregation-related aspects. The routing-related research considers how to design routing protocols, which decide when and where which vehicle nodes broadcast information to whom. The other research focuses on data compression and aggregation techniques, which mainly consider how to decrease the amount of data transmission.
Great efforts have been made in data routing protocol in the process of data gathering in VANETs. According to the communication mode, data routing in the process of data gathering can classify three sorts, which are routing only through V2I communication, only V2V communication, and both V2I and V2V communication. Data collection in [15] only utilizes the communication between vehicle nodes and roadside units, so it does not effectively use the communication between vehicle nodes. The approaches given by [11, 16, 17] are based on the idea of clustering. The nodes in the cluster send data to cluster heads, and then cluster heads utilize V2I communication to forward all data to APs. Soua et al. proposed ADOPEL protocol based on a distributed Q learning techniqueto make the collecting operation more reactive to nodes mobility and topology changes [18]. Yuan et al. proposed an infrastructure-free data aggregation algorithm by restricting forwarders to limit the number of forwarders in VANETs [19]. Rangari et al. Summarized forwarding control of data aggregation in VANETs [6].
Data aggregation in the process of data gathering can effectively decrease bandwidth consumption. At the same time, it may cause the decrease of data accuracy and the delay. References [20, 21] chose to do data compression that is a no-loss accuracy aggregation. According to the difference of the data and the residual space of transmitted packet the aggregation decision in [10] was made. ParkNet and Lochert et al. aggregated the information of the free parking places [9, 22]. Fuzzy Aggregation utilized fuzzy logic reasoning to make aggregation decision about the data in the adaptive zone partition [23]. The longer data is retained in the node, the more opportunity data can be aggregated. However it also leads to the delay increase. References [24–26] considered the tradeoff between the delay and the data amount decrease. Catch-up applied distributed MDP model to control the detention time of the message [25, 26]. The idea in [24] is that transmitted data should be carried as much as possible until the delay might exceed the deadline requested by the user. Compressive sensing is introduced to data gathering in VANETs [12, 27]. We surveyed existing data aggregation protocols in VANETs in Table 1.
Summary of data aggregation protocols in VANETs.
To sum up, existing research about data gathering did not proceed from the requirements of applications. Therefore, they did not optimize communication bandwidth consumption under the premise of satisfying the data accuracy demands of the users. It incurs either that QoI of aggregated data is higher than the requirements and communication bandwidth is wasted or that QoI of data collected is lower than the requirement and QoS of the application is greatly affected.
3. The System Model
In this section, we introduce the system model applied in the process of data gathering with QoI guarantee in VANETs. The model includes three parts, basic assumptions, details about guidance routing application, and details about data aggregation.
3.1. Basic Assumption
Assume the system is a time-slotted and synchronized system. It means all the nodes in the system have the synchronized clock. We model urban road map as a graph. Graph
Next, we introduce the definition of the accuracy requirement of estimated travel time of road r.
Definition 1.
The accuracy requirement of estimated travel time of road
It means that the requirement is that the possibility that the error of estimated travel time
On MAC layer we consider one-hop interference model, which is that links within one hop of each other cannot be scheduled to transmit at the same time. A node can either send or receive data at a time and it can receive a data packet correctly when it hears only this packet at that time slot. If a node hears two or more messages at the same time, it cannot receive any of them correctly due to the interference.
3.2. Routing Guidance Application
In the urban routing guidance application, many users raise navigation requests, and routing guidance system (GRS) utilizes real-time traffic information to get responses to requests from the users. Generally, the request of GRS only includes the place of departure and the destination. However, it is not enough. Usually we need a path along which we can reach the destination in the expected time. So we need GRS that can inquire requests with QoS requirements. At first, we define guidance routing request with QoS demand, which describes the degree of accuracy of travel time provided by shortest time navigation algorithm.
Definition 2.
Guidance routing request with QoS demand
GRS receives automotive navigation request with QoS demand; then, it will reply the query. So we introduce the definition of guidance routing response.
Definition 3.
Guidance routing response
There are many users who will raise the routing guidance requests with QoS demand to GRS. We denote all guidance routing requests with QoS demands raised by many users by the request set
3.3. Data Aggregation
Data aggregation can effectively decrease communication bandwidth cost in the process of data gathering, but it also makes aggregated data imprecise. Data impreciseness is measured by the quantitative difference between an approximate value and the exact value. The application specifies the precision constraint of data aggregation by an upper bound ε on the data impreciseness (called the error bound). That is, on receiving an aggregate data value
We assume “perfect” aggregation. It means that intermediate nodes can gather data from predecessors and aggregate all data into a single packet. The smaller the error bound need to be, the more accurate collected data need to be, and the higher the bandwidth cost will be. We model the cost of data aggregation with error bound of ε as function It is a decreasing function. When error bound ε is 0, the function value is a maximum value. The function value depends on the data distribution.
Assume the velocity is distributed normally
4. Overview of QoI-Based Data Gathering and Routing Guidance
While QoI constraints are used to guide the process of data gathering, extracting QoI constraints from the requirements is the basic step. The next step is to create QoI-based data gathering query and distribute it in the interested area. At last, data is gathered by QoI-DG protocol to the requesting AP.
In the first step, we need to extract the QoI bound of data aggregation from the set
Figure 2 presents the overview of collecting the travel time information and routing guidance application in VANET environment. The requesting APs need to extract QoI constraints of data aggregation from the requirements of the users, distribute the queries with QoI constraints, collect aggregated data to estimate the travel times of roads, and calculate the shortest time routing guidance. Vehicle nodes get the QUERY message, distribute it to other nodes, aggregate data with QoI constraints, and route it to APs. The pseudocode of QoI-based data gathering and routing guidance is followed as in Algorithm 1.
(1) Extracting QoI bound of estimated road travel time from the request set RS on the requesting AP (2) Extract the amount of data collected (3) Create QoI-based QUERY message on the requesting AP (4) Send QoI-based QUERY message from the requesting AP to the vehicle nodes in the interested area (5) Distribute QoI-based QUERY message between the vehicle nodes in the interested area (6) QoI-based AVERAGE/HISTOGRAM data aggregation from (7) Choose the forwarding strategy CARRY or FORWARD on the vehicle nodes (8) Complete wireless channel between the vehicle nodes to forward the data on the vehicle nodes (9) Collect aggregated data on the requesting AP (10) Run Routing Guidance Application to get the shortest time path (11)

The framework of QoI-based data gathering and routing guidance.
5. Extracting QoI Constraints of Data Gathering
The data sensed by the vehicles is utilized to run the applications, and it will be tackled through three stages. The first step is data gathering, in which data is routed and aggregated to AP. The next step is road travel time estimating, whose input aggregated data from vehicle nodes so that estimated travel time has deviation. The last step is routing guidance algorithm that utilizes estimated road travel time to get the best routing path with travel time. In the whole process, data gathering is the basic. The QoI of collected data has a direct impact on the performance of guidance routing. Therefore, QoI is an important factor that should be considered in the process of data gathering. QoI constraints of data gathering should be extracted from the requirement of the users. Extracting the QoI constraints is divided into two steps, one of which is to extract QoI bound of estimated road travel time from the requirements of the users and the other is to extract error bounds of data aggregation from QoI bound of estimated road travel time.
5.1. Extracting QoI Bound of Estimated Road Travel Time
We consider two situations: one is to handle the requirement by the single user and the other is to handle the requirement set of many users. The method of extracting from the requirement of the single user applies equipartition idea. To solve the problem of extracting from the requirements of multiusers, we formalize it as convex optimization problem, denoted by ERs problem. Through the solution of convex optimization it is solved.
5.1.1. Extracting the Requirement of the Single User
The user inputs guidance routing request with QoS demand
5.1.2. Extracting from Multiuser Requirements
In the guidance routing system many users will raise the queries. Assume the error bound of estimated travel time of road r is
ERs problem is as follows:
Theorem 4.
ERs problem is convex optimization.
Proof.
We first prove the feasible zone is convex set.
Set
So the feasible set D constrained by formula (10) is convex set. Now we prove optimization function is a convex function.
Consider optimization function
So the feasible set and optimization function of ERs problem are convex. ERs problem is convex problem.
For a convex optimization problem, we can use Lagrange dual from [29] to solve it.
Assume the possibility of satisfying the error bound of road r is
5.2. Extracting the Error Bound of Data Aggregation
The error bound of data aggregation
According to Definition 1, we can get (15) and use (13) to deduce (16). Then, we can get (17), and it is independent and identically distributed, so we can calculate error bound
6. Distributing the QoI-Based Queries
In this section, we use QoI constraints to create QUERY message and distribute it in the interested area. At first, requesting AP creates QoI-based data gathering query. QUERY message includes the information of the sponsor of the query, the creation time and life time of the query, the interested area, and QoI bound. Table 1 describes the structure of QoI-based data gathering QUERY message. The key items in QUERY message are information about QoI constraints which are the amount of data collected and error bound of aggregated data.
The structure of QUERY message.
After creating QUERY message, requesting AP distributes this QUERY message to the vehicles in the interested area. Query propagation deals with the diffusion of the query message from the requesting AP to the interest region. Until now there are many researches on broadcasting protocol in VANETs [24, 30]. But these researches are different from our work, in which response nodes are chosen during query propagation in our work. Our query diffusion need consider how to choose partial nodes to respond to this query.
The process of QUERY message diffusion is as follows. AP broadcasts a QoI-based data gathering QUERY message. When a vehicle node receives a QUERY message, it stores the query only if it is in the interested area. Then, it prepares to schedule the dispatches of QUERY message. It checks the neighbor list, the information which is got by beacon messages in cycle times. A random number between 0 and 1 is generated for each node in its neighbor list, and if this number is above and beyond the value of P, which is a constant parameter, put this node ID into set S and decrease the value of
7. Gathering the Data with QoI Constraints
The crucial task of collecting the data and carrying them toward the AP is managed by our QoI-based data gathering protocol (QoI-DG). The protocol need solve three questions as follows.
Which part of data can be aggregated? When and how data aggregation is done? When does the node forward data, retain data, and discard data? Which node should broadcast when several nodes in the conflict range all want to broadcast data?
To solve these problems, QoI-DG protocol should include three parts: QoI-based data aggregation, strategy selection, and wireless communication schedule. QoI-based data aggregation is to maximize the amount of decreasing the data communication cost with QoI constraints of aggregated data. In QoI-DG protocol, data aggregation algorithm on node is based on the dynamic programming idea to choose optimal data to aggregate. Strategy selection points out that whether retain strategy or forward strategy the node should adapt according to the current state of the node. The main idea of strategy selection in QoI-DG protocol is that data is retained until the delay is close to the bound. Wireless communication schedule in QoI-DG protocol provides a method based on priority that deals with communication conflicts between vehicle nodes.
7.1. QoI-Based Data Aggregation
There are many kinds of data aggregation operation, such as AVERAGE, HISTOGRAM, SUM, and COUNT. We consider the real-time traffic condition, so we choose to consider two kinds of aggregation operations to gather the velocity data, one of which is AVERAGE aggregation and the other is HISTOGRAM aggregation.
7.1.1. AVERAGE Aggregation
We analyze the problem of AVERAGE aggregation on every vehicle node. It defined QoI-based data aggregation on every node as AVERAGEAgg problem.
AVERAGEAgg problem: assume data set is
To solve AVERAGEAgg problem, we propose a dynamic programming algorithm. We define
Analyzing QoI-based AVERAGE aggregation, we get
We use (19) to construct the dynamic programming matrix. The pseudocode of this algorithm is as shown in Algorithm 2.
(1) (2) (3) Calculate (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16)
7.1.2. HISTOGRAM Aggregation
For HISTOGRAM aggregation operation, we remark QoI-based HISTOGRAM data aggregation problem as HISTOGRAMAgg problem. In the HISTOGRAM aggregation, the velocity data interval is divided by error bound ε; then, the data is put into the related interval. The data in one interval is aggregated data. HISTOGRAM aggregation need consider the partition of the data zone and the classification of the data. The pseudocode of QoI-based HISTOGRAM aggregation algorithm is shown in Algorithm 3.
(1) PartitionDataZone(MinD, MaxD, ε) (2) (3) temp = PutDIntoDataInterval( (4) B[temp] = B[temp] ∪ (5)
7.2. Strategy Selection
Vehicle nodes have three choices to deal with data, which is to forward, retain, or discard the data. Carrying the data is convenient because it can reduce bandwidth consumption while the node itself brings the data closer to the requesting AP. Moreover when a vehicle is carrying data, it can receive and aggregate data from other vehicles. The vehicle needs to decide whether to retain the data they are carrying or to forward the data to another node on which aggregation can be done to save more bandwidth cost.
It depends on the direction and value of the velocity of the node and its neighbors whether the node is fit for forwarding data. If the moving direction of the node is away from the requesting AP, the node must choose to forward the data. If there is HELP node in its neighbors, the node also should forward data. We define HELP node as follows.
Definition 5.
HELP node: node j is HELP node of node i, if and only if it satisfies the conditions as follows.
The moving direction of node j is towards AP.
The information of node speed and moving direction is got by BEACON message.
Whether the node discards the data depends on available storage space. When the proportion of available storage space in entire storage space is less than a constant parameter f (set f is 20%), the node will aggregate the stored data. If there is no free storage space when node receives new data, the data with the oldest time stamp is discarded.
7.3. Wireless Communication Schedule
When several nodes in the communication range all need to forward data, only one node in every time slot can utilize wireless channel. It depends on the priority of the node which node can utilize the wireless channel. The node exchanges the priority of itself with other nodes. The node with highest priority broadcasts the data, while other neighbor nodes are ready to receive the data. The priority is determined by the distinctive feature of data set on the node. We can get the bit vector that identifies which data the node has through BEACON message. We define the repetition frequency of data k, denoted by
Definition 6.
Repetition frequency of data k: assume the neighbor node set of node i is
Definition 7.
Distinctive feature of data set S on the node i,
8. Simulation Results and Analysis
In this section, we compare QoI-DG protocol with DB-VDG protocol [24] from the aspects of effectiveness of data gathering, bandwidth consumption, and the rate of data collection. DB-VDG protocol was proposed by Palazzi et al. and it was a protocol of data gathering with deadline constraint on wireless mobile sensor networks. This protocol has two strategies: SBSS and DBSS. We use SBSS in the DB-VDG protocol, because SBSS is better than DBSS.
8.1. Experimental Setup
We use Singapore expressway mentioned in [31] as the scene of experiment. The Singapore expressway has 11 intersections and 28 links (both directions). The structure of the expressway is shown in Figure 3.

Singapore expressway.
We use traffic simulation software “Sumo” to generate the traces of vehicles. In the experiment we apply discrete time simulator. The nodes are synchronized. The communication model is disc model that when the distance between one node and the other is less than 100 m, they can communicate.
The requirements of guidance routing are generated as follows.
Enumerate all intersect pairs Calculate the time Randomly choose
So we get the requirement set
Simulation parameters.
8.2. Analysis of Results
In order to analyze the performance of QoI-DG protocol, we introduce two different metrics which measure the effectiveness and efficiency of the protocol.
Effectiveness measures the ability of QoI-DG to gather enough data under error bound in order to provide application service with enough quality. We use the percent of guidance routing with QoS satisfied to measure it. Efficiency is an index that measures the level of bandwidth optimization. We use the number of all DATA messages to measure it.
Firstly, we analyze the effectiveness of QoI-DG protocol. As shown in Figure 4, QoI-DG protocol guarantees that the percent of the result of guidance routing with QoS satisfied changes with the QoS requirement. Regarding the effectiveness, QoI-DG protocol can almost reach the requirement proposed by the user. From Figure 4, we can find out the proportion of the result with QoS satisfied increases with the constant parameters

Comparisons on QoS requirements and the results of QoI-based routing guidance.
Then we compare the efficiency of the protocol with DB-VDG protocol. From Figure 5, it is clear to demonstrate that QoI-DG protocol need less communication bandwidth cost. As the time increases, the proportion of deduced bandwidth consumption is rapidly increasing. Comparing AVERAGE aggregation and HISTOGRAM aggregation, we can find out that the two kinds of data aggregation save more or less bandwidth consumption.

Comparisons on communication bandwidth consumption.
Another aspect of the protocol assessed is the rate of data collection. Figure 6 shows the proportion of data collected in the whole required data increases by the time. We can find out that the rate in the whole process of data gathering is relatively stable. Moreover, the rate is always higher than DB-VDG protocol.

Comparisons on the rate of data collection.
9. Conclusion and Future Works
This paper analyzes the disadvantage of current research on data gathering in VANETs and points out that data gathering should be guided by the requirement of ITS applications. Then, we design the framework of data gathering with QoI constraints. We divide it into three steps, which are extracting of QoI bound, query propagation, and QoI-based data routing and aggregation. The main mechanism of our solution is to utilize QoI bound to control the optimization of data aggregation and routing. The core work is that we propose QoI-DG protocol which can use least communication bandwidth consumption under the premise of satisfying the requirement of application provided by the users. Simulation experiments are carried out to prove the effectiveness and efficiency of QoI-DG protocol in the aspect of both satisfying user demands and saving bandwidth consumption.
In this paper we focus on one kind of ITS applications, dynamic routing guidance. For future work, it would be interesting to add other ITS applications into it. That will mainly complicate the extraction of QoI fit for all demands. Furthermore, the delay will be one key aspect that need to be researched.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
This work was supported by the National Natural Science Foundation of China (NSFC) Grant funded by the Chinese government (no. 61370214 and no. 60803148).
