Data Aggregation and Routing Guidance with QoS Guarantee in VANETs

Abstract

Data aggregation is a useful technology that can decrease the communication bandwidth cost in the process of data gathering in VANETs. However, data aggregation may lose some data accuracy. Current data aggregation schemes in VANETs only consider saving bandwidth cost while ignoring the application requirement, which may result in the inaccuracy of aggregated data for dynamic routing application. Therefore, we propose a framework in which application demands are considered in the process of data aggregation to ensure the accuracy of aggregated data for dynamic routing application. The framework consists of three parts: extracting QoI constraints of aggregated data, distributing the QoI-based data gathering queries, routing and aggregating data with QoI constraints. First, we propose average allocation method to handle the demand of single user and utilize convex optimization to handle multiuser demands. Then, we distribute the QUERY message with QoI constraints in the interested area. Last, we propose QoI-DG protocol to do two kinds of data aggregation operation, namely, AVERAGE aggregation and HISTOGRAM aggregation. Simulation experiments show that our proposed method can increase about 20 percent in the collected data rate and save 15 percent communication bandwidth cost in the process of data gathering in VANETs.

1. Introduction

Over the last decades, the technology of vehicular ad hoc networks (VANETs) [1] has developed rapidly. VANETs have a broad application prospect in intelligent transportation system (ITS) [2]. The application scene in VANETs is shown in Figure 1. There are two kinds of wireless communication equipment in VANETs, vehicle nodes on the roads and access points (APs) fixed on the roadsides. In VANETs, one vehicle node communicates with another vehicle node through V2V [3] communication mode, while it utilizes V2I [4] communication mode to communicate with APs. Many sorts of sensors are installed at the vehicles to get information about the environment of the drivers, such as velocity, GPS, acceleration and available parking place. Then such information is gathered to APs through data transmission and aggregation. APs integrate such information to provide diversified applications (real-time navigation, safe warning of intersections, discovery of free parking places, etc.) [5] in order to make driving safer, more efficient, and comfortable. As shown in Figure 1, there are three applications, which are dynamic routing navigation application, accident warning, and request for free parking place. The data sensed by vehicle nodes is always streamy and large, so decreasing the bandwidth consumption in the process of data gathering is one key research point.

Figure 1

VANET application scene.

Data aggregation techniques can effectively decrease the amount of data transmission. However, they will also make the data accuracy decrease and the latency added. The accuracy and the latency of aggregated data are called the quality of information (QoI). The QoI of aggregated data in VANETs has a direct effect on the performance of routing guidance application. Aggregated data with high QoI can make estimated travel time more accurate, and then the result provided by dynamic routing guidance application is better. At the same time, the higher the QoI of aggregated data can be reached, the more wireless communication bandwidth in the process of data gathering need to be costed. Susceptible situations in VANETs (communication disruption, the mobility of vehicles, etc.) make the available bandwidth in VANET limited. Therefore, data gathering protocol in VANETs should cost wireless communication bandwidth as little as possible under the premise of satisfying the requirements of routing guidance application proposed by the users. Thus, our basic idea is that the QoI requirements of routing guidance applications should be considered in the process of data gathering while communication bandwidth consumption can be decreased as much as possible.

The contributions of this paper are summarized as follows. (i)

We propose a framework, which considers the application demands in the process of data aggregation in VANETs.

(ii)

QoI constraints of aggregated data are extracted from the requirement of routing guidance raised by the users using convex optimization.

(iii)

The strategy of distributing the QoI-based data gathering queries is proposed, and the request message with QoI requirements is designed.

(iv)

QoI-DG protocol is proposed to gather data in the interest area to the requesting AP through routing and aggregating data with QoI constraints.

(v)

We conduct extensive experimental study. Simulation experiments show that our proposed method can increase about 20 percent in the collected data rate and save 15 percent communication bandwidth cost in the process of data gathering in VANETs.

The rest of the paper is organized as follows. Related works are reviewed and analyzed in Section 2. Section 3 describes the system model. Section 4 describes the overview of QoI-based data gathering and routing guidance in VANETs. Extracting QoI constraints of data gathering from the requirements of routing guidance application is presented in Section 5. QoI-based query propagation is stated in Section 6. In Section 7, QoI-DG protocol is introduced in detail. Section 8 contains the results and the analysis of an experimental study on QoI-based data gathering and routing guidance in VANETs. Finally, in Section 9, conclusions are given and some future researches are outlined.

2. Related Works

Data gathering is one of the most important issues in VANETs which gathers a mass of real-time data used for ITS applications. Many researches and implementation efforts [6–14] have been involved in the process of data gathering.

The research about data gathering in VANETs focuses on two main aspects: routing-related and aggregation-related aspects. The routing-related research considers how to design routing protocols, which decide when and where which vehicle nodes broadcast information to whom. The other research focuses on data compression and aggregation techniques, which mainly consider how to decrease the amount of data transmission.

Great efforts have been made in data routing protocol in the process of data gathering in VANETs. According to the communication mode, data routing in the process of data gathering can classify three sorts, which are routing only through V2I communication, only V2V communication, and both V2I and V2V communication. Data collection in [15] only utilizes the communication between vehicle nodes and roadside units, so it does not effectively use the communication between vehicle nodes. The approaches given by [11, 16, 17] are based on the idea of clustering. The nodes in the cluster send data to cluster heads, and then cluster heads utilize V2I communication to forward all data to APs. Soua et al. proposed ADOPEL protocol based on a distributed Q learning techniqueto make the collecting operation more reactive to nodes mobility and topology changes [18]. Yuan et al. proposed an infrastructure-free data aggregation algorithm by restricting forwarders to limit the number of forwarders in VANETs [19]. Rangari et al. Summarized forwarding control of data aggregation in VANETs [6].

Data aggregation in the process of data gathering can effectively decrease bandwidth consumption. At the same time, it may cause the decrease of data accuracy and the delay. References [20, 21] chose to do data compression that is a no-loss accuracy aggregation. According to the difference of the data and the residual space of transmitted packet the aggregation decision in [10] was made. ParkNet and Lochert et al. aggregated the information of the free parking places [9, 22]. Fuzzy Aggregation utilized fuzzy logic reasoning to make aggregation decision about the data in the adaptive zone partition [23]. The longer data is retained in the node, the more opportunity data can be aggregated. However it also leads to the delay increase. References [24–26] considered the tradeoff between the delay and the data amount decrease. Catch-up applied distributed MDP model to control the detention time of the message [25, 26]. The idea in [24] is that transmitted data should be carried as much as possible until the delay might exceed the deadline requested by the user. Compressive sensing is introduced to data gathering in VANETs [12, 27]. We surveyed existing data aggregation protocols in VANETs in Table 1.

Table 1

Summary of data aggregation protocols in VANETs.

Protocol	Data type	Agg. accuracy	Comm. mode	Routing structure	Forwarding time
SOTIS	Speed and Pos.	No loss	V2V	Ad hoc	No waiting
TrafficView	speed	Loss	V2V	Ad hoc	No waiting
Pro. Agg.	Parking Info.	No loss	V2V & V2I	Ad hoc	No waiting
Catch-up	Speed	Loss	V2V	Ad hoc	Waiting
CASCADE	Speed	No loss	V2V	Clustering	No waiting
Fuzzy Agg.	Speed	Loss	V2V	Ad hoc	No Waiting
ParkNet	Parking Info.	No loss	V2V & V2I	Ad hoc	No waiting
Two-tier Agg.	Speed	No loss	V2V & V2I	Two-tier	No Waiting
Adaptive For.	Speed	Loss	V2V	Ad hoc	Waiting
DB-VDG	Speed	Loss	V2V & V2I	Ad hoc	Waiting
CS	Speed	Loss	V2V & V2I	Ad hoc	No Waiting

To sum up, existing research about data gathering did not proceed from the requirements of applications. Therefore, they did not optimize communication bandwidth consumption under the premise of satisfying the data accuracy demands of the users. It incurs either that QoI of aggregated data is higher than the requirements and communication bandwidth is wasted or that QoI of data collected is lower than the requirement and QoS of the application is greatly affected.

3. The System Model

In this section, we introduce the system model applied in the process of data gathering with QoI guarantee in VANETs. The model includes three parts, basic assumptions, details about guidance routing application, and details about data aggregation.

3.1. Basic Assumption

Assume the system is a time-slotted and synchronized system. It means all the nodes in the system have the synchronized clock. We model urban road map as a graph. Graph $G = (X, R)$ is described as road network, in which X is the set of the intersections and R is the set of the roads. Travel time passed by road r is denoted by $t_{r}$ , and the length of road r is denoted by $l_{r}$ . The number of the vehicles on the road r is denoted by $n_{r}$ . Travel time calculated with aggregated data is denoted by ${\hat{t}}_{r}$ . The error of travel time of road r approximation is denoted by $e_{r}$ . It can be calculated by (1). The error bound of estimated travel time of road r is $δ_{r}$ :

\begin{matrix} e_{r} = | {\hat{t}}_{r} - t_{r} | . \end{matrix}

(1)

Next, we introduce the definition of the accuracy requirement of estimated travel time of road r.

Definition 1.

The accuracy requirement of estimated travel time of road $r (δ_{r}, p_{r})$ is as follows:

\begin{matrix} p (e_{r} \leq δ_{r}) \geq p_{r} . \end{matrix}

(2)

It means that the requirement is that the possibility that the error of estimated travel time ${\hat{t}}_{r}$ is less than $δ_{r}$ should be more than $p_{r}$ .

On MAC layer we consider one-hop interference model, which is that links within one hop of each other cannot be scheduled to transmit at the same time. A node can either send or receive data at a time and it can receive a data packet correctly when it hears only this packet at that time slot. If a node hears two or more messages at the same time, it cannot receive any of them correctly due to the interference.

3.2. Routing Guidance Application

In the urban routing guidance application, many users raise navigation requests, and routing guidance system (GRS) utilizes real-time traffic information to get responses to requests from the users. Generally, the request of GRS only includes the place of departure and the destination. However, it is not enough. Usually we need a path along which we can reach the destination in the expected time. So we need GRS that can inquire requests with QoS requirements. At first, we define guidance routing request with QoS demand, which describes the degree of accuracy of travel time provided by shortest time navigation algorithm.

Definition 2.

Guidance routing request with QoS demand $(s, d, δ, p)$ : s means the place of departure and d means the destination. Assume the response provided by GRS is that the shortest time path from s to d is path l, and the estimated travel time of path l is ${\hat{t}}_{l}$ . The travel time of path l is $t_{l}$ . The response need satisfy in

\begin{matrix} p (| {\hat{t}}_{l} - t_{l} | < δ) > p . \end{matrix}

(3)

GRS receives automotive navigation request with QoS demand; then, it will reply the query. So we introduce the definition of guidance routing response.

Definition 3.

Guidance routing response $(l, {\hat{t}}_{l})$ : it means that GRS calculates the shortest time path which is path $l = (r_{1}, r_{2}, \dots, r_{| l |})$ , and the estimated travel time is ${\hat{t}}_{l}$ .

There are many users who will raise the routing guidance requests with QoS demand to GRS. We denote all guidance routing requests with QoS demands raised by many users by the request set $RS$ .

3.3. Data Aggregation

Data aggregation can effectively decrease communication bandwidth cost in the process of data gathering, but it also makes aggregated data imprecise. Data impreciseness is measured by the quantitative difference between an approximate value and the exact value. The application specifies the precision constraint of data aggregation by an upper bound ε on the data impreciseness (called the error bound). That is, on receiving an aggregate data value $A^{'}$ , the application would like to assure that the exact aggregate value A lies in the interval $[A^{'} - ε, A^{'} + ε]$ . We refer to the actual velocity measurements as raw data. Assume there are n vehicles, the velocity of vehicle i is $v_{i}$ . According to (4) these raw data do AVERAGE aggregation and get aggregated data $v_{i}^{'}$ :

\begin{matrix} v_{1}^{'} = v_{i}^{'} = \dots = v_{n}^{'} = \frac{\sum_{i = 1}^{n} v_{i}}{n} . \end{matrix}

(4)

We assume “perfect” aggregation. It means that intermediate nodes can gather data from predecessors and aggregate all data into a single packet. The smaller the error bound need to be, the more accurate collected data need to be, and the higher the bandwidth cost will be. We model the cost of data aggregation with error bound of ε as function $C (ε)$ . Through analyzing approximation data aggregation, we can get some characters of function $C (ε)$ as follows. (1)

It is a decreasing function.

(2)

When error bound ε is 0, the function value is a maximum value.

(3)

The function value depends on the data distribution.

Assume the velocity is distributed normally $N (μ, σ)$ , and the distributions of different nodes velocity are independent. We can get formula (5). So we consider to averagely divide $[μ - 3 σ, μ + 3 σ]$ by ε. So the cost of data aggregation function is defined as (6):

\begin{matrix} p (μ - 3 σ \leq v_{i} \leq μ + 3 σ) \geq 99.7 %, \end{matrix}

(5)

\begin{matrix} C (ε) = \frac{6 σ}{ε} . \end{matrix}

(6)

4. Overview of QoI-Based Data Gathering and Routing Guidance

While QoI constraints are used to guide the process of data gathering, extracting QoI constraints from the requirements is the basic step. The next step is to create QoI-based data gathering query and distribute it in the interested area. At last, data is gathered by QoI-DG protocol to the requesting AP.

In the first step, we need to extract the QoI bound of data aggregation from the set $RS$ . We calculate the QoI bound of estimated travel time on the road $(Δ_{r}, P_{r})$ from it and then extract the amount $n_{r}$ and the error bound $ε_{r}$ of data that need be collected in each road. In the next step, according to QoI bound of data collected, the QUERY messages are created and distributed. In the third step, data is gathered by QoI-DG protocol to the requesting AP. This is the whole process of data gathering. We use collected data to estimate travel time of each road, to further calculate the routing guidance requests.

Figure 2 presents the overview of collecting the travel time information and routing guidance application in VANET environment. The requesting APs need to extract QoI constraints of data aggregation from the requirements of the users, distribute the queries with QoI constraints, collect aggregated data to estimate the travel times of roads, and calculate the shortest time routing guidance. Vehicle nodes get the QUERY message, distribute it to other nodes, aggregate data with QoI constraints, and route it to APs. The pseudocode of QoI-based data gathering and routing guidance is followed as in Algorithm 1.

Algorithm 1: QoI-based Data Agg. and GRS.

Input: the request set RS, the sensor data

$d_{1}, d_{2}, \dots, d_{n}$

Output: The shortest time path $l_{i}$ for the user i

(1) Extracting QoI bound of estimated road travel time from the request set RS on the requesting AP

(2) Extract the amount of data collected $X_{r}$ and error bound of data aggregation $ε_{r}$ from QoI bound $(Δ_{r}, P_{r}$ ) on the requesting AP

(3) Create QoI-based QUERY message on the requesting AP

(4) Send QoI-based QUERY message from the requesting AP to the vehicle nodes in the interested area

(5) Distribute QoI-based QUERY message between the vehicle nodes in the interested area

(6) QoI-based AVERAGE/HISTOGRAM data aggregation from $d_{1}, d_{2}, \dots, d_{n}$ to $B_{1}, B_{2}, \dots, B_{m}$ on the vehicle nodes

(7) Choose the forwarding strategy CARRY or FORWARD on the vehicle nodes

(8) Complete wireless channel between the vehicle nodes to forward the data on the vehicle nodes

(9) Collect aggregated data on the requesting AP

(10) Run Routing Guidance Application to get the shortest time path $l_{i}$ for the user i on the requesting AP

(11) return the shortest time path $l_{i}$ for the user i

Figure 2

The framework of QoI-based data gathering and routing guidance.

5. Extracting QoI Constraints of Data Gathering

The data sensed by the vehicles is utilized to run the applications, and it will be tackled through three stages. The first step is data gathering, in which data is routed and aggregated to AP. The next step is road travel time estimating, whose input aggregated data from vehicle nodes so that estimated travel time has deviation. The last step is routing guidance algorithm that utilizes estimated road travel time to get the best routing path with travel time. In the whole process, data gathering is the basic. The QoI of collected data has a direct impact on the performance of guidance routing. Therefore, QoI is an important factor that should be considered in the process of data gathering. QoI constraints of data gathering should be extracted from the requirement of the users. Extracting the QoI constraints is divided into two steps, one of which is to extract QoI bound of estimated road travel time from the requirements of the users and the other is to extract error bounds of data aggregation from QoI bound of estimated road travel time.

5.1. Extracting QoI Bound of Estimated Road Travel Time

We consider two situations: one is to handle the requirement by the single user and the other is to handle the requirement set of many users. The method of extracting from the requirement of the single user applies equipartition idea. To solve the problem of extracting from the requirements of multiusers, we formalize it as convex optimization problem, denoted by ERs problem. Through the solution of convex optimization it is solved.

5.1.1. Extracting the Requirement of the Single User

The user inputs guidance routing request with QoS demand $(s, d, δ, p)$ into the guidance routing system. Then, the QoI bound of each road travel time estimated $(Δ_{r}, P_{r})$ is extracted from the request $(s, d, δ, p)$ . Our method is based on the equal partition. At first, we use shortest path algorithm to find out the shortest length path from departure to destination and its length $l_{s d}^{\min}$ . Then we utilize [28] to calculate all paths of which the length is smaller than $(1 + τ) l_{s d}^{\min}$ to compose the PATH Set $PS$ , where τ is a constant parameter. The roads in the path p compose set $R S^{p}$ , and the set of all roads of paths in the set $PS$ is denoted by $RPS$ . Assume the number of roads of which the path p in the set $PS$ has the most roads is $n_{s d}^{\max}$ . For any $r \in RPS$ , the error bound $Δ_{r}$ is got through dividing the QoS demand by (7). For any $r \in RPS$ , the possibility of satisfying the error bound of road r is got through dividing the QoS demand p by (8). Until now, we get QoI bound of road travel time estimated $(Δ_{r}, P_{r})$ from the request $(s, d, δ, p)$ :

\begin{matrix} Δ_{r} = \frac{Δ}{n_{s d}^{\max}}, \end{matrix}

(7)

\begin{matrix} P_{r} = \sqrt[n_{s d}^{\max}]{P} . \end{matrix}

(8)

5.1.2. Extracting from Multiuser Requirements

In the guidance routing system many users will raise the queries. Assume the error bound of estimated travel time of road r is $Δ_{r}$ , the number of users is n, and the request set is $RS$ . Extracting QoI bounds of road travel time from the request set $RS$ need consider the combination of all queries. Shortest path algorithm can find out the shortest length path $p_{i}$ from departure $s_{i}$ to destination $d_{i}$ and its length $l_{i}^{\min}$ , and this path is composed by road $r_{i}^{1}, \dots, r_{i}^{m}$ , which compose a road set $R S^{p_{i}}$ . The optimal problem is to minimize the bandwidth cost of data aggregation under satisfying the request set RS. It is formalized as follows.

ERs problem is as follows:

\begin{array}{l} \underset{δ_{i}}{minimize} Min \sum ‍ C (δ_{i}) \end{array}

(9)

\begin{array}{l} subject to δ_{r_{1}^{1}} + δ_{r_{1}^{2}} + \dots + δ_{r_{1}^{p_{1}}} \leq Δ_{1} \\ ⋮ \\ δ_{r_{n}^{1}} + δ_{r_{n}^{2}} + \dots + δ_{r_{n}^{p_{n}}} \leq Δ_{n} \\ δ_{i} \geq 0 . \end{array}

(10)

Theorem 4.

ERs problem is convex optimization.

Proof.

We first prove the feasible zone is convex set.

Set $f_{i} (\vec{δ}) = δ_{r_{1}^{i}} + δ_{r_{2}^{i}} + \dots + δ_{r_{| P_{i} |}^{i}}$ , $g_{i} (\vec{δ}) = δ_{i}$ , $1 \leq i \leq n$ . $f_{i} (\vec{δ})$ and $g_{i} (\vec{δ})$ functions are linear functions. Therefore, according to the character of linear function, they are convex.

So the feasible set D constrained by formula (10) is convex set. Now we prove optimization function is a convex function.

Consider optimization function $f_{0} (\vec{δ}) = \sum ‍ C (δ_{r}) = \sum ‍ 6 σ ∖ δ_{r}$ , $f (δ) : R^{| R r s |} \to R^{+}$

\begin{array}{l} \forall α : 0 \leq α \leq 1, {\vec{δ}}_{1}, {\vec{δ}}_{1} \in D, \\ f (α {\vec{δ}}_{1} + (1 - α) {\vec{δ}}_{2}) = \sum ‍ \frac{6 σ}{α δ_{1 i} + (1 - α) δ_{2 i}}, \\ α f ({\vec{δ}}_{1}) + (1 - α) f ({\vec{δ}}_{2}) = α \sum ‍ \frac{6 σ}{δ_{1 i}} + (1 - α) \sum ‍ \frac{6 σ}{δ_{2 i}}, \\ \frac{6 σ}{α δ_{1 i} + (1 - α) δ_{2 i}} - (α \frac{6 σ}{δ_{1 i}} + (1 - α) \frac{6 σ}{δ_{2 i}}) \\ = \frac{- α (1 - α) {(δ_{1 i} - δ_{2 i})}^{2}}{(α δ_{1 i} + (1 - α) δ_{2 i}) δ_{1 i} δ_{2 i}} ⩽ 0 \\ ∴ f (α {\vec{δ}}_{1} + (1 - α) {\vec{δ}}_{2}) - (α f ({\vec{δ}}_{1}) + (1 - α) f ({\vec{δ}}_{2})) ⩽ 0 . \end{array}

(11)

f (\vec{δ})

is a convex function.

So the feasible set and optimization function of ERs problem are convex. ERs problem is convex problem.

For a convex optimization problem, we can use Lagrange dual from [29] to solve it.

Assume the possibility of satisfying the error bound of road r is $P_{r}$ . They are got from the request set RS. For $(s_{i}, d_{i}, δ_{i}, p_{i}) \in RS$ , the shortest path from $s_{i}$ to $d_{i}$ is $p_{i}$ , the roads of the path $p_{i}$ are set $R S^{p_{i}}$ , and the number of roads in path $p_{i}$ is $m_{i}$ . We use (12) to get $P_{r}$ as follows:

\begin{matrix} P_{r} = \max {\sqrt[m_{i}]{P_{i}}} . \end{matrix}

(12)

5.2. Extracting the Error Bound of Data Aggregation

The error bound of data aggregation $ε_{r}$ is extracted from $(Δ_{r}, P_{r})$ . Assume the sensed data of vehicles on the road r is $t_{1}, t_{2}, \dots, t_{k}$ ; the data that AP received is ${\tilde{t}}_{1}, {\tilde{t}}_{2}, \dots, {\tilde{t}}_{k}$ ; travel time of road t is normally distributed. Assume estimated travel time of the road r is ${\hat{t}}_{r}$ ,

\begin{matrix} | {\tilde{t}}_{i} - t_{i} | < ε_{r}, \end{matrix}

(13)

\begin{matrix} \hat{t} = \frac{\sum_{i = 1}^{k} {\tilde{t}}_{i}}{k} . \end{matrix}

(14)

According to Definition 1, we can get (15) and use (13) to deduce (16). Then, we can get (17), and it is independent and identically distributed, so we can calculate error bound $ε_{r}$ . Consider

\begin{array}{l} p (| \hat{t} - t | < Δ_{r}) \\ = p (- Δ_{r} < \hat{t} - t < Δ_{r}) \\ = p (- Δ_{r} < \frac{\sum_{i = 1}^{k} {\tilde{t}}_{i}}{k} - t < Δ_{r}) \\ = p (- Δ_{r} < \frac{\sum_{i = 1}^{k} {\tilde{t}}_{i}}{k} - t) p (\frac{\sum_{i = 1}^{k} {\tilde{t}}_{i}}{k} - t < Δ_{r}), \end{array}

(15)

\begin{array}{l} p (- Δ_{r} < \frac{\sum_{i = 1}^{k} {\tilde{t}}_{i}}{k} - t) < p (- Δ_{r} < \frac{\sum_{i = 1}^{k} t_{i}}{k} - t + ε_{r}), \\ p (\frac{\sum_{i = 1}^{k} {\tilde{t}}_{i}}{k} - t < Δ_{r}) < p (\frac{\sum_{i = 1}^{k} t_{i}}{k} - t - ε_{r} < Δ_{r}), \end{array}

(16)

\begin{array}{l} p (- Δ_{r} < \frac{\sum_{i = 1}^{k} t_{i}}{k} - t + ε_{r}) p (\frac{\sum_{i = 1}^{k} t_{i}}{k} - t - ε_{r} < Δ_{r}) > P_{r} . \end{array}

(17)

6. Distributing the QoI-Based Queries

In this section, we use QoI constraints to create QUERY message and distribute it in the interested area. At first, requesting AP creates QoI-based data gathering query. QUERY message includes the information of the sponsor of the query, the creation time and life time of the query, the interested area, and QoI bound. Table 1 describes the structure of QoI-based data gathering QUERY message. The key items in QUERY message are information about QoI constraints which are the amount of data collected and error bound of aggregated data. $(e_{r}, X_{r})$ is a structure used for storing the temp information (Table 2).

Table 2

The structure of QUERY message.

Name	Meaning
AP	The ID of AP that creates this query
CreateTime	The time of creating this query
Lifetime	The remaining time of query
Interested Area	The area from which data is collected
$n_{r}$	The amount of data that need to be collected
$ε_{r}$	Error bound that needs to be satisfied in data aggregation
$(e_{r}, X_{r})$	The value used to set respond nodes

After creating QUERY message, requesting AP distributes this QUERY message to the vehicles in the interested area. Query propagation deals with the diffusion of the query message from the requesting AP to the interest region. Until now there are many researches on broadcasting protocol in VANETs [24, 30]. But these researches are different from our work, in which response nodes are chosen during query propagation in our work. Our query diffusion need consider how to choose partial nodes to respond to this query.

The process of QUERY message diffusion is as follows. AP broadcasts a QoI-based data gathering QUERY message. When a vehicle node receives a QUERY message, it stores the query only if it is in the interested area. Then, it prepares to schedule the dispatches of QUERY message. It checks the neighbor list, the information which is got by beacon messages in cycle times. A random number between 0 and 1 is generated for each node in its neighbor list, and if this number is above and beyond the value of P, which is a constant parameter, put this node ID into set S and decrease the value of $X_{r}$ . Then, it continues diffusing the query and set S to its neighbors. When one node receives QUERY message, if its ID is in set S, it sets itself as one response node of QUERY. At the same time, if QUERY is expired through lifetime of QUERY, the node aborts this query propagation.

7. Gathering the Data with QoI Constraints

The crucial task of collecting the data and carrying them toward the AP is managed by our QoI-based data gathering protocol (QoI-DG). The protocol need solve three questions as follows. (a)

Which part of data can be aggregated? When and how data aggregation is done?

(b)

When does the node forward data, retain data, and discard data?

(c)

Which node should broadcast when several nodes in the conflict range all want to broadcast data?

To solve these problems, QoI-DG protocol should include three parts: QoI-based data aggregation, strategy selection, and wireless communication schedule. QoI-based data aggregation is to maximize the amount of decreasing the data communication cost with QoI constraints of aggregated data. In QoI-DG protocol, data aggregation algorithm on node is based on the dynamic programming idea to choose optimal data to aggregate. Strategy selection points out that whether retain strategy or forward strategy the node should adapt according to the current state of the node. The main idea of strategy selection in QoI-DG protocol is that data is retained until the delay is close to the bound. Wireless communication schedule in QoI-DG protocol provides a method based on priority that deals with communication conflicts between vehicle nodes.

7.1. QoI-Based Data Aggregation

There are many kinds of data aggregation operation, such as AVERAGE, HISTOGRAM, SUM, and COUNT. We consider the real-time traffic condition, so we choose to consider two kinds of aggregation operations to gather the velocity data, one of which is AVERAGE aggregation and the other is HISTOGRAM aggregation.

7.1.1. AVERAGE Aggregation

We analyze the problem of AVERAGE aggregation on every vehicle node. It defined QoI-based data aggregation on every node as AVERAGEAgg problem.

AVERAGEAgg problem: assume data set is $S = d_{1}, d_{2}, \dots, d_{n}$ and find minimal set partition $B_{1}, B_{2}, \dots, B_{m}$ under the following constraint:

\begin{matrix} \forall B_{i}, \forall d_{j}, | d_{j}^{'} - d_{j} | ⩽ ε, d_{j} \in B_{i}, \\ d_{j}^{'} = \frac{\sum_{d_{k} \in B_{i}}^{} d_{k}}{| B_{i} |} . \end{matrix}

(18)

To solve AVERAGEAgg problem, we propose a dynamic programming algorithm. We define $A (i, j)$ the optimal set partition number of set $d_{i}, \dots, d_{j}$ , and $B (i, j)$ the flag that sign whether aggregation of $d_{i}, \dots, d_{j}$ satisfies the constraint. If it is satisfied, the value is 1 and if not, the value is ∞.

Analyzing QoI-based AVERAGE aggregation, we get

\begin{array}{l} A (1, n) = \min {(A (1,1) + A (2, n)), (A (1,2) + A (3, n)), \\ \dots, (A (1, n - 1) + A (n, n)), B (1, n)} . \end{array}

(19)

We use (19) to construct the dynamic programming matrix. The pseudocode of this algorithm is as shown in Algorithm 2.

Algorithm 2: QoI-based AVERAGE Data Agg.

Input: sensor data $S = d_{1}, d_{2}, \dots, d_{n}$ and error bound $ε$

Output: $B_{1}, B_{2}, \dots, B_{m}$

(1) for $i \leftarrow 0$ to n do

(2) for $j \leftarrow 0$ to n do

(3) Calculate $B (i, j)$

(4) for $i \leftarrow 0$ to n do

(5) $A (i, i) = 1$

(6) for $k \leftarrow 2$ to n do

(7) for $i \leftarrow 1$ to $n - k + 1$ do

(8) $j = i + k - 1$ ;

(9) if $B (i, j) = 1$ then

(10) $A (i, j) = 1$

(11) else

(12) for $y \leftarrow i$ to $j - 1$ do

(13) $q = A (i, y) + A (y + 1, j)$

(14) if $q < A (i, j)$ then

(15) $A (i, j) = q$

(16) return $B_{1}, B_{2}, \dots, B_{m}$

7.1.2. HISTOGRAM Aggregation

For HISTOGRAM aggregation operation, we remark QoI-based HISTOGRAM data aggregation problem as HISTOGRAMAgg problem. In the HISTOGRAM aggregation, the velocity data interval is divided by error bound ε; then, the data is put into the related interval. The data in one interval is aggregated data. HISTOGRAM aggregation need consider the partition of the data zone and the classification of the data. The pseudocode of QoI-based HISTOGRAM aggregation algorithm is shown in Algorithm 3.

Algorithm 3: QoI-based HISTOGRAM Data Agg.

Input: sensor data $S = d_{1}, d_{2}, \dots, d_{n}$ and error bound $ε$

Output: $B_{1}, B_{2}, \dots, B_{m}$

(1) PartitionDataZone(MinD, MaxD, ε)

(2) for $i \leftarrow 1$ to n do

(3) temp = PutDIntoDataInterval( $d_{i}$ )

(4) B[temp] = B[temp] ∪ ${d_{i}}$

(5) return $B_{1}, B_{2}, \dots, B_{m}$

7.2. Strategy Selection

Vehicle nodes have three choices to deal with data, which is to forward, retain, or discard the data. Carrying the data is convenient because it can reduce bandwidth consumption while the node itself brings the data closer to the requesting AP. Moreover when a vehicle is carrying data, it can receive and aggregate data from other vehicles. The vehicle needs to decide whether to retain the data they are carrying or to forward the data to another node on which aggregation can be done to save more bandwidth cost.

It depends on the direction and value of the velocity of the node and its neighbors whether the node is fit for forwarding data. If the moving direction of the node is away from the requesting AP, the node must choose to forward the data. If there is HELP node in its neighbors, the node also should forward data. We define HELP node as follows.

Definition 5.

HELP node: node j is HELP node of node i, if and only if it satisfies the conditions as follows. (1)

The moving direction of node j is towards AP.

(2)

$v_{j} - v_{i} > V, v_{j}$ is the current speed of node j, $v_{i}$ is the current speed of node i, and V is system parameter.

The information of node speed and moving direction is got by BEACON message.

Whether the node discards the data depends on available storage space. When the proportion of available storage space in entire storage space is less than a constant parameter f (set f is 20%), the node will aggregate the stored data. If there is no free storage space when node receives new data, the data with the oldest time stamp is discarded.

7.3. Wireless Communication Schedule

When several nodes in the communication range all need to forward data, only one node in every time slot can utilize wireless channel. It depends on the priority of the node which node can utilize the wireless channel. The node exchanges the priority of itself with other nodes. The node with highest priority broadcasts the data, while other neighbor nodes are ready to receive the data. The priority is determined by the distinctive feature of data set on the node. We can get the bit vector that identifies which data the node has through BEACON message. We define the repetition frequency of data k, denoted by $T F_{k}^{i}$ , and the distinctive feature of data set S on the node i, denoted by $DF (S)^{i}$ .

Definition 6.

Repetition frequency of data k: assume the neighbor node set of node i is $N_{i}$ , the bit vector of data on the node j is $B V_{j}$ , and $B V_{j}^{k}$ is the value in the bit vector corresponding to data k. Consider

\begin{matrix} T F_{k}^{i} = \frac{\sum_{j}^{} {BV}_{j}^{k}}{| N_{i} |} . \end{matrix}

(20)

Definition 7.

Distinctive feature of data set S on the node i, $DF (S)^{i}$ :

\begin{matrix} DF {(S)}^{i} = \frac{\sum_{k \in S}^{} 1 - T F_{k}^{i}}{| S |} . \end{matrix}

(21)

8. Simulation Results and Analysis

In this section, we compare QoI-DG protocol with DB-VDG protocol [24] from the aspects of effectiveness of data gathering, bandwidth consumption, and the rate of data collection. DB-VDG protocol was proposed by Palazzi et al. and it was a protocol of data gathering with deadline constraint on wireless mobile sensor networks. This protocol has two strategies: SBSS and DBSS. We use SBSS in the DB-VDG protocol, because SBSS is better than DBSS.

8.1. Experimental Setup

We use Singapore expressway mentioned in [31] as the scene of experiment. The Singapore expressway has 11 intersections and 28 links (both directions). The structure of the expressway is shown in Figure 3.

Figure 3

Singapore expressway.

We use traffic simulation software “Sumo” to generate the traces of vehicles. In the experiment we apply discrete time simulator. The nodes are synchronized. The communication model is disc model that when the distance between one node and the other is less than 100 m, they can communicate.

The requirements of guidance routing are generated as follows. (1)

Enumerate all intersect pairs $(s, d)$ except pairs with single road.

(2)

Calculate the time $t_{s d}$ of the shortest path of all intersect pairs $(s, d)$ .

(3)

Randomly choose $Δ_{s d}$ , from the interval $((1 + a_{1}) t_{s d}, (1 + a_{2}) t_{s d})$ , where $a_{1}$ , $a_{2}$ are constant parameters.

So we get the requirement set $RS$ . Table 3 lists simulation parameters.

Table 3

Simulation parameters.

Parameter	Value
The number of intersections	18
The number of roads	41
The number of cars	60
Interested area	All roads
Communication radius	100 m

8.2. Analysis of Results

In order to analyze the performance of QoI-DG protocol, we introduce two different metrics which measure the effectiveness and efficiency of the protocol. (i)

Effectiveness measures the ability of QoI-DG to gather enough data under error bound in order to provide application service with enough quality. We use the percent of guidance routing with QoS satisfied to measure it.

(ii)

Efficiency is an index that measures the level of bandwidth optimization. We use the number of all DATA messages to measure it.

Firstly, we analyze the effectiveness of QoI-DG protocol. As shown in Figure 4, QoI-DG protocol guarantees that the percent of the result of guidance routing with QoS satisfied changes with the QoS requirement. Regarding the effectiveness, QoI-DG protocol can almost reach the requirement proposed by the user. From Figure 4, we can find out the proportion of the result with QoS satisfied increases with the constant parameters $a_{1}$ and $a_{2}$ . The bigger the value of $a_{1}$ and $a_{2}$ , the lower the QoS requirement provided by the user and the higher the percent of results that satisfy the QoS requirement.

Figure 4

Comparisons on QoS requirements and the results of QoI-based routing guidance.

Then we compare the efficiency of the protocol with DB-VDG protocol. From Figure 5, it is clear to demonstrate that QoI-DG protocol need less communication bandwidth cost. As the time increases, the proportion of deduced bandwidth consumption is rapidly increasing. Comparing AVERAGE aggregation and HISTOGRAM aggregation, we can find out that the two kinds of data aggregation save more or less bandwidth consumption.

Figure 5

Comparisons on communication bandwidth consumption.

Another aspect of the protocol assessed is the rate of data collection. Figure 6 shows the proportion of data collected in the whole required data increases by the time. We can find out that the rate in the whole process of data gathering is relatively stable. Moreover, the rate is always higher than DB-VDG protocol.

Figure 6

Comparisons on the rate of data collection.

9. Conclusion and Future Works

This paper analyzes the disadvantage of current research on data gathering in VANETs and points out that data gathering should be guided by the requirement of ITS applications. Then, we design the framework of data gathering with QoI constraints. We divide it into three steps, which are extracting of QoI bound, query propagation, and QoI-based data routing and aggregation. The main mechanism of our solution is to utilize QoI bound to control the optimization of data aggregation and routing. The core work is that we propose QoI-DG protocol which can use least communication bandwidth consumption under the premise of satisfying the requirement of application provided by the users. Simulation experiments are carried out to prove the effectiveness and efficiency of QoI-DG protocol in the aspect of both satisfying user demands and saving bandwidth consumption.

In this paper we focus on one kind of ITS applications, dynamic routing guidance. For future work, it would be interesting to add other ITS applications into it. That will mainly complicate the extraction of QoI fit for all demands. Furthermore, the delay will be one key aspect that need to be researched.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work was supported by the National Natural Science Foundation of China (NSFC) Grant funded by the Chinese government (no. 61370214 and no. 60803148).

References

Jakubiak

Koucheryavy

State of the art and research challenges for VANETs

Proceedings of the 5th IEEE Consumer Communications and Networking Conference (CCNC '08)

January 2008

Las Vegas, Nev, USA

912 916

2-s2.0-51949106222

10.1109/ccnc08.2007.212

Beresford

Bacon

Intelligent transportation systems

IEEE Pervasive Computing 2006 5 4 63 667

Wang

C.-X.

Cheng

Laurenson

D. I.

Vehicle-to-vehicle channel modeling and measurements: recent advances and future challenges

IEEE Communications Magazine 2009 47 11 96 103

2-s2.0-70449512388

10.1109/MCOM.2009.5307472

Cho

Kim

S. I.

Choi

H. K.

H. S.

Kwak

D. Y.

Performance evaluation of V2V/V2I communications: the effect of midamble insertion

Proceedings of the 1st International Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic Systems Technology (Wireless VITAE '09)

May 2009

Aalborg, Denmark

793 797

2-s2.0-70350784156

10.1109/WIRELESSVITAE.2009.5172551

Hartenstein

Laberteaux

K. P.

A tutorial survey on vehicular ad hoc networks

IEEE Communications Magazine 2008 46 6 164 171

2-s2.0-45749099297

10.1109/MCOM.2008.4539481

Rangari

A. B.

Girish Agrawal

P. B.

Data aggregation control for VANET—a reviews

International Journal of Engineering Research and Technology 2013 2 556 569

Chang

W.-R.

Lin

H.-T.

Chen

B.-X.

TrafficGather: an efficient and scalable data collection protocol for vehicular ad hoc networks

Proceedings of the 5th IEEE Consumer Communications and Networking Conference (CCNC '08)

January 2008

Las Vegas, Nev, USA

365 369

2-s2.0-51949114353

10.1109/ccnc08.2007.88

Delot

Ilarri

Data gathering in vehicular networks: the VESPA experience

Proceedings of the 36th Annual IEEE Conference on Local Computer Networks (LCN '11)

October 2011

Bonn, Germany

797 804

2-s2.0-84856147580

10.1109/LCN.2011.6115554

Mathur

Kaul

Gruteser

Trappe

ParkNet: a mobile sensor network for harvesting real time vehicular parking information

Proceedings of the MobiHoc S3 Workshop on MobiHoc S3 (MobiHoc S3 ' 09)

2009

New York, NY, USA

ACM

25 28

10.

Nadeem

Dashtinezhad

Liao

Iftode

Trafficview: traffic data dissemination using car-to-car communication

SIGMOBILE Mobile Computing and Communications Review 2004 8 3 6 19

11.

Salhi

Cherif

M. O.

Senouci

S. M.

A new architecture for data collection in vehicular networks

Proceedings of the IEEE International Conference on Communications (ICC '09)

June 2009

Dresden, Germany

1 6

2-s2.0-70449500103

10.1109/ICC.2009.5198637

12.

Wang

Zhu

Zhang

Compressive sensing based monitoring with vehicular networks

Proceedings of the IEEE INFOCOM

April 2013

Turin, Italy

2823 2831

2-s2.0-84883099314

10.1109/INFCOM.2013.6567092

13.

Wischhof

Ebner

Rohling

Lott

Halfmann

SOTIS—a self-organizing traffic information system

Proceedings of the 57th IEEE Semiannual Vehicular Technology Conference (VTC '03)

April 2003

2442 2446

2-s2.0-0041968594

14.

Zarmehri

M. N.

Aguiar

Data gathering for sensing applications in vehicular networks (poster)

Proceedings of the IEEE Vehicular Networking Conference (VNC '11)

November 2011

Amsterdam, The Netherlands

222 229

2-s2.0-84857609214

10.1109/VNC.2011.6117104

15.

Arbabi

M. H.

Weigle

M. C.

Using vehicular networks to collect common traffic data

Proceedings of the 6th ACM International Workshop on VehiculAr Inter-NETworking (VANET '09)

September 2009

New York, NY, USA

117 118

2-s2.0-70350647144

10.1145/1614269.1614289

16.

Brik

Lagraa

Cherroun

Lakas

Token-based clustered data gathering protocol(TCDGP) in vehicular networks

Proceedings of the 9th International Wireless Communications and Mobile Computing Conference (IWCMC '13)

July 2013

Sardinia, Italy

1070 1074

10.1109/IWCMC.2013.6583705

17.

Song

Cuckov

A mobility-aware general-purpose vehicular ad-hoc network clustering scheme

Journal of Information Science and Engineering 2010 26 3 897 911

2-s2.0-77953667539

18.

Soua

Afifi

Adaptive data collection protocol using reinforcement learning for VANETs

Proceedings of the 9th International Wireless Communications and Mobile Computing Conference (IWCMC '13)

July 2013

Sardinia, Italy

1040 1045

10.1109/IWCMC.2013.6583700

19.

Yuan

Luo

Yan

Zhao

DA2RF: a data aggregation algorithm by restricting forwarders for VANETs

Proceedings of the International Conference on Computing, Networking and Communications (ICNC '14)

February 2014

Honolulu, Hawaii, USA

393 397

10.1109/ICCNC.2014.6785366

20.

Ibrahim

Weigle

M. C.

Accurate data aggregation for VANETs

Proceedings of the 4th ACM International Workshop on Vehicular Ad Hoc Networks (VANET '07)

September 2007

New York, NY, USA

71 72

2-s2.0-37849011302

10.1145/1287748.1287761

21.

Ibrahim

Weigle

M. C.

CASCADE: cluster-based accurate syntactic compression of aggregated data in VANETs

Proceedings of the IEEE Globecom Workshops (GLOBECOM '08)

December 2008

New Orleans, La, USA

1 10

2-s2.0-62949138700

10.1109/GLOCOMW.2008.ECP.59

22.

Lochert

Scheuermann

Mauve

A probabilistic method for cooperative hierarchical aggregation of data in VANETs

Ad Hoc Networks 2010 8 5 518 530

2-s2.0-77349088706

10.1016/j.adhoc.2009.12.008

23.

Dietzel

Schoch

Bako

Kargl

A structurefree aggregation framework for vehicular ad hoc networks

Proceedings of the 6th International Workshop on Intelligent Transportation (WIT '09)

March 2009

Hamburg, Germany

24.

Palazzi

C. E.

Pezzoni

Ruiz

P. M.

Delay-bounded data gathering in urban vehicular sensor networks

Pervasive and Mobile Computing 2012 8 2 180 193

2-s2.0-84858149820

10.1016/j.pmcj.2011.06.008

25.

Gong

C.-Z.

Catch-up: a data aggregation scheme for VANETs

Proceedings of the 5th ACM International Workshop on VehiculAr Inter-NETworking (VANET '08)

September 2008

New York, NY, USA

ACM

49 57

2-s2.0-59249099290

10.1145/1410043.1410053

26.

C.-Z.

Guo

Adaptive forwarding delay control for VANET data aggregation

IEEE Transactions on Parallel and Distributed Systems 2012 23 1 11 18

2-s2.0-82555186946

10.1109/TPDS.2011.102

27.

Liu

Chigan

Gao

Compressive sensing based data collection in VANETsdata collection in vanets

Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC '13)

April 2013

Shanghai, China

1756 1761

10.1109/WCNC.2013.6554829

28.

Aneja

Y. P.

Aggarwal

Nair

K. P. K.

Shortest chain subject to side constraints

Networks 1983 13 2 295 302

2-s2.0-0020765680

29.

Brandimarte

Convex optimization

Numerical Methods in Finance and Economics: A MATLAB-Based Introduction 2006

New York, NY, USA

John Wiley & Sons

327 398

10.1002/9780470080498.ch6

30.

Fernandes

Boukerche

Pazzi

Samarah

Efficient data gathering and position dissemination protocols for heterogeneous vehicle ad hoc and sensor networks

Proceedings of the 5th IEEE GCC Conference and Exhibition (GCC '09)

March 2009

Kuwait City, Kuwait

1 4

2-s2.0-79953890403

10.1109/IEEEGCC.2009.5734337

31.

Alves

van Ast

Cong

de Schutter

Babuška

Ant colony optimization for traffic dispersion routing

Proceedings of the 13th International IEEE Conference on Intelligent Transportation Systems (ITSC '10)

September 2010

Funchal, Portugal

683 688

2-s2.0-78650439808

10.1109/ITSC.2010.5625146