A Secure and Efficient Data Aggregation Framework in Vehicular Sensing Networks

Abstract

Vehicular ad hoc networks support a wide range of promising applications including vehicular sensing networks, which enable vehicles to cooperatively collect and transmit the aggregated traffic data for the purpose of traffic monitoring. The reported literatures mainly focus on how to achieve the data aggregation in dynamic vehicular environment, while the security issue especially on the authenticity and integrity of aggregation results receives less attention. In this study, we introduce a basic aggregation scheme which could aggregate the data and the message authentication codes by using syntactic aggregation and cryptographic aggregation. To tolerate duplicate messages and further improve the aggregation performance, we introduce a secure probabilistic data aggregation scheme based on Flajolet-Martin sketch and sketch proof technique. We also discuss the tradeoff between the bandwidth efficiency and the estimation accuracy. Extensive simulations and analysis demonstrate the efficiency and effectiveness of the proposed scheme.

1. Introduction

With the advancement of wireless technology, vehicular communication networks, also known as vehicular ad hoc networks (VANETs), are emerging as a promising approach to increase road safety, efficiency, and convenience [1, 2]. Although the primary purpose of vehicular networks is to enable communication-based automotive safety applications, VANETs also allow a wide range of promising applications such as traffic monitoring and data collecting, which are regarded as an important component of future intelligent transportation systems (ITSs). It is also observed that rising popularity of smartphones with onboard sensors (e.g., GPS, compass, accelerometer) and always-on mobile Internet connections sheds light on using smartphones as a platform for large-scale vehicular sensing. Recent reports report that smartphone users have surpassed feature phone users in the USA by 2012. According to figures released by IDC, 207.6 million Android and Apple smart-phones were shipped in the fourth quarter of 2012. This further renders the possibility of vehicular sensing.

As shown in [3–10], Departments of Transportation in the USA must collect various types of data (e.g., average speed or traffic density) for traffic monitoring purposes. Traditionally, these important data are collected by technologies such as inductive loop detectors (ILDs), video detection systems, acoustic tracking systems, or microwave radar sensors. However, these technologies mostly suffer from a high maintenance cost. On the other hand, cooperative data collection and dissemination in VANETs allow the traffic monitoring performed in a more cost-effective way [11]. Specifically, each vehicle collects its own or neighboring information (e.g., its current speed or neighboring traffic) and then transmits it to the remote roadside units (RSUs) via vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. The RSUs can be deployed at various points of interest along the roadway and can be used to collect data from locations up to tens of kilometers away. In this study, we coin the vehicular networks which are designed for traffic sensing and monitoring as the vehicular sensing networks.

One of the major challenges of vehicular sensing networks is high overhead of transmitted sensing data. Each sensing result is essentially some spatial-temporal measured values (speed, traffic density), which record the position of vehicles (i.e., a road segment or a small area) and the observation time. Such sensing data is periodically broadcasted. Upon reception of such a broadcast, the intermediate receivers/forwarders incorporate the received data into their local reports and then broadcast them again. Unfortunately, such a periodical broadcast brings on a high traffic load or even traffic storm. This problem is more serious in the scenario of high vehicle density, which could be found on multilaned highways in congestion situations. On the other hand, in most cases, drivers or monitors do not need exact individual reports, but only an overview of the general average speed on the road ahead [12]. This motivates the data aggregation issues in vehicular networks, including Flajolet-Martin sketch based probabilistic aggregation [13], fuzzy aggregation [12], and others [14, 15]. However, most of them are mainly focusing on how to achieve the data aggregation in dynamic vehicular environment, while the security issues on the aspect of the authenticity and integrity of aggregation results receive less attention. Since aggregation operation could be made by any intermediate forwarding vehicle, any malicious attacker could easily launch the attacks towards the data aggregation process by modifying the aggregated result or simply inserting invalid sensing data.

Secure data aggregation is a great challenge in vehicular sensing networks due to their unique network characteristics including highly dynamic network topology, intermittent connectivity, and potentially huge numbers of VANET nodes. These unique characteristics make the secure data aggregation in traditional wireless sensor networks such as [16], which always assume either a static network topology or aggregation structure, unsuitable for vehicular sensing networks.

Therefore, to achieve secure and efficient data sensing and collection, in this paper, we present the SAS, a secure data aggregation scheme for vehicular sensing networks which includes the basic scheme and advanced scheme. In the basic scheme, it achieves efficient data and MAC authentication via syntactic aggregation and cryptographic aggregation. However, the basic scheme needs to keep the original sensing data, which prevents a more efficient data aggregation. Further, it cannot work in case of the existence of duplicate messages. Thus, to overcome this problem, we propose an advanced scheme based on Flajolet-Martin sketch and a series of sketch proof techniques. We also discuss the tradeoff between the bandwidth efficiency and the estimation precision. Finally, extensive simulations and analysis demonstrate the efficiency and effectiveness of the proposed scheme.

The remainder of the paper is organized as follows. In Section 2, we introduce the related work. In Section 3, we present the system model and the design goals. In Section 4, we present some preliminaries. In Section 5, we present a secure data aggregation scheme in vehicular sensing networks by using the syntactic aggregation and cryptographic aggregation approach. In Section 6, we propose a probabilistic data aggregation scheme. Performance analysis is given in Section 7, followed by the conclusion in Section 8.

2. Related Work

Vehicular sensing networks represent a promising way to cooperatively collect useful information in order to increase road safety and driver convenience for future intelligent transportation system. By being integrated with the traditional digital map system, vehicular sensing networks provide the functionality of real-time automatic route scheduling [14], decentralized free parking places discovery [15], traffic monitoring [3], and so forth. In these applications, data aggregation is necessary for efficient data propagation and reduced transmission overhead.

There are quite a few research proposals for data aggregation in vehicular sensing networks [14, 15]. Most of them are based on group formulation and vehicle clustering, which can dramatically reduce the communication overhead due to the increased aggregation level. In additional to the above proposals, the structure-free aggregation frameworks are also proposed including Flajolet-Martin sketch-based aggregation [13] and fuzzy aggregation [12] without defining aggregate structures. However, the aforementioned studies focus on the data aggregation itself but do not take the security issues into consideration.

The most related research study for secure data aggregation in VANETs is the voting scheme, including [17, 18], which involves multiple vehicles to collect information towards a specific event (e.g., collision or traffic jam). Each witness (or observer) of this event will submit a message to a group leader. The group leader will take the responsibility of collecting more than a threshold k of proofs from k distinct witnesses to prove the validity of an emergency event by the voting scheme. References [17, 18] discuss how to further improve the aggregation efficiency by exploiting cryptographic tools such as onion signature [18] and aggregate signature [17]. Note that, in this study, we consider a more general data aggregation scenario: collecting data within a certain area and, at the same time, providing security guarantee for the aggregation functionality.

3. System Model and Design Goal

This section describes our system model, attack model, security assumptions, and design goals.

3.1. Network Model

In this paper, we consider a general vehicular sensing network model, which is mainly comprised of three components: traffic monitoring centre (TMC), RSUs, and vehicles. As shown in Figure 1, RSUs could be selectively deployed at some positions (e.g., intersections) to collect the traffic information (e.g., average speed) within a certain area. Due to high maintenance cost, RSUs could be only deployed intermittently to reduce the deployment cost. We assume that each vehicle, which is equipped with an on-board unit (OBU), has the capability of data collecting and reporting. The transmitted sensing data are propagated via V2V and V2I communications to the RSUs, which then forward them to the TMC. SAS is based on the distributed aggregation model similar to [13], which does not require any group/cluster formulation.

Figure 1

Overview of vehicular sensing network.

3.2. Security Assumptions

We assume that each OBU either shares a distinct secret symmetric key with TMC or obtains a public/private key pair, which is issued by TMC. Whether using shared secret key or public key depends on different system requirements.

3.3. Attack Model

In this study, we assume that the TMC and RSUs are trusted while vehicles (including the sensing vehicles and aggregator vehicles) are potentially malicious and can thus launch various attacks including fabricating, duplicating, and computing the aggregation incorrectly. We do not consider denial-of-service attacks where aggregator vehicles fail to or refuse to provide any acceptable result. A malicious sensor can always report an arbitrary sensing report, which fundamentally cannot be prevented. So we do not aim at preventing such an attack.

3.4. Design Goals

(i)

Security Goal. The security goal of SAS is to enable the TMC to verify whether an aggregate sensing report is correct or not. Specifically, TMC should accept a reported aggregate report if and only if it is equal to the output of a correct execution of the aggregation function over all of the sensing reports provided by the qualified vehicles in the most recent epoch.

(ii)

Efficiency and Effectiveness Goal. The efficiency goal of SAS is to minimize the transmission overhead and, at the same time, to ensure a certain sensing accuracy. However, computational cost is not a major concern of this paper since VANET is generally assumed to have unlimited computational capability [17].

4. Preliminaries

4.1. One-Way Chains and MAX Protocols

One-way chain is a widely used cryptographic primitive, which is based on a one-way function F and a secret seed s. The one-way function F is easy to compute but computationally infeasible to invert. The chain has the sequence of values $F (s)$ , $F (F (s))$ , $F (F (F (s)))$ , …. Throughout this paper, we use $F^{x} ()$ to denote recursively applying the function F for x times. Thus, the xth value of the sequence is $F^{x} (s)$ . For example, given two positive integers m and n, where $m < n$ , it is easy to compute the $F^{n} (s)$ by functioning forward the value of $F^{m} (s)$ for $(n - m)$ times with the function F. However, it is infeasible to compute the value of $F^{m} (s)$ by functioning backward the value of $F^{n} (s)$ . One-way chain has been widely used in many security topics such as micropayment. Recently, in [16], the authors take advantage of one-way chains to construct a MAX protocol, which could ensure the aggregated maximum message cannot be inflated or deflated. However, MAX protocol is not designed for probabilistic aggregation. Further, the network topology considered in [16] is for sensor networks with statistic network topology. In SAS, what we consider is a dynamic network topology and probabilistic aggregation model.

4.2. Pairing Technique

The proposed basic scheme is based on bilinear pairing which is briefly introduced as below. Let 𝔾 be a cyclic additive group and $𝔾_{T}$ a cyclic multiplicative group of the same prime-order q; that is, $| 𝔾 | = | 𝔾_{T} | = q$ . Let g be a generator of 𝔾 and $e : 𝔾 \times 𝔾 \to 𝔾_{T}$ an efficient admissible bilinear map with the following properties: (i)

bilinear: for $a, b \in ℤ_{q}^{*}$ , $e (g^{a}, g^{b}) = e (g, g)^{a b}$ ;

(ii)

nondegenerate: $e (g, g) \neq 1$ .

4.3. Aggregate Signature and Batch Verification

The major computation cost for authenticating an emergency message comes from verifying a set of supporting signatures issued by different emergency witnesses. The corresponding public key certificates of the signers also need to be verified together. All of them will incur a significant amount of transmission and verification cost. In this study, we use aggregate signature to reduce the transmission cost of supporting signatures, certificates, and batch verification to realize efficient signature verification.

An aggregate signature is a digital signature that supports aggregation of n distinct signatures issued by n distinct signers to a single short signature [19]. This single signature (and the n original messages) will convince the verifier that the n signers indeed sign the n original messages. In addition to the benefit of the reduced transmission size, aggregate signature technique supports batch verification, which enables the receivers to quickly verify a set of digital signatures on different messages by different signers. In this study, we adopt the aggregate signature and batch verification introduced in [20] as our basic cryptographic aggregation technique to improve the aggregation performance.

5. A General Secure Data Aggregation Framework in Vehicular Sensing Networks

In this section, we introduce a general data aggregation framework in vehicular sensing networks by using the syntactic aggregation and cryptographic aggregation approach.

5.1. System Setup

The TMC generates a tuple $(q, g, 𝔾, 𝔾_{T}, e)$ as the system parameters. The TMC selects a random $s k \in ℤ_{q}^{*}$ as its secret key and generates its public key $p k = g^{s k}$ , by which four hash functions are formed: $H : {0,1}^{*} \to 𝔾$ , $H_{1} : {0,1}^{*} \to 𝔾$ , $H_{2} : {0,1}^{*} \to 𝔾$ , $H_{3} : {0,1}^{*} \to ℤ_{q}$ . The group public key and secret key are $(q, g, 𝔾_{1}, 𝔾_{T}, e, p k, H, H_{1}, H_{2}, H_{3})$ and $s k$ , respectively.

An important task of the setup procedure is to determine the format of emergency report message. In our study, the format of a secure sensing report (SSR) is defined as follows. For a sensed event, the sensor vehicle i will generate an SSR:

\begin{matrix} SS R_{i} = (I D_{i}, Type #, v_{i}, Loc #, epoch #, MA C_{i}, Cer t_{i}), \end{matrix}

(1)

where

I D_{i}

denotes the identity of the vehicle that generates the claim.

Type #

denotes the type of SSR reported in this report.

v_{i}

denotes the sensing value provided by i.

Loc #

denotes the sensing area.

epoch #

denotes the sensing period.

MA C_{i}

denotes the message authentication code generated by vehicle i on this SSR. It has two modes: symmetric key mode (Mode I) or public key mode (Mode II).

Cer t_{i}

denotes the certificate held by vehicle i.

For a specific event, it is reasonable to assume that the relevant SSRs will share the same $Type #$ , $Loc #$ , and $epoch #$ .

5.2. Registration

A vehicle can join the network by performing the following step depending on Mode I or Mode II. (1)

Private Key Generation for Mode I. In the symmetric key mode, a vehicle i can randomly choose $x_{i}$ as its secret key.

(2)

Private/Public Key Generation for Mode II. In the public key mode, a vehicle can randomly choose $x_{i} \in ℤ_{q}^{*}$ as its secret key and generate its public key $X_{j} = g^{x_{j}}$ . After ensuring the legitimacy of this vehicle, TMC will issue the public key certificate by signing its signature on $(i, X_{i})$ . Here, the certificate generation process follows a typical Boneh, Lynn, and Shacham signature scheme in [19]. TMC computes $h_{i} \leftarrow H (i | | X_{i})$ and $σ_{i} \leftarrow h_{i}^{x_{i}}$ . $Cer t_{i} = (i, X_{i}, σ_{i})$ is the public key certificate of i. The verification of public key certificate could be as follows. Given a vehicle's public key certificate $Cer t_{i}$ , $h_{i} \leftarrow H (i | | X_{i})$ can be computed, and it is accepted if $e (σ_{i}, g) = e (h_{i}, p k)$ .

5.3. SSR Generation and Broadcasting

Once an event is sensed by one or multiple vehicles and the observation is $(Type #, Loc #, epoch #)$ , the sensing vehicles $i | i = 1,2, \dots$ may independently generate their SSRs as follows. (1)

Mode I SSR Generation. In terms of Mode I SSR generation, given the type and observation time of the emergency message $TL = Type # | | epoch #$ as well as the location information $ℓ = Loc #$ , a witness vehicle i with its private key $x_{i}$ could compute message authentication code as follows:

\begin{matrix} MA C_{i} = H (x_{i}, i, Type #, Loc #, epoch #) . \end{matrix}

(2)

Thus,

(i, Type #, Loc #, epoch #, MA C_{i})

constitutes an SSR claim generated by vehicle i towards the sensing event. After that, i will broadcast this SSR to its neighbors.

(2)

Mode II SSR Generation. For Mode II SSR, given the type and observation time of the emergency message $TL = Type # | | epoch #$ as well as the location information $ℓ = Loc #$ , a witness vehicle with its public and private key pairs $(X_{j}, x_{j})$ can compute $w_{i} \leftarrow H_{3} (TL | | ℓ)$ , $a \leftarrow H_{1} (ℓ)$ , $b \leftarrow H_{2} (ℓ)$ and generate the signature $MA C_{i} = a^{x_{i}} b^{x_{i} w_{i}}$ . Thus, $(i, Type #, Loc #, epoch #, MA C_{i}, Cer t_{i})$ constitutes an SSR claim generated by vehicle i towards the sensing event. After that, i will broadcast this SSR to its neighbors.

A single SSR verification can be performed as follows: given $SSR = (i, Type #, Loc #, epoch #, MA C_{i}, Cer t_{i})$ , the verifier will first check the validity of certificate included in this SSR. After that, it can check the validity of supporting signature by computing $w_{i} \leftarrow H_{3} (TL | | ℓ)$ , $a \leftarrow H_{1} (ℓ)$ , $b \leftarrow H_{2} (ℓ)$ . It is accepted if $MA C_{i} = a^{x_{i}} b^{x_{i} w_{i}}$ .

5.4. SSR Opportunistic Forwarding

In VANETs, the network topology could be very dynamic and diversified in shape from time to time, even sometimes sparse and frequently partitioned. The communication between vehicles is expected to be performed in an opportunistic manner. This means a vehicle can carry packets when routes do not exist but forward the packets to the new receivers when they move into its vicinity [21]. To enable the opportunistic data propagation, vehicles that are within a range r and maintain connectivity for a minimum time t with each other can be arranged to form a cluster. The detailed discussion on cluster creation and maintenance can be found in [21]. We refer to the node at the head of every cluster as header, which is responsible for forwarding the data to the next cluster in a typical opportunistic data forwarding algorithm such as [21, 22]. The messages will be buffered at the header until they are forwarded to the next cluster, which is also referred to as the “Carry and forward” strategy. In this study, it is considered that the header can also play the role of emergency message aggregator because of the following two reasons. (1)

If taking a header of a cluster as the aggregator, the aggregation process will be merged into a part of data forwarding process. Therefore, there is no need to elect another cluster head to perform the data aggregation operations.

(2)

The process of message propagation between two clusters is referred to as a catch-up process, where a message traverses along with its carrying vehicles until it reaches within the radio range of the vehicle at the end of another cluster, which obviously presents a considerable propagation interval depending on the speed of vehicles and the gap between clusters. Therefore, we can use such an interval to aggregate the related emergency messages to minimize the aggregation latency.

In the following sections, a cluster head will be taken as the aggregator of the cluster, which will perform the following SSR aggregated authentication algorithm.

5.5. SSR Secure Aggregation

For any specific emergency event, each aggregator maintains two local message lists, which keep the forwarded SSRs and ReadytoForward SSRs, respectively. The forwarded message list, denoted as ℱ, contains all the SSRs which have been forwarded by this vehicle before, while the ReadytoForward message list, denoted as ℛ, stores messages which have not been transmitted but can be forwarded some time later. The $SSRs$ set $ℱ \cup ℛ$ includes all the SSRs related to a specific event. Whenever receiving an SSR, the aggregator should check if this SSR is a duplicate. If yes, such an SSR will be dropped; otherwise it will be put into the message list R. Before the forwarded propagation, the aggregator will perform the SSR aggregation (or Aggregate_SSR) and SSR batch verification (BatchVerify_SSR) operations as follows.

5.5.1. SSR Aggregation

Aggregate_SSR is used to aggregate multiple SSRs into a single SSR, which includes two steps: syntactic aggregation step and cryptographic aggregation step. (i)

Syntactic Aggregation. For a specific event, given n SSRs $(i, Type #, Loc #, epoch #, MA C_{i}, Cer t_{i})$ by vehicles $1, \dots, n$ , we can obtain syntactically aggregated SSR as $SS R_{agg} = (1, \dots, n, Type #, Loc #, epoch #, MA C_{1}, \dots, MA C_{n}, Cer t_{1}, \dots, Cer t_{n})$ .

(ii)

MAC Aggregation. It is used to aggregate multiple MACs into a single MAC, which includes the following two modes: Mode I and Mode II.

(1)

Mode I Aggregation. Mode I aggregation is

\begin{array}{l} MA C_{agg} = H (x_{1}, 1, Type #, Loc #, epoch #) \\ \otimes \dots \otimes H (x_{n}, n, Type #, Loc #, epoch #), \end{array}

(3)

where ⊗ can be XOR operation.

(2)

Mode II Aggregation. Mode II aggregation includes the certificate aggregation $Cer t_{agg} \leftarrow (i, X_{i}, σ_{agg})$ and MAC aggregation $σ_{agg} \leftarrow \prod_{i = 1}^{n} ‍ Cer t_{i}$ . $MA C_{agg} \leftarrow \prod_{i = 1}^{n} ‍ MA C_{i}$ .

After syntactic aggregation and cryptographic aggregation, we can obtain the aggregated SER as $SS R_{agg} = (1, \dots, n, Type #, Loc #, epoch #, MA C_{agg}, Cer t_{agg})$ .

5.5.2. SSR Batch Verification

In this section, we exploit batch verification to further reduce the computational cost. (i)

Mode I Verification. For Mode I verification, TMC could verify the sensing reports by verifying the following equations:

\begin{array}{l} MA C_{agg} = H (x_{1}, 1, Type #, Loc #, epoch #) \\ \otimes \dots \otimes H (x_{n}, n, Type #, Loc #, epoch #) . \end{array}

(4)

(ii)

Mode II Verification. For Mode II verification, TMC could perform the certificate batch verification as well as signature batch verification.

(1)

Certificate Batch Verification. Given an aggregated certificate $Cer t_{agg} \leftarrow (i, X_{i}, σ_{agg})$ , the verifier accepts if $e (\prod_{i = 1}^{n} σ_{i}, g) = e (\prod_{i = 1}^{n} h_{i}, p k)$ holds.

(2)

Signature Batch Verification. Given $MA C_{agg}$ , the message set ${SSR}_{i} | 1 \leq i \leq n$ and public keys $X_{i} | \leq i \leq n$ for all the vehicles in set 𝒱 accept if $e (MA C_{agg}, g) = e (a, \prod_{i = 1}^{n} X_{i}) \times e (b, \prod_{i = 1}^{n} X_{i}^{w_{i}})$ .

If the batch verification holds, the aggregator will accept SSRs in list ℛ as valid SSRs. Then the aggregated SSR in ℛ will be forward propagated. Meanwhile, the aggregator will put all the SSRs in ℛ to message list ℱ.

However, the previous proposed solution may face the following two problems. Firstly, it need to carry the original input of each sensing node for future verification. This is because MACs authentication requires the original input. Secondly, the duplicated message should be carefully removed from the aggregation; otherwise many of them will be aggregated for several times. This point is difficult to prevent in the context of VANET, which is a typically dynamic and distributed environment. In the next section, we will introduce a probabilistic data aggregation scheme which could automatically filter duplicate messages.

6. A Probabilistic Data Aggregation Scheme for Vehicular Sensing Networks

In this section, we firstly introduce the concept of FM sketch, which is the foundation of probabilistic data aggregation in vehicular networks. We then propose a secure data aggregation scheme based on our proposed sketch proof technique.

6.1. FM Sketches-Based Data Aggregation in VANETs

A Flajolet-Martin sketch (or “FM sketch”) is a data structure for probabilistic counting of distinct elements that has been introduced in [23]. FM sketch represents an approximation of a positive integer by a bit field $s = s_{1}, \dots, s_{w}$ of length w, where $w \geq 1$ . The bit field is initialized to zero at all positions. To add an element x to the sketch, it is hashed by a hash function h with geometrically distributed positive integer output, where $P (h (x) = i) = 2^{- i}$ . The entry $s_{h (x)}$ is then set to one. After processing all objects, FM finds the first bit of the sketch that is still 0. Let the position of this bit be k; then the number of distinct objects is estimated as $n = 1.29 \times 2^{k}$ .

The variance of n is quite significant [13], and thus, the approximation is not very accurate. To overcome this, instead of using only one sketch, a set of sketches can be used to represent a single value to achieve trade-off between the accuracy and memory. The respective technique is called probabilistic counting with stochastic averaging (PCSA) in [23]. With PCSA, each added element is first mapped to one of the sketches by using an equally distributed hash function, and it is then added there. If m sketches are used, denoted by $S_{1}, \dots, S_{m}$ , let $a_{1}, a_{2}, \dots, a_{m}$ be the positions of the first 0 in the m sketches, respectively; the estimate for the total number of distinct items added is then given by $n = 1.29 \times 2^{k_{a}}$ , where $k_{a} = (1 / m) \sum_{i = 1}^{m} (a_{i})$ .

Sketches can be merged to obtain the total number of distinct elements added to any of them by a simple bitwise OR. Important here is that, by their construction, repeatedly combining the same sketches or adding already present elements again does not change the results, no matter how often or in which order these operations occur. FM sketch summaries are naturally composable: simply OR-ing independently built bitmaps (e.g., over data sets $a_{1}$ and $a_{2}$ ) for the same hash function gives precisely the sketch of the union of the underlying sets (i.e., $a_{1} \cup a_{2}$ ). This makes FM sketches ideally suited for VANET aggregation.

For the purpose of discussion, let us consider a specific application. Assume that we are interested in monitoring the average speed within a certain area. As the first step, we use a sketch for each road segment and approximate the sum of speeds of vehicles within this road segment. For the second step, we will calculate the average speed by dividing the speed sum by the number of vehicles involved. In the following sections, we will discuss how to generate the sketch proof and secure sketch aggregation.

6.2. Sketch Proof Generation

According to the FM sketch definition, given the ID i and speed $v_{i}$ , a vehicle may add the tuples $(i, 1), \dots, (i, v_{i})$ to the sketch by hashing them and setting the respective bit position $h (i, 1), \dots, h (i, v_{i})$ to 1. The malicious attackers may launch two kinds of attacks towards the FM sketch: inflation attack and deflation attack.

We start from three basic pieces of information that each sensor generates in our protocol. Let $Λ^{i} = {ℓ_{1}, \dots, ℓ_{v_{i}}}$ denote $v_{i}$ 1-bit positions generated by i. Given that $ψ_{i}$ is the position of first 0-bit, $Λ^{i}$ could be represented as the union of two subsets $Λ_{ψ_{i}}^{i} = {1, \dots, ψ_{i} - 1}$ and $\bar{Λ_{ψ_{i}}^{i}} = {ℓ_{ψ_{i}}, \dots, ℓ_{v_{i}}}$ , where $ℓ_{ψ_{i}}$ represents the first 1-bit larger than $ψ_{i}$ . Thus, each vehicle i generates (1)

$s_{i}^{+} = {i, ψ_{i}, Loc #, epoch #, MA C_{K_{i}} (ω | | Loc # | | epoch #) | ω \in Λ_{ψ_{i}}^{i}}$ , which is called vehicle i's inflation-free proof. Here, $Loc #$ and epoch# refer to the road segment number and time slot number, respectively.

(2)

$s_{i}^{-} = MA C_{K_{i}} (Loc # | | epoch #)$ , which is called vehicle i's deflation-free proof. This is basically the authentication code generated by the vehicle on the common information $Loc #, epoch #$ .

(3)

$s_{i}^{\times} = {\bar{Λ_{ψ_{i}}^{i}}, MA C_{K_{i}} (ω | | Loc # | | epoch #) | ω \in \bar{Λ_{ψ_{i}}^{i}}}$ , which is called vehicle i's supplement security proof.

In the following, we will introduce these three security proofs one by one.

6.2.1. Inflation-Free Proof

Inflation-free proof is basically the authentication code generated by the vehicles on the 1-bit positions, which are smaller than the position of first 0. To prevent the inflation attacks, it is sufficient to require that each 1-bit, whose position is less than $ψ_{i}$ , should be authenticated by a single signed value from one of the sensing vehicles that turn it on. We define two extra operations for inflation-free proofs. (i)

Merging Operation ⊕. Consider two sketches $Λ^{i}$ and $Λ^{j}$ (for simplicity of presentation, we assume $ψ_{i} > ψ_{j}$ ). Let $ψ_{m}$ be the globally maximum value of first 0-bit after sketch merging, which corresponds to the new $Λ_{ψ_{m}} = {1, \dots, ψ_{m} - 1}$ and $\bar{Λ_{ψ_{m}}} = Λ^{i} \cup Λ^{j} ∖ Λ_{ψ_{m}}$ . We define

\begin{matrix} \oplus_{ω = i, j} s_{ψ_{w}}^{+} = s_{ψ_{i}}^{+} \cup s_{i}^{\times} (Λ_{ψ_{m}}) \cup s_{j}^{\times} (Λ_{ψ_{m}}), \end{matrix}

(5)

where

s_{i}^{\times} (Λ_{ψ_{m}})

is the operation that picks up all the supplement security proof whose positions are less than

ψ_{m}

. In other words, to generate inflation-free proof for the merged sketches, the aggregator could first pick up the inflation-free proof

s_{ψ_{i}}^{+}

of the sketch with a higher 0-bit position

ψ_{i}

. For the remaining 1-bit positions

ψ_{i}, \dots, ψ_{m} - 1

, the aggregator could pick up the inflation-free proofs either from

s_{i}^{\times}

s_{j}^{\times}

. Note that, if a 1-bit is authenticated by multiple MACs generated by multiple vehicles, aggregators could choose inflation-free proof of vehicles with a lower ID.

(ii)

Aggregation Operation ⊗. The MACs of $s_{i}^{+}$ could be further aggregated. For example, if MAC is generated by symmetric key-based hash function (e.g., MD5 or SHA-1), then ⊗ can be simple XOR; if MAC is signatures, $\otimes$ could be achieved by using aggregate signature technique such as [19].

With merging operation and aggregation operation, size of inflation-free proof could be minimized to $| ID | * N_{1 - bit} + | MAC |$ , where $| ID |$ and $| MAC |$ refer to the size of vehicle ID and MAC, respectively, and $N_{1 - bit}$ denotes the number of 1-bits.

6.2.2. Deflation-Free Proof

Deflation attack is defined as that the malicious aggregators may try to turn 1-bits into 0-bits, removing the corresponding MACs from the security proofs. To prevent deflation attack, SAS adopts the hash-chain-based MAX protocol, which is introduced in [16]. The basic idea is to construct one-way chains whose seeds are all the $s_{i}^{-}$ . Specifically, given the one-way function $F ()$ , vehicle node i reports to the aggregator $F^{ψ_{0}} (s_{i}^{-})$ . In a case of multiple sketch aggregation, let $ψ_{m}$ be the maximum positions observed by the aggregator. The aggregator can obtain $F^{ψ_{m}} (s_{i}^{-})$ by performing hash operations on $F^{ψ_{0}} (s_{i}^{-})$ by $ψ_{m} - ψ_{0}$ times. After obtaining all the $F^{ψ_{m}} (s_{i}^{-})$ , a new operation is introduced in [16] to reduce the transmission cost, which is shown as follows. (i)

Hash Chain Folding Operation ⊙. The aggregator could use the folding function ⊙ to fold all the hash chains into a single one $⊙ F^{ψ_{m}} (s_{i}^{-})$ . Obviously, due to adoption of one-way function, it is impossible for the attackers to generate a new security proof for $ψ_{i} < ψ_{m}$ , which prevents the deflation attack.

Note that one-way chains should be rolled forward even after they have been folded together with an operation like ⊙. Therefore, it requires the one-way function to achieve homomorphic property in that $F (x_{1} ⊙ x_{2}) = F (x_{1}) ⊙ F (x_{2})$ . There is a wide range of cryptographic tools such as RSA encryption that could support such kind of homomorphic property. In this case, ⊙ could be defined as modular multiplication.

The size of deflation-free proof is a constant number $| F () |$ , which represents the size of one-way function output. If choosing $RSA$ as the cryptographic tool, $| F () | = 1024$ .

6.2.3. Supplement Security Proof

Supplement security proof enables the aggregator to derive the new inflation-free proof when $ψ_{0}$ changes because of the merge of sketches. Therefore, SAS records all 1-bits whose positions are larger than $ψ_{m}$ and their corresponding $MAC$ s as the supplement security proof. Since they are not continuous, supplement security proof cannot be aggregated. Further, we denote $s_{i}^{\times} (\bar{Λ_{ψ_{m}}})$ as the set of all the supplement security proofs whose positions are not less than $ψ_{m}$ .

6.3. Sketch Proof Aggregation

As shown in Figure 2, multiple sketches could be aggregated during their propagation process, and sketch proofs could be aggregated along with sketches merging. Without loss of generality, we discuss aggregation algorithm only for two sketch proofs, and more than two sketch aggregations can be aggregated by applying it for multiple times.

Figure 2

Sketch generation and sketch proof.

Consider two sketches $Λ^{i}$ and $Λ^{j}$ and their corresponding sketch proofs $s_{i}^{+}, s_{i}^{-}, s_{i}^{\times}$ and $s_{j}^{+}, s_{j}^{-}, s_{j}^{\times}$ . Let $ψ_{m}$ be the globally maximum value of first 0-bit after sketch merging. The sketch proofs could be aggregated by performing the following steps: (i)

inflation-free proof aggregation: $\otimes (\oplus_{ω = i, j} s_{ψ_{w}}^{+})$ ;

(ii)

deflation-free proof aggregation: $⊙_{ω = i, j} F^{ψ_{m}} (s_{ω}^{-})$ ;

(iii)

supplement security proof updating:

\begin{matrix} s_{i}^{\times} (\bar{Λ_{ψ_{m}}}) \cup s_{j}^{\times} (\bar{Λ_{ψ_{m}}}) . \end{matrix}

(6)

Note that such a sketch proof aggregation process could be performed fully distributed, which means it naturally supports hierarchical aggregation, while it does not require any aggregation architecture.

6.4. Sketch Proof Verification and Average Calculation

After the aggregation results and the security proof arrive at the TMC, TMC should verify the correctness of the inflation-free proof and deflation-free proof. To check the validity of inflation-free proof, TMC should perform the following operations in different MAC modes. (i)

Symmetric Key Mode. In this mode, TMC should recalculate the MAC of each 1-bit and then aggregate them into a single one. After that, TMC should check if the obtained result is equal to the received one.

(ii)

Signature Mode. In this mode, TMC could batch verify the aggregated signatures by performing batch verification technique [19].

To verify the correctness of deflation-free proof, it needs to compute all individual $s_{ω}^{-}$ and fold them together to create the $⊙_{ω = 1,2, \dots} F^{ψ_{m}} (s_{ω}^{-})$ . The answer is accepted if and only if the calculated result is equal to the received one. Finally, by obtaining the $ψ_{m}$ , the average speed could be computed as follows:

\begin{matrix} spee d_{average} = 1.29 \times \frac{2^{ψ_{m}}}{N_{ID}}, \end{matrix}

(7)

where

N_{ID}

refers to the number of vehicles involved. Similar to the original FM sketch, the accuracy of this average speed estimation could be further improved by introducing multiple sketches.

6.5. Further Discussion

In this subsection, we give an extended discussion on some issues closely related to the proposed SAS protocol.

6.5.1. Symmetric Key versus Asymmetric Key

As we have mentioned in Section 3, MAC in this study represents two modes: symmetric key-based mode and asymmetric key- (or signature-) based mode. Generally speaking, different MAC modes have different advantages as well as disadvantages.

From the performance point of view, symmetric key-based MAC has the advantage on asymmetric key-based approach in that it has shorter size and will not introduce the computational expensive operations. Symmetric key-based MAC is expected to play an important role in the vehicular sensing applications where sensing information is directly sent to the TMC since they could be processed faster than signature-based approach and also introduce less transmission overhead. However, on-path vehicles cannot verify an MAC's authenticity since only TMC shared the key with MAC generator. On the other hand, signature-based approach could provide many extra features such as nonrepudiation and public authentication. In the context of vehicular sensing networks, it means the aggregated information could be verified by any on-path vehicles, which allows the drivers to have fast access to the authenticated traffic information instead of waiting for the response of the RSUs.

6.5.2. Size of Sketch Proofs

There are three kinds of sketch proofs for SAS. The first two sketch proofs including inflation-free proof and deflation-free proof could be aggregated and thus introduce a minimized transmission overhead. The third sketch proof, supplement security proof, does not support proof aggregation since they will be merged with inflation-free proof in the future. This means supplement security proof may incur a higher transmission overhead. However, we argue that size of supplement security proof is still acceptable in that, during the aggregation process, size of supplement security proof will decrease along with the increase of first 0-bit position $ψ_{m}$ . In the performance evaluation part, we will give a more detailed discussion on the size of sketch proofs.

7. Performance Evaluations

In this section, we evaluate the performance of the proposed SAS in terms of the resultant communication cost and approximate accuracy. To demonstrate the superiority of SAS, we also compare SAS with nonaggregation transmission approach. In this part, we consider SHA-1 as the building blocks of MAC. Note that asymmetric key-based MAC mode will have a similar communication cost if we choose short aggregate signature as the building blocks.

7.1. Transmission Overhead

One of the major advantages of SAS is the reduction of its transmission cost. The communication cost is determined by the size of aggregated security proof including inflation-free proof, deflation-free proof, and supplement security proof. Note that, since MAC in this study represents two modes: symmetric key-based mode and asymmetric key- (or signature-) based mode, here we only discuss the symmetric key-based MAC due to page limitation. As a typical example, we choose the 64-bits SHA-1 as the basic MAC technique and RSA-1028 as the basic one-way function tool. Table 1 summarizes the size of different components as well as the overall transmission overhead for nonaggregation transmission and SAS transmission. Here, we consider the worst case of our aggregation in that the size of supplement security proofs is bounded by $\log_{2} (v_{\max} \times n)$ [13], where $v_{\max}$ is the maximum speed for this road segment while n is maximum number of vehicles in this area. However, it is important to point out that, in practice, the size for supplement security proof should be much less than this bound since it will decrease along with the aggregation.

Table 1

The size of each component of SAS (bytes).

	No SAS	SAS
T&L	$8 \times n$	8
ID	$8 \times n$	$8 \times n$
Data v	$8 \times n$	0
${Sketch}_{i}$	0	$8 \times \log_{2} (v_{\max} \times n)$
Sketch proofs	$8 \times n$	$8 \times \log_{2} (v_{\max} \times n) + 136$
Total size	$32 \times n$	$8 \times n + 16 \times \log_{2} (v_{\max} \times n) + 144$

By choosing different number of sketches, we obtain the different communication cost of SAS under different vehicle numbers as well as different sketch numbers, which has been shown in Figure 3. It is observed that the probabilistic aggregation does not show its advantage when the number of vehicles is small. However, when the number of vehicles grows, the proposed SAS aggregation scheme could dramatically reduce the communication cost when the sketch number is small. It is also observed that the number of sketches plays an important role for the overall system performance in that a small sketch number such as 4 makes the proposed SAS have a better performance while, when the sketches number is large such as 16, the advantage is not so obvious. Therefore, if an acceptable accuracy is guaranteed, the number of sketches should be as small as possible to achieve a better performance. In the next section, we will discuss the tradeoff of accuracy and the number of sketches.

Figure 3

Transmission overhead of various secure FM sketches.

7.2. Tradeoff of the Accuracy and Number of Sketches

According to [13], PCSA yields a standard error of approximately $0.78 / \sqrt{m}$ . By choosing different sketch numbers, we can obtain the corresponding standard error, which has been plotted in Figure 4. It is observed that the standard error decreases dramatically along with the increase of number of sketches in the beginning while it stays relatively stable after a specific threshold (e.g., 4 in Figure 4). However, as we pointed out in the previous section, in the vehicular sensing networks, a small number of sketches (e.g., 4) guarantee an acceptable standard error (e.g., $0.39$ ). This further demonstrates the effectiveness of the proposed SAS.

Figure 4

Standard error of SAS secure sketch.

8. Conclusion and Future Work

Vehicular sensing networks have been envisioned to play an important role for future traffic monitoring applications. In this study, we propose a secure and efficient aggregation method based on FM sketch and security proofs techniques. The extensive performance evaluations have demonstrated the efficiency and effectiveness of the proposed scheme. Our future work includes implementing SAS in a specific application scenario and evaluating its performance with more realistic simulations or even experiments.

Footnotes

Acknowledgments

The correspondence author of this paper is H. Zhu. This research is supported by the National Natural Science Foundation of China (Grant no. 61003218, 70971086, 61272444, 61161140320, and 61033014), the Doctoral Fund of Ministry of Education of China (Grant no. 20100073120065), the JSPS A3 Foresight Program, and the NEC C&C Foundation.

References

Zhu

Lin

Shi

P. H.

Shen

PPAB: a privacy-preserving authentication and billing architecture for metropolitan area sharing networks

IEEE Transactions on Vehicular Technology 2009 58 5 2529 2543

2-s2.0-66449108023

10.1109/TVT.2008.2007983

Zhu

Gao

Dong

Cao

A probabilistic misbehavior detection scheme towards efficient trust establishment in delay-tolerant networks

IEEE Transactions on Parallel and Distributed Systems 2013

10.1109/TPDS.2013.36

Fontaine

Traffic monitoring

Vehicular Networks from Theory to Practice 2009

CRC Press

Zhu

Lin

Fan

Shen

SMART: a secure multilayer credit-based incentive scheme for delay-tolerant networks

IEEE Transactions on Vehicular Technology 2009 58 8 4628 4639

2-s2.0-70350238158

10.1109/TVT.2009.2020105

Zhu

Shen

Lin

Security in service-oriented vehicular networks

IEEE Wireless Communications 2009 16 4 16 22

2-s2.0-70350330400

10.1109/MWC.2009.5281251

Zhu

Lin

P. H.

Shen

X. S.

SLAB: a secure localized authentication and billing scheme for wireless mesh networks

IEEE Transactions on Wireless Communications 2008 7 10 3858 3868

2-s2.0-55149100626

10.1109/T-WC.2008.07418

Liu

Zheng

Zhang

Chen

Shen

Secure and energy-efficient disjoint multipath routing for WSNs

IEEE Transactions on Vehicular Technology 2012 61 7 3255 3265

10.1109/TVT.2012.2205284

Liu

Jin

Cui

Chen

Deployment guidelines for achieving maximal lifetime and avoiding energy holes in sensor network

Information Sciences 2013 230 197 226

10.1016/j.ins.2012.12.037

Liu

Ren

Chen

Shen

Design principles and improvement of cost function based energy aware routing algorithms for wireless sensor networks

Computer Networks 2012 56 7 1951 1967

10.

Liu

Zhang

Chen

Theoretical analysis of the lifetime and energy hole in cluster based wireless sensor networks

Journal of Parallel and Distributed Computing 2011 71 10 1327 1355

11.

Arbabi

M. H.

Weigle

Using vehicular networks to collect common traffic data

Proceedings of the 6th ACM International Workshop on Vehicular Ad Hoc Networks (VANET '09)

2009

Beijing, China

117 118

10.1145/1614269.1614289

12.

Dietzel

Bako

Schoch

Kargl

A fuzzy logic based approach for structure-free aggregation in vehicular ad-hoc networks

Proceedings of the 6th ACM International Workshop on Vehicular Ad Hoc Networks (VANET '09)

2009

Beijing, China

79 88

10.1145/1614269.1614283

13.

Lochert

Scheuermann

Mauve

Probabilistic aggregation for data dissemination in VANETs

Proceedings of the 4th ACM International Workshop on Vehicular Ad Hoc Networks (VANET '07)

September 2007

1 8

2-s2.0-37849005338

10.1145/1287748.1287750

14.

Nadeem

Dashtinezhad

Liao

Iftode

TrafficView: traffic data dissemination using car-to-car communication

Mobile Computing and Communications Review 2004 8 3 6 19

10.1145/1031483.1031487

15.

Caliskan

Graupner

Mauve

Decentralized discovery of free parking places

Proceedings of the 3rd ACM International Workshop on Vehicular Ad Hoc Networks (VANET '06)

September 2006

New York, NY, USA

ACM

30 39

2-s2.0-34247386346

10.1145/1161064.1161070

16.

Nath

Chan

Secure outsourced aggregation via one-way chains

Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD '09)

2009

Providence, RI, USA

31 44

17.

Zhu

Lin

P. H.

Shen

AEMA: an aggregated emergency message authentication scheme for enhancing the security of vehicular ad hoc networks

Proceedings of IEEE International Conference on Communications (ICC '08)

May 2008

Beijing, China

1436 1440

10.1109/ICC.2008.278

18.

Raya

Aziz

Hubaux

J. P.

Efficient secure aggregation in VANETs

Proceedings of the 3rd ACM International Workshop on Vehicular Ad Hoc Networks (VANET '06)

September 2006

67 75

2-s2.0-34247336902

10.1145/1161064.1161076

19.

Boneh

Lynn

Shacham

Short signatures from the weil pairing

Journal of Cryptology 2004 17 4 297 319

2-s2.0-23044435711

10.1007/s00145-004-0314-9

20.

Camenisch

Hohenberger

Pedersen

Batch verification of short signatures

Advances in Cryptology (EUROCRYPT '07) 2007 4515

New York, NY, USA

Springer

246 263 Lecture Notes in Computer Science

2-s2.0-38049146172

10.1007/978-3-540-72540-4_14

21.

Little

T. D. C.

Agarwal

An information propagation scheme for VANETs

Proceedings of the 8th International IEEE Conference on Intelligent Transportation Systems

September 2005

155 160

2-s2.0-33747386499

10.1109/ITSC.2005.1520039

22.

Fujimoto

Guensler

Hunter

MDDV: a mobility-centric data dissemination algorithm for vehicular networks

Proceedings of the 1st ACM International Workshop on Vehicular Ad Hoc Networks (VANET '04)

October 2004

47 56

2-s2.0-14944351780

23.

Flajolet

Martin

G. N.

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences 1985 31 2 182 209

2-s2.0-0020828424