Abstract
Vehicular ad hoc networks support a wide range of promising applications including vehicular sensing networks, which enable vehicles to cooperatively collect and transmit the aggregated traffic data for the purpose of traffic monitoring. The reported literatures mainly focus on how to achieve the data aggregation in dynamic vehicular environment, while the security issue especially on the authenticity and integrity of aggregation results receives less attention. In this study, we introduce a basic aggregation scheme which could aggregate the data and the message authentication codes by using syntactic aggregation and cryptographic aggregation. To tolerate duplicate messages and further improve the aggregation performance, we introduce a secure probabilistic data aggregation scheme based on Flajolet-Martin sketch and sketch proof technique. We also discuss the tradeoff between the bandwidth efficiency and the estimation accuracy. Extensive simulations and analysis demonstrate the efficiency and effectiveness of the proposed scheme.
1. Introduction
With the advancement of wireless technology, vehicular communication networks, also known as vehicular ad hoc networks (VANETs), are emerging as a promising approach to increase road safety, efficiency, and convenience [1, 2]. Although the primary purpose of vehicular networks is to enable communication-based automotive safety applications, VANETs also allow a wide range of promising applications such as traffic monitoring and data collecting, which are regarded as an important component of future intelligent transportation systems (ITSs). It is also observed that rising popularity of smartphones with onboard sensors (e.g., GPS, compass, accelerometer) and always-on mobile Internet connections sheds light on using smartphones as a platform for large-scale vehicular sensing. Recent reports report that smartphone users have surpassed feature phone users in the USA by 2012. According to figures released by IDC, 207.6 million Android and Apple smart-phones were shipped in the fourth quarter of 2012. This further renders the possibility of vehicular sensing.
As shown in [3–10], Departments of Transportation in the USA must collect various types of data (e.g., average speed or traffic density) for traffic monitoring purposes. Traditionally, these important data are collected by technologies such as inductive loop detectors (ILDs), video detection systems, acoustic tracking systems, or microwave radar sensors. However, these technologies mostly suffer from a high maintenance cost. On the other hand, cooperative data collection and dissemination in VANETs allow the traffic monitoring performed in a more cost-effective way [11]. Specifically, each vehicle collects its own or neighboring information (e.g., its current speed or neighboring traffic) and then transmits it to the remote roadside units (RSUs) via vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. The RSUs can be deployed at various points of interest along the roadway and can be used to collect data from locations up to tens of kilometers away. In this study, we coin the vehicular networks which are designed for traffic sensing and monitoring as the vehicular sensing networks.
One of the major challenges of vehicular sensing networks is high overhead of transmitted sensing data. Each sensing result is essentially some spatial-temporal measured values (speed, traffic density), which record the position of vehicles (i.e., a road segment or a small area) and the observation time. Such sensing data is periodically broadcasted. Upon reception of such a broadcast, the intermediate receivers/forwarders incorporate the received data into their local reports and then broadcast them again. Unfortunately, such a periodical broadcast brings on a high traffic load or even traffic storm. This problem is more serious in the scenario of high vehicle density, which could be found on multilaned highways in congestion situations. On the other hand, in most cases, drivers or monitors do not need exact individual reports, but only an overview of the general average speed on the road ahead [12]. This motivates the data aggregation issues in vehicular networks, including Flajolet-Martin sketch based probabilistic aggregation [13], fuzzy aggregation [12], and others [14, 15]. However, most of them are mainly focusing on how to achieve the data aggregation in dynamic vehicular environment, while the security issues on the aspect of the authenticity and integrity of aggregation results receive less attention. Since aggregation operation could be made by any intermediate forwarding vehicle, any malicious attacker could easily launch the attacks towards the data aggregation process by modifying the aggregated result or simply inserting invalid sensing data.
Secure data aggregation is a great challenge in vehicular sensing networks due to their unique network characteristics including highly dynamic network topology, intermittent connectivity, and potentially huge numbers of VANET nodes. These unique characteristics make the secure data aggregation in traditional wireless sensor networks such as [16], which always assume either a static network topology or aggregation structure, unsuitable for vehicular sensing networks.
Therefore, to achieve secure and efficient data sensing and collection, in this paper, we present the SAS, a secure data aggregation scheme for vehicular sensing networks which includes the basic scheme and advanced scheme. In the basic scheme, it achieves efficient data and MAC authentication via syntactic aggregation and cryptographic aggregation. However, the basic scheme needs to keep the original sensing data, which prevents a more efficient data aggregation. Further, it cannot work in case of the existence of duplicate messages. Thus, to overcome this problem, we propose an advanced scheme based on Flajolet-Martin sketch and a series of sketch proof techniques. We also discuss the tradeoff between the bandwidth efficiency and the estimation precision. Finally, extensive simulations and analysis demonstrate the efficiency and effectiveness of the proposed scheme.
The remainder of the paper is organized as follows. In Section 2, we introduce the related work. In Section 3, we present the system model and the design goals. In Section 4, we present some preliminaries. In Section 5, we present a secure data aggregation scheme in vehicular sensing networks by using the syntactic aggregation and cryptographic aggregation approach. In Section 6, we propose a probabilistic data aggregation scheme. Performance analysis is given in Section 7, followed by the conclusion in Section 8.
2. Related Work
Vehicular sensing networks represent a promising way to cooperatively collect useful information in order to increase road safety and driver convenience for future intelligent transportation system. By being integrated with the traditional digital map system, vehicular sensing networks provide the functionality of real-time automatic route scheduling [14], decentralized free parking places discovery [15], traffic monitoring [3], and so forth. In these applications, data aggregation is necessary for efficient data propagation and reduced transmission overhead.
There are quite a few research proposals for data aggregation in vehicular sensing networks [14, 15]. Most of them are based on group formulation and vehicle clustering, which can dramatically reduce the communication overhead due to the increased aggregation level. In additional to the above proposals, the structure-free aggregation frameworks are also proposed including Flajolet-Martin sketch-based aggregation [13] and fuzzy aggregation [12] without defining aggregate structures. However, the aforementioned studies focus on the data aggregation itself but do not take the security issues into consideration.
The most related research study for secure data aggregation in VANETs is the voting scheme, including [17, 18], which involves multiple vehicles to collect information towards a specific event (e.g., collision or traffic jam). Each witness (or observer) of this event will submit a message to a group leader. The group leader will take the responsibility of collecting more than a threshold k of proofs from k distinct witnesses to prove the validity of an emergency event by the voting scheme. References [17, 18] discuss how to further improve the aggregation efficiency by exploiting cryptographic tools such as onion signature [18] and aggregate signature [17]. Note that, in this study, we consider a more general data aggregation scenario: collecting data within a certain area and, at the same time, providing security guarantee for the aggregation functionality.
3. System Model and Design Goal
This section describes our system model, attack model, security assumptions, and design goals.
3.1. Network Model
In this paper, we consider a general vehicular sensing network model, which is mainly comprised of three components: traffic monitoring centre (TMC), RSUs, and vehicles. As shown in Figure 1, RSUs could be selectively deployed at some positions (e.g., intersections) to collect the traffic information (e.g., average speed) within a certain area. Due to high maintenance cost, RSUs could be only deployed intermittently to reduce the deployment cost. We assume that each vehicle, which is equipped with an on-board unit (OBU), has the capability of data collecting and reporting. The transmitted sensing data are propagated via V2V and V2I communications to the RSUs, which then forward them to the TMC. SAS is based on the distributed aggregation model similar to [13], which does not require any group/cluster formulation.

Overview of vehicular sensing network.
3.2. Security Assumptions
We assume that each OBU either shares a distinct secret symmetric key with TMC or obtains a public/private key pair, which is issued by TMC. Whether using shared secret key or public key depends on different system requirements.
3.3. Attack Model
In this study, we assume that the TMC and RSUs are trusted while vehicles (including the sensing vehicles and aggregator vehicles) are potentially malicious and can thus launch various attacks including fabricating, duplicating, and computing the aggregation incorrectly. We do not consider denial-of-service attacks where aggregator vehicles fail to or refuse to provide any acceptable result. A malicious sensor can always report an arbitrary sensing report, which fundamentally cannot be prevented. So we do not aim at preventing such an attack.
3.4. Design Goals
Security Goal. The security goal of SAS is to enable the TMC to verify whether an aggregate sensing report is correct or not. Specifically, TMC should accept a reported aggregate report if and only if it is equal to the output of a correct execution of the aggregation function over all of the sensing reports provided by the qualified vehicles in the most recent epoch. Efficiency and Effectiveness Goal. The efficiency goal of SAS is to minimize the transmission overhead and, at the same time, to ensure a certain sensing accuracy. However, computational cost is not a major concern of this paper since VANET is generally assumed to have unlimited computational capability [17].
4. Preliminaries
4.1. One-Way Chains and MAX Protocols
One-way chain is a widely used cryptographic primitive, which is based on a one-way function F and a secret seed s. The one-way function F is easy to compute but computationally infeasible to invert. The chain has the sequence of values
4.2. Pairing Technique
The proposed basic scheme is based on bilinear pairing which is briefly introduced as below. Let 𝔾 be a cyclic additive group and bilinear: for nondegenerate:
4.3. Aggregate Signature and Batch Verification
The major computation cost for authenticating an emergency message comes from verifying a set of supporting signatures issued by different emergency witnesses. The corresponding public key certificates of the signers also need to be verified together. All of them will incur a significant amount of transmission and verification cost. In this study, we use aggregate signature to reduce the transmission cost of supporting signatures, certificates, and batch verification to realize efficient signature verification.
An aggregate signature is a digital signature that supports aggregation of n distinct signatures issued by n distinct signers to a single short signature [19]. This single signature (and the n original messages) will convince the verifier that the n signers indeed sign the n original messages. In addition to the benefit of the reduced transmission size, aggregate signature technique supports batch verification, which enables the receivers to quickly verify a set of digital signatures on different messages by different signers. In this study, we adopt the aggregate signature and batch verification introduced in [20] as our basic cryptographic aggregation technique to improve the aggregation performance.
5. A General Secure Data Aggregation Framework in Vehicular Sensing Networks
In this section, we introduce a general data aggregation framework in vehicular sensing networks by using the syntactic aggregation and cryptographic aggregation approach.
5.1. System Setup
The TMC generates a tuple
An important task of the setup procedure is to determine the format of emergency report message. In our study, the format of a secure sensing report (SSR) is defined as follows. For a sensed event, the sensor vehicle i will generate an SSR:
For a specific event, it is reasonable to assume that the relevant SSRs will share the same
5.2. Registration
A vehicle can join the network by performing the following step depending on Mode I or Mode II.
Private Key Generation for Mode I. In the symmetric key mode, a vehicle i can randomly choose Private/Public Key Generation for Mode II. In the public key mode, a vehicle can randomly choose
5.3. SSR Generation and Broadcasting
Once an event is sensed by one or multiple vehicles and the observation is Mode I SSR Generation. In terms of Mode I SSR generation, given the type and observation time of the emergency message Mode II SSR Generation. For Mode II SSR, given the type and observation time of the emergency message A single SSR verification can be performed as follows: given
5.4. SSR Opportunistic Forwarding
In VANETs, the network topology could be very dynamic and diversified in shape from time to time, even sometimes sparse and frequently partitioned. The communication between vehicles is expected to be performed in an opportunistic manner. This means a vehicle can carry packets when routes do not exist but forward the packets to the new receivers when they move into its vicinity [21]. To enable the opportunistic data propagation, vehicles that are within a range r and maintain connectivity for a minimum time t with each other can be arranged to form a cluster. The detailed discussion on cluster creation and maintenance can be found in [21]. We refer to the node at the head of every cluster as header, which is responsible for forwarding the data to the next cluster in a typical opportunistic data forwarding algorithm such as [21, 22]. The messages will be buffered at the header until they are forwarded to the next cluster, which is also referred to as the “Carry and forward” strategy. In this study, it is considered that the header can also play the role of emergency message aggregator because of the following two reasons.
If taking a header of a cluster as the aggregator, the aggregation process will be merged into a part of data forwarding process. Therefore, there is no need to elect another cluster head to perform the data aggregation operations. The process of message propagation between two clusters is referred to as a catch-up process, where a message traverses along with its carrying vehicles until it reaches within the radio range of the vehicle at the end of another cluster, which obviously presents a considerable propagation interval depending on the speed of vehicles and the gap between clusters. Therefore, we can use such an interval to aggregate the related emergency messages to minimize the aggregation latency.
In the following sections, a cluster head will be taken as the aggregator of the cluster, which will perform the following SSR aggregated authentication algorithm.
5.5. SSR Secure Aggregation
For any specific emergency event, each aggregator maintains two local message lists, which keep the forwarded SSRs and ReadytoForward SSRs, respectively. The forwarded message list, denoted as ℱ, contains all the SSRs which have been forwarded by this vehicle before, while the ReadytoForward message list, denoted as ℛ, stores messages which have not been transmitted but can be forwarded some time later. The
5.5.1. SSR Aggregation
Aggregate_SSR is used to aggregate multiple SSRs into a single SSR, which includes two steps: syntactic aggregation step and cryptographic aggregation step.
Syntactic Aggregation. For a specific event, given n SSRs MAC Aggregation. It is used to aggregate multiple MACs into a single MAC, which includes the following two modes: Mode I and Mode II.
Mode I Aggregation. Mode I aggregation is
Mode II Aggregation. Mode II aggregation includes the certificate aggregation
After syntactic aggregation and cryptographic aggregation, we can obtain the aggregated SER as
5.5.2. SSR Batch Verification
In this section, we exploit batch verification to further reduce the computational cost.
Mode I Verification. For Mode I verification, TMC could verify the sensing reports by verifying the following equations:
Mode II Verification. For Mode II verification, TMC could perform the certificate batch verification as well as signature batch verification.
Certificate Batch Verification. Given an aggregated certificate Signature Batch Verification. Given
If the batch verification holds, the aggregator will accept SSRs in list ℛ as valid SSRs. Then the aggregated SSR in ℛ will be forward propagated. Meanwhile, the aggregator will put all the SSRs in ℛ to message list ℱ.
However, the previous proposed solution may face the following two problems. Firstly, it need to carry the original input of each sensing node for future verification. This is because MACs authentication requires the original input. Secondly, the duplicated message should be carefully removed from the aggregation; otherwise many of them will be aggregated for several times. This point is difficult to prevent in the context of VANET, which is a typically dynamic and distributed environment. In the next section, we will introduce a probabilistic data aggregation scheme which could automatically filter duplicate messages.
6. A Probabilistic Data Aggregation Scheme for Vehicular Sensing Networks
In this section, we firstly introduce the concept of FM sketch, which is the foundation of probabilistic data aggregation in vehicular networks. We then propose a secure data aggregation scheme based on our proposed sketch proof technique.
6.1. FM Sketches-Based Data Aggregation in VANETs
A Flajolet-Martin sketch (or “FM sketch”) is a data structure for probabilistic counting of distinct elements that has been introduced in [23]. FM sketch represents an approximation of a positive integer by a bit field
The variance of n is quite significant [13], and thus, the approximation is not very accurate. To overcome this, instead of using only one sketch, a set of sketches can be used to represent a single value to achieve trade-off between the accuracy and memory. The respective technique is called probabilistic counting with stochastic averaging (PCSA) in [23]. With PCSA, each added element is first mapped to one of the sketches by using an equally distributed hash function, and it is then added there. If m sketches are used, denoted by
Sketches can be merged to obtain the total number of distinct elements added to any of them by a simple bitwise OR. Important here is that, by their construction, repeatedly combining the same sketches or adding already present elements again does not change the results, no matter how often or in which order these operations occur. FM sketch summaries are naturally composable: simply OR-ing independently built bitmaps (e.g., over data sets
For the purpose of discussion, let us consider a specific application. Assume that we are interested in monitoring the average speed within a certain area. As the first step, we use a sketch for each road segment and approximate the sum of speeds of vehicles within this road segment. For the second step, we will calculate the average speed by dividing the speed sum by the number of vehicles involved. In the following sections, we will discuss how to generate the sketch proof and secure sketch aggregation.
6.2. Sketch Proof Generation
According to the FM sketch definition, given the ID i and speed
We start from three basic pieces of information that each sensor generates in our protocol. Let
In the following, we will introduce these three security proofs one by one.
6.2.1. Inflation-Free Proof
Inflation-free proof is basically the authentication code generated by the vehicles on the 1-bit positions, which are smaller than the position of first 0. To prevent the inflation attacks, it is sufficient to require that each 1-bit, whose position is less than Merging Operation ⊕. Consider two sketches Aggregation Operation ⊗. The MACs of
With merging operation and aggregation operation, size of inflation-free proof could be minimized to
6.2.2. Deflation-Free Proof
Deflation attack is defined as that the malicious aggregators may try to turn 1-bits into 0-bits, removing the corresponding MACs from the security proofs. To prevent deflation attack, SAS adopts the hash-chain-based MAX protocol, which is introduced in [16]. The basic idea is to construct one-way chains whose seeds are all the Hash Chain Folding Operation ⊙. The aggregator could use the folding function ⊙ to fold all the hash chains into a single one
Note that one-way chains should be rolled forward even after they have been folded together with an operation like ⊙. Therefore, it requires the one-way function to achieve homomorphic property in that
The size of deflation-free proof is a constant number
6.2.3. Supplement Security Proof
Supplement security proof enables the aggregator to derive the new inflation-free proof when
6.3. Sketch Proof Aggregation
As shown in Figure 2, multiple sketches could be aggregated during their propagation process, and sketch proofs could be aggregated along with sketches merging. Without loss of generality, we discuss aggregation algorithm only for two sketch proofs, and more than two sketch aggregations can be aggregated by applying it for multiple times.

Sketch generation and sketch proof.
Consider two sketches inflation-free proof aggregation: deflation-free proof aggregation: supplement security proof updating:
Note that such a sketch proof aggregation process could be performed fully distributed, which means it naturally supports hierarchical aggregation, while it does not require any aggregation architecture.
6.4. Sketch Proof Verification and Average Calculation
After the aggregation results and the security proof arrive at the TMC, TMC should verify the correctness of the inflation-free proof and deflation-free proof. To check the validity of inflation-free proof, TMC should perform the following operations in different MAC modes.
Symmetric Key Mode. In this mode, TMC should recalculate the MAC of each 1-bit and then aggregate them into a single one. After that, TMC should check if the obtained result is equal to the received one. Signature Mode. In this mode, TMC could batch verify the aggregated signatures by performing batch verification technique [19].
To verify the correctness of deflation-free proof, it needs to compute all individual
6.5. Further Discussion
In this subsection, we give an extended discussion on some issues closely related to the proposed SAS protocol.
6.5.1. Symmetric Key versus Asymmetric Key
As we have mentioned in Section 3, MAC in this study represents two modes: symmetric key-based mode and asymmetric key- (or signature-) based mode. Generally speaking, different MAC modes have different advantages as well as disadvantages.
From the performance point of view, symmetric key-based MAC has the advantage on asymmetric key-based approach in that it has shorter size and will not introduce the computational expensive operations. Symmetric key-based MAC is expected to play an important role in the vehicular sensing applications where sensing information is directly sent to the TMC since they could be processed faster than signature-based approach and also introduce less transmission overhead. However, on-path vehicles cannot verify an MAC's authenticity since only TMC shared the key with MAC generator. On the other hand, signature-based approach could provide many extra features such as nonrepudiation and public authentication. In the context of vehicular sensing networks, it means the aggregated information could be verified by any on-path vehicles, which allows the drivers to have fast access to the authenticated traffic information instead of waiting for the response of the RSUs.
6.5.2. Size of Sketch Proofs
There are three kinds of sketch proofs for SAS. The first two sketch proofs including inflation-free proof and deflation-free proof could be aggregated and thus introduce a minimized transmission overhead. The third sketch proof, supplement security proof, does not support proof aggregation since they will be merged with inflation-free proof in the future. This means supplement security proof may incur a higher transmission overhead. However, we argue that size of supplement security proof is still acceptable in that, during the aggregation process, size of supplement security proof will decrease along with the increase of first 0-bit position
7. Performance Evaluations
In this section, we evaluate the performance of the proposed SAS in terms of the resultant communication cost and approximate accuracy. To demonstrate the superiority of SAS, we also compare SAS with nonaggregation transmission approach. In this part, we consider SHA-1 as the building blocks of MAC. Note that asymmetric key-based MAC mode will have a similar communication cost if we choose short aggregate signature as the building blocks.
7.1. Transmission Overhead
One of the major advantages of SAS is the reduction of its transmission cost. The communication cost is determined by the size of aggregated security proof including inflation-free proof, deflation-free proof, and supplement security proof. Note that, since MAC in this study represents two modes: symmetric key-based mode and asymmetric key- (or signature-) based mode, here we only discuss the symmetric key-based MAC due to page limitation. As a typical example, we choose the 64-bits SHA-1 as the basic MAC technique and RSA-1028 as the basic one-way function tool. Table 1 summarizes the size of different components as well as the overall transmission overhead for nonaggregation transmission and SAS transmission. Here, we consider the worst case of our aggregation in that the size of supplement security proofs is bounded by
The size of each component of SAS (bytes).
By choosing different number of sketches, we obtain the different communication cost of SAS under different vehicle numbers as well as different sketch numbers, which has been shown in Figure 3. It is observed that the probabilistic aggregation does not show its advantage when the number of vehicles is small. However, when the number of vehicles grows, the proposed SAS aggregation scheme could dramatically reduce the communication cost when the sketch number is small. It is also observed that the number of sketches plays an important role for the overall system performance in that a small sketch number such as 4 makes the proposed SAS have a better performance while, when the sketches number is large such as 16, the advantage is not so obvious. Therefore, if an acceptable accuracy is guaranteed, the number of sketches should be as small as possible to achieve a better performance. In the next section, we will discuss the tradeoff of accuracy and the number of sketches.

Transmission overhead of various secure FM sketches.
7.2. Tradeoff of the Accuracy and Number of Sketches
According to [13], PCSA yields a standard error of approximately

Standard error of SAS secure sketch.
8. Conclusion and Future Work
Vehicular sensing networks have been envisioned to play an important role for future traffic monitoring applications. In this study, we propose a secure and efficient aggregation method based on FM sketch and security proofs techniques. The extensive performance evaluations have demonstrated the efficiency and effectiveness of the proposed scheme. Our future work includes implementing SAS in a specific application scenario and evaluating its performance with more realistic simulations or even experiments.
Footnotes
Acknowledgments
The correspondence author of this paper is H. Zhu. This research is supported by the National Natural Science Foundation of China (Grant no. 61003218, 70971086, 61272444, 61161140320, and 61033014), the Doctoral Fund of Ministry of Education of China (Grant no. 20100073120065), the JSPS A3 Foresight Program, and the NEC C&C Foundation.
