Abstract
In vehicular ad hoc networks, inside attackers can launch a false information attack by injecting false emergency messages to report bogus events such as traffic accidents. In this article, a false message detection scheme is proposed and evaluated. First, traffic flow theory is employed to analyze vehicular behavior under a traffic accident scenario. It shows that a “bottleneck” phenomenon is triggered because the road capacity is reduced by blocked lanes at an accident site. The traffic parameters, such as vehicular density, exhibit a distinct statistical property compared to an accident-free scenario. Based on this, a false message detection algorithm is proposed in which the traveling vehicles are exploited as witnesses to collect traffic parameters, and their observation data are used as evidence to feed a traffic flow model. A Bayesian theorem–based method is used to calculate the likelihood for each traffic scenarios, and the actual traffic condition is estimated to determine whether the reported accident has actually occurred. Finally, the performance of the proposed scheme was verified through simulations in a realistic traffic scenario. It was shown that a higher detection accuracy could be obtained compared to previously proposed approach.
Introduction
In recent years, vehicular ad hoc networks (VANETs) have received much attention from academics and industry because a variety of VANETs applications have emerged for road safety, passenger comfort, and traffic efficiency. 1 In VANETs, vehicles are equipped with wireless access vehicular environment (WAVE) devices, which enable them to communication with each other (vehicle-to-vehicle, V2V), and with pre-deployed roadside units (RSUs; vehicle-to-infrastructure, V2I). Road safety applications are expected to decrease road accidents in VANETs, and this includes cooperative collision warning (CCW), road hazard notification (RHN), and post-crash notifications (PCN). 2 In these applications, vehicles are exploited as a “moving sensor” for collecting traffic information, and they are allowed to broadcast two types of messages: (1) periodic beacon messages, which are used to show the present of a vehicle in the network, and (2) emergency messages, which are used to report the occurrence of damaging events (such as a traffic accident and congestion). These messages can help drivers be beware of the traffic situation and hazard events that are beyond the horizon. However, these benefits can only be realized only if the messages are reliable, in other words, if they reflect real world honestly and correctly. Inside attackers can inject false messages into the network when they are motivated out of selfishness to make a malicious attack. 3 For example, by reporting a bogus traffic congestion, a selfish vehicle may try to create the illusion of traffic congestion to mislead other vehicles to exit the current road in order to reach their destination faster. In addition, hardware faults can also result in incorrect data. The drivers may be misled by unreliable messages to take wrong reactions, such as hard brake or switching to an alternative route, which results in journey delay or traffic disturbance, and in extreme cases, an accident could occur. Hence, effectively detecting false messages in VANETs is very important.
Generally, a VANETs message can be considered to be reliable if the following two conditions are satisfied: (1) the sender is a valid node, and the message integrity is protected well against malicious injection, modification, and replay attacks, and (2) the information contained in the message honestly and correctly reflects the real world. Some existing VANETs security mechanisms 4 focus on the first requirement by the use of digital signature and authentication technology. However, compared to launching attacks on a protocol stack, it is much easier attack for an inside attacker to inject false information with validly signed messages because an inside attacker commonly has the certificate and key distributed by the certificate authority.
In recent years, various schemes have been proposed to detect false messages injected by an inside attacker. In literature,5–7 the concept of reputation/trust is used to represent to what degree a vehicle can be determined to be trustworthy. Each vehicle observes the behavior of its neighbors and uses a numeric score to represent the node’s credibility. A central reputation server is responsible for storing and updating the reputation/trust value of vehicles in the network. However, VANETs have some unique features such as a large geographical range and network consisting of millions of nodes. High node mobility generally results in frequent topology changes that make it very difficult to maintain real-time querying and updating of the reputation/trust score. Therefore, applying a centralized reputation/trust mechanism in VANETs has long been debatable. To solve this problem, data-centric detection schemes have been proposed to reduce the network overhead and improve efficiency using fully distributed and localized detection algorithms,8–10 in which the vehicles located at the scene are exploited as “witness,” and they perform cooperatively and locally the detection algorithm to collect possible evidence for verifying the correctness of safety messages. Examples include the plausibility and consistency checking scheme,11,12 trajectory-based detection classifier framework, 10 and heartbeat-based detection. 13 However, these efforts focus on how the evidence are collected and used, and few works attempt to intensively analyze the inherent characteristics of the traffic data itself.
In real-world traffic scenario, the occurrence of abnormal traffic events can commonly be reflected by a change of traffic parameters. For example, in a car collision scenario, parts of the lanes may be blocked by crashed cars and the road capacity can be reduced by them. Since the arrived newly vehicles cannot pass through in a timely manner, they brake and change lanes to merge into the remaining unobstructed lanes. Queuing or traffic jams can be formed during rush hours, and a “bottleneck” phenomenon can be triggered at the accident site. 14 As a result, traffic parameters, such as density and speed, exhibit abnormal fluctuations. These provide us with a feasible approach to infer whether an accident reported by an emergency message has actually happened. In this article, we extract two typical traffic patterns—lane-blocking and blocking-free—and employ a traffic flow model to estimate the probability density function (PDF) of traffic parameters. Based on this, a Bayesian inference–based false message detection algorithm is proposed. Our contributions in this article are summarized as follows:
Two typical traffic patterns are extracted, and a traffic flow model is built to analyze the characteristics of vehicular behavior and to estimate the PDF of traffic parameters under each traffic pattern.
A false message detection algorithm is proposed, in which the vehicles located at the scene are exploited as witnesses to infer the actual traffic condition. Based on their observational data, a Bayesian approach is used to calculate the likelihood of each traffic pattern and to determine whether the accident reported by the emergency message has actually happened.
A simulation experiment was conducted for this article to validate the proposed algorithm.
The remainder of this article is organized as follows. We discuss some related works in section “Related works,” before describing the VANETs model and traffic flow model in section “System models.” In section “The proposed scheme,” we introduce the detailed designation of the proposed algorithm. The results of performance evaluation are presented in section “Simulation.” Finally, we draw our conclusions in section “Conclusion.”
Related works
In recent years, there has been much researches aimed at detecting false messages in VANETs. In our opinion, the methods can be classified into two categories: (1) node-centric and (2) data-centric schemes. Node-centric schemes try to analyze the senders’ behavior, such as the packet and message pattern, and compare them with the normal nodes to identify the attackers. Deviated nodes are deemed as malicious, for example, the nodes whose message sending rate is significantly higher than the average value. Furthermore, a numeric score, such as reputation value, is used to quantify the healthy degree of a node. Li et al. 6 proposed a reputation-based announcement scheme. When an announcement arrives, the recipient determines whether to accept or reject it by using the sender’s reputation score, which reflects the extent to which the sender has announced reliable messages in the past. The reputation score of a sender is computed based on feedback from its neighboring vehicles. They tend to give positive feedback for reliable messages, or a negative feedback for unreliable messages, which increases or decreases the senders’ reputation score. A reputation server is responsible for collecting, updating, and certifying the reputation score. Even though reputation/trust-based schemes have been well studied in self-organized and peer-to-peer networks, they are impractical in our case because of the large network size and high node mobility of VANETs. During the detection process, reputation-based schemes require multi-round communication (vehicle-to-vehicle, vehicle-to-infrastructure, and vehicle-to-reputation server), which leads to high detection delay and heavy communication overheads.
Data-centric trust has been proposed to solve these problems. It attempts to evaluate the reliability of the messages rather than identify misbehaving nodes. In Ruj et al., 8 based on locally observed information, each vehicle independently decides whether a received message is false. To defend against sybil attacks, this scheme does not use any external inter-vehicle cooperation or majority voting mechanism. In LEAVE (local eviction of attacker by voting evaluators), 15 each vehicle runs an intrusion detection system (IDS) to monitor the behavior of its neighbors. If an attacker sends false data, its neighbors will identify a significant deviation from the honest data reported by other neighbors. A voting procedure is triggered, and each vehicle launches an accusation against the sender. The sender is evicted from the network only if the number of accusations against it exceeds a given threshold. Threshold-based schemes16,17 are built on the assumption that each witness sends an alert message to report the observed traffic event. A vehicle accepts that a message is true only if the number of identical messages surpasses a predefined threshold. In practice, the main issue of threshold-based schemes is that the threshold is very difficult to choose. Too high of a value leads to valid messages being rejected; too low a threshold offers little defense against attacks launched by multiple colluding malicious vehicles. In addition, since there is typically some time period between the occurrence of a traffic event and the vehicles reporting it, the alert messages are not accepted until a sufficient number of vehicles have sent reports. This leads to an unnecessary delay. Sedjelmaci et al. 9 proposed an efficient and lightweight intrusion detection mechanism for vehicular network (ELIDV) to detect internal malicious vehicles. ELIDV first evaluates the number of intrusion detection agents located within the wireless communication range and then uses a set of rules to evaluate the credibility of each node. Finally, it calculates and assigns a malicious level to detect malicious vehicles, and the vehicles are classified into one of the following classes: trustworthy, uncertain, or untrustworthy. Yao et al. 18 considered the types of applications and the authority levels of nodes and proposed a dynamic entity-centric trust model by employing the experiences and utility theory. The scheme is simple enough for real-time trust evaluation. Analyses show that it can reflect the data trustworthiness objectively and help vehicles to detect the false or bogus data.
An RSU-aided data-centric trust establishment scheme was proposed in Grover et al., 19 in which the trust relations between the reporting vehicles and the data-consuming vehicles are decoupled. The reported data are first collected by an RSU, rather than the data-consuming vehicles, and the RSU transforms the data into evidence, calculates a trust value, and provides the data and the trust value to the data-consuming vehicles. In heartbeat-based scheme, 13 each vehicle continuously parses beacon messages of its neighbors and try to detect the possible inconsistency in disseminated information using consecutive beacons. Because a beacon message includes position, speed, and steering angle of the sender, the sender’s further position in a short period can be predicted by using speed and steering angle contained in the current beacon. The sender can be viewed with suspicion only if its reported position does not match the predicted position.
Zaidi et al. 12 proposed a host-based intrusion detection scheme to detect false emergency messages in VANETs, in which the vehicles located in the scene are exploited to provide their observation data on the traffic condition, and a statistical approach is used to identify the malicious vehicles that broadcast a false message. The proposed scheme is based on a fact: if an accident occurs, the crashed vehicles block part of lanes and the traffic flow exhibits decreased trend. In the scheme, a hypothesis test method is exploited to identify the changes on traffic flow statistics. However, this decrease phenomenon has not been well modeled and analyzed. To what extent does traffic flow drop and the correlation of some parameters, such as the number of lanes, number of blocked lanes, and vehicle density, have not been analyzed quantitatively. In real-world traffic scenario, traffic condition can be variable at different hours of a day. For example, traffic is light at midnight and heavy at rush hour. It should be considered fully and been analyzed quantitatively. In this article, the traffic flow theory is exploited to conduct a road capacity analysis, and several parameters, such in-flow rate, number of lanes, and number of blocked lanes, are used as the input of the model. A more general model is proposed to estimate the actual traffic condition.
System models
Network model
As shown in Figure 1, we consider a road safety application scenario in which the vehicles broadcast an emergency message to report a traffic accident. The message is relayed to all the vehicles located within a predefined geographical range so that their drivers can make a timely reaction. Also, it is assumed that the vehicles are equipped with various kinds of on-board sensors (GPS, accelerometer, radar, etc.) that enable them to obtain the motion status of themselves and the vehicles in their vicinity. For example, a vehicle can count the number of neighboring vehicles and calculate the local traffic density within a perception radius

VANET scenario.
Adversaries model
This article focuses on false message attacks launched by inside adversaries who are a valid VANET node and have the security parameters (key and certificate) distributed by the certificate authority. They launch a false message attack by broadcasting bogus emergency messages to claim a nonexistent accident. It also inserts manipulated low speed in its beacon messages to make an illusion of traffic congestion for enhanced deception. In addition, it is assumed that there may be colluded attackers, in which multiple adversaries launch a false message attack in a cooperative style such as reporting the bogus messages with the similar content to compromise the deployed IDS. In the most of the existing IDS, multiple reports with the same content can be regarded as a stronger signal of high credibility than a single report. The detection accuracy may be degraded, or even be compromised in some cases, which results in the greater damage to the reliability of VANETs.
Moreover, we assume the multiple pseudonyms–based privacy preservation scheme is used in VANETs, in which each vehicle stores multiple pseudonyms and uses pseudonym to signature the sent messages. It leads to a risk of sybil attacks, in which an attacker behaves as if it is a large number of nodes. The attacker can send multiple false alert messages to claim bogus event. Also, it can also provide false evidence to compromise the IDS by using multiple identities. The detecting of sybil attacks is out of scope of this article, so we assume a sybil attack detection protocol has been already deployed in VANETs.
Trust model
A trust model is built to represent the degree of trustworthiness of the data used in the detection process. The data can be classified into two categories: (1) completely trustworthy and (2) partially trustworthy. For a detecting vehicle, the data collected from its own on-board sensors, such as camera, lidar, and radar, are assumed as trustworthy completely. Besides, because we assume colluded attackers, there may be attackers in the detector’s neighbors. The evidence data provided by them are assumed as trustworthy partially.
Traffic model
For thoroughly analyzing “bottleneck” phenomenon, we introduce some concepts from traffic flow theory
22
to model the vehicular behaviors and their macroscopic characteristics. As shown in Figure 2, a freeway with

Traffic model.
The correlation between density and flow rate can be regulated by the following piecewise linear function
where

Fundamental diagram.
To represent the status of the freeway at timeslot
where
The proposed scheme
Overview
First, two typical traffic patterns, lane-blocking and blocking-free, are used to characterize a real-world traffic scenario with and without a traffic accident. Let
As shown in Figure 4, the proposed algorithm is deployed at each vehicle node and it operates in a fully distributed style. When receiving an emergency message, the algorithm is triggered and an evidence collection process (Block I) is first performed, in which the vehicle collects its own sensor data and exchanges these data with the nearby vehicles to calculate

Proposed algorithm.
Evidence collection
First, each witness independently observes the traffic density of the segment it is located in and send its observation data
Bayesian inference
Observation data
The output
Inferring the traffic condition
For inferring the actual traffic condition, a remaining unsolved problem is how to obtain the conditional probability
Blocking-free pattern
In this pattern, there are no blocked lanes, and all the segments have the same capacity
where
Lane-blocking pattern
When a car collision occurs, several lanes are blocked and the upstream traffic flow must be merged into the remaining unobstructed lanes. In the blocked segment, the capacity is reduced from its normal value of
Definition 1: Equilibrium
An equilibrium is an N-dimensional state vector
Road segment
Lemma 1
Proof
Define
Lemma 2
Considering a blocking-free scenario in which
Proof
Existence
Define the traffic density under an uncongested state as
and the flow
Uniqueness
Suppose
and if
Lemma 2 gives the equilibrium values of the blocking-free patterns. In this pattern, the traffic flow expelled by a segment always can be accepted by its downstream segment. Thus, all of the segments run at free-flow status and they have the same density. The system status converges to blocking-free equilibrium
Lemma 3
Consider a lane-blocking scenario in which
Proof
If
Then, we consider the case in which
Under lane-blocking pattern, the traffic density in upstream and downstream segments exhibits distinct value. Due to reduced road capacity, upstream vehicles cannot pass timely the blocked site. They accumulate and queue at segment
Proposition 1
Suppose
Proof
From Lemma 3, we know a fact that the traffic flow rate of whole freeway is determined by segment
For
and so
Above analysis gives the equilibrium values of traffic density under blocking-free and lane-blocking pattern. By using them, the PDF of traffic density can be established easily (Figure 5), where the equilibriums are denoted by

PDFs of two traffic patterns.
Simulation
Simulation setup
A simulation was conducted to evaluate the performance of the proposed algorithm. We used Simulation of Urban Mobility (SUMO) version 0.19.0 to generate the traffic scenario used in the simulation. SUMO is a traffic simulation software that has the ability to generate highly realistic vehicular behavior by specifying road type, speed limit, and traffic flow rate. In the simulation, a three-lane, one-way straight-line freeway with 1 km length was used, and a car following Krauss mobility model was used, which is the default vehicular mobility model in SUMO. To achieve heterogeneous vehicles, we set up three different types of vehicles: small, medium, and large size, with vehicle lengths of 5, 7, and 10 m, and maximum speeds were set as 19, 17, and 15 m/s, respectively. SUMO outputs a .xml file that contains the floating car data of all vehicles in the traffic scenario. The .xml file was converted to a NS2 mobility file with a Python script traceExporter.py provided by SUMO. The network simulation was performed by using network simulator NS2 2.35. In NS2, we implemented the proposed algorithm and turn it into simulations. The two-ray ground reflection model was used as wireless propagation model, and the wireless communication range was set as 250 m. Other parameter setting used in the simulations is given in Table 1. The simulation was run repeatedly for 50 times, and Figure 9 shows the average values of all the results.
Parameter setting.
In order to evaluate the traffic model and the proposed algorithm, we used different scenarios. The first task is to verity whether the traffic model can predict accurately the vehicular behavior. According to the above two traffic patterns, an accident-free scenario and a traffic accident scenario are set in SUMO, the outputted .xml file was parsed by a Python script, and the traffic densities were calculated and plotted. Furthermore, we inserted a percentage of attackers into the accident-free scenario to demonstrate how the attacker behavior occurs and how well the proposed algorithm works.
Two metrics were used to evaluate the performance of the proposed scheme: detection rate (DR) and false positive rate (FPR). DR refers to the ratio between the attacks that were successfully detected and all attacks. The FPR is the ratio between wrong attack alerts that honest data detected as bogus data and all alerts.
Simulation results
Verifying traffic model
First, we set up an accident scenario and collected vehicular data from it to examine whether the traffic flow model could accurately predict vehicular behavior. The duration of the simulation was 600 s. At the beginning, vehicles enter the freeway from segment

Observation density under various densities: (a) in-flow rate = 640 vehicles/hour, (b) in-flow rate = 840 vehicles/hour, and (c) in-flow rate = 1040 vehicles/hour.
Furthermore, we define the observation area as 350–650 m, which consists of upstream area (350–500 m) and downstream area (500–650 m). The distribution of

Probability distribution of observed density: (a) traffic density under blocking-free pattern, (b) traffic density under lane-blocking pattern (upstream), and (c) traffic density under lane-blocking pattern (downstream).
Evaluating detection algorithm
A collusion attack scenario is simulated by inserting a percentage of attacker into the accident-free scenario, who send false emergency messages to claim a traffic accident that occurs at 500 m. The attackers take part in the detection procedure and inject bogus observed density for misleading the detection algorithm to get the wrong results. In order to eliminate these false evidence, an honest detector first uses its sensor data to calculate the observation density
We run the simulations both with and without the proposed detection algorithm, and the results are shown in Figure 8. First, the average density

Observed density in upstream area: (a) traffic density in upstream: 10% attackers, without detection; (b) traffic density in upstream: 20% attackers, without detection; (c) traffic density in upstream: 20% attackers, with detection.
We calculate the DR and FPR of the proposed algorithm and compare it with the previously proposed scheme in Rajesh and Soumya. 13 Figure 9(a) gives the DR of two algorithms under varying attacker proportion. It can be observed that the proposed scheme worked well, and all the bogus messages were detected successfully under a small proportion attackers. The DR started to drop when the attacker proportion surpassed 0.2. This is mainly due to the attackers taking advantage in number, and they provide bogus evidence to mislead honest vehicles into obtaining wrong results. However, the heartbeat-based scheme worked well under small attacker proportion, but its performance was poor when the proportion became larger. The explanation for this is that the proposed algorithm can exploit the observation data provided by both upstream and downstream vehicles. Hence, our algorithm have a better performance in resisting collusion attack.

(a) Detection rate (DR) under varying percentage of attackers and (b) false positive rate (FPR) under varying percentage of attackers.
A similar trend can be seen in Figure 9(b), which gives the FPR of the two schemes. The number of attackers has a significant impact on the performance of the detection algorithm. The FPR starts to rise when the attacker proportion reached 0.25. However, it is only a worse situation scenario. In practice, it is very hard to place plentiful attackers into a pre-selected scenario and arrange them in strategic positions.
Conclusion
In this work, a traffic flow model–based false message detection scheme was proposed and tested. The simulation results showed that the proposed algorithm exhibits better performance under collusion attacks compared to the previously proposed heartbeat-based scheme. The proposed scheme demonstrates the effectiveness of the traffic flow model on determining if the emergency message data are bogus based on the observation data collected from traveling vehicles. Using a traffic flow model, vehicular behavior and the value of traffic parameters under free-flow or accident scenarios can be accurately estimated, and the actual traffic pattern can be accurately inferred. It proves the feasibility in applying the traffic flow model on the false message detection problem in VANETs, without the need for any pre-deployed infrastructure.
Footnotes
Handling Editor: Antonella Molinaro
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (Grant No. 61772173), Key Research Project of Higher Education of Henan Province (Grant No. 18A520052), and Innovation Talent Support Program Project of University Science and Technology of Henan Province (Grant No. 19HASTIT027).
