Replica attack detection method for vehicular ad hoc networks with sequential trajectory segment

Abstract

In vehicular ad hoc networks, attackers can disguise as replicas of legitimate vehicles by cracking or colluding and then use the identity replicas in a malicious way. Not only the generation of replicas itself poses an aggressive behavior, but also the replicas can enable other insider attacks, such as denial of service, information interception, and replay attack. To solve this issue, researchers have presented many solutions in wireless sensor network or in mobile ad hoc networks. However, majority of current schemes are not good at dealing with conspiracy replicas or lack of considering peculiar characteristics of high mobility of vehicles. For detecting identity replicas in vehicular ad hoc networks, we propose a detection method with sequential trajectory segment based on semi-supervised support vector machine. In terms of semi-supervised support vector machine, we establish a detection model using spatio-temporal trajectories of different identities as input samples, which include features of both conspiracy and non-conspiracy attack scenarios. To validate our approach, we apply sequential trajectory segment to simulation environment. The performance analysis and experimental studies suggest that our proposed method provides high detection accuracy, which is almost impervious to the replica identity ratios in vehicular ad hoc networks. Furthermore, the time performance of replica detection is less affected by the distance between compromised nodes and their clones than that of existing solutions.

Keywords

Vehicular ad hoc networks identity replicas spatio-temporal trajectory semi-supervised support vector machine replica attack detection

Introduction

In vehicular ad hoc networks (VANETs), vehicles perform as mobile nodes sending messages to other vehicles and to units (roadside units (RSUs)) via wireless communication. Due to the openness nature of wireless communication in VANETs, the vehicular identity as the foundation of this communication is vulnerable to security threats.^1–3 For example, an adversary may infiltrate virtually any unit (electronic control unit) of a vehicle to physically compromise it by means of certain communication device. Then he may create replicas of its identity with the captured credentials and secretly deploy them at selected real vehicles. This type of attack is known as an identity replica attack or identity clone attack, in which one identity is used by multiple vehicles in multiple places. The replicas disguise as legal nodes to participate in the network. They can further launch various hazardous attacks depending on intentions (selfish or destructive) of the attacker. On account of forming the root cause of many security problems, replica attack has important repercussions in VANETs.

The concept of replica attack was originally proposed in wireless sensor networks (WSNs).⁴ An intruder first physically captures a legitimate node within the network, cloning the compromised node by cracking the confidential information and creating replicas sharing the same identifier. Not only the generation of replicas itself poses an aggressive behavior, but also the replicas can enable other vicious attacks ranging from data injection to routing loop creation.⁵

In VANETs, data are also transmitted through an open and shared communication media, the same as in WSNs. In addition, vehicular nodes are widely distributed and the network topology is rapidly changing,^6,7 so that it is difficult to manage all identities in a centralized way. Thus, the occurrence of identity replica attack is also possible in VANETs. Note that replica attack is different from Sybil attack in the sense that in the former, single compromised identity is used and then inserted into multiple physical vehicles, whereas in Sybil attack single vehicle impersonates multiple identities. Figure 1 shows an example of replica attack and its influence in VANETs, where vehicle I is the comprised one and $I_{1}$ is its replica as well as $I_{2}$ . Vehicle A wants to request the traffic conditions from vehicle I. With no attack, the request packet is transmitted like the route A-D-E-F-G-J-I. Unfortunately, the appearance of replica identities changes the transmission route. Vehicle $I_{1}$ obtains the request packet. For malicious intention, it would drop the packet to launch denial-of-service (DoS) attack or return a false traffic information to make traffic in a mess.

Figure 1.

Example of a replica attack and its influence in VANETs.

Motivation

One of the important security issues in VANETs is identity attack, which includes both Sybil and replica attack. Previous research has mainly focused on Sybil attack and numerous detection methods^3,7–10 are proposed. But the detection for destruction of replica attack remains much to be explored. In addition, the current replica detection methods are mostly addressed in WSNs.^11–14 We attempt to develop an effective and efficient method for detecting replica identities in VANETs. We take into account wide distribution and rapid movement of vehicles, which are characteristics of VANETs.

In this article, we treat the compromised identities and their clones as abnormal. They are both called replica identities. Those not being cloned are treated as normal. If a classification model is able to differentiate normal from abnormal identities, the replica identities and thus the replica attack would be identified. Here we propose a detection scheme using semi-supervised support vector machine (SVM) theory. Considering the nodal motion features and wide distribution in VANETs, we collect trajectory data of vehicles as the initial sample set. At every sampling time, each moving vehicle records its own locations with timestamp, as well as the information of direct communication neighbors, which form trajectory sample point. A series of sample points thus constitute a trajectory segment. Regarded as the original sample set, they will be sent to the control center server (CCS) through RSU.

Usually, vehicles with replica identities have abnormal movement characteristics comparing with normal identities, for instance, there exists a certain distance between vehicles with the same identity at the same moment. Based on the abnormal characteristics, we present the definition of the trajectory sample similarity and the method of feature extraction, so as to get ready for a semi-supervised SVM classification. With the help of the established classification model, we can identify replica identities, that is, replica attack. In particular, this classification model provides a detection basis for the subsequent collections.

Contribution

Compared with the previous work, we have the following contributions in this article:

On the basis of temporal and spatial trajectory segment we propose a novel replica attack detection method in VANETs. When replica attack occurs, replica identities have abnormal trajectory features, which contributes to build replica detection model.

The proposed scheme can detect conspiracy replicas in comparison with the related literatures.^15,16 Furthermore, the time performance of replica detection is almost impervious to the distance between compromised nodes and their clones compared with the literature.¹⁷

During the entire detection process, the communication overhead of vehicle-to-roadside and inter-vehicles does not increase compared with the literature.¹⁷ The detection accuracy has no restrictions on the proportion of replica identities in the local area.

Organization

The rest of the paper is organized as follows: section “Related work” reviews the related work and then introduces the network architecture and the security models in section “System model.” Section “Replica attack detection scheme with sequential trajectory segment” presents the novel replication detection schemes and their underlying rationales. Performance evaluation and experimental validation are reported in section “Performance evaluation.” Section “Conclusion” summarizes the work and concludes the paper.

Related work

At present, the solutions of replica attack detection have been widely researched in WSNs. The related research work has not been addressed in VANETs. Therefore, in this section, we study the detection schemes of related problems in WSNs.

One previous line of research has been devoted to address the issue in stationary WSNs.^4,18–20 The primary idea is that witness nodes check whether the same ID in different locations has conflicting declarations. The witnesses can be randomly selected, probably chosen, or chosen from the routing path. Obviously, the effect of replica attack detection mainly relies on the selection of witness nodes. More importantly, the node locations are required to be fixed. Meanwhile, replica detection schemes in stationary sensor networks are usually based on these security assumptions for which the benign nodes are majority, or a global sink node is required to make a judgment. This would initiate other serious issues such as bottleneck node, single point failure. Therefore, Wang and Yang²¹ put forward the idea of importing mobile node to detect replica attack in stationary WSNs. Leveraging a mobile node as the mobile sink, it not only alleviates the bottleneck phenomenon but also avoid single point failure. However, the scheme will incur additional communication and computation overhead for simultaneous localization among stationary nodes. To further reduce the detection overhead, Chen et al.²² present a mobile detection method, in which the mobile sink patrols the network and gathers data to accomplish the task of replicas detection. This solution performs effectively and efficiently with a small and well-balanced overhead. It relies on neither location mechanisms nor time synchronization algorithms. However, this method requires the mobile sink to be absolutely honest.

However, aforementioned replica detection solutions will not work when nodes are expected to move. This area of research is the focal point of the another line of research. A number of solutions^15–17,23 have been proposed to resolve the mobile replica detection problem.

In the literature, Yu et al.¹⁵ proposed a localized solution named eXtremely Efficient Detection (XED). The key idea of XED is that the latest random number is preserved between the recorder and the encounter. When meeting again, the recorder checks whether the random number of the previous exchange matches the current record or not. If not, presence of replica attacks will be reported. The scheme has a low computational and communication overhead. However, it loses the detection capability when the replicas conspire to synchronize the random numbers.

Ho et al.¹⁶ proposed a fast and effective mobile replica node detection scheme using the sequential probability ratio test (SPRT). This SPRT scheme is based on the intuition that a benign mobile node should never move faster than the preconfigured maximum speed. Replicas will appear to move much faster than benign ones since they are expected to turn up in different places at the same time. The base station checks whether the measured speed of a node exceeds the threshold to decide whether replicas exist. Similarly to XED, this scheme is also lack of the ability to detect collusive replicas. Because the replicas may intentionally schedule their own movements so as not to exceed the predefined maximum speed.

To adapt for any mobile scene, Xing and Cheng¹⁷ proposed two replication detection schemes from both the time domain (TDD) and the space domain (SDD). In their solution, they mainly utilize a one-way hash encryption function with low complexity, to force the replica nodes to keep on generating paradoxes. Each moving node exchanges messages in the vicinity and looks up for paradoxes statements to detect replicas. Especially, the approach is excellent resilience against collusive replicas because the local information exchange protocol prevents compromised nodes and their clones from synchronizing the bindings of relevant information. But during detecting process, the witness nodes record the information of observed identities and exchange their recorded information when meeting each other. This will cause excessive communication and large-sized storage overhead. In addition, time performance of detection decreases as the distance between compromised nodes and their clones increases.

In order to detect node replicas in mobile WSN, Deng et al.²⁴ have proposed two novel mobility-assisted distributed solutions, which are Unary-Time-Location Storage and Exchange (UTLSE) and Multi-Time-Location Storage and Diffusion (MTLSD). The former is a basic protocol and the latter improves the detection performance by storing multi time-location claims instead of single time-location claim and introducing more cooperation among witnesses. Both protocols are methods based on encounters among witness nodes. Once any of encountered witnesses gets conflicting time-location claims for the same node ID, it announces and triggers a revocation action against the replicated node. These protocols have high detection performance and low communication overhead. They can also be extended to deal with the scenario of information sharing among replicas; meanwhile, the communication and storage overhead are increased accordingly. However, the witnesses must be absolutely trust in their scheme that means detection accuracy will be decreased if the replica is selected as a witness node.

For detecting replica attack in 802.11-based ad hoc networks, Faisal et al.⁵ proposed a hybrid approach of local and global detection. In local scheme, the received signal strength information (RSSI) is used to calculate distance between nodes to detect replicas from different locations based on the principle of each location being bound with a unique identity. RSSI-based method not only reduces the extra hardware overhead, but also can effectively prevent duplicates from collusion in local area since the RSSI is hard to be forged. To identify replica attack in the overall network, each node verifies whether there is a bridge node between nodes, which have the same node in their 1-hop lists. The scheme has a remarkable accuracy, especially works efficiently when the density of nodes is high. Nevertheless, the nodal storage overhead would also be increased with high density, in that each node needs to store the shared 1-hop lists from other reference nodes in the overall network. Furthermore, RSSI-based methods can keep local nodes to be detected away from colluding with each other; yet it cannot avoid the nodes colluding with the verifier. If the verifier does not store and announce a replica’s existence in its local area, the detection accuracy of this scheme will be decreased.

To summarize, the previous work shows that majority of current schemes are lack of dealing with conspiracy replicas in mobile environment. Moreover, the detection performance would be affected by the distance between a compromised node and its replica. Although the appearance of mobile nodes is good for replica detection, there are also many challenges in mobile scenarios, for instance wide distribution and high-speed movement of nodes in VANETs. Thus, replicas may appear at anywhere in the network when replica attack happens. When replica attack happens, replicas may appear at anywhere in the network and mainly rely on the intentions of adversaries. To deal with the above-mentioned replica nodes appearing in different positions, we aim to present a novel detection method.

System model

In this section, we describe the system model adopted in our proposed replica detection method and then illustrate the related attack model and assumptions.

VANETs system model

The system model is built on four types of entities: trust authority (TA), CCS, RSU, and onboard unit (OBU), which form a typical two-tier network model architecture together, as shown in Figure 2.

Figure 2.

VANET’s network architecture.

In this system, two-layer communication model is constituted. The first layer refers to the communication between OBU and OBU or between OBU and RSU, that is, V2V and V2R mode. Each mobile vehicle equipped with OBU plays a role of acquisition data, collecting its own traffic information, location, speed, direction, and so on, which transmitted through dedicated short-range communication (DSRC) protocol. The second is the communication between RSU and TA. They communicate with each other over cellular network (2G/3G/4G), WIMAX, or WLAN. In this layer, TA/CCS has sufficient computing and storage resources to support anomaly detection. After they establish an identity classification model based on the collected sample data, they can perform attack detection for the subsequent collections. There are two types of messages in the two-layer communication model.

Messages communicating between OBU and OBU

Each vehicle periodically broadcasts its own traffic status to neighbors. We name the message as $M_{v 2 v}$ . Its format is shown as equation (1)

M_{v 2 v} (N_id | | M_id | | Azimuth | | Vel | | Loc | | T_Stamp | | Si g^{n})

(1)

Here, $N_id$ and $M_id$ are identity ID and message ID, respectively. Azimuth and Vel represent $N_id' s$ azimuth and speed, respectively. Loc is $N_id' s$ location at $T_Stamp$ . $Si g^{n}$ is the signature signed by n (vehicle with identity $N_id$ ) with n’s public key for message $M_{v 2 v}$ .

Utilizing received $M_{v 2 v}$ messages from its neighbors, each vehicle simultaneously extracts their identity IDs to form neighbor list. The vehicular neighbor list and its own $M_{v 2 v}$ message are aggregated into $M_{v 2 R}$ message, whose format is shown as equation (2)

M_{v 2 R} (N_id | | Loc | | Azimuth | | Vel | | Neb_list | | T_Stamp | | Si g^{n})

(2)

Here, $M_{v 2 R} (k, t_{i})$ represents the seven-tuple message of identity k at time $t_{i}$ . These messages are sent to RSU and then to CCS.

Messages communicating between OBU and RSU/CCS

Furthermore, in order to enable the gathered sample data to reflect movement characteristics of vehicles, we let each vehicle k collect $M_{v 2 R}$ messages with number of l at a given start time. Assuming that the start time is $t_{1}$ , the recorded message sequence is represented as MS, as shown in equation (3)

MS (k, t_{1}, l) = < M_{v 2 R} (k, t_{1}), M_{v 2 R} (k, t_{2}), \dots, M_{v 2 R} (k, t_{l}) >

(3)

Attack models and assumptions

In this article, we assume that attackers initiate replica attack for two possible reasons. On account of attackers’ different intentions, we assume two types of attack scenarios:

Case 1, an attacker has evil intentions of exploiting the legal identity of captured vehicle to distribute false information or to escape the responsibility for traffic accident. For this purpose, the cloned identity is stealthily deployed to other vehicles far away from the captured one, at least not within its communication range. This behavior is imperceptible to the captured vehicle. In this case, we assume that cloned identities have independant movements.

Case 2, the captured vehicle and the attacker reach a consensus on network behaviors. They share their identities with their accomplices and participate in the network together to launch other inside attacks such as DoS attack. This kind of replica attack is essentially an act of identity sharing. Since each vehicle only records its neighbors’ $N_id$ during communication, we assume that vehicles do not synchronize the bindings of time and location when colluding. Conspirators with shared identity appear within each other’s communication range, and for a long time, they have the same movement characteristics.

In addition, vehicles act as front-end tools for data acquisition, synthesis, and forwarding, not performing the replica attack detection in the whole system. Moreover, we assume that the vehicular location and time collected in real time is not tampered.

Replica attack detection scheme with sequential trajectory segment

In this section, we illustrate our scheme in detail for detecting identity replicas in VANETs. To be specific, we first describe the collection of sequential trajectory segment (STS), then with definition of the trajectory sample similarity, we present the method of extracting and labeling of trajectory features and finally propose replica detection scheme based on semi-supervised SVM. Table 1 lists the symbols used in the proposed method.

Table 1.

Definition of notations in the scheme.

Symbol	Description
$N_{k}$	A vehicle with identity k
$M_{v 2 v}$	Messages communicating between OBU and OBU
$M_{v 2 R}$	Aggregated messages of the vehicular neighbor list and its own $M_{v 2 v}$ message
$MS (k, t_{1}, l)$	Message sequence composed of $M_{v 2 R}$ messages of length l at a given start time $t_{1}$ recorded by vehicle k
$t_{j}$	Time when jth sample point was collected in trajectory sequence
$s_{k}^{t_{j}}$	Coordinate position $(x_{k}^{t_{j}}, y_{k}^{t_{j}})$ at time $t_{j}$
$T R_{i} (k, t_{1}, l)$	The ith sequential trajectory segment with length l at a given start time $t_{1}$ recorded by vehicle k
R	Communication radius of vehicle nodes
$dist (s_{k}^{t_{i}}, s_{p}^{t_{i}})$	Euclidean distance between points $s_{k}^{t_{i}}$ and $s_{p}^{t_{i}}$ at sampling time $t_{i}$
$CNN_List (s_{k}^{t_{i}})$	Communication neighbor list of at time $t_{i}$
$Neb_Lis t_{i} (k, t)$	The trajectory neighbors of $T R_{i}$ of vehicle k
${dist}_{ij}^{t_{m}}$	Euclidean distance between two trajectories at moment $t_{m}$
$d_{ij}$	The average of distance deviation at each consecutive sampling point in overlapping time zone
$\max (t_{i, 1}, t_{j, 1})$	Selecting the larger time from the first sampling time of trajectory i and that of trajectory j
$\min (t_{i, l}, t_{j, l})$	Selecting the earlier time from the last sampling time of trajectory i and that of trajectory j
$dis T_{ij}$	Time distance between trajectory segment $T R_{i}$ and $T R_{j}$
$s_{k}^{\max (t_{i, 1}, t_{j, 1})}$	The location corresponds to the time $\max (t_{i, 1}, t_{j, 1})$
$s_{k}^{\min (t_{i, l}, t_{j, l})}$	The location corresponds to the time $\min (t_{i, l}, t_{j, l})$
$Δ \min_T_{ij}$	The set made of overlapping sampling time
$s_{k, i}^{t'}$	Vehicular position at sampling time $t'$ of trajectory $T R_{i}$
${\bar{v}}_{k, i}$	Average speed of trajectory i in $\min_dis T_{ij}$
m	The number of labeled samples
n	The number of samples
c	The number of label categories
T	A $n \times n$ transfer probability matrix
Y	A $n \times c$ label probability distribution matrix

OBU: onboard unit.

Collection of STS

Each moving vehicle distributedly collects the motion characteristic data of vehicles to complete data acquisition and aggregation. The collected data will be sent to the backstage traffic control center for analyzing. The detailed steps are as follows:

Node $N_{k}$ periodically collects and sends its own $M_{v 2 v}$ message.

$N_{k}$ gathers all neighbor nodes’ $M_{v 2 v}$ messages at a given time to integrate into $M_{v 2 R}$ message.

$N_{k}$ collects $M_{v 2 R}$ message with number of l for a period of time $t_{1} ~ t_{l}$ to generate $MS (k, t_{1}, l)$ message sequence.

Extracts the spatio-temporal position from a message sequence $M S_{i} (k, t, l)$ to form the cor-responding trajectory segment $T R_{i} (k, t_{1}, l) = (s_{k}^{t_{1}}, s_{k}^{t_{2}}, \dots, s_{k}^{t_{j}}, \dots, s_{k}^{t_{l}})$ , where $t_{j}$ represents the time when jth sample point was collected in trajectory sequence and $s_{k}^{t_{j}} (j \in 0 \dots l - 1)$ represents the coordinate position $(x_{k}^{t_{j}}, y_{k}^{t_{j}})$ at time $t_{j}$ .

The message sequence $M S_{i} (k, t, l)$ is sent to the backstage control center through RSU and stored in sample database.

Thus, the initial sample data set is composed of trajectory segments at different times and with different identities.

The definition of the trajectory sample similarity

Sample data are required to be labeled before being classified with SVM. Here we adopt semi-supervised method to label sample data. The idea of semi-supervision is to propagate labels of the labeled data to the unlabeled data according to their intrinsic relations, which is called label propagation (LP). This implies that we can assign labels to the unlabeled sample from a small number of labeled sample, whereas the premise is to obtain the similarities between them. To this end, the first work of our replica detection scheme is to define the trajectory sample similarity according to their relationships. The relevant definitions are demonstrated as follows.

Definition 1

Trajectory sampling point neighbor. For two trajectory sampling points $s_{k}^{t_{i}}$ and $s_{p}^{t_{i}}$ in trajectories $T R_{i}$ and $T R_{j}$ , respectively, if equation (4) is satisfied, that is to say point $s_{p}^{t_{i}}$ is both physical neighbor and communication neighbor of point $s_{k}^{t_{i}}$ ,²⁵ we call that point $s_{p}^{t_{i}}$ is the trajectory sampling point neighbor of point $s_{k}^{t_{i}}$

{\begin{matrix} dist (s_{k}^{t_{i}}, s_{p}^{t_{i}}) \leq R \\ s_{p}^{t_{i}} \in CNN_List (s_{k}^{t_{i}}) \end{matrix}

(4)

In equation (4), $t_{i}$ is the same sampling time, $dist (s_{k}^{t_{i}}, s_{p}^{t_{i}})$ denotes the Euclidean distance between points $s_{k}^{t_{i}}$ and $s_{p}^{t_{i}}$ at sampling time $t_{i}$ , and R is the communication radius of vehicle nodes. If $dist (s_{k}^{t_{i}}, s_{p}^{t_{i}}) \leq R$ , $s_{p}^{t_{i}}$ is the physical neighbor of $s_{k}^{t_{i}}$ . $CNN_List (s_{k}^{t_{i}})$ represents the communication neighbor list of s_k at time $t_{i}$ .

Definition 2

Trajectory neighbor. $T R_{j}$ is called a trajectory neighbor of $T R_{i}$ with respect to for the given proportion r% (here r is a threshold), iff there exist at least r% trajectory sampling point neighbors in both trajectories. The trajectory neighbors of $T R_{i}$ of vehicle k are denoted as $Neb_Lis t_{i} (k, t)$ .

When replica attack occurs in trajectory neighbors, it is usually a collusive behavior. They have similar movement characteristics. So the similarity of trajectory neighbors is defined as Definition 3.

Definition 3

The similarity to trajectory neighbors. For two trajectory neighbors $T R_{i}$ and $T R_{j}$ , abbreviated as i and j; we represent their similarity as the average distance deviation of trajectory sampling point neighbors. As we all know, the smaller the distance deviation $d_{ij}$ is, the higher the similarity between i and j is. Assuming that two trajectory neighbors have overlapping time zone $t_{m} ~ t_{m + p}$ , that means, there are $p + 1$ sampling points at the same time. The distance deviation $d_{ij}$ between i and j is shown as equation (5)

d_{i, j} = \frac{\sum_{q = 1}^{p} ∥ {dist}_{ij}^{t_{m + q}} - {dist}_{ij}^{t_{m}} ∥}{p}, i \neq j

(5)

Here, ${dist}_{ij}^{t_{m}}$ indicates the euclidean distance of two trajectories at moment $t_{m}$ . $∥ {dist}_{ij}^{t_{m + q}} - {dist}_{ij}^{t_{m}} ∥$ stands for the distance deviation between two adjacent sampling time. $d_{ij}$ represents the average of distance deviation at each consecutive sampling point in overlapping time zone. The similarity is inversely proportional to the distance deviation.

Extraction of trajectory sample features

Sample features to be extracted should be helpful to differentiate normal identities from abnormal (replica) ones. In section “Attack models and assumptions,” we have assumed two replica attack scenarios. Under both scenarios, there is a common abnormal characteristic that vehicles with the same identity are separated with a certain distance at the same time. Furthermore, in Case 1, vehicles with the same identity have independent movement; consequently, replicas may appear at any time. When they appear one after another, the trajectory distance between them is not within a reasonable range. In Case 2, there are certain constraints among vehicles with the same identity, accordingly with the similar movement characteristics of trajectories. From above, we can conclude that replica attack in VANETs has the following abnormal characteristics:

Characteristic 1: When replica and cloned identity appear at the same time, the distance difference of trajectory sample at the same sampling time zone will exceed 0.

Characteristic 2: When the replica identity and the original identity do not appear at the same time, the drive distance of two adjacent trajectories is beyond the reasonable range.

Characteristic 3: The corresponding relative position relation-element between the original identity and the replica identity is consistent at the same sampling time.

From aforementioned, it is known that the summarized abnormal characteristics are related to the time gap of trajectories. In order to extract the features of trajectory samples, we define the time distance between trajectories.

Time distance of trajectory samples

Since each trajectory sample has a corresponding time span, it can be represented as the form of line segments on a one-dimensional time axis. Thus, the time distance between two trajectory samples is equivalent to the distance between the corresponding time gaps, which has two cases such as overlapped one and separated one, as shown in Figure 3. In Figure 3, $t_{i, 1}$ denotes the first sampling time on the ith trajectory segment; meanwhile, $t_{i, l}$ denotes the last one. $t_{j, 1}$ and $t_{j, l}$ are the same thing. Based on this, we define the time distance of two trajectories $dis T_{ij}$ following the idea of Jekard Distance,²⁶ which is the diversity measurement between two sets.

Figure 3.

The distance between two time spans.

Definition 4

Time distance $dis T_{ij}$ of trajectory samples. For two trajectories $T R_{i} (k_{i}, t_{i, 1}, l)$ and $T R_{j} (k_{j}, t_{j, 1}, l)$ . Both trajectories have l pieces of sampling points. The time distance $dis T_{ij}$ between $T R_{i} (k_{i}, t_{i, 1}, l)$ and $T R_{j} (k_{j}, t_{j, 1}, l)$ is defined as equation (6)

dis T_{ij} = Δ T_{ij} = \max (t_{i, 1}, t_{j, 1}) - \min (t_{i, l}, t_{j, l})

(6)

As shown in Figure 3, if $dis T_{ij}$ is negative, it means that there is time overlap and the two time segments intersect. On the other hand, if $dis T_{ij}$ is positive, it means that the two time segments are separated from each other. The smaller the $dis T_{ij}$ value, the closer the time distance is.

The abnormal characteristics of replica attack are mainly focused on vehicles with the same identity. Therefore, for each trajectory sample i to be studied, we query the same identity trajectory j with the smallest time distance based on the defined time distance $dis T_{ij}$ and record the minimum time distance between i and j as $\min_dis T_{ij}$ .

Relative position distance $RL_Dis t_{ij}$ of trajectory samples

According to the abnormal characteristics 1 and 2, we first define the relative positional offset $RL_Dis t_{ij}$ between trajectory i and trajectory j with the same identity. The relative positional distance based on the minimum time distance $\min_dis T_{ij}$ is calculated as equation (7)

RL_Dis t_{ij} = {\begin{matrix} ∥ s_{k}^{\max (t_{i, 1}, t_{j, 1})} - s_{k}^{\min (t_{i, l}, t_{j, l})} ∥, \min_dis T_{ij} > 0 \\ \frac{\sum_{t'} ∥ s_{k, i}^{t'} - s_{k, j}^{t'} ∥}{| Δ \min_T_{ij} |}, t' \in Δ \min_T_{ij}, \min_dis T_{ij} \leq 0 \end{matrix}

(7)

In equation (7), $\max (t_{i, 1}, t_{j, 1})$ represents to select the larger time from the first sampling time of trajectory i and that of trajectory j. $s_{k}^{\max (t_{i, 1}, t_{j, 1})}$ indicates the location corresponds to the time $\max (t_{i, 1}, t_{j, 1})$ . In the same way, $\min (t_{i, l}, t_{j, l})$ stands for selecting the earlier time from the last sampling time of trajectory i and that of trajectory j. $s_{k}^{\min (t_{i, l}, t_{j, l})}$ indicates the location corresponds to the time $\min (t_{i, l}, t_{j, l})$ . $\min_dis T_{ij} > 0$ suggests that there is a gap between the time span corresponding to trajectory i and that corresponding to trajectory j. $∥ s_{k}^{\max (t_{i, 1}, t_{j, 1})} - s_{k}^{\min (t_{i, l}, t_{j, l})} ∥$ reveals the relative position distance for the same identity k from its disappearance in the previous trajectory to its appearance in the latter trajectory. $\min_dis T_{ij} \leq 0$ means that two trajectories have overlapping in time axis. The set made of overlapping sampling time is denoted as $Δ \min_T_{ij}$ . $s_{k, i}^{t'}$ shows the vehicular position at sampling time $t'$ of trajectory i. $∥ s_{k, i}^{t'} - s_{k, j}^{t'} ∥$ indicates the location deviation in corresponding sampling time during the overlap of two trajectories. We sum up all $∥ s_{k, i}^{t'} - s_{k, j}^{t'} ∥$ during the overlap and then calculate the average to constitute the relative position distance $RL_Dis t_{ij}$ .

Vehicle drive distance drive_Dist_ij of trajectory samples

Similarly, another feature, named the vehicle drive distance $drive_Dis t_{ij}$ , is calculated on the minimum time distance $\min_dis T_{ij}$ as follows

drive_Dis t_{ij} = {\begin{matrix} \frac{({\bar{v}}_{k, i} + {\bar{v}}_{k, j}) * \min_dis T_{ij}}{2}, \min_dis T_{ij} > 0 \\ 0, \min_dis T_{ij} \leq 0 \end{matrix}

(8)

In equation (8), ${\bar{v}}_{k, i}$ and ${\bar{v}}_{k, j}$ respectively represent the average speed of trajectory i and trajectory j in $\min_dis T_{ij}$ . If $\min_dis T_{ij} > 0$ , the vehicle drive distance $drive_Dis t_{ij}$ equals to the distance traveled by trajectory i at the average speed ${\bar{v}}_{k, i}$ or trajectory j at the average speed ${\bar{v}}_{k, j}$ in time $\min_dis T_{ij}$ . If $\min_dis T_{ij} \leq 0$ , the two trajectories should not deviate during this time, so the vehicle drive distance is set as 0.

Relative position relation-element set (eigenvector)

According to the abnormal characteristics 1 and 3, we can infer that relation-elements of relative position of two trajectory samples, corresponding to the original identity and the replica identity, are consistent at the same sampling time. In view of this, an eigenvector will be defined on the basis of relation-elements of relative position at each same sampling time. Because the number of sampling points is at most l, we define an eigenvector containing l relation-elements of relative position, as shown in equation (9)

\begin{matrix} \vec{f} = (Δ s_{k}^{1}, Δ s_{k}^{2}, \dots, Δ s_{k}^{p}, \dots, Δ s_{k}^{l}) \\ where, s_{k}^{p} = s_{k}^{t_{i, p}} - s_{k}^{t_{j, p}}, p \in 1, \dots, l \end{matrix}

(9)

In equation (9), $Δ s_{k}^{p}$ represents the relative position of the ph same sampling point. If the number of the same sampling point is less than l, the rest time points defaulted as 0.

To summarize, trajectory sample features include an eigenvector, $\min_dis T_{ij}$ , $RL_Dis t_{ij}$ and $drive_Dis t_{ij}$ , as shown in Table 2. These features include the abnormal characteristics which we discussed in section “Extraction of trajectory sample features.” The detailed extraction algorithm of trajectory sample features is given in Algorithm 1.

Table 2.

The trajectory sample features.

Features	Description
$Δ s_{k}^{1}, Δ s_{k}^{2}, \dots, Δ s_{k}^{p}, \dots, Δ s_{k}^{l}$	Relation-elements of relative position at the same sampling time
$\min_dis T_{ij}$	The minimum time distance of trajectories containing the same identity
$RL_Dis t_{ij}$	The relative position distance on the shortest time interval
$drive_Dis t_{ij}$	The vehicle drive distance on the shortest distance

Algorithm 1. Trajectory sample features extraction algorithm.
Require: Sample trajectory data set ${T R_{i} (k_{i}, t_{i, 1}, l)}$ , $(i \in 1 \dots n)$ ; //n is trajectory sample size
Ensure: Formulate the features table(FTB)
1: for all $T R_{i} (i \in 1 \dots n)$ do
2: $TR_SI D_{k_{i}}$ = { $T R_{i}$ } $\cup TR_SI D_{k_{i}}$ ; //Put $T R_{i}$ into the group $TR_SI D_{k_{i}}$ with same identity $k_{i}$
3: end for
4: for each $TR_SI D_{k_{i}}$ do
5: Sort $TR_SI D_{k_{i}}$ according to $t_{i, 1}$ of $T R_{i}$ ;
6: for $p \leftarrow 1$ to $\| TR_SI D_{k_{i}} \| - 1$ do
7: $T R_{i} \leftarrow TR_SI D_{k_{i}} [p]$ ;
8: $T R_{j} \leftarrow TR_SI D_{k_{i}} [p + 1]$ ;
9: $\min_dis T_{ij} \leftarrow dis T_{i, j}$ ;// $\min_dis T_{ij}$ can be achieved from adjacent elements;
10: Calculate $RL_Dis t_{ij}$ (equation (7)), $drive_Dis t_{ij}$ (equation (8)), $\bar{f}$ (equation (9));
11: FTB.insert( $RL_Dis t_{ij}$ , $drive_Dis t_{ij}$ , $\bar{f}$ );
12: Normalize each feature component value of FTB[i];
13: end for
14: end for

Replica detection with STS

As aforementioned, replica identities including compromised identities and their duplicates are abnormal identities; the rest is normal identities. Each identity has a series of STSs and replica identities have abnormal trajectory features. Thus, the replica detection can be transformed into the detection of abnormal trajectories. We extract trajectory features based on the abnormal behavior of replica identity and then utilize the idea of semi-supervised SVM to establish a classification model, which can differentiate normal trajectories from abnormal ones.

All the above work in section “Replica attack detection scheme with sequential trajectory segment” was the preparation for utilizing the semi-supervised SVM method to detect replica attack. And then we describe the detailed process in Algorithm 2, which consists of training and detecting phases.

Algorithm 2. Replica detection algorithm based on sequential trajectory segment.
Require: The sample trajectory data set ${T R_{i} (k_{i}, t_{i, 1}), l} (i \in 1 \dots n)$ ; m identified sample labels ${γ_{1}, γ_{2}, \dots, γ_{m}}$ ( $m < < n$ )
1: Training phase:
2: for each $T R_{i} (i \in 1 \dots n)$ do
3: Calculate the similarity $d_{i, j}$ to their neighbors using equation (5);
4: end for
5: //Invoke Algorithm LP to label the initial trajectory samples:
6: repeat
7: Calculate the weight $ω_{ij}$ between $T R_{i}$ and $T R_{j}$ : $ω_{ij}$ = $\exp (- \frac{d_{i, j}^{2}}{σ^{2}})$ ;
8: Calculate transfer probability $T_{ij}$ : $T_{ij} = P (i \to j) = \frac{ω_{ij}}{\sum_{p \in Neb_Lis t_{i} (k, t)} ω_{ip}}$ ;
9: Calculate label matrix Y of samples: $Y \to T_{n \times n} Y_{n \times c}$ ; //c is the number of label categories
10: Keep the labeled sample in Y remains unchanged;
11: until Y converges;
12: Invoke algorithm 1 to extract the trajectory features and improve the label matrix Y;
13: Using FTB and Y to train detection model based on SVM theory (equation (10));
14: Detecting phase:
15: Invoke algorithm 1 to extract eigenvalue to be detected trajectories;
16: Using the eigenvalue as input of detection model to distinguish normal trajectory from abnormal one (i.e. replica identity);

Training phase

In this phase, we first label the initial trajectory samples. Here we adopt a semi-supervised method named LP algorithm.²⁷ The main idea of this algorithm is to propagate labels from very few labeled nodes to the unlabeled nodes, according to the similarity of inter-nodes on constructed network graph. We build network graph like this: each trajectory segment is regarded as a vertex of the graph, and there may be an edge between trajectory neighbors. In terms of the similarity to trajectory neighbors, defined in section “The definition of the trajectory sample similarity,” the higher the similarity is, the more likely is to be labeled the same. For example, when replicas and the captured vehicle conspire to launch inside attack (Case 2 in our attack scenarios), they have the similar movement. And it is obvious that they are adjacent nodes in the network from Definition 3. Thus, the similarity to them is high, and they are likely to be labeled the same. Therefore, we only need to label a very small amount of trajectory samples as normal or abnormal. As shown in Algorithm 2, ${γ_{1}, γ_{2}, \dots, γ_{m}}$ denotes the labels of m labeled samples. All the labels’ probability distributions of labeled and unlabeled samples are preserved in a $n \times c$ matrix Y, in which a row represents the probability distribution of the label $γ_{i}$ of each vertex $T R_{i}$ belonging to different labels. By invoking the LP algorithm, we calculate weight $ω_{ij}$ and transfer probability $T_{ij}$ between $T R_{i}$ and $T R_{j}$ based on the similarity $d_{i, j}$ . Every transfer probability $T_{ij}$ composes a transfer probability matrix T. On the premise of keeping the label of labeled sample unchanged, each node iteratively selects adjacent node’s label with high conversion probability to update its own label. Consequently, the unlabeled nodes can be propagated by very few labeled nodes on the constructed network graph. In this way, it relieves the cost of manual labeling.

After labelling the trajectory samples, we extract features including $\min_dis T_{ij}$ , $RL_Dis t_{ij}$ , $drive_Dis t_{ij}$ and an eigenvector from STS by Algorithm 1. Furthermore, we amend labels of trajectory sample features with the help of the relationship among extracted features, such as the relationship between $RL_Dis t_{ij}$ and $drive_Dis t_{ij}$ . As we all know, the shortest between two points is always a straight line distance. The relative position distance $RL_Dis t_{ij}$ of trajectory sample is always less than or equal to drive distance $drive_Dis t_{ij}$ . Once a sample violates the relationship, it should be labeled as abnormal. The sample’s label will be updated if its original label is normal. Thus, the accuracy of each sample label has been further improved.

Finally, we utilize the SVM theory of a dual model with slack variables, as shown in equation (10), to construct a classification model. Here, $α_{i}$ is Lagrange multiplier, $x_{i}$ is trajectory sample feature, $y_{i}$ is the label of the ith sample, $K (\cdot)$ is the Gaussian kernel function, and C is the penalty factor

\begin{matrix} \max \sum_{i = 1}^{n} α_{i} - \frac{1}{2} \sum_{i, j = 1}^{n} α_{i} α_{j} y_{i} y_{j} K (x_{i}, x_{j}) \\ s . t . \sum_{i = 1}^{n} α_{i} y_{i} \\ 0 \leq α_{i} \leq C \end{matrix}

(10)

Detecting phase

In detecting phase, we collect the subsequent trajectory data set and calculate the features and then utilize the classification model trained during the training phase to distinguish normal trajectories from abnormal ones. The related identity of the abnormal trajectory is the replica identity. The entire replica detection process based on semi-supervised SVM is given in Algorithm 2.

Performance evaluation

In this section, we will evaluate our proposed scheme in two aspects. One is performance analysis in communication cost and node memory overhead. The other is experimental validation of the detection performance in classification accuracy rate (CAR), detection rate (DR), and false detection rate (FDR).

Performance analysis

Communication overhead

In the proposed detection scheme, we take full advantage of basic collaborate communications in VANETs to collect trajectory samples. A vehicle periodically sends its own traffic status data and receives neighbors’ information, which do not need to generate additional communication data. Then the STS is formed and forwarded to the control center by RSU. This is accomplished by basic V2R communication. Therefore, the detection scheme does not bring additional transmission overhead to VANET’s communication.

Nodal storage overhead

Each vehicle node $N_{k}$ periodically collects its own spatio-temporal motion characteristic data to generate $M_{v 2 v}$ message. The vehicle exchanges with the neighbor node to integrate the $M_{v 2 R}$ message which contains the neighbor information and collects a certain number of $M_{v 2 R}$ messages continuously to form STS. Therefore, the storage overhead of each vehicle node is to store a certain number of $M_{v 2 R}$ messages. Assuming that the number of $M_{v 2 R}$ messages collected continuously is l, and the length of the $M_{v 2 R}$ message is $L_{M_{v 2 v}}$ , the nodal storage cost is $S (l * L_{M_{v 2 v}})$ . Due to the continuous collection of $M_{v 2 R}$ message to form the trajectory of the vehicle, l also represents the number of sampling points or trajectory points, that is trajectory length. In the case where the $M_{v 2 R}$ message format is fixed, the smaller the trajectory length l is, the smaller the storage overhead required by the node is.

Table 3 exhibits the comparison of existing replica detection methods in mobile scenarios from conspiracy attack detection, communication cost, computation cost, and storage cost. Supposing a mobile ad hoc network is comprised of N nodes. In TDD and SDD solutions, $θ_{u} (t_{i})$ denotes the number of nodes that a node u meets at the time $t \pm Δ$ , and d denotes the number of nodes being randomly selected for paradox check. In UTLSE and MTLSD, each node tracks $\sqrt{N}$ nodes and has $\sqrt{N}$ witnesses. According to the analysis of the reviewed work in section “Related work,” we list the result of comparison. From Table 4, we conclude that each node has lower computation and communication overhead than others in STS method, which also has the ability to detect conspiracy attack.

Table 3.

The comparison of existing replica detection methods in mobile scenarios.

Replica detection methods	Conspiracy attack detection	Computation overhead	Communication overhead	Storage overhead
XED¹⁵	×	O(1)	O(1)	O(N)
SPRT¹⁶	×	O(1)	$O (\sqrt{N})$	O(1)
TDD¹⁷	√	$O (\sum_{t_{i} \in (t \pm Δ)} θ_{u} (t_{i}))$	$O (\sqrt{N})$	O(N)
SDD-LC¹⁷	√	O(1)	O(1)	O(N)
SDD-LWC¹⁷	√	O(d)	O(d)	O(N)
UTLSE and MTLSD²⁴	×	$O (\sqrt{N})$	O(N)	$O (\sqrt{N})$
STS	√	O(1)	O(1)	$O (l * L_{M_{v 2 v}})$

XED: eXtremely Efficient Detection; SPRT: sequential probability ratio test; TDD: time domain; UTLSE: Unary-Time-Location Storage and Exchange; MTLSD: Multi-Time-Location Storage and Diffusion; STS: sequential trajectory segment; SDD-LC: Space-Domain Detection for Local Check; SDD-LWC: Space-Domain Detection for Local Wittness Check.

Table 4.

Simulation parameters.

Parameter	Value
Node communication radius	250 m
RSU communication radius	1000 m
Vehicle speed	10–35 m/s
Data acquisition cycle T	1–10 s
Trajectory neighbor threshold r	50
Simulation time	388 s

Experimental evaluation

In order to further verify the performance of the algorithm, simulation of urban mobility (SUMO) and Network Simulator version 2 (NS2) is used to simulate the experimental scene of VANETs. SUMO²⁸ is a professional open-source micro-traffic simulation platform that can generate static traffic road and dynamic movement scene. It will create three configuration script files, which include travel time of vehicles, start and end time of simulation, vehicular number, time, speed, location, and so on. The mobility.tcl file records the coordinates of location along with speed at different times and is loaded into the network simulator NS2 as the main configuration file. NS2 can be used to simulate a variety of network topology and communication protocol. After that, we acquire the movement characteristics data received by each RSU from fr files and treat them as the original trajectory data to be analyzed. Then we perform the feature extraction, abnormal detection and show the corresponding detection result in MATLAB.

The simulation area is 12,000 × 82 m². Assume that the communication radius of vehicle node is 250 m and that of RSU is 1000 m. For the sake of collection of vehicle movement characteristics in time, RSU is set along the road every 500 m. In the simulated traffic scene, we suppose that stolen replica identities (replicas deployed stealthily) and collusive replica identities (replicas deployed collusively) are occupied 50%, respectively. We observe vehicles’ movement and communication in 388 s, then implement the detection algorithm. To improve the efficiency of detection, we ensure the training sample to contain both attack scenarios. The simulation parameters are set as shown in Table 4.

CAR

In terms of our proposal method, we extract the eigenvectors to form the sample data set from 17,347 trajectories with 1214 different identities, which are collected in 388 s. Meanwhile, we fabricate 30% of replica identities of vehicles. The sample data with different labeled ratios are trained and tested with different kernel functions. The classification accuracy is shown in Table 5.

Table 5.

Classification accuracy under different Kernel functions and different labeled ratios.

Ratio of labeled samples	Kernel
Ratio of labeled samples	Linear	Polynomial	RBF	Sigmoid
1%	93.9204	92.8804	97.3451	95.8723
2%	93.8182	92.9204	98.2301	95.9802
3%	95.6853	92.9053	98.1132	93.9089
4%	95.4545	95.9344	99.0909	95.9381
5%	95.9558	94.9543	98.9571	94.9597
6%	97.9778	94.9769	98.9781	97.5
7%	96.9867	91.1504	99.0909	93.9906
8%	96.9871	95.9869	98.2301	95.9895
9%	96.9921	94.9912	98.9932	96.9944
10%	96.992	93.9909	97.7528	95.9948

Table 5 shows that STS detection algorithm can achieve better classification effect with a small number of labeled samples, due to taking into account the overall structure of samples. Under different labeled ratios, the classification results demonstrate that Gaussian (radial basis function (RBF)) kernel has the highest classification accuracy in four different kernel functions. Therefore, the RBF kernel is used as the kernel function of SVM training model. We go further step to give the comparison of classification accuracy among STS algorithm, SVM-based algorithm, and LP-based algorithm under different labeled sample ratios. From Figure 4, we can achieve the following findings. First, STS algorithm always has a high CAR. SVM-based algorithm has a lower classification accuracy than ours, when the ratio of labeled samples is less than 65%. This is because that STS algorithm can make full use of the similarity among trajectory samples to quickly assign unlabeled samples, even if there is no enough labeled samples. Meanwhile, with the increase of the labeled ratios, the classification accuracy of SVM-based algorithm is also improved obviously. Second, the classification accuracy of LP-based algorithm has slight influence by ratios of labeled samples, which is fluctuated around 80%. This is because the sample similarity applied in LP-based algorithm is only in relation to collusive replica attack. As long as this kind of identity is labeled, it is likely to propagate to other collusive replica identities. The stolen replica identity can be classified correctly only in the labeled case. The bigger the proportion of the stolen replica identity is, the higher the overall classification accuracy is. Consequently, the overall classification accuracy is presented as the upward and downward trend.

Figure 4.

Comparison of classification accuracy with different labeled ratios.

Detection performance

The classification accuracy reflects the predicted correct ratio in whole sample data. However, in the case of unequal distribution of categories or large number of errors in sample classification prediction, classification accuracy is still good. Therefore, to validate the effectiveness of our algorithm, we evaluate its performance further from both DR and FDR. Also, the influence of the different ratios of malicious nodes and different densities of vehicles on the performance is observed.

The effect of the replica ratio on the detection performance

In the VANET comprosed of 1214 different identity vehicles, a different number of replicas were fabricated with the rate of 2%, 4%, 6%, 8%, and 10%, respectively. We compare the DR and the FDR vary with the replica identity ratio under different trajectory lengths, as shown in Figure 5(a) and (b).

Figure 5.

Detection performance under different replica identity ratios: (a) detection rate and (b) false detection rate.

Figure 5 exhibits that the DR and the FDR of our algorithm have little variation with the replica identity ratio. This can be explained by the fact that the extraction of the classification features mainly takes the motion characteristic data for instance the vehicular spatial-time location into account. Also we define the trajectory features according to the abnormal characteristics such as the unreasonable distance deviations between the same identities, who appear simultaneously or successively. Therefore, the DR is almost impervious to the replica identity ratio in VANETs. During the observation period of 388 s, when the vehicle trajectory length is less than 20, the DR is above 96%, and the FDR is less than 2.3%. In addition, it is observed that with the increase of the trajectory length, the DR decreases whereas the FDR increases. For a certain observation period, the longer the trajectory length is, the less the trajectories’ numbers with the same identity are. The probability of a certain identity with one trajectory becomes greater. Because the same identity trajectory cannot be found, the identity may be missed or be false detected, which will lead to lower DR and higher FDR.

Figure 5 indicates the impact of the trajectory length on the detection performance, the smaller the l is, the higher the DR is, and the lower the FDR is. According to the analysis of the storage cost in section “Performance analysis,” the smaller the length l is, the smaller the storage overhead is. Therefore, the value of l should be set small for considering the detection performance and storage costs.

The effect of the vehicle density on the detection performance

Our proposed detection algorithm relies on the received trajectory data. The reception of data packets by each vehicle and by RSU will affect the detection performance of the algorithm. When vehicle density changes, the packet reception is also different. Hence, in this part, we will conduct several experiments to examine the packet loss rate variation with vehicular density, which happens among vehicles or among RSUs. Then the influence of vehicular density on the detection performance is also verified.

The vehicle traffic data at different traffic densities are simulated and 10% of the replica identities are fabricated. Considering the length of the vehicle trajectory being l = 10, when the traffic density is different, data pack loss in the vehicle nodes and RSU are observed as shown in Figure 6. With the increase of vehicle density, the two kinds of packet loss rate present an increasing trend. In the similar conditions, the loss rate of packets sent to RSU is almost twice higher than the loss rate of packets sent among vehicles. This is because when the vehicle density increases, the probability of messages collision increases, which results in the increasing of packet loss rate. Under the same conditions, the loss rate of packets sent from the vehicle to the RSU and the loss rate of packets sent among vehicles are mainly dependent on the length of packet. Since the MS message sequence sent to the RSU is longer than the instantaneous state message $M_{v 2 v}$ sent among vehicles, the packet loss rate is also higher. In addition, with the increase of the acquisition cycle, the phenomenon of packet loss of vehicles and RSUs is alleviated. Because when the acquisition cycle increases, the frequency of data transmission decreases. And the probability of message collision decreases, then the packet loss rate also decreases.

Figure 6.

The packet loss rate variation at different vehicle densities: (a) T = 1 s, (b) T = 5 s, and (c) T = 10 s.

Figure 7 compares the DRs and FDRs at different acquisition cycles with different vehicle densities. As shown in Figure 7(a), with the increase of vehicular density, the DR decreases. Because the packet loss rate among vehicles and RSUs increases along with the increase of vehicular density. And packet loss in RSU leads directly to the less number of trajectories in the network. The probability of one identity only relating to one trajectory will increase during the observation period. Also the packet loss among vehicles will also affect the determination of the trajectory neighbor in the LP phase. If the packet loss rate among vehicles increases, the neighbor trajectory node becomes a non-neighbor trajectory node, for the reason that the condition of the communication neighbor cannot be satisfied. When the trajectory with unlabeled abnormal identity is determined as the non-neighbor trajectory of the labeled abnormal node, at the same time the unlabeled abnormal identity in the network has only one trajectory; miss detection will happen, which will result in decreasing of DR. In addition, the DR does not change significantly under different acquisition cycles. That is, as the acquisition cycle increases, on one hand, the packet loss rate decreases and the DR increases. On the other hand, the number of trajectories decreases which will lead to a decline of DR. At different vehicle densities, the minimum DR remains above 94% when the number of neighbor vehicles reaches 90.

Figure 7.

Detection performance at different vehicle densities: (a) detection rate and (b) false detection rate.

Figure 7(b) indicates that the FDR of replica identity increases with the increase of vehicular density. When the relative position between the replica node and its normal neighbor node is consistent, and the neighbor node has only one trajectory, there will be a false detection phenomenon. With the vehicular density increasing, there is a common trend that the motion characteristics among neighbors is in a consistent manner. Meanwhile, the packet loss rate increases along with increasing density, which will lead to the probability of one identity only relating to one trajectory increasing. That is, the probability of false detection increases. The FDR is similar to the DR under different acquisition cycles. When the number of neighbors reaches 90, the maximum FDR is not more than 2.5%.

Detection time

In this section, we evaluate the time performance of replica detection affected by the distance between compromised nodes and their clones. The proposed STS scheme is compared with two related mechanisms TDD and SDD in the literature.¹⁷ In Figure 8, it is observed that the detection time affected by distance in our scheme is much less than TDD and SDD method. This is because in the proposed scheme, we collect STSs including both compromised identities and their duplicates in CCS simultaneously, then analyze them by semi-supervised SVM. In TDD and SDD, the detection time increases with increasing distance. Because in this scheme, the validation node need to be selected from local areas. It obtains the knowledge that both two nodes v and w encountered u in different ways and then checks whether u is the replica based on three criteria. This depends on local information exchange. If the distance between compromised nodes and their clones increases, it will delay the detection and so the detection time increases.

Figure 8.

Comparison of detection time with different distances.

Conclusion

In VANETs, vehicle nodes are widely distributed and move with high speed, as a result their identities can be easily stolen or conspired. Replica attackers use the identity replica in a malicious way that can initiate various inside attacks and even destroy the whole network. The performance of replica detection solutions in existing mobile environment is prone to be affected by the distance between replica nodes and compromised nodes. To this end, we propose a detection scheme on the basis of STS. Relying on the temporal and spatial characteristics of trajectories data, we define the sample similarity, extract the sample features, and then establish an detection model based on semi-supervised SVM. The performance analysis and experimental studies show that the DR is high and the FDR is low; there is no additional communication overhead of vehicle-to-RSU and inter-vehicles. Also the proposed replica attack detection method is not sensitive to replica identity ratio. In future work, to further alleviate the computational cost of centralized analysis processing, parallel computation can be utilized to optimize the processing of data.

Footnotes

Acknowledgements

The authors would also like to thank the anonymous reviewers for their constructive comments and helpful suggestions.

Handling Editor: Syed Hassan Ahmed

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China under grant nos 61472001, U1405255, and U1764263, and research and innovation project of Jiangsu Province under grant no. CXLX11_0592.

ORCID iD

Yan Xin

References

Hasrouny

Samhat

Bassil

et al . VANET security challenges and solutions: a survey. Veh Commun 2017; 7: 7–20.

Bariah

Shehada

Salahat

et al . Recent advances in VANET security: a survey. In: Proceedings of the 82nd vehicular technology conference (VTC Fall), Boston, MA, 6–9 September 2015, pp.1–7. New York: IEEE.

De Sales

TBM

Perkusich

De Sales

et al . ASAP-V: a privacy-preserving authentication and Sybil detection protocol for VANETs. Inf Sci 2016; 372: 208–224.

Parno

Perrig

Gligor

. Distributed detection of node replication attacks in sensor networks. In: Proceedings of the IEEE symposium on security and privacy, Oakland, CA, 8–11 May 2005, pp.49–63. New York: IEEE.

Faisal

Abbas

Rahman

HU.

Identity attack detection system for 802.11-based ad hoc networks. EURASIP J Wireless Commun Netw 2018; 2018: 128.

Mishra

Singh

Kumar

VANET security: issues, challenges and solutions. In: Proceedings of the international conference on electrical, electronics, and optimization techniques, Chennai, India, 3–5 March 2016, pp.1050–1055. New York: IEEE.

Feng

Chen

et al . A method for defensing against multi-source Sybil attacks in VANET. Peer-to-Peer Netw Appl 2017; 10(2): 305–314.

Park

Aslam

Turgut

et al . Defense against Sybil attack in the initial deployment stage of vehicular ad hoc network based on roadside unit support. Secur Commun Netw 2013; 6(4): 523–538.

Xiao

Detecting Sybil attacks in VANETs. J Paral Distrib Comput 2013; 73(6): 746–756.

10.

Chang

Zhu

et al . Footprint: detecting Sybil attacks in urban vehicular networks. IEEE Trans Paral Distrib Syst 2012; 23(6): 1103–1114.

11.

Zhu

Zhou

Deng

et al . Detecting node replication attacks in wireless sensor networks: a survey. J Netw Comput Appl 2012; 35(3): 1022–1034.

12.

Khan

Aalsalem

Saad

MNBM

et al . Detection and mitigation of node replication attacks in wireless sensor networks: a survey. Int J Distrib Sens Netw 2013; 2013(2): 718–720.

13.

Mishra

Turuk

AK.

A comparative analysis of node replica detection schemes in wireless sensor networks. J Netw Comput Appl 2016; 61: 21–32.

14.

Ding

Yang

Localization-free detection of replica node attacks in wireless sensor networks using similarity estimation with group deployment knowledge. Sensors 2017; 17(1): 160.

15.

Kuo

SY.

Mobile sensor network resilient against node replication attacks. In: Proceedings of the sensor, mesh and ad hoc communications and networks SECON ‘08, San Francisco, CA, 16–20 June 2008, pp.597–599. New York: IEEE.

16.

Wright

Das

SK.

Fast detection of mobile replica node attacks in wireless sensor networks using sequential hypothesis testing. IEEE Trans Mobile Comput 2011; 10(6): 767–782.

17.

Xing

Cheng

From time domain to space domain: detecting replica attacks in mobile ad hoc networks. In: Proceedings of the INFOCOM, San Diego, CA, 14–19 March 2010, pp.1–9. New York: IEEE.

18.

Conti

Pietro

Mancini

et al . A randomized, efficient, and distributed protocol for the detection of node replication attacks in wireless sensor networks. In: Proceedings of the ACM international symposium on mobile Ad Hoc networking and computing, Montréal, Québec, Canada, 9–14 September 2007, pp. 80–89. New York: IEEE.

19.

Zhu

Addada

VGK

Setia

et al . Efficient distributed detection of node replication attacks in sensor networks. In: Proceedings of the twenty-third annual computer security applications conference ACSAC, Miami Beach, FL, 10–14 December 2007, pp.257–267. New York: IEEE.

20.

Chen

Meng

Zhan

YZ.

Detecting and defending against replication attacks in wireless sensor networks. Int J Distrib Sens Netw 2013; 2013: 55–60.

21.

Wang

Yang

Patrol detection for replica attacks on wireless sensor networks. Sensors 2011; 11(3): 2496–2504.

22.

Chen

Wang

Zhan

YZ.

Mobile detection of replication attacks in wireless sensor network. J Commun 2012; 2012: 178–185.

23.

Cho

Lee

et al . Energy-efficient replica detection for resource-limited mobile devices in the internet of things. IET Commun 2013; 7(18): 2141–2150.

24.

Deng

Xiong

Chen

. Mobility-assisted detection of the replication attacks in mobile wireless sensor networks. In: Proceedings of the IEEE international conference on wireless and mobile computing, networking and communications, Niagara Falls, ON, 11–13 October 2010, pp.225–232. New York: IEEE.

25.

Grover

Laxmi

Gaur

MS.

Sybil attack detection in VANET using neighbouring vehicles. Int J Secur Netw 2014; 9(4): 222–233.

26.

Yunyan

Jiawei

et al . Density-based spatiotemporal clustering analysis of trajectories. J Geo-Inform Sci 2015; 17: 1162–1171.

27.

Fujiwara

Irie

Efficient label propagation. In: Proceedings of the 31st international conference on machine learning (ICML-14), Beijing, China, 21–26 June 2014, pp.784–792. New York: IEEE.

28.

Behrisch

Bieker

Erdmann

et al . SUMO—simulation of urban mobility: an overview. In: SIMUL 2011, Barcelona, Spain, 23–29 October 2011, pp. 63–68. ThinkMind.

Replica attack detection method for vehicular ad hoc networks with sequential trajectory segment

Abstract

Keywords

Introduction

Motivation

Contribution

Organization

Related work

System model

VANETs system model

Messages communicating between OBU and OBU

Messages communicating between OBU and RSU/CCS

Attack models and assumptions

Replica attack detection scheme with sequential trajectory segment

Collection of STS

The definition of the trajectory sample similarity

Definition 1

Definition 2

Definition 3

Extraction of trajectory sample features

Time distance of trajectory samples

Definition 4

Relative position distance RL _ Dis t ij of trajectory samples

Vehicle drive distance drive_Distij of trajectory samples

Relative position relation-element set (eigenvector)

Replica detection with STS

Training phase

Detecting phase

Performance evaluation

Performance analysis

Communication overhead

Nodal storage overhead

Experimental evaluation

CAR

Detection performance

The effect of the replica ratio on the detection performance

The effect of the vehicle density on the detection performance

Detection time

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iD

References

Relative position distance $RL_Dis t_{ij}$ of trajectory samples

Vehicle drive distance drive_Dist_ij of trajectory samples