Abstract
In vehicular ad hoc networks, attackers can disguise as replicas of legitimate vehicles by cracking or colluding and then use the identity replicas in a malicious way. Not only the generation of replicas itself poses an aggressive behavior, but also the replicas can enable other insider attacks, such as denial of service, information interception, and replay attack. To solve this issue, researchers have presented many solutions in wireless sensor network or in mobile ad hoc networks. However, majority of current schemes are not good at dealing with conspiracy replicas or lack of considering peculiar characteristics of high mobility of vehicles. For detecting identity replicas in vehicular ad hoc networks, we propose a detection method with sequential trajectory segment based on semi-supervised support vector machine. In terms of semi-supervised support vector machine, we establish a detection model using spatio-temporal trajectories of different identities as input samples, which include features of both conspiracy and non-conspiracy attack scenarios. To validate our approach, we apply sequential trajectory segment to simulation environment. The performance analysis and experimental studies suggest that our proposed method provides high detection accuracy, which is almost impervious to the replica identity ratios in vehicular ad hoc networks. Furthermore, the time performance of replica detection is less affected by the distance between compromised nodes and their clones than that of existing solutions.
Keywords
Introduction
In vehicular ad hoc networks (VANETs), vehicles perform as mobile nodes sending messages to other vehicles and to units (roadside units (RSUs)) via wireless communication. Due to the openness nature of wireless communication in VANETs, the vehicular identity as the foundation of this communication is vulnerable to security threats.1–3 For example, an adversary may infiltrate virtually any unit (electronic control unit) of a vehicle to physically compromise it by means of certain communication device. Then he may create replicas of its identity with the captured credentials and secretly deploy them at selected real vehicles. This type of attack is known as an identity replica attack or identity clone attack, in which one identity is used by multiple vehicles in multiple places. The replicas disguise as legal nodes to participate in the network. They can further launch various hazardous attacks depending on intentions (selfish or destructive) of the attacker. On account of forming the root cause of many security problems, replica attack has important repercussions in VANETs.
The concept of replica attack was originally proposed in wireless sensor networks (WSNs). 4 An intruder first physically captures a legitimate node within the network, cloning the compromised node by cracking the confidential information and creating replicas sharing the same identifier. Not only the generation of replicas itself poses an aggressive behavior, but also the replicas can enable other vicious attacks ranging from data injection to routing loop creation. 5
In VANETs, data are also transmitted through an open and shared communication media, the same as in WSNs. In addition, vehicular nodes are widely distributed and the network topology is rapidly changing,6,7 so that it is difficult to manage all identities in a centralized way. Thus, the occurrence of identity replica attack is also possible in VANETs. Note that replica attack is different from Sybil attack in the sense that in the former, single compromised identity is used and then inserted into multiple physical vehicles, whereas in Sybil attack single vehicle impersonates multiple identities. Figure 1 shows an example of replica attack and its influence in VANETs, where vehicle I is the comprised one and

Example of a replica attack and its influence in VANETs.
Motivation
One of the important security issues in VANETs is identity attack, which includes both Sybil and replica attack. Previous research has mainly focused on Sybil attack and numerous detection methods3,7–10 are proposed. But the detection for destruction of replica attack remains much to be explored. In addition, the current replica detection methods are mostly addressed in WSNs.11–14 We attempt to develop an effective and efficient method for detecting replica identities in VANETs. We take into account wide distribution and rapid movement of vehicles, which are characteristics of VANETs.
In this article, we treat the compromised identities and their clones as abnormal. They are both called replica identities. Those not being cloned are treated as normal. If a classification model is able to differentiate normal from abnormal identities, the replica identities and thus the replica attack would be identified. Here we propose a detection scheme using semi-supervised support vector machine (SVM) theory. Considering the nodal motion features and wide distribution in VANETs, we collect trajectory data of vehicles as the initial sample set. At every sampling time, each moving vehicle records its own locations with timestamp, as well as the information of direct communication neighbors, which form trajectory sample point. A series of sample points thus constitute a trajectory segment. Regarded as the original sample set, they will be sent to the control center server (CCS) through RSU.
Usually, vehicles with replica identities have abnormal movement characteristics comparing with normal identities, for instance, there exists a certain distance between vehicles with the same identity at the same moment. Based on the abnormal characteristics, we present the definition of the trajectory sample similarity and the method of feature extraction, so as to get ready for a semi-supervised SVM classification. With the help of the established classification model, we can identify replica identities, that is, replica attack. In particular, this classification model provides a detection basis for the subsequent collections.
Contribution
Compared with the previous work, we have the following contributions in this article:
On the basis of temporal and spatial trajectory segment we propose a novel replica attack detection method in VANETs. When replica attack occurs, replica identities have abnormal trajectory features, which contributes to build replica detection model.
The proposed scheme can detect conspiracy replicas in comparison with the related literatures.15,16 Furthermore, the time performance of replica detection is almost impervious to the distance between compromised nodes and their clones compared with the literature. 17
During the entire detection process, the communication overhead of vehicle-to-roadside and inter-vehicles does not increase compared with the literature. 17 The detection accuracy has no restrictions on the proportion of replica identities in the local area.
Organization
The rest of the paper is organized as follows: section “Related work” reviews the related work and then introduces the network architecture and the security models in section “System model.” Section “Replica attack detection scheme with sequential trajectory segment” presents the novel replication detection schemes and their underlying rationales. Performance evaluation and experimental validation are reported in section “Performance evaluation.” Section “Conclusion” summarizes the work and concludes the paper.
Related work
At present, the solutions of replica attack detection have been widely researched in WSNs. The related research work has not been addressed in VANETs. Therefore, in this section, we study the detection schemes of related problems in WSNs.
One previous line of research has been devoted to address the issue in stationary WSNs.4,18–20 The primary idea is that witness nodes check whether the same ID in different locations has conflicting declarations. The witnesses can be randomly selected, probably chosen, or chosen from the routing path. Obviously, the effect of replica attack detection mainly relies on the selection of witness nodes. More importantly, the node locations are required to be fixed. Meanwhile, replica detection schemes in stationary sensor networks are usually based on these security assumptions for which the benign nodes are majority, or a global sink node is required to make a judgment. This would initiate other serious issues such as bottleneck node, single point failure. Therefore, Wang and Yang 21 put forward the idea of importing mobile node to detect replica attack in stationary WSNs. Leveraging a mobile node as the mobile sink, it not only alleviates the bottleneck phenomenon but also avoid single point failure. However, the scheme will incur additional communication and computation overhead for simultaneous localization among stationary nodes. To further reduce the detection overhead, Chen et al. 22 present a mobile detection method, in which the mobile sink patrols the network and gathers data to accomplish the task of replicas detection. This solution performs effectively and efficiently with a small and well-balanced overhead. It relies on neither location mechanisms nor time synchronization algorithms. However, this method requires the mobile sink to be absolutely honest.
However, aforementioned replica detection solutions will not work when nodes are expected to move. This area of research is the focal point of the another line of research. A number of solutions15–17,23 have been proposed to resolve the mobile replica detection problem.
In the literature, Yu et al. 15 proposed a localized solution named eXtremely Efficient Detection (XED). The key idea of XED is that the latest random number is preserved between the recorder and the encounter. When meeting again, the recorder checks whether the random number of the previous exchange matches the current record or not. If not, presence of replica attacks will be reported. The scheme has a low computational and communication overhead. However, it loses the detection capability when the replicas conspire to synchronize the random numbers.
Ho et al. 16 proposed a fast and effective mobile replica node detection scheme using the sequential probability ratio test (SPRT). This SPRT scheme is based on the intuition that a benign mobile node should never move faster than the preconfigured maximum speed. Replicas will appear to move much faster than benign ones since they are expected to turn up in different places at the same time. The base station checks whether the measured speed of a node exceeds the threshold to decide whether replicas exist. Similarly to XED, this scheme is also lack of the ability to detect collusive replicas. Because the replicas may intentionally schedule their own movements so as not to exceed the predefined maximum speed.
To adapt for any mobile scene, Xing and Cheng 17 proposed two replication detection schemes from both the time domain (TDD) and the space domain (SDD). In their solution, they mainly utilize a one-way hash encryption function with low complexity, to force the replica nodes to keep on generating paradoxes. Each moving node exchanges messages in the vicinity and looks up for paradoxes statements to detect replicas. Especially, the approach is excellent resilience against collusive replicas because the local information exchange protocol prevents compromised nodes and their clones from synchronizing the bindings of relevant information. But during detecting process, the witness nodes record the information of observed identities and exchange their recorded information when meeting each other. This will cause excessive communication and large-sized storage overhead. In addition, time performance of detection decreases as the distance between compromised nodes and their clones increases.
In order to detect node replicas in mobile WSN, Deng et al. 24 have proposed two novel mobility-assisted distributed solutions, which are Unary-Time-Location Storage and Exchange (UTLSE) and Multi-Time-Location Storage and Diffusion (MTLSD). The former is a basic protocol and the latter improves the detection performance by storing multi time-location claims instead of single time-location claim and introducing more cooperation among witnesses. Both protocols are methods based on encounters among witness nodes. Once any of encountered witnesses gets conflicting time-location claims for the same node ID, it announces and triggers a revocation action against the replicated node. These protocols have high detection performance and low communication overhead. They can also be extended to deal with the scenario of information sharing among replicas; meanwhile, the communication and storage overhead are increased accordingly. However, the witnesses must be absolutely trust in their scheme that means detection accuracy will be decreased if the replica is selected as a witness node.
For detecting replica attack in 802.11-based ad hoc networks, Faisal et al. 5 proposed a hybrid approach of local and global detection. In local scheme, the received signal strength information (RSSI) is used to calculate distance between nodes to detect replicas from different locations based on the principle of each location being bound with a unique identity. RSSI-based method not only reduces the extra hardware overhead, but also can effectively prevent duplicates from collusion in local area since the RSSI is hard to be forged. To identify replica attack in the overall network, each node verifies whether there is a bridge node between nodes, which have the same node in their 1-hop lists. The scheme has a remarkable accuracy, especially works efficiently when the density of nodes is high. Nevertheless, the nodal storage overhead would also be increased with high density, in that each node needs to store the shared 1-hop lists from other reference nodes in the overall network. Furthermore, RSSI-based methods can keep local nodes to be detected away from colluding with each other; yet it cannot avoid the nodes colluding with the verifier. If the verifier does not store and announce a replica’s existence in its local area, the detection accuracy of this scheme will be decreased.
To summarize, the previous work shows that majority of current schemes are lack of dealing with conspiracy replicas in mobile environment. Moreover, the detection performance would be affected by the distance between a compromised node and its replica. Although the appearance of mobile nodes is good for replica detection, there are also many challenges in mobile scenarios, for instance wide distribution and high-speed movement of nodes in VANETs. Thus, replicas may appear at anywhere in the network when replica attack happens. When replica attack happens, replicas may appear at anywhere in the network and mainly rely on the intentions of adversaries. To deal with the above-mentioned replica nodes appearing in different positions, we aim to present a novel detection method.
System model
In this section, we describe the system model adopted in our proposed replica detection method and then illustrate the related attack model and assumptions.
VANETs system model
The system model is built on four types of entities: trust authority (TA), CCS, RSU, and onboard unit (OBU), which form a typical two-tier network model architecture together, as shown in Figure 2.

VANET’s network architecture.
In this system, two-layer communication model is constituted. The first layer refers to the communication between OBU and OBU or between OBU and RSU, that is, V2V and V2R mode. Each mobile vehicle equipped with OBU plays a role of acquisition data, collecting its own traffic information, location, speed, direction, and so on, which transmitted through dedicated short-range communication (DSRC) protocol. The second is the communication between RSU and TA. They communicate with each other over cellular network (2G/3G/4G), WIMAX, or WLAN. In this layer, TA/CCS has sufficient computing and storage resources to support anomaly detection. After they establish an identity classification model based on the collected sample data, they can perform attack detection for the subsequent collections. There are two types of messages in the two-layer communication model.
Messages communicating between OBU and OBU
Each vehicle periodically broadcasts its own traffic status to neighbors. We name the message as
Here,
Utilizing received
Here,
Messages communicating between OBU and RSU/CCS
Furthermore, in order to enable the gathered sample data to reflect movement characteristics of vehicles, we let each vehicle k collect
Attack models and assumptions
In this article, we assume that attackers initiate replica attack for two possible reasons. On account of attackers’ different intentions, we assume two types of attack scenarios:
Case 1, an attacker has evil intentions of exploiting the legal identity of captured vehicle to distribute false information or to escape the responsibility for traffic accident. For this purpose, the cloned identity is stealthily deployed to other vehicles far away from the captured one, at least not within its communication range. This behavior is imperceptible to the captured vehicle. In this case, we assume that cloned identities have independant movements.
Case 2, the captured vehicle and the attacker reach a consensus on network behaviors. They share their identities with their accomplices and participate in the network together to launch other inside attacks such as DoS attack. This kind of replica attack is essentially an act of identity sharing. Since each vehicle only records its neighbors’
In addition, vehicles act as front-end tools for data acquisition, synthesis, and forwarding, not performing the replica attack detection in the whole system. Moreover, we assume that the vehicular location and time collected in real time is not tampered.
Replica attack detection scheme with sequential trajectory segment
In this section, we illustrate our scheme in detail for detecting identity replicas in VANETs. To be specific, we first describe the collection of sequential trajectory segment (STS), then with definition of the trajectory sample similarity, we present the method of extracting and labeling of trajectory features and finally propose replica detection scheme based on semi-supervised SVM. Table 1 lists the symbols used in the proposed method.
Definition of notations in the scheme.
OBU: onboard unit.
Collection of STS
Each moving vehicle distributedly collects the motion characteristic data of vehicles to complete data acquisition and aggregation. The collected data will be sent to the backstage traffic control center for analyzing. The detailed steps are as follows:
Node
Extracts the spatio-temporal position from a message sequence
The message sequence
Thus, the initial sample data set is composed of trajectory segments at different times and with different identities.
The definition of the trajectory sample similarity
Sample data are required to be labeled before being classified with SVM. Here we adopt semi-supervised method to label sample data. The idea of semi-supervision is to propagate labels of the labeled data to the unlabeled data according to their intrinsic relations, which is called label propagation (LP). This implies that we can assign labels to the unlabeled sample from a small number of labeled sample, whereas the premise is to obtain the similarities between them. To this end, the first work of our replica detection scheme is to define the trajectory sample similarity according to their relationships. The relevant definitions are demonstrated as follows.
Definition 1
Trajectory sampling point neighbor. For two trajectory sampling points
In equation (4),
Definition 2
Trajectory neighbor.
When replica attack occurs in trajectory neighbors, it is usually a collusive behavior. They have similar movement characteristics. So the similarity of trajectory neighbors is defined as Definition 3.
Definition 3
The similarity to trajectory neighbors. For two trajectory neighbors
Here,
Extraction of trajectory sample features
Sample features to be extracted should be helpful to differentiate normal identities from abnormal (replica) ones. In section “Attack models and assumptions,” we have assumed two replica attack scenarios. Under both scenarios, there is a common abnormal characteristic that vehicles with the same identity are separated with a certain distance at the same time. Furthermore, in Case 1, vehicles with the same identity have independent movement; consequently, replicas may appear at any time. When they appear one after another, the trajectory distance between them is not within a reasonable range. In Case 2, there are certain constraints among vehicles with the same identity, accordingly with the similar movement characteristics of trajectories. From above, we can conclude that replica attack in VANETs has the following abnormal characteristics:
Characteristic 1: When replica and cloned identity appear at the same time, the distance difference of trajectory sample at the same sampling time zone will exceed 0.
Characteristic 2: When the replica identity and the original identity do not appear at the same time, the drive distance of two adjacent trajectories is beyond the reasonable range.
Characteristic 3: The corresponding relative position relation-element between the original identity and the replica identity is consistent at the same sampling time.
From aforementioned, it is known that the summarized abnormal characteristics are related to the time gap of trajectories. In order to extract the features of trajectory samples, we define the time distance between trajectories.
Time distance of trajectory samples
Since each trajectory sample has a corresponding time span, it can be represented as the form of line segments on a one-dimensional time axis. Thus, the time distance between two trajectory samples is equivalent to the distance between the corresponding time gaps, which has two cases such as overlapped one and separated one, as shown in Figure 3. In Figure 3,

The distance between two time spans.
Definition 4
Time distance
As shown in Figure 3, if
The abnormal characteristics of replica attack are mainly focused on vehicles with the same identity. Therefore, for each trajectory sample i to be studied, we query the same identity trajectory j with the smallest time distance based on the defined time distance
Relative position distance
of trajectory samples
According to the abnormal characteristics 1 and 2, we first define the relative positional offset
In equation (7),
Vehicle drive distance drive_Distij of trajectory samples
Similarly, another feature, named the vehicle drive distance
In equation (8),
Relative position relation-element set (eigenvector)
According to the abnormal characteristics 1 and 3, we can infer that relation-elements of relative position of two trajectory samples, corresponding to the original identity and the replica identity, are consistent at the same sampling time. In view of this, an eigenvector will be defined on the basis of relation-elements of relative position at each same sampling time. Because the number of sampling points is at most l, we define an eigenvector containing l relation-elements of relative position, as shown in equation (9)
In equation (9),
To summarize, trajectory sample features include an eigenvector,
The trajectory sample features.
Replica detection with STS
As aforementioned, replica identities including compromised identities and their duplicates are abnormal identities; the rest is normal identities. Each identity has a series of STSs and replica identities have abnormal trajectory features. Thus, the replica detection can be transformed into the detection of abnormal trajectories. We extract trajectory features based on the abnormal behavior of replica identity and then utilize the idea of semi-supervised SVM to establish a classification model, which can differentiate normal trajectories from abnormal ones.
All the above work in section “Replica attack detection scheme with sequential trajectory segment” was the preparation for utilizing the semi-supervised SVM method to detect replica attack. And then we describe the detailed process in Algorithm 2, which consists of training and detecting phases.
Training phase
In this phase, we first label the initial trajectory samples. Here we adopt a semi-supervised method named LP algorithm.
27
The main idea of this algorithm is to propagate labels from very few labeled nodes to the unlabeled nodes, according to the similarity of inter-nodes on constructed network graph. We build network graph like this: each trajectory segment is regarded as a vertex of the graph, and there may be an edge between trajectory neighbors. In terms of the similarity to trajectory neighbors, defined in section “The definition of the trajectory sample similarity,” the higher the similarity is, the more likely is to be labeled the same. For example, when replicas and the captured vehicle conspire to launch inside attack (Case 2 in our attack scenarios), they have the similar movement. And it is obvious that they are adjacent nodes in the network from Definition 3. Thus, the similarity to them is high, and they are likely to be labeled the same. Therefore, we only need to label a very small amount of trajectory samples as normal or abnormal. As shown in Algorithm 2,
After labelling the trajectory samples, we extract features including
Finally, we utilize the SVM theory of a dual model with slack variables, as shown in equation (10), to construct a classification model. Here,
Detecting phase
In detecting phase, we collect the subsequent trajectory data set and calculate the features and then utilize the classification model trained during the training phase to distinguish normal trajectories from abnormal ones. The related identity of the abnormal trajectory is the replica identity. The entire replica detection process based on semi-supervised SVM is given in Algorithm 2.
Performance evaluation
In this section, we will evaluate our proposed scheme in two aspects. One is performance analysis in communication cost and node memory overhead. The other is experimental validation of the detection performance in classification accuracy rate (CAR), detection rate (DR), and false detection rate (FDR).
Performance analysis
Communication overhead
In the proposed detection scheme, we take full advantage of basic collaborate communications in VANETs to collect trajectory samples. A vehicle periodically sends its own traffic status data and receives neighbors’ information, which do not need to generate additional communication data. Then the STS is formed and forwarded to the control center by RSU. This is accomplished by basic V2R communication. Therefore, the detection scheme does not bring additional transmission overhead to VANET’s communication.
Nodal storage overhead
Each vehicle node
Table 3 exhibits the comparison of existing replica detection methods in mobile scenarios from conspiracy attack detection, communication cost, computation cost, and storage cost. Supposing a mobile ad hoc network is comprised of N nodes. In TDD and SDD solutions,
The comparison of existing replica detection methods in mobile scenarios.
XED: eXtremely Efficient Detection; SPRT: sequential probability ratio test; TDD: time domain; UTLSE: Unary-Time-Location Storage and Exchange; MTLSD: Multi-Time-Location Storage and Diffusion; STS: sequential trajectory segment; SDD-LC: Space-Domain Detection for Local Check; SDD-LWC: Space-Domain Detection for Local Wittness Check.
Simulation parameters.
Experimental evaluation
In order to further verify the performance of the algorithm, simulation of urban mobility (SUMO) and Network Simulator version 2 (NS2) is used to simulate the experimental scene of VANETs. SUMO 28 is a professional open-source micro-traffic simulation platform that can generate static traffic road and dynamic movement scene. It will create three configuration script files, which include travel time of vehicles, start and end time of simulation, vehicular number, time, speed, location, and so on. The mobility.tcl file records the coordinates of location along with speed at different times and is loaded into the network simulator NS2 as the main configuration file. NS2 can be used to simulate a variety of network topology and communication protocol. After that, we acquire the movement characteristics data received by each RSU from fr files and treat them as the original trajectory data to be analyzed. Then we perform the feature extraction, abnormal detection and show the corresponding detection result in MATLAB.
The simulation area is 12,000 × 82 m2. Assume that the communication radius of vehicle node is 250 m and that of RSU is 1000 m. For the sake of collection of vehicle movement characteristics in time, RSU is set along the road every 500 m. In the simulated traffic scene, we suppose that stolen replica identities (replicas deployed stealthily) and collusive replica identities (replicas deployed collusively) are occupied 50%, respectively. We observe vehicles’ movement and communication in 388 s, then implement the detection algorithm. To improve the efficiency of detection, we ensure the training sample to contain both attack scenarios. The simulation parameters are set as shown in Table 4.
CAR
In terms of our proposal method, we extract the eigenvectors to form the sample data set from 17,347 trajectories with 1214 different identities, which are collected in 388 s. Meanwhile, we fabricate 30% of replica identities of vehicles. The sample data with different labeled ratios are trained and tested with different kernel functions. The classification accuracy is shown in Table 5.
Classification accuracy under different Kernel functions and different labeled ratios.
Table 5 shows that STS detection algorithm can achieve better classification effect with a small number of labeled samples, due to taking into account the overall structure of samples. Under different labeled ratios, the classification results demonstrate that Gaussian (radial basis function (RBF)) kernel has the highest classification accuracy in four different kernel functions. Therefore, the RBF kernel is used as the kernel function of SVM training model. We go further step to give the comparison of classification accuracy among STS algorithm, SVM-based algorithm, and LP-based algorithm under different labeled sample ratios. From Figure 4, we can achieve the following findings. First, STS algorithm always has a high CAR. SVM-based algorithm has a lower classification accuracy than ours, when the ratio of labeled samples is less than 65%. This is because that STS algorithm can make full use of the similarity among trajectory samples to quickly assign unlabeled samples, even if there is no enough labeled samples. Meanwhile, with the increase of the labeled ratios, the classification accuracy of SVM-based algorithm is also improved obviously. Second, the classification accuracy of LP-based algorithm has slight influence by ratios of labeled samples, which is fluctuated around 80%. This is because the sample similarity applied in LP-based algorithm is only in relation to collusive replica attack. As long as this kind of identity is labeled, it is likely to propagate to other collusive replica identities. The stolen replica identity can be classified correctly only in the labeled case. The bigger the proportion of the stolen replica identity is, the higher the overall classification accuracy is. Consequently, the overall classification accuracy is presented as the upward and downward trend.

Comparison of classification accuracy with different labeled ratios.
Detection performance
The classification accuracy reflects the predicted correct ratio in whole sample data. However, in the case of unequal distribution of categories or large number of errors in sample classification prediction, classification accuracy is still good. Therefore, to validate the effectiveness of our algorithm, we evaluate its performance further from both DR and FDR. Also, the influence of the different ratios of malicious nodes and different densities of vehicles on the performance is observed.
The effect of the replica ratio on the detection performance
In the VANET comprosed of 1214 different identity vehicles, a different number of replicas were fabricated with the rate of 2%, 4%, 6%, 8%, and 10%, respectively. We compare the DR and the FDR vary with the replica identity ratio under different trajectory lengths, as shown in Figure 5(a) and (b).

Detection performance under different replica identity ratios: (a) detection rate and (b) false detection rate.
Figure 5 exhibits that the DR and the FDR of our algorithm have little variation with the replica identity ratio. This can be explained by the fact that the extraction of the classification features mainly takes the motion characteristic data for instance the vehicular spatial-time location into account. Also we define the trajectory features according to the abnormal characteristics such as the unreasonable distance deviations between the same identities, who appear simultaneously or successively. Therefore, the DR is almost impervious to the replica identity ratio in VANETs. During the observation period of 388 s, when the vehicle trajectory length is less than 20, the DR is above 96%, and the FDR is less than 2.3%. In addition, it is observed that with the increase of the trajectory length, the DR decreases whereas the FDR increases. For a certain observation period, the longer the trajectory length is, the less the trajectories’ numbers with the same identity are. The probability of a certain identity with one trajectory becomes greater. Because the same identity trajectory cannot be found, the identity may be missed or be false detected, which will lead to lower DR and higher FDR.
Figure 5 indicates the impact of the trajectory length on the detection performance, the smaller the l is, the higher the DR is, and the lower the FDR is. According to the analysis of the storage cost in section “Performance analysis,” the smaller the length l is, the smaller the storage overhead is. Therefore, the value of l should be set small for considering the detection performance and storage costs.
The effect of the vehicle density on the detection performance
Our proposed detection algorithm relies on the received trajectory data. The reception of data packets by each vehicle and by RSU will affect the detection performance of the algorithm. When vehicle density changes, the packet reception is also different. Hence, in this part, we will conduct several experiments to examine the packet loss rate variation with vehicular density, which happens among vehicles or among RSUs. Then the influence of vehicular density on the detection performance is also verified.
The vehicle traffic data at different traffic densities are simulated and 10% of the replica identities are fabricated. Considering the length of the vehicle trajectory being l = 10, when the traffic density is different, data pack loss in the vehicle nodes and RSU are observed as shown in Figure 6. With the increase of vehicle density, the two kinds of packet loss rate present an increasing trend. In the similar conditions, the loss rate of packets sent to RSU is almost twice higher than the loss rate of packets sent among vehicles. This is because when the vehicle density increases, the probability of messages collision increases, which results in the increasing of packet loss rate. Under the same conditions, the loss rate of packets sent from the vehicle to the RSU and the loss rate of packets sent among vehicles are mainly dependent on the length of packet. Since the MS message sequence sent to the RSU is longer than the instantaneous state message

The packet loss rate variation at different vehicle densities: (a) T = 1 s, (b) T = 5 s, and (c) T = 10 s.
Figure 7 compares the DRs and FDRs at different acquisition cycles with different vehicle densities. As shown in Figure 7(a), with the increase of vehicular density, the DR decreases. Because the packet loss rate among vehicles and RSUs increases along with the increase of vehicular density. And packet loss in RSU leads directly to the less number of trajectories in the network. The probability of one identity only relating to one trajectory will increase during the observation period. Also the packet loss among vehicles will also affect the determination of the trajectory neighbor in the LP phase. If the packet loss rate among vehicles increases, the neighbor trajectory node becomes a non-neighbor trajectory node, for the reason that the condition of the communication neighbor cannot be satisfied. When the trajectory with unlabeled abnormal identity is determined as the non-neighbor trajectory of the labeled abnormal node, at the same time the unlabeled abnormal identity in the network has only one trajectory; miss detection will happen, which will result in decreasing of DR. In addition, the DR does not change significantly under different acquisition cycles. That is, as the acquisition cycle increases, on one hand, the packet loss rate decreases and the DR increases. On the other hand, the number of trajectories decreases which will lead to a decline of DR. At different vehicle densities, the minimum DR remains above 94% when the number of neighbor vehicles reaches 90.

Detection performance at different vehicle densities: (a) detection rate and (b) false detection rate.
Figure 7(b) indicates that the FDR of replica identity increases with the increase of vehicular density. When the relative position between the replica node and its normal neighbor node is consistent, and the neighbor node has only one trajectory, there will be a false detection phenomenon. With the vehicular density increasing, there is a common trend that the motion characteristics among neighbors is in a consistent manner. Meanwhile, the packet loss rate increases along with increasing density, which will lead to the probability of one identity only relating to one trajectory increasing. That is, the probability of false detection increases. The FDR is similar to the DR under different acquisition cycles. When the number of neighbors reaches 90, the maximum FDR is not more than 2.5%.
Detection time
In this section, we evaluate the time performance of replica detection affected by the distance between compromised nodes and their clones. The proposed STS scheme is compared with two related mechanisms TDD and SDD in the literature. 17 In Figure 8, it is observed that the detection time affected by distance in our scheme is much less than TDD and SDD method. This is because in the proposed scheme, we collect STSs including both compromised identities and their duplicates in CCS simultaneously, then analyze them by semi-supervised SVM. In TDD and SDD, the detection time increases with increasing distance. Because in this scheme, the validation node need to be selected from local areas. It obtains the knowledge that both two nodes v and w encountered u in different ways and then checks whether u is the replica based on three criteria. This depends on local information exchange. If the distance between compromised nodes and their clones increases, it will delay the detection and so the detection time increases.

Comparison of detection time with different distances.
Conclusion
In VANETs, vehicle nodes are widely distributed and move with high speed, as a result their identities can be easily stolen or conspired. Replica attackers use the identity replica in a malicious way that can initiate various inside attacks and even destroy the whole network. The performance of replica detection solutions in existing mobile environment is prone to be affected by the distance between replica nodes and compromised nodes. To this end, we propose a detection scheme on the basis of STS. Relying on the temporal and spatial characteristics of trajectories data, we define the sample similarity, extract the sample features, and then establish an detection model based on semi-supervised SVM. The performance analysis and experimental studies show that the DR is high and the FDR is low; there is no additional communication overhead of vehicle-to-RSU and inter-vehicles. Also the proposed replica attack detection method is not sensitive to replica identity ratio. In future work, to further alleviate the computational cost of centralized analysis processing, parallel computation can be utilized to optimize the processing of data.
Footnotes
Acknowledgements
The authors would also like to thank the anonymous reviewers for their constructive comments and helpful suggestions.
Handling Editor: Syed Hassan Ahmed
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China under grant nos 61472001, U1405255, and U1764263, and research and innovation project of Jiangsu Province under grant no. CXLX11_0592.
