Abstract
The escalating sophistication of cyber attacks in intelligent transportation systems necessitates the development of advanced intrusion detection systems capable of processing multi-modal data efficiently while maintaining real-time performance. To address these critical cybersecurity challenges, this paper presents a novel spatio-temporal hybrid model for robust attack detection. Our framework, designed to overcome the limitations of existing methods in handling complex hybrid threats, integrates three key components: convolutional neural networks (CNN) for spatial feature extraction; bidirectional long short-term memory (BiLSTM) networks to capture long-range temporal dependencies, and an attention mechanism for adaptive feature weighting. This innovative architecture enables comprehensive spatio-temporal pattern analysis while dynamically prioritizing the most discriminative features. Extensive experimental results demonstrate that our model significantly outperforms existing AI-based detection methods, such as Korea New Network-KNN and Multilayer Perceptron-MLP, achieving superior detection accuracy and enhanced robustness against sophisticated attacks.
Introduction
The intellectualization and networking of automobiles signify a crucial trajectory in the ongoing transformation of the automotive sector. Smart connected vehicles harness cutting-edge communication technologies to facilitate effortless data interchange and utilize a suite of sensors—such as cameras, laser radars, and millimeter-wave radars—to attain thorough environmental awareness. By amalgamating communication data with insights gleaned from sensor readings, these vehicles can make well-informed decisions and implement precise control measures.1–3 Nowadays, the automotive electronic system has become the linchpin of contemporary vehicles, with the Vehicle Internet system materializing through the interconnection of these electronic components. Nevertheless, the inherent openness of the network environment renders automobiles vulnerable to an array of potential cyber security risks during the networking process, encompassing data tampering, viral attacks, and the insertion of malicious software, to name a few.4–6 According to the 2020 Auto Network Security Report issued by Upstream Security, an overseas security research entity, automotive cyber security incidents witnessed a staggering 605% increase from 2016 to January 2020. For example, on January 11, 2024, the Cybernews research team discovered that certain subdomain names of BMW were susceptible to redirection vulnerabilities, enabling attackers to craft links that, in reality, directed users to malicious websites; on June 3, 2025, numerous foreign media outlets reported that Stormous, a notorious global blackmail gang, claimed to have infiltrated Volkswagen AG and stolen sensitive information, including user account data and vehicle VIN codes. This concerning trend highlights the urgent necessity for in-depth research into resilient security control strategies tailored for intelligent transportation.
As a quintessential example of a cyber-physical system, intelligent transportation systems’ heavy dependence on unsecured wireless communication channels inherently renders them susceptible to cyber-attacks. 7 The lack of tangible security perimeters leaves these systems vulnerable to electromagnetic exploits, facilitating unauthorized network intrusions that jeopardize not only the integrity of transmitted data but also the reliability of vital vehicle safety communications in intelligent transportation. 8 At present, attacks targeting intelligent transportation information systems primarily fall into three categories: Denial-of-Service (DoS), replay attack, and deception attacks. DoS (Denial-of-Service) assaults focus on vehicular communication by deliberately inundating network channels with an excessive volume of redundant data packets. This intentional congestion of malicious traffic clogs up the information flow, thereby obstructing the vehicle’s onboard systems from acquiring vital control instructions sent by the cloud-based infrastructure. 9 Replay attacks entail the act of introducing adversarial signals that mimic legitimate system signals at a precise instant. In 2010, the Stuxnet virus unleashed extensive damage on industrial systems globally, utilizing replay attack strategies to transmit harmful commands formulated from intercepted data. 10 In contrast, deceptive attacks undermine vehicular systems by introducing tampered data packets into the communication flow. These nefarious alterations mislead the vehicle’s decision-making algorithms, effectively seizing control of the mechanisms and steering operational responses toward hazardous or unintended conditions. 11 Therefore, the detection of hybrid attacks—including Denial-of-Service (DoS), replay attack, and deceptive attacks—is of paramount importance to ensuring the security of intelligent transportation systems.
To tackle the challenge of detecting hybrid attacks within intelligent transportation systems, significant research endeavors have been channeled into devising robust countermeasures. Current strategies can be broadly categorized into two main approaches: artificial intelligence (AI)-detection techniques and model-based detection mechanisms. Table 1 provides a summary of the aforementioned detection methods. An attack detection method using improved Kalman filter was proposed to detect the injected malicious attacks in intelligent transportation. 11 In Gao et al., 12 to safeguard the information security of the train-ground communication system, an intrusion detection approach leveraging machine learning and state observer technology is introduced to identify and classify diverse types of attacks. A Partial Differential Equation-based observer is devised to identify the False Data Injection attack and pinpoint the exact location where the attack is introduced within the platoon. 13 In Cheng et al., 14 an adaptive detection and identification approach, utilizing an unknown input observer, has been developed to counteract malicious attacks in intelligent transportation. While model-based detection approaches in Chowdhury et al., 11 Gao et al., 12 Biroon et al., 13 and Cheng et al. 14 demonstrate effectiveness in identifying false data injection attacks, their performance is inherently limited by the fidelity of the underlying system model and the precision of threshold determination. Unlike these model-dependent techniques, AI-driven detection methods provide a significant advantage by operating without reliance on the intelligent transportation system’s mathematical representation. For example, a Long Short-Term Memory (LSTM)-based approach for SQL injection attack detection is proposed, enabling automatic extraction of meaningful data representations. 15 A graph-based machine learning techniques to identify malicious is proposed, by which the injected malicious attacks can be detected. 16 In AlEisa et al., 17 to enhance cybersecurity in connected automotive systems, a deep learning-powered Intrusion Detection System for real-time monitoring of malicious activities across In-Vehicle Networks, Vehicle-to-Vehicle communications, and Vehicle-to-Infrastructure networks is proposed. A new traffic anomaly detection approach based on Multi-Head Attention is proposed, which considers the built-in correlations of network traffic data. 18 Zhang et al., 19 designed a Transformer-enabled transfer learning framework for intrusion detection, specifically tailored to extract and interpret spatiotemporal sequential patterns from vehicular data streams. Nevertheless, existing AI-driven detection frameworks in Li et al., 15 Gupta et al., 16 AlEisa et al., 17 Li et al., 18 and Zhang et al. 19 neglect to model the interdependent spatio-temporal relationships that characterize modern network attacks in intelligent transportation systems.
Summary of techniques for detecting attacks.
Motivated by the limitations of the above existing methods, this paper introduces a novel convolutional neural networks (CNN)-bidirectional long short-term memory (BiLSTM)-Attention hybrid model for robust attack detection. The proposed architecture synergistically combines CNN for spatial feature extraction, BiLSTM for capturing contextual temporal dependencies, and an attention mechanism for adaptive feature refinement. The proposed integration detection framework enables effective representation of complex threat patterns amid dynamic and high-dimensional intelligent transportation systems. Extensive experiments demonstrate that our model achieves state-of-the-art performance, outperforming existing detection approaches across multiple metrics. The main contributions of this work are summarized as follows:
A novel CNN-BiLSTM-Attention detection framework that jointly models spatial features (via CNN), bidirectional temporal dynamics (via BiLSTM), and critical feature selection (via attention) for hybrid attack detection in intelligent transportation systems. Unlike existing methods that process spatial and temporal features separately, our unified approach can detect the injected attacks by capturing the spatio-temporal feature in intelligent transportation systems.
The introduced Attention mechanism automatically can emphasize the most discriminative spatial-temporal features while suppressing noise and irrelevant variations. This leads to more robust detection against evolving attack strategies compared to static feature-weighting approaches.
Experimental results underscore the superiority of our proposed detection model demonstrating marked improvements over existing works, such as Korea New Network-KNN 21 and Multilayer Perceptron-MLP. 22
The structure of this paper is outlined as follows. Section II provides the description of the problem. In Section III, the CNN-BiLSTM-Attention framework for attack detection is established. The simulation experiments in Section V illustrate the advantages of the proposed approach. Finally, Section IV presents the conclusion and discussion.
Problem description
In this section, a linear physical dynamic model is constructed to characterize the intelligent vehicular network system. Based on this model, we then develop a comprehensive hybrid attack framework that integrates DoS, replay, and deceptive attack strategies.
Physical dynamic model of intelligent transportation system
As shown in Figure 1, the intelligent networked vehicle system can monitor and control the vehicle status by collecting and processing sensor data from connected vehicles. According to the work in Wang et al., 20 a linear physical model of intelligent vehicular system can be constructed as follows
where

Intelligent vehicular network system.
DoS attack model
DoS attacks are intended to obstruct or postpone the transmission of control commands
where
By injecting numerous malicious data packets, attackers cause channel congestion and subsequent data loss, as shown in Figure 2. This compromises the monitoring system’s state estimation capability, potentially leading to traffic disruptions or safety-critical incidents.

Vehicular speed under different attacks.
Replay attack model
Malicious attackers execute replay attacks through the interception and retransmission of legitimate historical data, effectively bypassing the system’s security monitoring and misleading the control center. Based on the above intelligent vehicular system model, the replay attack model can be constructed as follows
where
Through a coordinated replay attack, the adversary can retransmit the historical data from time
Deceptive attack model
Attackers deceive the detection and control center by tampering with fabricated data at time instant k, aiming to conceal alterations in the state of the intelligent vehicular system system. Based on the above intelligent vehicular system model, the deceptive attack model can be established as follows
where
According to equation (4), an attacker has the capability to inject the previously mentioned deceptive attack into the vehicle networking system without setting off its detection mechanism, as shown in Figure 2.
Ultimately, the robust detection of cyber-physical attacks targeting intelligent transportation systems is of paramount significance, as it serves as a cornerstone for system safety and mitigates the risk of systemic failures with catastrophic repercussions.
CNN-BiLSTM-attention detection framework against hybrid attacks
This section introduces a spatio-temporal detection framework designed to counteract FDIAs in intelligent transportation systems. The proposed solution leverages a CNN-BiLSTM-Attention hybrid model, as illustrated in Figure 3. The architecture integrates three key components: a deep CNN module for extracting spatial features from continuous vehicular sensor data streams; a BiLSTM network for modeling temporal dependencies in the sequential sensor data; an attention mechanism that enhances detection accuracy by adaptively focusing on critical data patterns or anomalous behaviors. This multi-module approach effectively captures both spatial-temporal correlations while prioritizing the most relevant attack indicators through dynamic feature weighting.

Spatio-temporal detection framework using CNN-BiLSTM-attention.
Construction of CNN model-based spatial feature extraction
To extract spatial features from the intelligent transportation systems dataset, a CNN-based framework is designed. As shown in Figure 4, the implemented CNN structure comprises an input layer, a feature extraction layer (convolutional layer), a downsampling layer (pooling layer), and a classification layer (fully connected layer). The input layer receives the intelligent transportation systems data, containing both normal and attack condition. The feature extraction and downsampling layers are essential for refining the spatial patterns from the input signals. Ultimately, the classification layer processes and outputs the learned features.

CNN-based spatio feature extraction framework.
The convolutional layer performs local feature extraction by systematically sliding the convolutional filter across the input feature map. The mathematical formulation of the discrete convolution operation is expressed as follows:
where
The pooling layer primarily serves to downsample feature maps, significantly reducing computational complexity while preserving essential signal characteristics. The mathematical formulation of this operation is expressed as follows:
where
By forging global links among all activations, fully connected layer allows for thorough feature learning and supports informed decision-making in subsequent tasks. The mathematical expression for this transformation is outlined as follows:
where
Construction of BiLSTM model-based temporal feature extraction
As depicted in Figure 5, a bidirectional BiLSTM framework aimed at extracting temporal features is constructed. To capture the complex temporal dynamics inherent in network traffic data, we employ a BiLSTM network. Unlike unidirectional models that process sequences in a single direction, the BiLSTM analyzes the input sequence both forwards and backwards. This dual-level analysis allows the model to contextualize each data point within its entire historical and future context, which is crucial for identifying sophisticated attacks that may exhibit subtle, long-range dependencies. The model effectively mitigates the vanishing gradient problem through its gating mechanisms (input, forget, and output gates), enabling it to learn and retain long-term dependencies critical for detecting multi-stage cyber threats.

BiLSTM-attention-based temporal feature extraction framework.
The introduced model consists of two separate LSTM modules featuring gated architectures, each equipped with update gates, reset gates, and candidate state memory units. Importantly, one LSTM module processes input sequences in a forward, chronological manner, whereas the other processes them in a reverse temporal sequence. This bidirectional processing setup allows the BiLSTM architecture to concurrently capture both forward and backward contextual relationships within time-series data, thereby improving the extraction of holistic temporal features. The mathematical expression for the BiLSTM structure is given as follows.
where
Attention
To better capture the pertinent information in the encoding and extract the corresponding temporal and spatial features, we incorporate the attention mechanism into the model, as shown in Figure 4. Attention can enable heightened focus on specific trends or concentrations within the training data. The fundamental calculation formula is provided as follows.
where
Spatio-temporal detection algorithm using CNN-BiLSTM-Attention
Since the detection of FDIA attacks is expressed as a binary classification problem, the cross entropy loss function is defined as:
where
CNN-BiLSTM-Attention-based attack detection
The CNN-BiLSTM-Attention detection framework against hybrid attacks is designed by leveraging spatial-temporal feature learning. The detailed process is outlined as follows:
Step 1: Construct a CNN-based spatial feature extraction model to capture local patterns from vehicular network data.
Step 2: Develop a BiLSTM-based temporal feature extraction model to analyze sequential dependencies in traffic behavior and detect anomalies such as sudden trajectory deviations or message falsification.
Step 3: Integrate an attention mechanism to dynamically weigh critical time steps, improving detection sensitivity to stealthy attacks
Step 4: Train the CNN-BiLSTM-Attention model offline using historical vehicular network data.
Step 5: Deploy the trained detection model for real-time attack detection online.
The proposed spatial-temporal detection approach can ensure robust anomaly identification while maintaining low-latency processing, making it suitable for secure and efficient intelligent transportation systems. Based on the above detection steps, the detection algorithm against hybrid attacks is summarized in Algorithm 1.
Results
This section presents experimental simulations to evaluate the detection performance of the developed CNN-BiLSTM-Attention model for identifying cyber threats in intelligent transportation network systems. Comparative analyses with existing approaches (KNN 21 and MLP 22 ) that the proposed model achieves superior detection performance against hybrid attacks.
Simulation and attack injection
The simulation experiments in this study are conducted on a high-performance computing platform with the following specifications: MATLAB R2024a (https://www.mathworks.com) running on an Intel i9-13900HX processor (2.20 GHz), 16 GB RAM, and an NVIDIA TITAN RTX 4060 GPU. The proposed CNN-BiLSTM-Attention model is implemented with the following key parameters. BiLSTM Network: 128 hidden units per layer, 2 stacked layers, and a dropout rate of 0.2 to prevent overfitting; Attention Mechanism: 64-dimensional attention layer for feature weighting; CNN Architecture: Optimized kernel sizes and pooling layers for spatial feature extraction from vehicular data; Training Configuration: Adam optimizer with a learning rate of 0.01.
Based on the evaluation dataset that combines real-world driving records from the Open Vehicle Dataset of Shenzhen City and dataset in Song et al., 23 we simulate the different attacks, such as DoS attack, replay attack, and deceptive attack. The data set is divided into training set and test set in a ratio of 7:3.
Evaluation indicators
To assess the performance of the detection model, we have selected the following evaluation indicators: Accuracy, Precision, F1-Score, and Alarm Recall Rate. 24 The corresponding mathematical expressions for these metrics are provided below.
where
Ablation experiment analysis
To evaluate the contributions of the CNN, BiLSTM, and Attention mechanisms to the detection performance of the CNN-BiLSTM-Attention model, we conduct the following ablation experiments:
Model 1: CNN-BiLSTM (to assess the impact of the Attention mechanism)
Model 2: CNN-Attention (to examine the effect of BiLSTM)
Model 3: BiLSTM-Attention (to evaluate the influence of CNN)
Model 4: The complete CNN-BiLSTM-Attention model
Based on the above ablation framework, key performance metrics—including Accuracy, Precision, F1-Score, and Alarm Recall Rate—along with the Rate of Change (ROC) can be obtained, as shown in Table 2 and Figure 6.
Ablation results of CNN-BilSTM-Attention.

ROC of the ablation experiments for CNN-BiLSTM-Attention.
As illustrated in Table 2, the contributions of each module to the overall model performance can be summarized as follows: When Attention was removed, there is a slight decline in model performance, with the Accuracy and F1-Score decreasing from 97.58% to 95.12% and 97.59% to 94.89%, respectively. The above results indicates that Attention plays a important role to focus on important features. When BiLSTM was removed, the model performance suffered a significant drop, with the Accuracy and F1-Score decreasing from 97.58% to 93.15% and 97.59% to 92.84%, respectively. It reveals that BiLSTM plays a crucial role in extracting temporal features. When CNN was removed, the model performance suffered a significant drop, with the Accuracy and F1-Score decreasing from 97.58% to 92.62% and 97.59% to 91.93%, respectively. It reveals that CNN plays a crucial role in extracting spatio features.
By comparing the ROC curves of each model variant in Figure 6, one can observe that the complete CNN-BiLSTM-Attention model achieves the best performance. Its curve is closest to the upper-left corner, with an AUC value of 0.9768, significantly outperforming the other ablation variants. The CNN-Attention model (without the BiLSTM module, AUC = 0.9368) and the CNN-BiLSTM model (without the Attention module, AUC = 0.9525) show limitations in time-series modeling and feature extraction, respectively, across different threshold intervals. Meanwhile, the BiLSTM-Attention model (without the CNN module, AUC = 0.9299) exhibits the most pronounced performance degradation across all FPR (False Positive Rate) ranges, confirming the critical role of the CNN module in spatial feature extraction. The gap between each ablation model’s curve and the baseline model’s curve visually reflects the contribution of the removed component. Notably, the CNN and BiLSTM modules synergistically enhance performance, while the Attention mechanism primarily improves detection sensitivity in specific regions.
Comparison of detection performance under different detection models
To evaluate detection performance against hybrid attacks, this section selects baseline models (KNN and MLP) for comparative analysis. The comparative detection metrics (Accuracy, Precision, F1-Score, and Alarm Recall Rate) and confusion matrices are presented in Table 3 and Figure 7, respectively.
Comparison results of evaluation indicators under different detection models.

The confusion matrices under different detection models: (a) KNN, (b) MLP, and (c) CNN-BiLSTM-Attention.
As illustrated in Table 3, the CNN-BiLSTM-Attention model attains top-tier performance (Accuracy: 97.58%, Precision: 97.63%, Recall: 97.58%, F1-Score: 97.59%) by integrating three complementary modules: spatial convolution, temporal BiLSTM modeling, and attention-based feature reweighting. By contrast, the KNN and MLP models show significant performance gaps (1.3% and 2.98% ranges, respectively) due to their inability to adaptively process high-dimensional spatio-temporal patterns. As shown in Figure 7, the CNN-BiLSTM-Attention model achieves the highest performance in the confusion matrix, with main diagonal elements exceeding 97.5%—significantly outperforming the KNN (91.2%) and MLP (90.2%) models. The above quantitative advantage demonstrates that the CNN-BiLSTM-Attention architecture effectively captures complex characteristics of intelligent transportation data by integrating spatio-temporal feature extraction and an attention mechanism. In contrast, the KNN model suffers from fuzzy classification boundaries due to its reliance on distance-based metrics, while the MLP model is constrained by its inability to learn spatio-temporal features inherent in fully connected architectures.
The experimental results demonstrate that the spatio-temporal detection model proposed in this study achieves optimal feature extraction and attack recognition for hybrid attack detection tasks characterized by pronounced spatio-temporal dynamics.
Comparison analyse of attack robustness under different detection models
To assess the attack resilience of our CNN-BiLSTM-Attention framework in intelligent transportation scenarios, comparative experiments were performed against KNN and MLP baselines. Figure 8 presents the detection accuracy trends under escalating attack magnitudes, while Figure 9 compares the ROC metrics across all models.

The comparative results of detection rate under different detection models.

The comparative results of ROC under different detection models.
As illustrated in Figure 8, all models exhibit an upward trend in detection rates with increasing attack intensity; however, the proposed CNN-BiLSTM-Attention model consistently outperforms both KNN and MLP baselines in detection accuracy. Further analysis of the ROC curves in Figure 9 confirms that our model achieves significantly higher discriminative power (AUC) compared to the conventional approaches. This indicates its superior capability in accurately identifying adversarial data within Internet of Vehicles (IoV) datasets.
Figure 10 illustrates that as the noise standard deviation increases, the overall detection rate exhibits a declining trend. Initially, when the noise standard deviation is low, the detection rate remains high and decreases at a relatively slow pace. However, once the noise standard deviation surpasses a certain threshold, the rate of decline in the detection rate accelerates, before eventually tapering off and stabilizing at a comparatively low level. Conversely, the false alarm rate demonstrates an overall upward trend. At low levels of noise standard deviation, the false alarm rate is extremely low and increases gradually. As the noise standard deviation rises to a certain extent, the rate of false alarms experiences an acceleration, continuing to rise as the noise standard deviation further increases. This observed trend underscores the significant impact of noise on the performance of detection systems, with increasing noise leading to a decrease in detection rate and an increase in false alarms.

The analysis of detection performance in different noise environments.
Conclusions and future works
This study proposes a novel spatio-temporal detection model designed to identify hybrid attacks in intelligent transportation systems. The introduced CNN-BiLSTM-Attention framework effectively integrates convolutional neural networks for spatial feature extraction, bidirectional long short-term memory networks for capturing temporal dependencies, and a channel attention mechanism for adaptive feature weighting. Simulation results demonstrate that the proposed model outperforms conventional detection approaches such as KNN and MLP across multiple performance metrics, including accuracy, precision, F1-Score, and recall. Furthermore, the model exhibits strong robustness in handling diverse attack scenarios.
Looking forward, several promising directions warrant further investigation. First, we intend to explore more advanced feature representation learning techniques to enhance both detection performance and computational efficiency. Second, transfer learning methodologies will be further developed, with emphasis on parameter optimization strategies and cross-domain adaptation mechanisms to improve generalizability across different transportation networks. Third, we plan to extend our validation to more complex and realistic attack scenarios, incorporating larger and more diversified datasets to evaluate model performance under near-real-world conditions. Finally, the integration of explainable AI (XAI) techniques will be considered to improve the interpretability of detection results, which is critical for practical deployment in security-sensitive transportation applications.
Footnotes
Ethical considerations
This work did not involved humans and animals. Ethic approval was not required for this research.
Consent to participate
There is no such case.
Consent for publication
The corresponding author gave consent for the publication of the identifiable details.
Author contributions
Conceptualization, RX. W. and MY.Z.; methodology, MY.Z and X.Y W; data curation, RX. W; writing—original draft preparation, X.Y. W; writing—review and editing, X.Y. W and RX. W; visualization, MY.Z. and RX. W; All authors have read and agreed to the published version of the manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the Open Research Fund of Intelligent Electric Power Grid Key Laboratory of Sichuan Province under 2023-IEPGKLSPKFYB05, supported by Hebei Natural Science Foundation under F2025203071.
I acknowledge Open Research Fund of Intelligent Electric Power Grid Key Laboratory of Sichuan Province under (2023-IEPGKLSPKFYB05) and Hebei Natural Science Foundation (F2025203071) for financial support that made this research possible.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
Data underlying the results presented in this paper are available from the corresponding author upon reasonable request.
