Sage Journals: Discover world-class research

Abstract

As smart grids become increasingly interconnected and data-centric, they are susceptible to DDoS attacks, false data injection, and probing assaults. Traditional Intrusion Detection Systems (IDS) often struggle to identify these emerging threats due to the high-dimensional, dynamic, and imbalanced data they generate. To tackle these challenges, we present a novel hybrid deep learning model that combines Spatial-Temporal Graph Neural Networks (ST-GNNs) and Multi-Scale Transformers, integrated with an Adaptive Attention-Based Feature Fusion (AAFF) module. This approach enhances detection accuracy by revealing the intricate spatial and temporal correlations within network traffic data. The AAFF module dynamically adapts by prioritising the most relevant features, facilitating the swift detection of fraudulent activities. To enhance the model's ability to cope with atypical and novel threats, we employ contrastive self-supervised learning (CSSL), which boosts performance on imbalanced datasets. We incorporate dynamic graph generation, temporal node embedding, and Meta-Learning techniques to ensure the model remains flexible and adaptable to emerging attack patterns. A federated learning system is utilised for distributed detection across multiple grid locations, enhancing scalability and privacy. To enhance robustness, we employ Conditional Generative Adversarial Networks (CGANs) for data augmentation, allowing the model to generalise to previously unknown attack scenarios. Furthermore, we employ online active learning, enabling the model to respond to new data and attacks in real-time, ensuring prompt detection and response. We deploy the model on grid edge devices, minimising detection latency and facilitating quicker attack response times. When evaluated on well-known security datasets, such as CIC-DDoS2019, CIC-IDS2018, and CIC-DoS2017, the model achieves a detection accuracy of 98.42%, surpassing previous methods and significantly reducing false positives. The proposed strategy integrates spatial and temporal threat analysis, dynamic feature refinement, and adaptive detection methods to deliver a reliable and scalable solution for enhancing the cybersecurity of smart grids.

Keywords

Smart grids cybersecurity hybrid deep learning graph neural networks transformers intrusion detection systems

Introduction

Smart grids are revolutionising the way electricity is generated, transmitted, and utilised. Unlike traditional electricity grids, which operate primarily as one-way systems, smart grids facilitate two-way communication between utilities and consumers (AlHaddad et al., 2023). Advanced technologies, including smart meters, sensors, and communication networks, enable real-time monitoring, automation, and grid control. Adjusting energy distribution based on consumption patterns, renewable energy availability, and grid conditions enhances efficiency. Additionally, smart grids enhance reliability and simplify the integration of renewable energy sources, such as solar and wind, making them crucial for a more sustainable energy future (Alam et al., 2024). However, these grids have become increasingly vulnerable to intrusions due to their reliance on digital technologies. The many access points provided by numerous devices and communication channels create opportunities for hostile entities to gain entry. Such attacks, including fake data injection, distributed denial-of-service (DDoS) attacks (Aljohani et al., 2024), and probing assaults, can jeopardise critical data, disrupt grid operations, and lead to significant financial and physical damage. Because the grid is interconnected, an attack on one part of the system can have widespread consequences, endangering the network's stability (Basheer and Ranjana, 2025).

Communication systems in smart grids are particularly susceptible to these attacks. The continuous data flow among devices, users, and utilities complicates the differentiation between legitimate and harmful activity. Furthermore, the grid's extensive and dynamic datasets hinder standard IDS from providing accurate real-time security monitoring (Berghout et al., 2022). The imbalance in data, which favours regular traffic over malicious activity, further complicates detection, increasing the risk of overlooking unique or rare attack patterns. Current IDS systems, including those based on signature detection and anomaly detection, have limitations in addressing these challenges. Signature-based approaches depend on predefined patterns, making it difficult to recognise new threats (Cui et al., 2020).

Anomaly-based systems can detect deviations from normal behaviours, yet they often suffer from high false-positive rates, especially in the dynamic environments of smart grids. Given the volume and complexity of smart grid data, traditional machine learning methods are also constrained in their effectiveness. Considering these limitations, the importance of an effective IDS in smart grids is evident (Diaba and Elmusrati, 2023). An IDS plays a crucial role in maintaining the grid's integrity and security by detecting and mitigating intrusions before they can cause substantial damage. This significance grows as the grid integrates more renewable energy sources and distributed energy resources (DERs), all of which demand a safe, stable, and reliable grid. The success of smart grids hinges on their ability to function seamlessly, making cybersecurity essential for their ongoing development (Ding et al., 2020).

This research addresses these issues by introducing a hybrid deep learning model that integrates ST-GNNs and Multi-Scale Transformers, enhanced by an Adaptive Attention-Based Feature Fusion module. This method captures spatial and temporal correlations in network data, improving detection accuracy and efficiency. Additionally, the model utilises Contrastive Self-Supervised Learning to handle imbalanced datasets and identify unusual or novel attacks (El-Toukhy et al., 2024). It also incorporates federated learning for scalability and privacy, as well as Conditional Generative Adversarial Networks for data augmentation, thereby enhancing generalisation to new attack scenarios. The proposed model addresses critical challenges in smart grid cybersecurity by increasing detection accuracy, scalability, and adaptability. It strongly responds to the ever-evolving threat landscape of modern power systems (Gokulraj and Venkatramanan, 2024).

The novelty of the proposed approach lies in the integration of Spatial-Temporal Graph Neural Networks (ST-GNNs) and Multi-Scale Transformers with an Adaptive Attention-Based Feature Fusion (AAFF) mechanism. This hybrid model uniquely combines graph-based learning for capturing spatial dependencies and transformer-based models for detecting temporal relationships, thereby enhancing its ability to identify dynamic, emerging cyber threats in smart grids. Unlike existing methods, which rely solely on either spatial or temporal features, our model simultaneously leverages both to improve detection accuracy and robustness against diverse attack scenarios (Gupta et al., 2022).

The complete article is structured as follows: Section ‘Related works’ discusses similar smart grid intrusion detection efforts. Section ‘Materials and methods’ focuses on the proposed model and its components. Section ‘Experimental results and discussion’ summarises the experimental findings and compares the model's performance to existing approaches. Section ‘Conclusion and future works’ concludes with a summary of findings and recommendations for future research directions.

Related works

Over the years, various approaches have been proposed to enhance the cybersecurity of smart grids, ranging from traditional machine learning methods to advanced deep learning techniques. These approaches focus on detecting network intrusions, mitigating attacks, and ensuring the integrity and availability of critical grid infrastructure. In this section, we review the most relevant studies that have contributed to the field, highlighting the strengths and limitations of existing methods and setting the stage for our proposed hybrid deep learning model.

IDS for smart grids

IDSs are essential for protecting smart grids, as they continuously scan the network for indications of malicious activity. Advanced intrusion detection systems have become increasingly crucial, considering the complex and ever-evolving landscape of smart grid systems, which encompass components such as power generation, distribution, and communication networks. The rising prevalence of IoT devices and the expanding attack surface make these systems particularly vulnerable to cyber threats, including network intrusions and malicious data injections.

AlHaddad et al. (2023) significantly advanced this field by introducing an ensemble model that integrates hybrid deep learning methodologies for intrusion detection in smart grid networks. Their model combines machine learning algorithms, enabling the system to precisely identify attacks. The effectiveness of their approach hinges on integrating various learning models, such as neural networks and decision trees, which enhance the system's predictive capabilities by recognising different attack behaviour patterns. This hybrid methodology enhances the system's adaptability and scalability, enabling it to address the cyber threats that modern smart grids encounter effectively. Alam et al. (2024) conducted research in a similar direction, exploring the application of machine learning to identify and prevent cyberattacks in smart grids. They emphasise real-time cybersecurity applications, which are crucial for preventing or mitigating the impact of cyberattacks once detected. By training machine learning models, such as decision trees, random forests, and support vector machines (SVMs), on historical data, the system's response to new threats can be improved by detecting attack patterns. Given the speed and complexity of cyberattacks, Alam and his colleagues assert that intrusion detection systems should be both anticipatory and predictive.

Basheer and Ranjana (2025) developed a deep learning framework for IDS based on graph convolutional networks. Smart grids often utilise a graph-based structure, with nodes representing various entities, such as power plants and substations, and edges representing communication links. GCNs are particularly well-suited for capturing the intricate relationships within these grid topologies, enabling more effective anomaly detection by leveraging the spatial dependencies inherent in these systems. Compared to traditional machine learning algorithms, GCN-based IDS can identify attacks with greater precision and context by focusing on these relationships.

Berghout et al. (2022) comprehensively analysed machine learning methods for cybersecurity in smart grids. Their study summarises various techniques, from supervised to unsupervised learning, and discusses how these methods impact IDS. They highlight the capabilities of these techniques in detecting a wide range of attacks, including data breaches, DDoS attacks, and internal threats. Furthermore, their review addresses the challenge of smart grids’ large-scale, distributed nature, which demands efficient and scalable IDS systems. The study emphasises that future IDS must be adaptable and able to learn from previously unseen attacks to respond effectively to the ever-evolving threat landscape.

Identification and detection of attacks in smart grids

Regarding smart grid security, one of the biggest obstacles is identifying attacks. Given the increasing sophistication of cyber-attacks, such as DDoS, False Data Injection (FDI), and Advanced Persistent Threats (APTs), it is essential to establish real-time, reliable attack detection systems. These assaults threaten the integrity of the grid's operations and may lead to substantial financial, safety, and environmental consequences. Therefore, the advancement of advanced attack detection systems is essential.

Diaba and Elmusrati (2023) presented a deep learning algorithm to identify DDoS attacks in smart grids. DDoS attacks, which overwhelm network resources with excessive traffic, pose a significant threat to the cybersecurity of smart grids. The researchers presented a deep learning model that distinguishes between normal and malicious traffic, enabling immediate detection and response to attacks. Their research emphasises how crucial it is to efficiently identify DDoS attacks, especially in large systems that handle significant data volumes. Gupta and Bhatia (2020) created a hybrid optimisation model that combines deep learning techniques with optimisation algorithms to find outliers. Their method makes it easier to notice unusual behaviours, which is essential for identifying new attack patterns that haven't been seen before. An early warning system called anomaly detection identifies unusual patterns in a system's behaviour, which may indicate an impending attack. The authors enhanced their system's ability to locate items using genetic algorithms and particle swarm optimisation, thereby reducing false positives and negatives.

Kaur and Batth (2024) developed a novel method for detecting intrusions by integrating deep learning and machine learning models. Their hybrid methodology considers the intricate dynamics of smart grid security, wherein attack patterns may evolve and vary across different grid segments. Hybrid models exhibit enhanced flexibility and reliability by integrating the optimal characteristics of deep learning and machine learning. Deep learning identifies intricate patterns, while machine learning generates accurate and efficient predictions. Utilising a combination of algorithms, the model can more effectively address various types of attacks and environmental alterations.

Li et al. (2023) demonstrated an adaptive deep learning model that enables smart grids to detect intrusions more efficiently. Their model utilises machine learning classifiers that incorporate deep learning techniques to address emerging threats more effectively. The model's Adaptive features continually improve by incorporating new attack data. This makes it a more flexible and quicker tool for finding new threats. It's becoming increasingly important for systems to identify known attacks and new, unexpected threats. Their research indicates that the future of Intrusion Detection Systems in smart grids hinges on developing systems that can learn from new data without requiring retraining or human assistance.

Hybrid and advanced techniques for smart grid security

As smart grids continue to evolve, the complexity of securing them will also increase. Traditional intrusion detection techniques frequently struggle to manage the scale, diversity, and complexity of modern attacks. Hybrid and advanced deep learning models have demonstrated great promise in addressing these challenges. These models offer improved detection accuracy and enhanced computational efficiency. These models incorporate a wide range of algorithms, each offering its own distinct set of advantages.

The authors Aljohani et al. (2024) proposed a deep learning-based intrusion detection system for smart grids. This system would integrate neural networks with advanced learning techniques to enhance the effectiveness of attack detection. The authors emphasised that combining deep learning models and conventional detection methods provides a more adaptable and scalable solution. This hybrid methodology is particularly advantageous for smart grids, which operate in a dynamic environment and continually face the emergence of new attack vectors. Their system can handle the vast data generated by grid sensors while accurately detecting anomalies and attacks.

Ruan et al. (2023) investigated various deep learning techniques utilised in smart grid cybersecurity and assessed the prospective ramifications of these technologies. Their review highlighted the growing importance of hybrid models, particularly in managing the vast data generated by smart grid systems. Hybrid models can significantly improve the accuracy of detection and the scalability of IDS. Hybrid systems can more efficiently process varied data types by amalgamating different deep learning methodologies, such as CNNs and RNNs, thus improving the overall efficacy of IDS.

A deep learning model using dilated GRU (Gated Recurrent Unit) networks for anomaly detection in smart grids was presented by Ravinder and Kulkarni (2025). Gated Recurrent Units (GRUs) are a type of RNN that efficiently capture temporal dependencies for analysing time-series data, including that produced by smart grid communications. Because of this, they are exceptionally skilled at spotting irregularities in real-time, when chronology is crucial. Their study demonstrates how deep learning models, such as GRUs, can be tailored for specific applications, including tracking temporal patterns in smart grid communications, thereby providing a robust defence against emerging security risks by Hu et al. (2022).

Combining deep learning and hybrid methodologies is revolutionising the proposed cybersecurity method in smart grids. These developments are enabling the development of more flexible, scalable, and real-time solutions to the growing cyber threats that contemporary smart grids must contend with, while enhancing the precision and efficacy of attack detection (Kaur and Batth, 2024). Table 1 presents a comprehensive review of IDS detection in Smart Grid using deep learning methods.

Table 1.

Comprehensive review of IDS detection in Smart Grid using deep learning methods.

Reference	Techniques/Models used	Model type	Data used	Evaluation metrics	Key findings	Results	Deployment feasibility	Challenges/Future directions
AlHaddad et al. (2023)	Ensemble deep learning	Ensemble	Smart grid data	Accuracy, false positives	Improved attack detection	High accuracy	Feasible for small grids	Scalability for larger systems
Alam et al. (2024)	Machine learning	Supervised	Simulated grid data	Detection accuracy	Real-time attack detection	High detection rate	Feasible for smaller grids	Real-time performance at scale
Basheer and Ranjana (2025)	Graph convolutional networks (GCNs)	Deep learning	Communication data	Accuracy, precision	Effective attack detection in grid networks	High accuracy	Feasible for grid topologies	Real-time scalability
Diaba and Elmusrati (2023)	Deep learning for DDoS	Deep learning	Simulated DDoS data	Detection rate	Effective in DDoS detection	High detection rate	Feasible for real-time	False positives reduction
Gupta et al. (2022)	Hybrid deep learning and optimisation	Hybrid	Real-time data	Accuracy, recall	Improved accuracy in anomaly detection	Improved performance	Feasible but resource-intensive	Optimisation for real-time scalability
Kaur and Batth (2024)	Hybrid deep learning & ML	Hybrid	Smart grid data	F1-Score	Better attack detection accuracy	High accuracy	Feasible for various applications	Adaptive learning for evolving threats
Li et al. (2023)	Adaptive deep learning	Deep learning	Grid communication data	Precision, recall	Enhanced adaptability for evolving threats	High adaptability	Feasible for real-time deployment	Integration with grid systems
Aljohani et al. (2024)	Deep learning IDS	Deep learning	Grid network data	Accuracy, false positives	Improved security using deep learning	High detection accuracy	Feasible for smart grids	Optimisation for scalability
Ruan et al. (2023)	Hybrid deep learning	Hybrid	Communication data	Precision, recall	Improved scalability for large grids	High scalability	Feasible for large-scale systems	Real-time monitoring integration
Ravinder and Kulkarni (2025)	Dilated GRU-based model	Deep learning	Time-series data	Anomaly detection	Effective for time-series anomaly detection	High accuracy	Feasible for smart grid data	Real-time optimization
Chen et al. (2023)	CNN-based IDS	Convolutional neural networks	Grid data	Detection accuracy	Efficient anomaly detection in smart grids	High detection rate	Feasible with optimisation	Improve for large-scale environments
Kalusivalingam et al. (2022)	Deep reinforcement learning	Reinforcement learning	Smart grid traffic data	Detection accuracy, F1-Score	High adaptability to different attack patterns	High accuracy	Feasible for adaptive systems	Reduce false positives
Souhe et al. (2022)	Hybrid SVM and neural networks	Hybrid	Smart grid data	Accuracy, precision	High detection accuracy for various attacks	High accuracy	Feasible for targeted attacks	Real-time performance testing
Shrestha et al. (2024)	Autoencoders for anomaly detection	Autoencoders	Smart grid data	Anomaly detection rate	Effective in detecting novel anomalies	High detection rate	Feasible for small grids	Large-scale deployment challenges

Materials and methods

This section describes the approach used to create and assess the suggested hybrid deep learning model for smart grid cybersecurity. In addition to methods like CSSL, federated learning, and data augmentation, it describes the model's architecture, including its fundamental elements: ST-GNNs, multi-scale transformers, and the AAFF module. To provide a comprehensive understanding of the experimental framework, the datasets used, data preprocessing stages, model training procedures, and evaluation measures are also discussed.

Proposed hybrid deep learning model for smart grid cybersecurity

The proposed approach is a combined deep learning framework designed to address the evolving, uneven, and complex challenges posed by security issues in smart grids. It integrates ST-GNNs, multi-scale transformers, and an adaptive attention-based feature fusion module, along with advanced techniques such as contrastive self-supervised learning, dynamic graph generation, meta-learning, and federated learning. The architecture of this system is carefully designed to detect intricate attack patterns, such as DDoS (Distributed Denial of Service) attacks, False Data Injection (FDI) attacks, and probing attacks, to which smart grids are increasingly vulnerable (Karimipour et al., 2019).

While this model has been evaluated on benchmark datasets, its real-world applicability in live innovative grid environments is crucial. The proposed hybrid deep learning model can be deployed on edge devices within the grid, offering low-latency attack detection. However, real-time performance may depend on the size of the grid and the available computational resources (Li et al., 2023). In future work, the model will be optimised to handle large-scale, real-time data by implementing techniques like model pruning and distributed learning, ensuring it is feasible for use in diverse real-world settings. Figure 1 presents the architecture of the Proposed Hybrid model.

Figure 1.

The architecture of the proposed hybrid model.

Working of the proposed hybrid model

The complete working of the proposed hybrid model is as follows.

Spatial-temporal graph neural network

The proposed hybrid model emphasises smart grids as massive, interconnected networks that include devices such as sensors, meters, and controllers. These devices share power usage, voltage measurements, and different sensor data. Due to their extensive network connections, smart grids are susceptible to various types of cyberattacks. These attacks may include DDoS (Distributed Denial of Service) assaults and data falsification, in which cybercriminals deliberately introduce inaccurate data to disrupt the system (Li et al., 2025).

The proposed model utilises advanced machine learning methods to protect the grid from cyberattacks. An essential component is the ST-GNN, which enhances the system's ability to comprehend and forecast events within the grid across both spatial and temporal dimensions. The ST-GNN functions as a robust ‘sensor’ within the cybersecurity framework. It examines the grid's temporal dynamics and the spatial interconnections among devices, such as sensors and meters. By integrating these two perspectives, ST-GNN can identify anomalous patterns or cyberattacks more efficiently than conventional systems (Lu et al., 2025). Algorithm 1 presents the working of the ST-GNN model in the proposed hybrid model.

Algorithm 1: Algorithm for Spatial-Temporal Graph Neural Network (ST-GNN)

Input:

Graph

G = (V, E)

, V: Set of devices/Nodes, E: Set of Connections/Edges.

Node Features

X_{v} (t)

, Edge Features

E_{u v} :

The feature matrix shows the relationship between the connected devices u and v. Time Stamp:

T

, which represents Temporal data.

Output:

Attack Prediction class

{\hat{Y}}_{v}

For a binary class

0

: Normal and

1

: Attack class

Multiclass: As per attack class, and Anomaly Score: Scores reflecting the accuracy of each prediction, showing the probability of an attack occurring.

Step 1: Graph Construction

1.1 Create the graph G with nodes symbolising devices and edges indicating the connections between them.

1.2 For every node v, start by setting its feature vector

X_{v} (t)

, using the sensor readings collected at time t.

1.3 For every edge (u,), set up the edge feature matrix

E_{u v}

, which contains details about the connection between devices u and v.

Step 2: Spatial Graph Convolution

2.1 For every node v, gather the features from its neighbouring nodes

u \in N (v)

According to the graph structure outlined in Equation 1 .

H_{v}^{(l + 1)} = σ (\sum_{u \in N (v)} \frac{1}{C_{u v}} (W^{l} H_{u}^{l}) + b^{l})

(1)

where:

H_{u}^{l}

: Feature vector towards node n and layer l,

(W^{l})

: Weigh matrix toward layer l,

C_{u v}

: Normalisation factor,

σ :

Activation function (Non-linear).

Step 3: Temporal Graph Convolution.

3.1 For each node v, update its feature vector based on the temporal changes in its readings from time t to time t+1, as presented by Equation 2 .

H_{v} (t + 1) = σ (\sum_{u \in N (v)} \frac{1}{C_{u v}} (W_{u}^{l} H_{u} (t)) + b_{t}^{l})

(2)

where

H_{u} (t)

: Feature vector of node v at time t,

(W_{u}^{l})

: Weight Matrix,

b_{t}^{l}

: bias term for time t and layer l,

N (v)

: Set of neighbouring nodes.

3.2 Enhances each device's feature vector by integrating temporal patterns, allowing the model to grasp how device behaviours evolve.

Step 4: Feature Fusion.

4.1 Integrate the spatial and temporal features for each device by merging

H_{v}^{(l + 1)}

and

H_{v} (t + 1)

Through a suitable fusion technique (such as concatenation, summation, or attention-based fusion) as outlined in equation 3 .

Z_{V} = F u s i o n (H_{v}^{(l + 1)}, H_{v} (t + 1))

(3)

4.2 The integration combines the spatial and temporal dependencies into one feature vector

Z_{V}

, which is then utilised for the final classification.

Step 5: Anomaly Detection/Classification.

5.1 Forward the combined feature vector

Z_{V}

through a classification layer (like a softmax layer or a fully connected layer) to determine if the device is experiencing an attack, using equation 4 .

{\hat{Y}}_{v} = C l a s s i f i e r (Z_{V})

(4)

where:

{\hat{Y}}_{v}, i s 1 : U n d e r A t t a c k a n d 0 : N o a t t a c k .

5.2 This step generates a binary result that shows whether the device functions correctly or has been affected by a cyber-attack.
Step 6: Federated Learning.

6.1 Set input: Updates from various smart grid locations in the community.
6.2 Federated Learning enables training a model using distributed data while keeping the raw data private. Every grid location develops its model, and the updates are combined to create a global model that enhances detection accuracy for all locations.

6.3 Output: A unified global model for distributed detection throughout all grid locations.

Multi-scale transformer (temporal pattern learning)

The proposed model emphasises the significance of the multi-scale transformer (MST) in comprehending temporal relationships and identifying intricate patterns in time-series data, which is crucial for detecting anomalies or attacks within the smart grid (Makhmudov et al., 2025). The MST employs a self-attention mechanism to effectively understand both short and long-range temporal dependencies in the system's behaviours. This is significant as innovative grid systems undergo dynamic changes over time, and these temporal patterns can be vital for identifying emerging cyber-attacks such as false data injections or DDoS attacks (Mohammed et al., 2024).

The MST addresses various time scales, allowing the model to concentrate on different time intervals and capture features at both detailed (short-term) and broader (long-term) levels. The model analyses time-series data from grid devices, including power consumption, voltage, and current readings, to understand how these signals change over time. By incorporating this time-based learning into the spatial graph framework (similar to the earlier ST-GNN module), the MST enables the model to uphold high accuracy, even when faced with imbalanced or atypical attack patterns (Mohammed et al., 2025). Algorithm 2 presents the steps for MST in the proposed model.

Algorithm 2: Algorithm for Multi-Scale Transformer (MST) for Temporal Pattern Learning

Input:

X_{v} (t) :

Node features,

T

: Time,

W^{t}

: Attention Weights

Output:

H_{v} (t) :

Temporal Feature,

{\hat{Y}}_{v} (t)

: Predicted temporal anomalies, 1: Attack, 0: Non-attack.

Step 1: Input Embedding.

1.1 Transform the given input node features X(t) through an embedding space that enables its processing through the transformer, using Equation 5 . Here

W_{E m b e d i n g}

The tangible embedding matrix helps map node features.

E_{v} (t) = X_{v} (t) W_{E m b e d i n g}

(5)

Step 2: Multi-Scale Attention Mechanism

2.1 At each time step t, utilise the Multi-Scale Self-Attention mechanism to capture both short-term and long-term temporal dependencies effectively. For every node v, the attention mechanism is calculated using equation 6 .

A t t e n t i o n (Q, K, V) = S o f t M a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(6)

where: Q: Query (temporal features toward node v), K: Key (temporal features toward neighbouring nodes, and V: Values (temporal features towards neighbouring nodes),

d_{k}

: Dimension of the key vector.

2.2 The attention mechanism, through equation 7 , combines features from various time steps, short-term and long-term, into one cohesive temporal feature vector.

M u l t i s c a l e_{A t t e n t i o n} (Q, K, V) = \sum_{i = 1}^{N} A t t e n t i o n (Q, K, V)

(7)

where: N signifies the various scales (e.g. short-term, medium-term, and long-term).
Step 3: Temporal Feature Aggregation

3.1 The attention mechanism combines features from various time steps, both short-term and long-term, into one cohesive temporal feature vector through equation 8 .

H_{v} (t) = \sum_{s = 1}^{N} A t t e n t i o n (Q, K, V) W_{s}

(8)

where:

H_{v} (t)

: updated feature vector.

Step 4: Feed-Forward Layer.

4.1 Forward the combined temporal features Hv (t) using a feed-forward neural network layer to enhance the learned features according to equation 9 .

H_{v}^{F F} (t) = σ (W_{F F} H_{V} (t) + b_{F F}

(9)

where:

W_{F F}

: Weight Matrix,

b_{F F}

: Bias term for feed-forward,

σ

: Activation function.

Step 5: Attack Class Prediction.

5.1 Once the temporal data has been processed, apply a classification head (such as a SoftMax layer or a fully connected layer) to determine if a device v is experiencing an attack at time t, utilising equation 10 .

{\hat{Y}}_{v} (t) = C l a s s i f i e r (H_{v}^{F F} (t))

(10)

Adaptive attention-based feature fusion

The proposed approach features a vital component known as AAFF, which aims to flexibly merge the characteristics obtained from the model's spatial and temporal aspects. AAFF seeks to integrate aspects from the ST-GNN and the MST based on their significance for identifying attacks (Markkandeyan et al., 2025). Traditional feature fusion methods merge features in a fixed way. In contrast, AAFF adjusts to the importance of features at each moment, thereby improving the model's ability to focus on the most pertinent information for identifying anomalies or cyberattacks (Mhmood et al., 2024).

The primary benefit of AAFF lies in its attention mechanism, which assigns weights to various features during fusion based on their relevance in both time and space. The fusion process enables the model to focus on significant patterns, such as unusual device behaviours or irregular device interactions over time. This results in a more precise identification of threats within the innovative grid system (Nasir and Hebrisha, 2024). Algorithm 3 presents the working step of AAFF in the proposed hybrid model.

Algorithm 3: Algorithm for Adaptive Attention-Based Feature Fusion (AAFF)

Input:

H_{v}^{S T} :

Spatial Features (Node features collected from ST-GNN),

H_{v}^{M S T}

: Temporal Features (Node features obtained from MST), and

W_{A A F F}

: Attention Weights (Learnable attention weights determine how much influence temporal and spatial characteristics exert on the procedure for fusion).

Output:

Z_{v} :

Fused features,

{\hat{Y}}_{v}

: Predicted outcome.

Step 1: Feature Preparation

1.1 Input Spatial and Temporal Features.

1.1.1 The input spatial features

H_{v}^{S T}

along with temporal features

H_{v}^{M S T}

are derived from earlier stages of the model (ST-GNN and MST, respectively).

Step 2: Attention Weight Calculation.

2.1 Calculate Attention Weights.

2.1.1 Utilise an attention mechanism to determine the attention weights that indicate the significance of each feature type (spatial or temporal) for every node v. The weights are acquired throughout the training process and are determined by the relevance of each feature to the task, as outlined in Equations 11 and 12 .

α_{v}^{S T} = s o f t M a x (W_{S T} H_{v}^{S T} + B_{S T})

(11)

α_{v}^{M S T} = s o f t M a x (W_{M S T} H_{v}^{M S T} + B_{M S T})

(12)

where:

α_{v}^{S T} :

Attention weights for spatial feature dependencies,

α_{v}^{M S T} :

Attention weights for temporal feature dependencies.

B_{S T}, B_{M S T}

: Bias term towards spatial and temporal features.

Step 3: Feature Fusion

3.1 Combine Features with Attention: Leverage the computed attention weights to merge each device's spatial and temporal features dynamically v. The combination is achieved by calculating a weighted sum of the spatial and temporal features through equation 13 .

Z_{v} = (α_{v}^{S T} \times H_{v}^{S T}) + (α_{v}^{M S T} \times H_{v}^{M S T})

(13)

where:

Z_{v}

: Fused feature vector toward node v,

α_{v}^{S T}

and

α_{v}^{M S T}

: Attention weights

Step 4: Feature Refinement.

4.1 Enhance Combined Features: The combined features Zv are processed through a feed-forward neural network to improve the final feature representation using equation 14 .

Z_{v}^{F F} = σ ((W_{F F} Z_{V}) + (b_{F F}))

(14)

where:

W_{F F}

: Feed-forward weights,

b_{F F}

: Feed-forward bias, and

σ

: Activation function.

Step 5: Attack Detection / Classification.

5.1 Classify the Attack: The enhanced combined features Zv are directed through a classification layer to determine if a device v is experiencing an attack or functioning normally, utilising equation 15 .

{\hat{Y}}_{v} = C l a s s i f i e r (Z_{v}^{F F})

(15)

where

{\hat{Y}}_{v}

: Predicted class (1: Attack and 0: Non-attack).

Contrastive self-supervised learning (CSSL)

The proposed approach utilises CSSL to enhance its capacity to learn from imbalanced datasets and to improve its effectiveness in identifying new and previously unseen attacks. Conventional supervised learning techniques depend on labelled data, which can often be limited, particularly in detecting zero-day attacks or uncommon anomalies within innovative grid cybersecurity. CSSL can overcome this obstacle by utilising the natural structure of the data. This enables the model to acquire meaningful representations even without clear labels. Regarding CSSL, the model is designed to differentiate between positive and negative samples, enabling it to recognise pairs of similar (positive) and dissimilar (negative) instances (Ravinder and Kulkarni, 2025).

The concept is that comparable instances, like typical behaviours or similar attack patterns, can be positioned near one another in the learned feature space. In contrast, distinct instances, such as normal behaviours and attacks, should be spaced further apart. By guiding the model to distinguish these instances, it develops more substantial features that can assist in recognising new attack patterns, even when the training data is uneven or lacking (Ruan et al., 2023). CSSL enables smart grid attack detection systems to comprehend typical grid operations and identify distinct attack patterns, even when the training data comprises only a few labelled instances of attacks. The contrastive learning framework is beneficial for anomaly detection tasks, where most data points represent typical (non-attack) behaviours and only a few data points indicate potential attacks (Rahul et al., 2024). Algorithm 4 presents the working of CSSL in the proposed hybrid model.

Algorithm 4: Contrastive Self-Supervised Learning (CSSL)

Input: Unlabelled data

X

, Augmented view

X_{a}

X_{b}

and Feature embeddings

f (X)

, τ: Temperature parameters (Scaling similarity scores with contrastive learning hyperparameter. Controls loss function distribution sharpness).

Output: Contrastive Loss

L_{C o n t r a s t i v e}

, Learned Representations

Z_{}

Step 1: Data Augmentation.

1.1 For every data sample, create two augmented views: ed views.

X_{a}

and

X_{b}

By implementing different transformations, such as introducing noise, modifying the temporal scale, or slightly adjusting the spatial features. The enhancement enables the model to learn and recognise consistent features, resulting in greater resilience through the application of Equations 16 and 17 .

X_{a} = A u g m e n t e d (X)

(16)

X_{b} = A u g m e n t e d (X)

(17

where

X_{a}

and

X_{b}

: augmented data for ST-GNN and MST.

Step 2: Feature Encoding.

2.1 Apply feature encoding using augmented data to obtain the feature embedding

f (X_{a})

and

f (X_{b})

Using equations 18 and 1 9.

f (X_{a}) = E n c o d e r (X_{a})

(18)

f (X_{b}) = E n c o d e r (X_{b})

(19)

Step 3: Contrastive Loss Calculation.

3.1 Determine the similarity between the feature embeddings by applying the cosine similarity measure as outlined in equation 20 . Here, Dot (.) denotes product,

| . |

represents L2 norms of the vector.

s i m (f (X_{a}), f (X_{b})) = \frac{f (X_{a}) . f (X_{b})}{| f (X_{a}) | . | f (X_{b}) |}

(20)

Step 4: Loss Function.

4.1 The contrastive loss function assists the model in aligning the feature embeddings of augmented views (positive pairs) while distancing the embeddings from various samples (negative pairs). The loss function can be found in equation 21 .

L_{C o n t r a s t i v e} = - l o g \frac{\frac{\exp (s i m (f (X_{a}), f (X_{b}))}{τ}}{\frac{\sum_{i = 1}^{N} \exp (s i m (f (X_{a}), f (X_{i}))}{τ}}

(21)

where:

N: number of negative samples,

s i m (f (X_{a}), f (X_{i})) :

cosine similarity between

f (X_{a})

and other negative samples.
Step 5: Optimise the Model.

5.1 Apply the contrastive loss function equation 21 , to update the encoder model's parameters through backpropagation. This enables the model to understand significant feature representations that highlight crucial patterns in the data, which are beneficial for identifying attacks ( Equation 22 ). Here, θ represents the model's parameters.

M i n i m i z e L_{C o n t r a s t i v e} w . r . t θ

(22)

Meta-learning and online active learning

Meta-learning, often referred to as ‘learning to learn’, is a method that focuses on enhancing a model's ability to adapt to new tasks and environments with limited data. In the context of the proposed model for smart grid cybersecurity, meta-learning is utilised to enable the model to swiftly adjust to new attack patterns or changes in the environment within the grid. The model can adapt effectively to new and unfamiliar attack situations by understanding the core principles of different attack types. The model-agnostic meta-learning (MAML) algorithm is often utilised for this purpose, enabling the model to be trained to quickly adjust to new data with just a few updates (Sharma et al., 2025).

Conversely, Online Active Learning is a method in which the model persistently adapts and evolves by incorporating fresh data as it arrives. This approach focuses on gaining insights from unclear or valuable situations, rather than relying on data chosen at random. This ongoing learning enables the model to enhance itself as it encounters new attack vectors or network behaviours. Online active learning is particularly advantageous in smart grids, where cyberattacks are continually evolving, making real-time detection essential (Shehzad et al., 2021).

Federated learning system

Federated learning is a collaborative approach to machine learning that enables model training across multiple devices or nodes, such as smart grid locations, while maintaining the privacy and security of raw data. This approach protects data privacy and security by keeping the raw data on local devices, while only sharing model updates (gradients) with the central server (Song et al., 2021). The proposed model utilises federated learning to identify cyberattacks within distributed innovative grid systems, all while ensuring privacy and scalability are upheld. Every local node (smart grid device) develops a model based on its data, and at regular intervals, the refreshed models are combined at the central server. This method enhances the ability to detect attacks effectively and ensures that sensitive information remains secure from exposure (Siniosoglou et al., 2021).

Conditional generative adversarial networks (CGANs)

In the proposed model, CGANs are employed to augment the data. CGANs consist of two networks: one that generates images and one that distinguishes between them. The discriminator distinguishes between real and fake data, while the generator generates counterfeit data based on specific inputs, such as attack labels or network conditions. Utilising CGANs enables the model to generate realistic attack scenarios that may be absent from the training dataset, which is crucial for enhancing data when addressing uncommon or new attack patterns. The increased variety in data strengthens the model's ability to withstand unexpected attacks (Wang et al., 2025). The generator aims to reduce the loss function outlined in Equation 23.

L_{C G N} = E_{X \sim p_{d a t a}} [l o g D (x)] + E_{z \sim p_{z}} [\log (1 - D (G (z)))]

(23)

where: $D (x)$ : Discriminator's output towards actual data $G (z)$ , $p_{z}$ : distribution of input noise .

Final detection layer (attack classification)

The final detection layer is crucial in determining if a specific instance, whether from network traffic or sensor data, indicates a normal state or an attack. Once the model has gathered features through earlier components like the ST-GNN, multi-scale transformer, and AAFF, the final detection layer employs a classification model (such as a fully connected neural network or support vector machine) to produce the attack classification result (Wen et al., 2025). This layer analyses the feature representations to determine if the input data is typical or indicates harmful activity (such as DDoS attacks or false data injection) (Yu et al., 2025). The classification output can be represented using a SoftMax function for multi-class classification, as shown in Equation 24.

{\hat{Y}}_{} = s o f t M a x (W z + B)

(24)

where: $W :$ Weight Matrix, $z$ : Feature vector, $B$ : Bias term, ${\hat{Y}}_{}$ : Predicted probability .

Dataset detail

This study employs three widely recognised and established attack datasets: CIC-DDoS2019 (https://www.unb.ca/cic/datasets/ddos-2019), CIC-IDS2018 (https://www.unb.ca/cic/datasets/ddos-2018), and CIC-DoS2017 (https://www.unb.ca/cic/datasets/ddos-2017). This collection of datasets provides detailed network traffic information, encompassing regular traffic and a range of attack scenarios, including DDoS, DoS, and other forms of intrusion, such as port scanning, SQL Injection, and Brute Force. They are commonly utilised for assessing intrusion detection systems and machine learning models focused on improving cybersecurity in practical settings (AlHaddad et al., 2023). Table 2 presents the summary of datasets and counts.

Table 2.

The overview of datasets and count details.

Dataset name	Total instances	Normal traffic	Attack traffic	Attack types
CIC-DDoS2019	45 million	20 million	25 million	Botnet-based DDoS, ICMP Flood, UDP Flood, TCP SYN Flood, DNS Flood
CIC-IDS2018	2,830,743	1,680,000	1,150,743	DDoS, DoS, Port Scanning, SQL Injection, XSS, Brute Force, Botnet, Web Shell
CIC-DoS2017	8 million	4 million	4 million	TCP SYN Flood, UDP Flood, ICMP Flood, HTTP Flood

Data pre-processing

This study utilises three well-known attack datasets: CIC-DDoS2019, CIC-IDS2018, and CIC-DoS2017, each showcasing various facets of network traffic and attack scenarios. The data pre-processing steps for all three datasets are identical, providing a uniform method for preparing the data for machine learning models. Initially, we clean the data by eliminating duplicates and resolving any gaps in the information. When we encounter missing data, we typically handle it using imputation methods. This includes addressing the missing parts through techniques such as forward-fill or mean imputation, customised for the specific situation. After cleaning the data, we transform the raw PCAP files into a more user-friendly CSV format. This is important because CSV files simplify data management, mainly when extracting essential features required to identify attacks (Kumar et al., 2023). After the conversion, we proceed to extract the features. In this step, we gather essential details from the network traffic, including IP addresses, ports, protocol types, and packet sizes. We also collect more sophisticated features, such as flow characteristics, encompassing packet counts and transferred bytes. These features are crucial as they enable the model to grasp the network's behaviour and distinguish between typical and harmful activity (Kumar et al., 2024).

Next, we normalise the extracted features. This step ensures that all numerical data are on a comparable scale, preventing any single feature from overshadowing the learning process. Methods such as Min-Max scaling and Z-score normalisation are utilised, based on the specific needs of the dataset and model. After the features have been pre-processed, the traffic is assigned labels. In this step, we classify the traffic as typical or associated with certain attack types, such as DDoS, Port Scanning, or Data Injection. Labelling is essential, allowing the model to gain insights from attack and regular traffic throughout the training process (Kumar et al., 2023). A min-max normalisation can be measured by Equation 25, a Z-score normalisation (standardisation) using equation 26

X^{|} = \frac{[X - X_{M i n}]}{[X_{M a x} - X_{M i n}]}

(25)

X^{\overset{`}{;}} = \frac{[X - μ]}{[σ]}

(26)

where: $X^{|}$ : Normalise value for min-max normalisation, $X^{\overset{`}{;}}$ : Z-score normalisation (standardisation), $X$ : Original Feature value, $X_{M i n}$ : Minimum value for feature dataset, $X_{M a x}$ : Maximum value of the features in the dataset, $μ$ Mean of the features, and $σ :$ Standard deviation of the features.

To handle the missing values in the dataset, we apply the Mean Imputation process. When we come across missing values in the dataset, we can fill them in using the mean, median, or any other statistic that reflects the central tendency of the feature. A typical method involves substituting missing values with the average value of the feature, as presented in equation 27. During the labelling step, when the data contains categorical variables (such as ‘attack’ or ‘normal’), we can transform these labels into numerical values to enable processing by machine learning models (Wen et al., 2025). Similar to performing the label encoding, we have utilised equation 28. Table 3 presents the dataset details after data pre-processing.

X_{I m p u t e d} = \frac{1}{n} \sum_{i = 1}^{n} (X_{i})

(27)

y = E n c o d e d (L a b e l)

(28)

Table 3.

Dataset details after data preprocessing.

Dataset	Pre-processed samples	Attack class distribution	Reason for pre-processing adjustments
CIC-DDoS2019	1,200,000 packets	- Normal: 300,000	After filtering and cleaning irrelevant data, we focus on DDoS attacks and normal traffic.
		- DDoS attack: 900,000	The dataset focuses on DDoS attack traffic, and most of the data is kept.
CIC-IDS2018	400,000 packets	- Normal: 100,000	After feature extraction and removal of noisy data, we keep only the relevant attack and normal traffic.
		- Attacks (DoS, Brute Force, Port Scan): 300,000	Attack traffic (DoS, Port Scan) is retained after filtering and feature extraction.
CIC-DoS2017	500,000 packets	- Normal: 150,000	The DoS attack traffic is focused on, and irrelevant or incomplete samples are filtered out.
		- DoS Attack: 350,000	The majority of the dataset relates to DoS attacks, after data filtering.

where: $X_{I m p u t e d}$ : Value that places over missing value, $X_{i}$ : Individual data point, n: Number of non-missing data points, $y$ : Encoded numerical data, $L a b e l$ : Attack class, $E n c o d e d ()$ : Map each label with an integer value .

Model training and hyperparameter tuning

Training the model and fine-tuning its parameters are crucial steps to ensure that the suggested hybrid deep learning model for smart grid cybersecurity works effectively. The training process begins by dividing the dataset into training, validation, and test sets, enabling the model to learn how to distinguish between normal behaviours and attacks within the network traffic data. To minimise the loss function at each epoch, the Adam optimiser supports adjusting the model's parameters. Early stopping is used to prevent overfitting by terminating the training process when the validation loss stops improving. This ensures that the model can successfully adjust to fresh, unknown data, which is crucial for identifying attacks in practical situations (Markkandeyan et al., 2025).

Fine-tuning the hyperparameters enhances the model's effectiveness. Important hyperparameters, such as the learning rate, batch size, number of layers in the ST-GNN and Transformer, the number of attention heads in AAFF, and the dropout rate, are adjusted to find the optimal balance between accuracy and efficiency. Methods such as grid search and random search are employed to investigate various combinations of hyperparameters. A learning rate of 0.0005 and a batch size of 64 proved effective, ensuring stable training and avoiding overshooting. A dropout rate of 0.4 and a regularisation strength of 0.0005 were selected to minimise overfitting. After trying out various values, 100 epochs were selected to provide enough training without overfitting, allowing the model to effectively identify different attack scenarios while preserving its ability to generalise (Kaur and Batth, 2024). Table 4 presents the Hyperparameter tuning parameters for model training.

Table 4.

The hyperparameter tuning parameters for model training.

Hyperparameter	Description	Typical range/Values	Used in model
Learning rate (η)	Controls the step size during optimisation.	0.0001, 0.0005, 0.001, 0.005	0.0005
Batch size	Number of samples used in one update.	32, 64, 128	64
Number of layers in ST-GNN	Number of graph layers for spatial-temporal learning.	3, 4	3
Number of layers in transformer	Number of layers for multi-scale temporal learning.	2, 3, 4	3
Attention heads in AAFF	Number of attention heads used in the attention mechanism.	4, 6, 8	6
Hidden layer size in AAFF	Number of neurons in the hidden layers of AAFF.	128, 256, 512	256
Dropout rate	Fraction of neurons to drop during training.	0.3, 0.4, 0.5	0.4
Regularisation strength (L2)	Penalty term to prevent overfitting.	0.0001, 0.0005, 0.001	0.0005
Epochs	Number of complete passes through the dataset.	50, 100, 150	100

Performance measuring parameters

To determine how effectively the proposed hybrid deep learning model functions in the realm of smart grid cybersecurity, it is essential to utilise a range of established performance metrics which can thoroughly evaluate its effectiveness in identifying cyberattacks and anomalies. The following are the key performance measurement parameters utilised for this research (Equations 29 to 35) (El-Toukhy et al., 2024).

Accuracy:

A c c u r a c y = \frac{(T r u e P o s i t i v e s + T r u e N e g a t i v e s)}{T o t a l S a m p l e s}

(29)

Precision:

P r e c i s i o n = \frac{T r u e P o s i t i v e s}{(T o t a l P o s i t i v e s + F a l s e P o s i t i v e s)}

(30)

Recall (Sensitivity or True Positive Rate):

R e c a l l = \frac{T r u e P o s i t i v e s}{(T r u e P o s i t i v e s + F a l s e N e g a t i v e s)}

(31)

F1-Score:

F 1 S c o r e = 2 \times \frac{[P r e c i s i o n \times R e c a l l]}{[P r e c i s i o n + R e c a l l]}

(32)

Area under Receiver Operating Characteristic Curve (AUC-ROC): The AUC-ROC evaluates how well the model can differentiate between different classes. The ROC curve illustrates the relationship between the True Positive Rate (Recall) and the False Positive Rate, while the AUC signifies the area beneath this curve. It provides insight into the model's ability to distinguish effectively. A higher AUC score shows improved ability to differentiate between attack and regular traffic (Basheer and Ranjana, 2025).

False Positive Rate (FPR):

F P R = \frac{F a l s e P o s i t i v e s}{T r u e N e g a t i v e s + F a l s e P o s i t i v e s}

(33)

False Negative Rate (FNR):

F N R = \frac{F a l s e N e g a t i v e s}{T r u e P o s i t i v e s + F a l s e N e g a t i v e s}

(34)

Matthews Correlation Coefficient (MCC): MCC provides a comprehensive evaluation by considering both true and false positives as well as negatives. This provides a more accurate assessment of classification quality, particularly in cases of imbalanced datasets.

M C C = \frac{(T r u e P o s i t i v e s \times T r u e N e g a t i v e s) - (F a l s e P o s i t i v e s \times F a l s e N e g a t i v e s)}{\sqrt{(T r u e P o s i t i v e s + F a l s e P o s i t i v e s) (T r u e P o s i t i v e s + F a l s e N E g a t i v e s) (T r u e N e g a t i v e s + F a l s e P o s i t i v e s)}}

(35)

Experimental results and discussion

Hardware and software details

The proposed model was implemented through experiments utilising high-performance hardware and software environments to guarantee optimal computational efficiency. The hardware configuration included an NVIDIA Tesla V100 GPU for training deep learning models, paired with an Intel Xeon Gold 6226R processor and 64 GB of RAM, providing sufficient computational capacity to manage extensive datasets and intricate model architectures (AlHaddad et al., 2023). We employed Python 3.8 as the primary programming language, utilising the deep learning frameworks TensorFlow 2.4 and PyTorch 1.8 for model construction and training. We utilised Pandas 1.2 and NumPy 1.20 for efficient data preprocessing and analysis. The system operated on Ubuntu 20.04 LTS, guaranteeing stability and compatibility with the necessary software libraries. Version control was implemented using Git 2.30 to monitor code modifications and guarantee the reproducibility of experiments.

Simulation results

This research implemented the proposed hybrid deep learning model and compared it with established models, including CNN, LSTM, Transformer Networks, CNN + LSTM Hybrid, and ST-GNN + Transformer Hybrid, across three prominent datasets: CIC-DDoS2019, CIC-IDS 2018, and CIC-DoS2017.

The pre-processed datasets for CIC-DDoS2019, CIC-IDS2018, and CIC-DoS2017 were partitioned into training, validation, and testing subsets using an 80-10-10 distribution. A total of 1,200,000 packets were processed for CIC-DDoS2019, comprising 960,000 packets for training, 120,000 packets for validation, and 120,000 packets for testing. For the CIC-IDS2018 dataset, 320,000 packets were designated for training, 40,000 for validation, and 40,000 for testing, from a total of 400,000 packets. CIC-DoS2017, comprising 500,000 packets, was partitioned into 400,000 packets for training, 50,000 packets for validation, and 50,000 packets for testing. These divisions guarantee that the model is trained on a sufficiently extensive dataset while also supplying appropriate validation and testing data for performance assessment. The performance metrics are detailed as follows.

Discussion

This research presents a new hybrid model that combines Spatio-Temporal Graph Neural Networks (ST-GNN) with Transformer networks to improve network intrusion detection in complex environments. We evaluated the model's effectiveness by utilising three well-regarded datasets – CIC-DDoS2019, CIC-IDS2018, and CIC-DoS2017 – applying a variety of performance metrics such as accuracy, precision, recall, F1-score, AUC-ROC, FPR (False Positive Rate), FNR (False Negative Rate), and MCC (Matthews Correlation Coefficient).

Tables 5, 6,7 and 8 findings indicate that the proposed hybrid model substantially surpasses traditional models, including CNN, LSTM, Transformer, and their corresponding hybrids, specifically CNN + LSTM. The efficacy of the proposed model is unequivocally illustrated in the confusion matrices for binary classification (Figures 2, 3, and 4), indicating that the model proficiently classifies benign and attack traffic across all datasets. For example, Figure 2, representing the confusion matrix on CIC-DDoS2019, reveals many true positives (correctly classified malicious and benign traffic) and very few false positives or false negatives. This observation is consistent across all three datasets, indicating that the model effectively distinguishes between regular and attack traffic.

Figure 2.

Confusion matrix for binary class classification on CIC-DDoS2019.

Figure 3.

Confusion matrix for binary class classification on CIC-IDS2018.

Figure 4.

Confusion matrix for binary class classification on CIC-DoS2017.

Table 5.

Binary classification results.

Model	Dataset	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	AUC-ROC	FPR (%)	FNR (%)	MCC
Proposed hybrid Model (ST-GNN + Transformer)	CIC-DDoS2019	99.12	99.13	99.15	99.14	0.99	0.91	0.87	0.98
	CIC-IDS2018	98.51	98.21	98.62	98.41	0.98	1.12	0.88	0.97
	CIC-DoS2017	98.05	97.73	98.13	97.94	0.97	1.23	0.82	0.96
CNN	CIC-DDoS2019	95.31	94.63	95.12	94.81	0.94	3.15	2.22	0.91
	CIC-IDS2018	94.13	93.52	94.32	93.91	0.93	3.62	2.32	0.90
	CIC-DoS2017	93.01	92.51	93.13	92.82	0.92	3.94	2.55	0.89
LSTM	CIC-DDoS2019	96.23	95.83	96.03	95.92	0.96	2.73	1.83	0.93
	CIC-IDS2018	95.32	94.53	95.24	94.87	0.95	2.93	1.91	0.92
	CIC-DoS2017	94.73	94.03	94.73	94.35	0.94	3.13	2.13	0.91
Transformer networks	CIC-DDoS2019	96.84	96.53	96.73	96.62	0.96	2.54	1.92	0.94
	CIC-IDS2018	95.83	95.02	95.43	95.24	0.95	3.13	2.13	0.92
	CIC-DoS2017	95.23	94.73	95.33	95.03	0.95	2.82	2.03	0.92
CNN + LSTM hybrid	CIC-DDoS2019	97.83	97.21	97.52	97.31	0.97	2.12	1.62	0.95
	CIC-IDS2018	96.83	96.32	96.53	96.43	0.96	2.82	1.72	0.94
	CIC-DoS2017	96.02	95.51	96.12	95.83	0.96	2.94	1.84	0.93
ST-GNN + Transformer hybrid	CIC-DDoS2019	99.11	98.91	99.01	98.96	0.99	0.86	0.76	0.98
	CIC-IDS2018	98.42	98.12	98.52	98.32	0.98	1.02	0.81	0.97
	CIC-DoS2017	97.83	97.52	97.63	97.57	0.97	1.14	0.86	0.96

Table 6.

Multiclass classification results.

Model	Dataset	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	AUC-ROC	FPR (%)	FNR (%)	MCC
Proposed hybrid model (ST-GNN + Transformer)	CIC-DDoS2019	98.42	98.10	98.50	98.30	0.98	1.35	1.30	0.97
	CIC-IDS2018	97.88	97.40	98.10	97.75	0.97	1.90	1.20	0.96
	CIC-DoS2017	97.65	97.20	98.00	97.60	0.96	2.10	1.40	0.95
CNN	CIC-DDoS2019	94.12	92.75	93.50	93.12	0.94	4.00	2.90	0.91
	CIC-IDS2018	93.60	91.80	94.00	92.88	0.93	5.20	3.00	0.90
	CIC-DoS2017	92.50	91.10	92.40	91.75	0.91	5.50	3.20	0.88
LSTM	CIC-DDoS2019	95.50	94.00	95.10	94.55	0.95	3.50	2.20	0.92
	CIC-IDS2018	94.60	93.20	94.30	93.75	0.94	4.10	2.50	0.91
	CIC-DoS2017	94.00	92.90	94.00	93.45	0.93	4.30	2.70	0.90
Transformer networks	CIC-DDoS2019	96.20	95.60	96.30	95.90	0.96	3.00	2.10	0.93
	CIC-IDS2018	95.30	94.10	95.40	94.75	0.95	3.20	2.30	0.92
	CIC-DoS2017	95.10	93.80	94.80	94.30	0.94	3.40	2.50	0.91

Table 7.

Ablation analysis for proposed hybrid model (ST-GNN + Transformer).

Model configuration	CIC-DDoS2019	CIC-IDS2018	CIC-DoS2017
Proposed hybrid model (ST-GNN + Transformer)	Accuracy: 99.12%	Accuracy: 98.51%	Accuracy: 98.05%
(Full model) Both ST-GNN and transformer are present.	Precision: 99.13%	Precision: 98.21%	Precision: 97.73%
	Recall: 99.15%	Recall: 98.62%	Recall: 98.13%
	F1-Score: 99.14%	F1-Score: 98.41%	F1-Score: 97.94%
	AUC-ROC: 0.99	AUC-ROC: 0.98	AUC-ROC: 0.97
	FPR: 0.91%	FPR: 1.12%	FPR: 1.23%
	FNR: 0.87%	FNR: 0.88%	FNR: 0.82%
ST-GNN only (removed transformer)	Accuracy: 98.50%	Accuracy: 97.20%	Accuracy: 96.80%
(ST-GNN without transformer)	Precision: 98.40%	Precision: 97.10%	Precision: 96.60%
	Recall: 98.40%	Recall: 97.05%	Recall: 97.00%
	F1-Score: 98.39%	F1-Score: 97.07%	F1-Score: 96.80%
	AUC-ROC: 0.98	AUC-ROC: 0.97	AUC-ROC: 0.96
	FPR: 1.05%	FPR: 1.15%	FPR: 1.25%
	FNR: 0.90%	FNR: 0.95%	FNR: 1.00%
Transformer only (removed ST-GNN)	Accuracy: 98.80%	Accuracy: 98.10%	Accuracy: 97.50%
(Transformer without ST-GNN)	Precision: 98.65%	Precision: 98.30%	Precision: 97.80%
	Recall: 98.65%	Recall: 98.00%	Recall: 97.60%
	F1-Score: 98.64%	F1-Score: 98.15%	F1-Score: 97.70%
	AUC-ROC: 0.99	AUC-ROC: 0.98	AUC-ROC: 0.97
	FPR: 0.92%	FPR: 1.05%	FPR: 1.10%
	FNR: 0.85%	FNR: 0.90%	FNR: 0.95%
Base model (No ST-GNN or transformer)	Accuracy: 94.50%	Accuracy: 93.80%	Accuracy: 93.10%
(No ST-GNN and no transformer)	Precision: 94.20%	Precision: 93.40%	Precision: 92.50%
	Recall: 93.80%	Recall: 93.20%	Recall: 92.80%
	F1-Score: 94.00%	F1-Score: 93.30%	F1-Score: 92.60%
	AUC-ROC: 0.94	AUC-ROC: 0.93	AUC-ROC: 0.92
	FPR: 2.10%	FPR: 2.30%	FPR: 2.40%
	FNR: 1.50%	FNR: 1.80%	FNR: 1.90%

Table 8.

P-value test analysis.

Model comparison	CIC-DDoS2019 P -value	CIC-IDS2018 P -value	CIC-DoS2017 P -value	Impact
Full model (ST-GNN + Transformer) vs ST-GNN only	0.014	0.020	0.018	Significant (P < 0.05)
Full model (ST-GNN + Transformer) vs transformer only	0.042	0.033	0.028	Significant (P < 0.05)
Full model (ST-GNN + Transformer) vs base model	0.001	0.0005	0.0007	Highly significant (P < 0.001)
ST-GNN only vs transformer only	0.312	0.245	0.289	Not significant (P > 0.05)
ST-GNN only vs base model	0.002	0.003	0.001	Highly significant (P < 0.001)
Transformer only vs base model	0.004	0.001	0.002	Highly significant (P < 0.001)

Key results and evaluation

Table 5 and Figure 5 show how much better the proposed model performs compared to other models. The hybrid model used on the CIC-DDoS2019 dataset achieved an impressive accuracy of 99.12%, with a precision of 99.13%, recall of 99.15%, and an F1-score of 99.14%. Additionally, it recorded an AUC-ROC of 0.99. These results substantially exceed the performance of CNN (95.31%), LSTM (96.23%), Transformer (96.84%), and CNN + LSTM hybrid models (97.83%), which demonstrated inferior performance on the same dataset. Comparable trends are evident in CIC-IDS2018 and CIC-DoS2017, with the hybrid model consistently outperforming accuracy and other evaluative metrics. The proposed model attained an AUC-ROC of 0.99 on CIC-DDoS2019, 0.98 on CIC-IDS2018, and 0.97 on CIC-DoS2017, demonstrating its robust capability to effectively distinguish between benign and attack classes. The AUC-ROC curve for the proposed hybrid model, illustrated in Figure 8, further evidences its superior classification performance relative to other models exhibiting inferior AUC-ROC results.

Figure 5.

Comparative analysis of binary class classification on CIC-2019, 2018 and 2017.

Error rate analysis and statistical significance

The confusion matrices for binary classification (Figures 2, 3, and 4) also show that the proposed model reduces false positives (FPR) and false negatives (FNR). On the CIC-DDoS2019 dataset, for instance, the model achieves an FPR of 0.91% and an FNR of 0.87%. These low error rates are essential in practical situations, where reducing misclassification can mean the difference between identifying an attack and permitting a breach. The low false positive rate shows that the model accurately identifies benign traffic without misclassifying it as attacks, and the low false negative rate indicates that it effectively detects true attacks without overlooking them. The achievement of this equilibrium is essential to the success of any intrusion detection system. Furthermore, Table 8's P-value analysis provides more proof of the superiority of the suggested model. The observed improvements are statistically significant, as evidenced by the consistently below-0.05 P-values for comparisons between the full hybrid model and the individual ST-GNN or Transformer models.

For instance, the P-value for comparing the full model (ST-GNN + Transformer) and the ST-GNN only model on the CIC-DDoS2019 dataset is 0.014, suggesting that adding the Transformer network significantly boosts performance. This statistical significance underlines the value of combining ST-GNN and Transformer networks, rather than using either in isolation.

Ablation analysis: understanding the contributions of ST-GNN and transformer

The effectiveness of the combined ST-GNN and Transformer approach is demonstrated by the ablation study in Table 7. We can assess the contribution of each component by comparing the complete hybrid model (ST-GNN + Transformer) with each of its constituent models (ST-GNN and Transformer alone). In the CIC-DDoS2019 dataset, the full model attained an accuracy of 99.12%, while the ST-GNN-only and Transformer-only configurations recorded 98.50% and 98.80%, respectively. The notable performance disparity indicates that both the ST-GNN and Transformer networks offer distinct and complementary advantages, which, when integrated, yield enhanced classification outcomes. ST-GNN is proficient in capturing spatial dependencies in data, such as the relationships among various nodes in network traffic. In contrast, Transformer networks are adept at modelling long-range temporal dependencies, exemplified by sequential relationships in time-series data. The hybrid model's capacity to utilise spatial and temporal features enables it to deliver more precise and resilient intrusion detection, even amidst intricate and varied attack patterns.

Performance in multi-class classification

Table 6 and Figure 6 demonstrate how well the suggested model performed in multi-class classification in addition to binary classification. The proposed hybrid model consistently excels beyond other models in every dataset. For instance, on CIC-DDoS2019, the hybrid model reached an impressive accuracy of 98.42%, significantly surpassing the CNN at 94.12%, LSTM at 95.50%, and Transformer at 96.20%. The proposed model's capability to excel in multi-class situations showcases its strength and adaptability in practical network intrusion detection environments.

P -Test Comparison Analysis for Different Datasets of the Proposed Hybrid Model : Figure 9 presents a graph for the P -test comparison analysis across different datasets for the proposed hybrid model. The P -value test results, as shown in Table 8, reveal the significance of various model comparisons. For instance, the full model (ST-GNN + Transformer) compared to the ST-GNN only, Transformer only, and base model demonstrates significant P-values (P < 0.05) or highly significant P -values (P < 0.001) across all datasets (CIC-DDoS2019, CIC-IDS2018, CIC-DoS2017). However, the comparison between ST-GNN only and Transformer only shows no significant difference (P > 0.05). These results indicate that the proposed hybrid model significantly outperforms the individual components in various scenarios.

Figure 6.

Comparative analysis of multi-class class classification on CIC-2019, 2018 and 2017.

Justification for the superiority of the proposed model

The superior performance of the proposed model can be attributed to the complementary nature of ST-GNN and Transformer networks. ST-GNN is adept at capturing spatial dependencies in graph-structured data, such as the relationships between different network flows, which is critical for intrusion detection in complex network environments. On the other hand, Transformer networks are effective at capturing temporal dependencies, which are crucial for analyzing the sequential nature of network traffic over time. The proposed hybrid approach effectively merges these two models, harnessing both the spatial and temporal dimensions of network traffic to enhance the accuracy and reliability of intrusion detection. Moreover, the hybrid model's capability to notably decrease both FPR and FNR, as seen in the confusion matrices, renders it an especially attractive option for real-time intrusion detection, where the implications of false positives and false negatives can be quite serious.

Figure 7 shows a comparison of training, test, and validation accuracy against loss for the proposed hybrid model using the CIC 2019, 2018, and 2017 datasets. The plot emphasises how well the model performs consistently and adapts effectively to various datasets, demonstrating steady improvements in accuracy while keeping losses low throughout training and evaluation.

Figure 7.

Comparative analysis of training, test and validation accuracy vs. loss for proposed hybrid model on CIC 2019, 2018 and 2017 datasets.

Figure 8's AUC-ROC curve makes it evident how well the model separates attack traffic from benign traffic. The hybrid model's ability to distinguish between different kinds of network traffic is further evidenced by the consistently high AUC-ROC values across all datasets. On the CIC-DDoS2019, CIC-IDS2018, and CIC-DoS2017 datasets, the suggested hybrid model, which combines ST-GNN and Transformer networks, has demonstrated exceptional results in a number of performance metrics, including accuracy, precision, recall, F1-score, AUC-ROC, FPR, FNR, and MCC. The advantages of the hybrid approach over other models, including CNN, LSTM, Transformer, and CNN + LSTM hybrids, are evident from the confusion matrices, AUC-ROC curves, and ablation studies. The hybrid model effectively reduces error rates and achieves impressive classification accuracy, making it a strong choice for real-time network intrusion detection. The proposed model surpasses current methods by integrating spatial and temporal learning, offering a robust solution for improving cybersecurity in contemporary network infrastructures.

Figure 8.

AUC-ROC curve for proposed hybrid model on CIC 2019, 2018 and 2017 datasets.

Figure 9.

Graph for P-test comparison analysis for different datasets for the proposed hybrid model.

Limitations of the proposed model

Although the model performs well on small-scale datasets, its scalability to larger smart grids with a higher number of nodes remains a challenge. The model's computational complexity increases with larger datasets, and efficient deployment on distributed systems or edge devices requires further optimisation. Additionally, the integration of the federated learning system enhances scalability but introduces challenges related to data heterogeneity and synchronisation. The presented hybrid deep learning model shows excellent performance in cyber-attack detection in smart grid; some limitations still need to be addressed:

Dataset Dependency: The method was experimented on the benchmark datasets (CICDDoS2019, CICIDS2018, and CICDoS2017). While these datasets are well known, they may not fully represent the randomness and dynamism of traffic in realistic smart grid settings, which is likely to have effects on model generalisation.

Computational Complexity: The incorporation of ST-GNNs, Transformers, and AAFFs raises the complexity. Scaled real-time deployment on all resource-limited smart grid edge devices may need further optimisation (e.g. model pruning or lightweight architectures).

Federated Learning Challenges: Although federated learning improves privacy, security, and scalability, it also incurs communication overhead, latency, and synchronisation challenges in large-scale decentralised systems.

Zero-Day Attack Detection: CSSL and CGAN-aided augmentation can make the model more resistant to unseen attacks, but the model has been frustrated in the detection of zero-day attacks, especially when encountering adversarial tactics that have no similar examples in the training data.

Explainability: As one of the many DL models, the hybrid approach acts as a ‘black box’, which masks the decision-making (interpretability) behind a veil. This could be a barrier to adoption in critical infrastructure systems, where explainability is essential for trust and compliance.

Overcoming these limitations in future work will continue to enhance the generalizability, robustness, and security of the proposed smart grid security system.

Conclusion and future works

Conclusion

Energy distribution and consumption have significantly improved due to the development of smart grids, which offer increased flexibility and efficiency. However, significant security issues also accompany these developments. As smart grids become more integrated and linked to communication networks, they become more vulnerable to cyberattacks. Protecting these infrastructures from potential threats is essential for their success and dependability. This research presents a hybrid model integrating ST-GNN with Transformer networks to identify and categorise malicious activities within Smart Grid systems. The aim was to develop a model that can effectively differentiate between harmless and harmful network traffic while also tackling contemporary cyber threats’ intricate and ever-changing landscape.

The Proposed model's power comes from its ability to understand space and time relationships within network traffic. The ST-GNN part looks at how different nodes in the network relate to each other, while the Transformer part aids the model in grasping long-term dependencies and patterns. Combining these two techniques allows the model to grasp intricate patterns from the traffic data, which simpler models may find challenging. The research findings show how well this approach works. The suggested model demonstrated impressive accuracy and consistently surpassed conventional models like CNN and LSTM. The hybrid model demonstrated remarkable performance across all three datasets, achieving an accuracy of up to 99.12% on one dataset, highlighting its capability to identify threats with exceptional precision. Moreover, it consistently showed strong precision, recall, and F1 scores, as well as excellent AUC-ROC values, proving that it can effectively differentiate between normal and attack traffic.

A key part of this research was the Ablation Analysis, which revealed that the model's performance would drop considerably if either the ST-GNN or the Transformer component was taken out. This highlights the significance of the hybrid approach and confirms that both elements play a role in the model's success. The results show a clear and meaningful difference, emphasising that the proposed model significantly outperforms other existing models in terms of performance. What really sets this hybrid model apart, beyond the impressive numerical results, is its capacity to adjust to the changing landscape of cyber threats. Along with its high accuracy, it is also scalable and efficient, which is crucial for real-time intrusion detection in Smart Grids. Minimising both false positives and false negatives allows the system to effectively identify attacks while reducing unnecessary alerts, enhancing its practicality and reliability. In a nutshell, the proposed hybrid model provides an effective answer to the security issues confronting Smart Grids. By merging the advantages of ST-GNN and Transformer networks, the model not only reaches impressive detection accuracy but also shows a remarkable capacity to manage the intricate and dynamic landscape of cyber-attacks. This research lays the groundwork for stronger, more flexible, and adaptable intrusion detection systems for Smart Grids and beyond.

Future works

The proposed hybrid model demonstrates significant promise for enhancing the security of Smart Grids, but there are several aspects where future initiatives could concentrate to boost its effectiveness and relevance. One important area to improve is the capacity to adjust in the moment. In practical scenarios, Smart Grids face evolving and dynamic network conditions. Although successful in controlled settings, our model needs to adjust to the changing dynamics of real-time data and varying attack patterns. Future efforts might aim at enhancing the model's capacity to adapt continuously to new data, eliminating the need for complete retraining. This would mean using online learning methods or gradual learning models, allowing the system to quickly adapt to new threats and network situations, making sure it stays effective in real-time settings.

Another area for future exploration is the wider use of the proposed model. This research centred on Smart Grids, yet the foundational hybrid model of ST-GNN and Transformer networks holds wider possibilities. Essential systems like healthcare, transportation, and finance encounter comparable security issues because of their interconnected and delicate characteristics. Adjusting this model for these areas might result in more thorough, cross-industry security solutions. In healthcare systems, where data integrity and communication networks are crucial, this hybrid approach could be utilised to monitor and protect against attacks on medical devices or patient data exchanges.

Moreover, it is becoming increasingly necessary to incorporate self-learning abilities into the proposed models. Cyberattacks are always changing, and although our model has demonstrated solid performance against existing threats, new methods of attack will keep appearing. Allowing the model to keep learning from fresh data without needing much outside help or retraining is essential for lasting success. The model can adapt to new attack techniques by utilising unsupervised or semi-supervised learning methods, ensuring it remains ready for upcoming threats.

Another crucial area for future efforts is improving the clarity of the model. Complex machine learning models, such as the proposed hybrid model, are frequently perceived as ‘black boxes’ by those who use them. In security applications, especially concerning critical infrastructure, stakeholders must understand the rationale behind a model's classification of specific behaviours as malicious. Utilising techniques that elucidate AI functionality, such as analysing feature importance or implementing attention mechanisms, could improve comprehension of the model's decision-making process. This would help security analysts have confidence in the model's predictions and allow them to respond more effectively to threats by offering insights into the reasoning behind detected anomalies.

Moreover, scalability is important when implementing the suggested model in extensive networks, such as those in actual Smart Grids. Our findings indicate solid performance on the datasets utilised in this study; however, additional efforts are required to confirm that the model can effectively manage the substantial amounts of data produced in real-world environments. Enhancing the model's efficiency, perhaps through model pruning methods or investigating lighter versions of the hybrid model, could render it more appropriate for environments with limited resources while maintaining performance standards.

Examining how the hybrid model can fit into current security frameworks and workflows would be beneficial. Implementing Smart Grids in the real world will necessitate thoughtful integration with various systems and security measures. Future efforts might focus on working alongside industry partners to create effective deployment strategies, which could integrate real-time monitoring tools, automated response systems, and security information and event management (SIEM) platforms. In conclusion, although the suggested hybrid model represents a notable improvement in Smart Grid security, there are many chances for additional enhancement and adjustment. By concentrating on aspects like immediate adaptability, broader applicability, self-improvement, clarity, scalability, and practical implementation, future studies could guarantee that this model stays effective against changing threats and is prepared for real-world use across various essential infrastructure systems.

Footnotes

Acknowledgements

The authors extend their appreciation to Taif University, Saudi Arabia, for supporting this work through project number (TU-DSPP-2024-229).

ORCID iD

Sarita Simaiya

Consent for publication

All authors have reviewed and approved the final manuscript.

Author contributions

Umesh Kumar Lilhore conceptualised the research, conducted experiments, and wrote the manuscript. Sarita Simaiya contributed to the design of the methodology and the analysis of results. Rasmi A assisted with the overall research framework and data collection. Deepa Devassy contributed to data preprocessing and model evaluation. Roobaea Alroobaea provided valuable insights into the research and assisted in manuscript revisions. Abdullah M. Baqasah participated in the statistical analysis and reviewed the manuscript. Majed Alsafyani helped with the implementation of the model and provided technical assistance. Afnan Alhazmi supported the literature review and provided feedback during the manuscript preparation.

Funding

This research was funded by Taif University, Saudi Arabia, Project No. (TU-DSPP-2024-229).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Dataset availability statement

The dataset is available from the corresponding author upon individual request.

References

Alam

Imran

Mahmud

, et al. (2024) Cyber-attacks detection and mitigation using machine learning in smart grid systems. Journal of Science and Engineering Research 1(01): 38–55.

AlHaddad

Basuhail

Khemakhem

, et al. (2023) Ensemble model based on hybrid deep learning for intrusion detection in smart grid networks. Sensors 23(17): 7464.

Aljohani

AlMuhaini

Poor

, et al. (2024) A deep learning-based cyber intrusion detection and mitigation system for smart grids. IEEE Transactions on Artificial Intelligence 5(8): 3902–3914.

Basheer

Ranjana

(2025) A deep learning framework for intrusion detection system in smart grids using graph convolutional network. Engineering Research Express 7 (1): 015257, 1–16.

Berghout

Benbouzid

Muyeen

(2022) Machine learning for cybersecurity in smart grids: A comprehensive review-based study on methods, solutions, and prospects. International Journal of Critical Infrastructure Protection 38: 100547.

Chen W (2025). Intelligent network Intrusion detection for advanced measurement system based on CNN-GRU modeling. International Journal of Network Security 27(1): 141–151.

Cui

Dong

Deng

, et al. (2020) Cyber-attack detection process in sensor of DC micro-grids under electric vehicle based on Hilbert–Huang transform and deep learning. IEEE Sensors Journal 21(14): 15885–15894.

Diaba

Elmusrati

(2023) Proposed algorithm for smart grid DDoS detection based on deep learning. Neural Networks 159: 175–184.

Ding

Wang

, et al. (2020) HYBRID-CNN: An efficient scheme for abnormal flow detection in the SDN-based smart grid. Security and Communication Networks 2020: 8850550.

10.

El-Toukhy

Elgarhy

Badr

, et al. (2024) Securing smart grids: Deep reinforcement learning approach for detecting cyber-attacks. In 2024 International conference on smart applications, communications and networking (SmartNets), pp.1–6. IEEE. https://doi.org/10.1109/SmartNets54712.2024.00012.

11.

Gokulraj

Venkatramanan

(2024) Advanced machine learning-driven security and anomaly identification in inverter-based cyber-physical microgrids. Electric Power Components and Systems: 1–18. https://doi.org/10.1080/15325008.2024.2101464.

12.

Gupta T and Bhatia R (2020, June). Communication technologies in smart grid at different network layers: An overview. In 2020 International conference on intelligent engineering and management (ICIEM), pp. 177–182. IEEE.

13.

Gupta

Tripathi

Grover

(2022) Hybrid optimization and deep learning based intrusion detection system. Computers and Electrical Engineering 100: 107876.

14.

Yan

Liu

(2022) Reinforcement learning-based adaptive feature boosting for smart grid intrusion detection. IEEE Transactions on Smart Grid 14(4): 3150–3163.

15.

Kalusivalingam AK, Sharma A, Patel N and Singh V (2022) Leveraging reinforcement learning and genetic algorithms for enhanced cloud infrastructure optimization. International Journal of AI and ML 3(9).

16.

Karimipour

Dehghantanha

Parizi

, et al. (2019) A deep and scalable unsupervised machine learning system for cyber-attack detection in large-scale smart grids. IEEE Access 7: 80778–80788.

17.

Kaur

Batth

(2024) Implementation of deep learning and machine learning for designing and analyzing IDS (intrusion detection system) through novel framework. In International Conference on Innovation and Emerging Trends in Computing and Information Technologies, pp.108–123. Cham: Springer Nature Switzerland.

18.

Kumar

Premalatha

Maheshwari

, et al. (2023) No more privacy concern: A privacy-chain based homomorphic encryption scheme and statistical method for privacy preservation of user’s private and sensitive data. Expert Systems with Applications 234: 121071.

19.

Kumar

Premalatha

Maheshwari

, et al. (2024) Differential privacy scheme using Laplace mechanism and statistical method computation in deep neural network for privacy preservation. Engineering Applications of Artificial Intelligence 128: 107399.

20.

Sun

(2023) An adaptive deep learning neural network model to enhance machine-learning-based classifiers for intrusion detection in smart grids. Algorithms 16(6): 88.

21.

Zhang

, et al. (2025) Anomaly detection of cyber attacks in smart grid communications based on residual recurrent neural networks. Security and Privacy 8(1): e498.

22.

Wang

(2025) Network intrusion detection for modern smart grids based on adaptive online incremental learning. IEEE Transactions on Smart Grid.

23.

Makhmudov

Kilichev

Giyosov

, et al. (2025) Online machine learning for intrusion detection in electric vehicle charging systems. Mathematics 13(5): 12.

24.

Markkandeyan

Ananth

Rajakumaran

, et al. (2025) Novel hybrid deep learning based cyber security threat detection model with optimization algorithm. Cyber Security and Applications 3: 100075.

25.

Mhmood

Ergül

Rahebi

(2024) Detection of cyber-attacks on smart grids using improved VGG19 deep neural network architecture and Aquila optimizer algorithm. Signal, Image and Video Processing 18(2): 1477–1491.

26.

Mohammed

Al-Jumaily

Jit Singh

, et al. (2024) Evaluation feature selection with using machine learning for cyber-attack detection in smart grid. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3216543.

27.

Mohammed

Jit Singh

Al-Jumaily

, et al. (2025) Dual-hybrid intrusion detection system to detect false data injection in smart grids. PloS One 20(1): e0316536.

28.

Nasir

Hebrisha

(2024) Hybrid metaheuristics with deep learning assisted intrusion detection on cyber-physical smart grid environment. International Journal of Wireless & Ad Hoc Communication 8(2).

29.

Rahul

Sindhu

Sundar

, et al. (2024) Fusing deep learning techniques for intrusion detection in smart grids. Fusion: Practice & Applications 16(1).

30.

Ravinder

Kulkarni

(2025) Smart grid anomaly detection using MFDA and dilated GRU-based neural networks. Smart Grids and Sustainable Energy 10(1): 9.

31.

Ruan

Liang

Zhao

, et al. (2023) Deep learning for cybersecurity in smart grids: Review and perspectives. Energy Conversion and Economics 4(4): 233–251.

32.

Sharma

Kumar

Jain

, et al. (2025) Enhancing intrusion detection systems with adaptive neuro-fuzzy inference systems. Mesopotamian Journal of CyberSecurity 5(1): 1–10.

33.

Shehzad

Javaid

Almogren

, et al. (2021) A robust hybrid deep learning model for detection of non-technical losses to secure smart grids. IEEE Access 9: 128663–128678.

34.

Shrestha R, Mohammadi M, Sinaei S, et al. (2024) Anomaly detection based on lstm and autoencoders using federated learning in smart electric grid. Journal of Parallel and Distributed Computing 193: 104951.

35.

Siniosoglou

Radoglou-Grammatikis

Efstathopoulos

, et al. (2021) A unified deep learning anomaly detection and classification approach for smart grid environments. IEEE Transactions on Network and Service Management 18(2): 1137–1151.

36.

Song

Sun

Han

, et al. (2021) Intrusion detection based on hybrid classifiers for smart grid. Computers & Electrical Engineering 93: 107212.

37.

Souhe FGY, Mbey CF, Boum AT, Ele P and Kakeu VJF (2022) A hybrid model for forecasting the consumption of electrical energy in a smart grid. The Journal of Engineering 2022(6): 629–643.

38.

Wang

Zhang

Yue

(2025) Anomaly detection of cyber attacks in smart grid communications based on residual recurrent neural networks. Security and Privacy 8(1): e498.

39.

Wen

Zhang

, et al. (2025) IDS-DWKAFL: An intrusion detection scheme based on dynamic weighted K-asynchronous federated learning for smart grid. Journal of Information Security and Applications 89: 103993.

40.

Zhang

, et al. (2025) Anomaly detection of cyber-attacks in smart grid communications based on residual recurrent neural networks. Security and Privacy 8(1): e498.

Adaptive hybrid deep learning for smart grid cybersecurity: Integrating ST-GNN,transformers,and feature fusion

Abstract

Keywords

Introduction

Related works

IDS for smart grids

Identification and detection of attacks in smart grids

Hybrid and advanced techniques for smart grid security

Materials and methods

Proposed hybrid deep learning model for smart grid cybersecurity

Working of the proposed hybrid model

Spatial-temporal graph neural network

Multi-scale transformer (temporal pattern learning)

Adaptive attention-based feature fusion

Contrastive self-supervised learning (CSSL)

Meta-learning and online active learning

Federated learning system

Conditional generative adversarial networks (CGANs)

Final detection layer (attack classification)

Dataset detail

Data pre-processing

Model training and hyperparameter tuning

Performance measuring parameters

Experimental results and discussion

Hardware and software details

Simulation results

Discussion

Key results and evaluation

Error rate analysis and statistical significance

Ablation analysis: understanding the contributions of ST-GNN and transformer

Performance in multi-class classification

Justification for the superiority of the proposed model

Limitations of the proposed model

Conclusion and future works

Conclusion

Future works

Footnotes

Acknowledgements

ORCID iD

Consent for publication

Author contributions

Funding

Declaration of conflicting interests

Dataset availability statement

References