Abstract
In the SCADA (Supervisory Control and Data Acquisition) network of a smart grid, the network switch is connected to multiple Intelligent Electronic Devices (IEDs) that are based on protective relays. False-Data Injection Attacks (FDIA), Remote-Tripping Command Injection (RTCI), and System Reconfiguration Attacks (SRA) are three types of cyber-attacks on SCADA networks, resulting in single-line-to-ground (SLG) fault, IED-relay failure, and circuit-breaker open issues occur. The existing cyber threat intelligence (CTI) approaches of grids are unable to provide visualization of cyber-attacking grid effects. To understand the full effect of the attacks, there is a need for a knowledge-graph method-based digital-twin cyber-attack visualization approach in SCADA networks, which is missing in existing SCADA systems. This study presents a novel “Digital-twin and Machine Learning-based SCADA Cyber Threat Intelligence (DT-ML-SCADA-CTI)” approach, which utilizes an innovative algorithm to visualize and predict the effects of cyber-attacks, including FDIA, RTCI, and SRA, on SCADA systems. The process begins with data transformation to generate cyber-attack grid data, which is then analyzed for attack prediction using machine learning models such as Extra-Trees, XGBoost, Random Forest, Bootstrap Aggregating, and Logistic Regression. To further enhance the analysis, a directed-graph (DiGraph) algorithm is applied to create a knowledge-graph-based digital twin, allowing for a deeper understanding of how these cyber-attacks impact SCADA operations. The comparison with existing models demonstrates the superiority of the proposed approach, as it offers a more detailed and clearer digital-twin representation of cyber-attack effects. This enhanced visualization provides deeper insights into attack dynamics and significantly improves predictive accuracy, showcasing the effectiveness of the proposed method in understanding and mitigating cyber threats.
Keywords
Introduction
The use of digital twins in energy and cybersecurity has emerged as a new research domain in the industry 4.0 era.1,2 A digital twin is a technology that integrates a real-world process with simulations, multimedia, or computer graphics. One of the key areas of development in Industry 4.0 is the smart grid, which relies heavily on cyber-physical systems, artificial intelligence (AI), the Internet of Things (IoT) and cloud-of-things.3,4 Researchers believe that many grid-related challenges can be addressed through advanced analysis using digital twins, particularly in overcoming issues related to remote data transfer between power grids and real-time data analysis. 5 Additionally, experts contend that the adoption of digital twin technology in smart grids can be accelerated by establishing a comprehensive development framework. 6 Researchers have provided architectures for digital twin-based applications to understand the complex physical processes and issues of vacuum circuit breakers, advanced prognostics, and energy health management systems in a substation grid. 7 Digital twins have been applied to prevent mechanical failure of a wind turbine system on the grid and facilitate operational optimization. 8 The digital twin approach has also been applicable in real-time solar energy storage monitoring, diagnostics, and fault correction. 9 However, apart from energy-related issues, digital twins can also play an important role in cybersecurity issues in real-time and updating grid network access policies accordingly. 10 In this study, we used Digit Twin to analyze a cyber-attack on a specific network segment of the smart grid. Supervisory-control-and-data-acquisition (SCADA) is a smart or digital process in smart grids used in energy distribution. In the SCADA network of the substation power grid, the network-switch device is connected to protective relays and microprocessor-based multiple Intelligent Electronic Devices (IEDs). Malicious control commands via cyber attackers or malware can send packets using the control protocol to the SCADA network of the smart grid. 11
The term “knowledge graph’ was first introduced in the context of a modular instructional system for a course,
12
and later, a project named “Knowledge Graph” was jointly developed by the University of Groningen and the University of Twente.13–15 Various researchers have explored knowledge graph techniques to enhance digital twin processes, such as the Universal Digital Twin,
16
Digital Twin Network,
17
and Digital Twin Replica with Temporal Knowledge Graph,
18
among others. In this study, we employ a knowledge graph-based digital twin approach. The key contributions of this research are outlined below. 1) The novel contribution of this study is to provide a digital-twin-based visualization for understanding the effects of False-Data Injection Attacks (FDIA), Remote-Tripping Command Injection (RTCI), and System Reconfiguration Attacks (SRA) on SCADA networks. 2) This study contributes by developing a Cyber Threat Intelligence (CTI) approach combining knowledge-graph-enabled digital twin technology and machine learning algorithms for cyber-physical systems in a network. The proposed “Digital-twin and Machine Learning based SCADA Cyber Threat Intelligence (DT-ML-SCADA-CTI)” approach uses the Directed-graph (DiGraph) method for visualizing the knowledge graph-enabled digital twin. Machine learning methods such as Extra-Trees, Extreme Gradient Boosting (XGBoost), Random Forest, Bootstrap Aggregating (bagging), and Logistic Regression are employed for cyber-attack analysis, chosen for their effectiveness in handling complex datasets and providing high accuracy in SCADA network environments. 3) Another contribution of this research is to demonstrate data-transformation of SCADA-network-based cyber-attacks for visualizing the effects of FDIA, RTCI, and SRA cyber-attacks through the proposed DT-ML-SCADA-CTI model. The data-transformation step occurs in the data-processing stage of the proposed model to carry meaningful information and features for processing the proposed CTI approach.
After the introduction section, the remainder of the article is divided into four sections. The second section discusses the literature review, in which related works and problems are discussed. The third section discusses the dataset, proposed model, and applied methods. The fourth section presents the results and analysis. Finally, the fifth section concludes the paper with a brief conclusion.
Literature review
Cyber-attacks on SCADA network
False-Data Injection Attacks (FDIA), Remote-Tripping Command Injection (RTCI), and System Reconfiguration Attacks (SRA) are specifically identified in the SCADA system cyber-attack dataset, which was jointly developed by Oak Ridge National Laboratory and Mississippi State University. This dataset is available on the Web site of the University of Alabama in Huntsville. 19 In this dataset, the SRA cyber-attack is referred to as “Relay Setting Change” The dataset has been extensively utilized in various studies on cyber-attacks in smart grids, including those focused on ensemble learning-based cyber-attack classification, 20 machine learning-based intrusion detection, 21 and ensemble learning-based intrusion detection, 22 among others. RTCI cyber-attacks on power transmission system-based SCADA systems have emerged, and researchers have mentioned single-line-to-ground (SLG) fault replay attacks, 23 Aurora attacks, 24 relay physical attacks, 25 and relay setting change attacks, 26 which indicate SRA cyber-attacks. Some studies of cyber-physical systems, power systems, and industrial control systems have specifically highlighted this SRA attack.27–29 On the other hand, FDIA and RTCI fall in the code injection attack category. 30 Various problems, including SLG fault, relay disabled fault, and open breaker issues due to FDIA, RTCI, and SRA cyber-attacks, can be seen in SCADA. 19
In the light of SCADA network-based cyber-attack studies, Figure 1 shows the steps of a cyber-attack on a power grid SCADA system. In this research paper, we have dealt with FDIA, RTCI, and SRA attacks and demonstrated through digital twin how FDIA, RTCI, and SRA cyber-attacks are damaging the SCADA system. Since FDIA, RTCI, and SRA attacks involve false relay operations, we prioritize false relay operations in this study. In a study on power systems, researchers noted that FDIA attacks lead to false SCADA relay operation issues.
31
However, DoS or DDoS
32
is a very important cyber-attack. In CTI-based research, researchers analyzed Software-Defined Networking (SDN) datasets for DoS or DDoS-based intrusion.
33
A recent systematic literature study sheds light on DDoS attack prevention through blockchain 34. Process of cyber-attacks at SCADA system in smart grid.
Digital twin for cyber-attacks in smart grids
The cyber-physical system is supported by an IoT-based digital twin framework, which interfaces with the energy control system to ensure proper operation. 35 Researchers have developed a digital twin model for a physical testbed to analyze cybersecurity issues in SCADA systems. 36 In a recent study, a cyber-physical system approach was proposed, combining 3D modeling-enhanced SCADA operations with virtual reality-based digital twins to monitor mill operations. 37 Additionally, researchers have utilized 3D modeling for intelligent agent-based interactions in distributed or decentralized networks, focusing on relative directional intelligence.38,39 A cyber-attack localization technique is tested with a digital twin simulator-based automation controller where experimental results suggest cyber-attacks in various operational contexts. 40 A software-defined networking control plane, which stores and distributes the smart meter behavior models and operating states, was enhanced by researchers by integrating digital twin technology. 41 Here, researchers integrated blockchain with deep machine learning. A review paper describes how cybersecurity infrastructure is served to smart-grid producers, producers, consumers, and the nation. 42 In a recent study, researchers proposed an LSTM (Long Short-Term Memory) deep learning-based cyber-attack model for a renewable energy-powered smart city, with the model being simulated using a digital twin. 43 Additionally, LSTM was employed for cyber-attack analysis in vehicle-to-grid-oriented cyber-physical systems, enhanced by digital twin technology. 44 In a project, researchers designed and validated digital-twin-as-a-service (DTaaS) for reducing capital expenditures (CapEx) and enhancing data security. 45 Researchers have noted the need for digital twins in a study to protect microgrids from command injection and DoS attacks. 46 Researchers used supervised machine learning to enhance the security of industrial control systems. 23 However, a recent study explores the potential future benefits of digital-twin grid cybersecurity and resilient power grids. The integration of Digital Twin technology with cybersecurity strategies is vital for enhancing the resilience of industrial systems in Industry 5.0. Recent studies propose using DiGraph-enabled DT models and machine learning for detecting cyber-attacks on SCADA networks 47 and leveraging DT to identify faults and intrusions in controller software. 48 Additionally, frameworks like CyberDefender provide intelligent defense mechanisms to protect DT-based systems from cyber threats, highlighting the role of DT in mitigating performance impacts from attacks.49,50
Digital twin for cyber-attacks in smart grids
In a SCADA network, network switch devices connect to protective-relay-based Intelligent Electronic Devices (IEDs), and cyber-attacks disrupt grid power distribution.19–22 False-Data Injection Attacks (FDIA), Remote-Tripping Command Injection (RTCI), and System Reconfiguration Attacks (SRA) can cause SLG faults, relay failures, and breaker issues.19–30 Based on the literature, we believe that to fully assess the impact of these attacks, SCADA networks need a digital twin and machine learning-based Cyber Threat Intelligence (CTI) approach, which is missing in current systems.
To address the mentioned issues, we proposed a DT-ML-SCADA-CTI approach for predicting and visualizing the effects of FDI, RTCI, and SRA attacks from network-switch devices to power lines in SCADA-network.
Digital-twin and machine learning-based SCADA cyber threat intelligence
SCADA network scenario and dataset
The “Power System Attack Datasets”,
43
developed by Mississippi State University and Oak Ridge National Laboratory, played a crucial role in this study. Various features from these datasets were utilized in our analysis. This dataset has been referenced in several research papers on cyber-attacks in the energy sector.51–54 The SCADA network scenario for testing the proposed approach using the dataset is shown in Figure 2(a), and (b). Four IED devices are connected to the network switch in the SCADA-network scenario. IED-Relay-1, IED-Relay-2, IED-Relay-3, and IED-Relay-4 respectively. IED-Relay-1, IED-Relay-2, IED-Relay-3, and IED-Relay-4 devices are connected to Circuit-Breaker-1, Circuit-Breaker-2, Circuit-Breaker-3, and Circuit-Breaker-4, respectively. There is Power-line-1 from Circuit-Breaker-1 to Circuit-Breaker-2 and Power-line-2 from Circuit-Breaker-3 to Circuit-Breaker-4. Phasor data of voltage and current from power lines
52
are measured by Phasor Measurement Unit (PMU)
53
devices and transferred to Phasor Data Concentrator (PDC) device.
54
The voltage and current-based phasor data of PDC or PMU run the OpenPDC (Open-source Phasor Data Concentrator) application on a Windows server.
55
OpenPDC relates to the SCADA control panel. Snort IDS (Intrusion Detection System) software is used for the forensics of networks,
56
and Snort IDS preprocesses common information about cyber-attacks. Syslog Server is a convenient tool for Snort IDS software,
56
and Syslog provides data logs that can be audited with Snort IDS to identify vulnerabilities. Wide-area network (WAN) is connected to the SCADA control panel,
57
and cyber attackers use the channel of WAN to carry out cyber-attacks on the SCADA network. (a). SCADA-network scenario for developing and experimenting proposed digital-twin based CTI approach. (b). Formation of Snort rule.
Steps of SCADA CTI
Figure 3 shows the flowchart of the proposed “Digital-twin and Machine Learning based SCADA Cyber Threat Intelligence (DT-ML-SCADA-CTI)” approach. The process of approach is discussed below. (4) Storing data in SCADA-server and Syslog-server: Syslog-server contains log data of various SCADA-network operations, and from those log data external network intrusion and illegal SCADA operations or events can be found through Snort IDS.56,58 On the other hand, energy-related information such as phasor (voltage, current, frequency) and SCADA operational data are stored in the SCADA server. (5) Searching issues of SCADA networks: Any intrusion from the external network is searched using different Snort-IDS rules on the log data of the Syslog-server. Besides, relay disabled, relay-fault, IED-relay device issues, single-line-to-ground (SLG) faults, open breakers, and tripping command issues problems must be detected in SCADA operations. (6) Data Transformation: If an intrusion is detected from outside the SCADA network and any issues related to energy operations are identified, two datasets need to be prepared for machine learning and digital twin processes through data transformation. The machine learning dataset includes phasor data (voltage, current, and frequency), where a value of 1 represents cyber-attack events and 0 represents natural events. Meanwhile, the digital twin dataset contains the names of the affected components and their corresponding effects. (7) Machine-learning and Digital-twin process: Cyber-attack prediction is performed by applying machine learning methods such as Extra-Trees, XGBoost, Random Forest, Bagging, and Logistic Regression to a dataset obtained through data transformation. A knowledge-graph-based digital-twin method is applied to another dataset obtained from data transformation to visualize the effects of cyber-attacks, which contains the effects of the SCADA network and the affected components. Flowchart of proposed overall SCADA-CTI approach using digital twin and machine learning.

Finally, the outputs from machine-learning and digital-twin processes are documented or reported for future cybersecurity and vulnerability assessments.
Data transformation for machine learning and digital-twin processes
Data transformation is a part of data processing that converts data into a common format for a computational process. 41 SCADA-network scenarios carry out experiments in this research. Experiments in this research are done with 41 SCADA-network scenarios, where 28 are cyber-attack events, and 13 are natural or normal operational events.
Data transformation for digital-twin process.
Data transformation for machine-learning process.
Machine learning based attack analysis using phasor data
In this process, a dataset (Table 2) is analyzed for cyber-attack prediction using machine learning algorithms based on Extra-Trees, XGBoost, Random-forest, Bootstrap-aggregating or bagging, and Logistic-regression. We processed this computation using Numpy, Pandas, Seaborn, Matplotlib, Sklearn, Python 3, Anaconda platform, and Jupyter Notebook. This computation was performed on a home computer with a configuration of Intel-Core-i3-4005U processor, 1.7 GHz (gigahertz) processor capacity, and 8 GB (giga-byte) installed RAM (Random access memory). Figure 4 illustrates the machine-learning process. Firstly, we set cyber-attack events as zero (1) and non-cyber-attack events as one (0). Then, we divided the dataset into two parts, with X and Y. X containing only numerical phasor data, and Y containing only zero-one event values. After that, we defined X_train, X_test, Y_train, and Y_test. We applied Extra-Trees, XGBoost, Random-forest, Bagging, and Logistic-regression methods on these train and test sets to get confusion matrix-based prediction outcomes. Working flow of Grid phasor data-oriented machine-learning computation for analyzing cyber-attacks.
The Extra-Trees classifier uses Gini-impurity (Equation (1)), and Entropy (Equation (2)).
59
Each tree using the extra-trees method requires each tree to select the best feature to split the data based on some mathematical criterion. This mathematical measure is the Gini index (Equation (3)). Here, S is the entropy value, and A is the field (event-type: cyber-attack and note-cyber-attack) to be predicted.
The XGBoost algorithm, developed by Chen and Guestrin, is one of the popular algorithms in machine learning.
60
The XGBClassifier used in this study is an XGBoost model for classification. Equation (4) represents the objective loss function and the regularization function.
61
Here, F = Objective-function, gi = Mean Square Error (MSE) first derivative, w = Score Vectors on leaves, ℎi = MSE second derivative, λ = Penality, T = Number of leaves, ρ = Complexity of Leaf, and Ij = Sample data of Leaf node j.
The random forest also uses the Gini index and entropy. The equation of the Random-forest classifier is shown in equation (5).
62
Here, classification is
In Bootstrap-aggregating or bagging process, a data sample
The logistic regression method that we used in our study is not a classifier. In this method, we need the Sigmoid (σ) function (equation (8)) to map the linear combination z. Here, e is the base of the logarithm. The logistic function transforms the probability (p) (equation (9)).
64
The logistic function can be written as equation (9) according to the sigmoid function. For simplifying logit (equation (9)), the equation (10) is used to calculate the value of p. Here, β
0
= constant, β
k
= coefficient of the predictor or independent variables (k = 1, 2, 3, k), and X
k
= predictor or independent variables.
We used the confusion matrix at Extra-Trees, XGBoost, Random-forest, Bagging, and Logistic regression to visualize cyber-attack prediction. Here, True-positive (TP) is ‘(1,1)’, False-negative (FN) is ‘(1,0)’, True-negative (TN) is ‘(0,0)’, and False-positive (FP) is (0,1). The prediction label is a column, and the true label is a row. In addition to the confusion matrix, we determined model score, accuracy, precision, recall, F1, and classification report. Accuracy, precision, recall, and F1 formulas are shown in the 11, 12, 13, and 14 equations, respectively.
After applying the mentioned models, we used the “ROC-AUC curve” (ROC: Receiver-operating-characteristic; AUC: Area under the ROC Curve) to understand the True-positive-rate (TPR) and False-positive-rate (FPR) based performance of the algorithms used. Formulas for TPR and FPR are shown in equations (15) and (16), respectively. Figure 5 illustrates the concept of the ROC-AUC curve. To test this entire machine learning process, a sample code of FDIA forecasts was taken,
65
and the errors noticed in the existing code were corrected in this study. ROC-AUC curve.

Digital-twin for visualizing cyber-attacking grid effects
In this process, we employed the Directed-Graph (DiGraph) method for knowledge-graph-enabled digital twin visualization. The DiGraph-based knowledge graph is used to analyze how cyber-attacks impact SCADA operations. The digital twin-based cyber-attack visualization illustrates the effects of FDIA, RTCI, and SRA attacks on SCADA networks. For the digital twin computation, we utilized Python libraries such as regular expression (RE), NetworkX, Numpy, Pandas, Matplotlib, and the Natural Language Toolkit (NLTK).
Algorithm 1: Pseudocode of DT-ML-SCADA-CTI
NetworkX offers data structures to represent various kinds of directed graphs with self-loops and parallel edges.
66
We have used the Trigraph-Orthography or triple concept of Knowledge-Graph with DiGraph. With the help of two studies,67,68 we created the formula of DiGraph used in our study, which is shown in equation (17), and according to this formula, a sample of DiGraph is shown in Figure 6. Here, d is A→B direction, A is the source, B is the target, and E is an edge. Triple DiGraph concept for knowledge-graph enabled digital-twin visualization.
Algorithm 1 is developed to visualize the effects of cyber-attacks on SCADA networks, with equation (17) used to generate the DiGraph plot. The algorithm illustrates the impact of cyber-attacks on substation network switches, IED-relay devices, circuit breakers, and power transmission lines. Algorithm 1 is divided into seven steps, and in each step, equation (17) is worked for visualizing the Plot of DiGraph. In steps 1, 2, 3, 4, 5, 6, and 7, respectively, SLG faults, SLG faults’ percentage, IED-Relay devise disabled, IED-Relay devise fault, tripping-command, command against IED-relay devices, and open-breaker of circuit-breakers problems have been demonstrated. In the algorithm, according to Figure 2, the effects mentioned are set as Target (B), and FDIA, RTCI, and SRA cyberattacks are set as Source (A). According to equation (17), each step of the algorithm has Source (A) and Target (B) with Relation (E) and Direction which creates a digital twin-based relationship of Cyber-Attack and Effects.
Result and analysis
In this section, we present an in-depth analysis of the experimental results obtained from the DT-ML-SCADA-CTI approach, which was evaluated across 41 distinct scenarios based on SCADA system operations. These scenarios included 28 cyber-attack events and 13 non-cyber-attack events, allowing us to test the model’s ability to effectively differentiate between malicious and normal system behaviors. The primary research questions addressed by these experiments were: To solve these problems, we employed five machine learning algorithms—Extra-Trees, XGBoost, Random Forest, Bagging, and Logistic Regression. A dataset transformed from raw SCADA data. These methods were evaluated based on several performance metrics: accuracy, precision, recall, and F1-score.
Performance scores of cyber-attack prediction.
We demonstrated cyber-attack prediction using confusion matrices for Extra-Trees, XGBoost, Random Forest, Bagging, and Logistic Regression. The results, shown in Figure 7, highlight a true positive (TP) prediction. In addition to the confusion matrix, we have evolved machine learning algorithms using the ROC-AUC curve (equations (15) and and (16), and Figure 5) method. Researchers using SVM69–71 and DTC72–74 for cyber-attack prediction on SCADA systems have emerged in recent studies. In the ROC-AUC curve computation, five machine-learning methods (Extra-Trees, XGBoost, Random Forest, Bagging, and Logistic Regression) were compared with Support Vector Machine (SVM) and Decision Tree Classifier (DTC), as shown in Figure 8. The results revealed that the five methods outperformed SVM and DTC. Figure 10 displays the ROC-AUC curve, comparing SVM and DTC with the five methods: ETC (Extra-Trees Classifier), XGBC (XGBoost Classifier), RF (Random Forest), BC (Bagging Classifier), and LR (Logistic Regression). Confusion-metrics-based cyber-attack analysis using hybrid machine machine-learning model. Performance scores of machine learning process.

Based on the digital twin computation from Algorithm 1, the effects of FDIA, RTCI, and SRA cyber-attacks on the network are illustrated in Figure 10. This visualization demonstrates how these attacks impact SCADA operations. The digital twin-based visualizations help analysts easily identify the disruptions caused by cyber-attacks. Figure 10 shows that FDIA and SRA result in SLG faults on power transmission lines (Line 1 and Line 2). Figure 9 show that SLG faults can occur from both non-cyber-attack events and cyber-attacks like FDIA and SRA, disrupting energy transmission. However, only the SRA attack led to the IED-relay disabled issue. Figure 10 shows that the SRA cyber-attack affects IED-relay devices, causing IED-relay faults. The FDIA attack leads to a tripping command issue, resulting in SLG faults. The RTCI attack causes a “command against IED-relay” issue, disrupting energy transmission and triggering open-breaker faults in circuit breakers 3 and 4, as shown in step 6 of Algorithm 1. ROC-AUC curve for comparing the performances of machine-learning methods. Steps of Algorithm 1 for visualizing digital-twin of cyber-attacking grid effects.

Thus, in this study, we have highlighted the overall effects of cyber-attacks on SCADA networks through knowledge-graph-based digital-twin computation. This experiment can be further improved in future studies, which we have planned for future studies.
Comparison between our study and other studies.

Performance evolution of other models with proposed studies.
Conclusion
Smart grids use SCADA systems to distribute electricity but cyberattacks on the network-switch device and IED-relay devices can disrupt power distribution. Three main types of attacks—FDIA, RTCI, and SRA—can cause various faults, including open-breaker and relay failures. A knowledge-graph-based digital twin (DT) visualization method is needed to understand these impacts. This study introduces a novel DT-ML-SCADA-CTI technique to predict and visualize the effects of these attacks. The process involves data transformation followed by machine learning algorithms (Extra-Trees, XGBoost, Random Forest, Bagging, Logistic Regression) for prediction, and a DiGraph-based method to assess the impact on SCADA operations.
This CTI approach needs to be further developed for the future. In particular, the digital-twin process requires machine-learning-based prediction at every step of Algorithms 1 and 2. DoS or DDoS is absent in this study. SCADA-network is related to smart meter networks where energy prediction and load forecasting are essential.34,76 By connecting federated learning with the proposed model for the security of these computations, the study can be experimented on improving grid security. Moreover, how the proposed model can be improved by combining grid blockchain and federated learning can dominate future studies. 77 Future studies need to include DoS or DDoS. Moreover, work can be done to improve this study on malware injection. Finally, it can be said that further development is needed for this study to have a significant role in the future in cyber-attack analysis on SCADA networks.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has been supported by the Zayed University Research Incentive Fund (RIF) research grant code number: R21109.
