Cyber threat intelligence for smart grids using knowledge graphs,digital twins,and hybrid machine learning in SCADA networks

Abstract

In the SCADA (Supervisory Control and Data Acquisition) network of a smart grid, the network switch is connected to multiple Intelligent Electronic Devices (IEDs) that are based on protective relays. False-Data Injection Attacks (FDIA), Remote-Tripping Command Injection (RTCI), and System Reconfiguration Attacks (SRA) are three types of cyber-attacks on SCADA networks, resulting in single-line-to-ground (SLG) fault, IED-relay failure, and circuit-breaker open issues occur. The existing cyber threat intelligence (CTI) approaches of grids are unable to provide visualization of cyber-attacking grid effects. To understand the full effect of the attacks, there is a need for a knowledge-graph method-based digital-twin cyber-attack visualization approach in SCADA networks, which is missing in existing SCADA systems. This study presents a novel “Digital-twin and Machine Learning-based SCADA Cyber Threat Intelligence (DT-ML-SCADA-CTI)” approach, which utilizes an innovative algorithm to visualize and predict the effects of cyber-attacks, including FDIA, RTCI, and SRA, on SCADA systems. The process begins with data transformation to generate cyber-attack grid data, which is then analyzed for attack prediction using machine learning models such as Extra-Trees, XGBoost, Random Forest, Bootstrap Aggregating, and Logistic Regression. To further enhance the analysis, a directed-graph (DiGraph) algorithm is applied to create a knowledge-graph-based digital twin, allowing for a deeper understanding of how these cyber-attacks impact SCADA operations. The comparison with existing models demonstrates the superiority of the proposed approach, as it offers a more detailed and clearer digital-twin representation of cyber-attack effects. This enhanced visualization provides deeper insights into attack dynamics and significantly improves predictive accuracy, showcasing the effectiveness of the proposed method in understanding and mitigating cyber threats.

Keywords

Smart grid knowledge graph digital twin cyber threat intelligence cyber physical systems cyber intelligence protective relay SCADA directed graph machine learning power system

Introduction

The use of digital twins in energy and cybersecurity has emerged as a new research domain in the industry 4.0 era.^1,2 A digital twin is a technology that integrates a real-world process with simulations, multimedia, or computer graphics. One of the key areas of development in Industry 4.0 is the smart grid, which relies heavily on cyber-physical systems, artificial intelligence (AI), the Internet of Things (IoT) and cloud-of-things.^3,4 Researchers believe that many grid-related challenges can be addressed through advanced analysis using digital twins, particularly in overcoming issues related to remote data transfer between power grids and real-time data analysis.⁵ Additionally, experts contend that the adoption of digital twin technology in smart grids can be accelerated by establishing a comprehensive development framework.⁶ Researchers have provided architectures for digital twin-based applications to understand the complex physical processes and issues of vacuum circuit breakers, advanced prognostics, and energy health management systems in a substation grid.⁷ Digital twins have been applied to prevent mechanical failure of a wind turbine system on the grid and facilitate operational optimization.⁸ The digital twin approach has also been applicable in real-time solar energy storage monitoring, diagnostics, and fault correction.⁹ However, apart from energy-related issues, digital twins can also play an important role in cybersecurity issues in real-time and updating grid network access policies accordingly.¹⁰ In this study, we used Digit Twin to analyze a cyber-attack on a specific network segment of the smart grid. Supervisory-control-and-data-acquisition (SCADA) is a smart or digital process in smart grids used in energy distribution. In the SCADA network of the substation power grid, the network-switch device is connected to protective relays and microprocessor-based multiple Intelligent Electronic Devices (IEDs). Malicious control commands via cyber attackers or malware can send packets using the control protocol to the SCADA network of the smart grid.¹¹

The term “knowledge graph’ was first introduced in the context of a modular instructional system for a course,¹² and later, a project named “Knowledge Graph” was jointly developed by the University of Groningen and the University of Twente.^13–15 Various researchers have explored knowledge graph techniques to enhance digital twin processes, such as the Universal Digital Twin,¹⁶ Digital Twin Network,¹⁷ and Digital Twin Replica with Temporal Knowledge Graph,¹⁸ among others. In this study, we employ a knowledge graph-based digital twin approach. The key contributions of this research are outlined below.

1) The novel contribution of this study is to provide a digital-twin-based visualization for understanding the effects of False-Data Injection Attacks (FDIA), Remote-Tripping Command Injection (RTCI), and System Reconfiguration Attacks (SRA) on SCADA networks.

2) This study contributes by developing a Cyber Threat Intelligence (CTI) approach combining knowledge-graph-enabled digital twin technology and machine learning algorithms for cyber-physical systems in a network. The proposed “Digital-twin and Machine Learning based SCADA Cyber Threat Intelligence (DT-ML-SCADA-CTI)” approach uses the Directed-graph (DiGraph) method for visualizing the knowledge graph-enabled digital twin. Machine learning methods such as Extra-Trees, Extreme Gradient Boosting (XGBoost), Random Forest, Bootstrap Aggregating (bagging), and Logistic Regression are employed for cyber-attack analysis, chosen for their effectiveness in handling complex datasets and providing high accuracy in SCADA network environments.

3) Another contribution of this research is to demonstrate data-transformation of SCADA-network-based cyber-attacks for visualizing the effects of FDIA, RTCI, and SRA cyber-attacks through the proposed DT-ML-SCADA-CTI model. The data-transformation step occurs in the data-processing stage of the proposed model to carry meaningful information and features for processing the proposed CTI approach.

After the introduction section, the remainder of the article is divided into four sections. The second section discusses the literature review, in which related works and problems are discussed. The third section discusses the dataset, proposed model, and applied methods. The fourth section presents the results and analysis. Finally, the fifth section concludes the paper with a brief conclusion.

Literature review

Cyber-attacks on SCADA network

False-Data Injection Attacks (FDIA), Remote-Tripping Command Injection (RTCI), and System Reconfiguration Attacks (SRA) are specifically identified in the SCADA system cyber-attack dataset, which was jointly developed by Oak Ridge National Laboratory and Mississippi State University. This dataset is available on the Web site of the University of Alabama in Huntsville.¹⁹ In this dataset, the SRA cyber-attack is referred to as “Relay Setting Change” The dataset has been extensively utilized in various studies on cyber-attacks in smart grids, including those focused on ensemble learning-based cyber-attack classification,²⁰ machine learning-based intrusion detection,²¹ and ensemble learning-based intrusion detection,²² among others. RTCI cyber-attacks on power transmission system-based SCADA systems have emerged, and researchers have mentioned single-line-to-ground (SLG) fault replay attacks,²³ Aurora attacks,²⁴ relay physical attacks,²⁵ and relay setting change attacks,²⁶ which indicate SRA cyber-attacks. Some studies of cyber-physical systems, power systems, and industrial control systems have specifically highlighted this SRA attack.^27–29 On the other hand, FDIA and RTCI fall in the code injection attack category.³⁰ Various problems, including SLG fault, relay disabled fault, and open breaker issues due to FDIA, RTCI, and SRA cyber-attacks, can be seen in SCADA.¹⁹

In the light of SCADA network-based cyber-attack studies, Figure 1 shows the steps of a cyber-attack on a power grid SCADA system. In this research paper, we have dealt with FDIA, RTCI, and SRA attacks and demonstrated through digital twin how FDIA, RTCI, and SRA cyber-attacks are damaging the SCADA system. Since FDIA, RTCI, and SRA attacks involve false relay operations, we prioritize false relay operations in this study. In a study on power systems, researchers noted that FDIA attacks lead to false SCADA relay operation issues.³¹ However, DoS or DDoS³² is a very important cyber-attack. In CTI-based research, researchers analyzed Software-Defined Networking (SDN) datasets for DoS or DDoS-based intrusion.³³ A recent systematic literature study sheds light on DDoS attack prevention through blockchain 34.

Figure 1.

Process of cyber-attacks at SCADA system in smart grid.

Digital twin for cyber-attacks in smart grids

The cyber-physical system is supported by an IoT-based digital twin framework, which interfaces with the energy control system to ensure proper operation.³⁵ Researchers have developed a digital twin model for a physical testbed to analyze cybersecurity issues in SCADA systems.³⁶ In a recent study, a cyber-physical system approach was proposed, combining 3D modeling-enhanced SCADA operations with virtual reality-based digital twins to monitor mill operations.³⁷ Additionally, researchers have utilized 3D modeling for intelligent agent-based interactions in distributed or decentralized networks, focusing on relative directional intelligence.^38,39 A cyber-attack localization technique is tested with a digital twin simulator-based automation controller where experimental results suggest cyber-attacks in various operational contexts.⁴⁰ A software-defined networking control plane, which stores and distributes the smart meter behavior models and operating states, was enhanced by researchers by integrating digital twin technology.⁴¹ Here, researchers integrated blockchain with deep machine learning. A review paper describes how cybersecurity infrastructure is served to smart-grid producers, producers, consumers, and the nation.⁴² In a recent study, researchers proposed an LSTM (Long Short-Term Memory) deep learning-based cyber-attack model for a renewable energy-powered smart city, with the model being simulated using a digital twin.⁴³ Additionally, LSTM was employed for cyber-attack analysis in vehicle-to-grid-oriented cyber-physical systems, enhanced by digital twin technology.⁴⁴ In a project, researchers designed and validated digital-twin-as-a-service (DTaaS) for reducing capital expenditures (CapEx) and enhancing data security.⁴⁵ Researchers have noted the need for digital twins in a study to protect microgrids from command injection and DoS attacks.⁴⁶ Researchers used supervised machine learning to enhance the security of industrial control systems.²³ However, a recent study explores the potential future benefits of digital-twin grid cybersecurity and resilient power grids. The integration of Digital Twin technology with cybersecurity strategies is vital for enhancing the resilience of industrial systems in Industry 5.0. Recent studies propose using DiGraph-enabled DT models and machine learning for detecting cyber-attacks on SCADA networks⁴⁷ and leveraging DT to identify faults and intrusions in controller software.⁴⁸ Additionally, frameworks like CyberDefender provide intelligent defense mechanisms to protect DT-based systems from cyber threats, highlighting the role of DT in mitigating performance impacts from attacks.^49,50

Digital twin for cyber-attacks in smart grids

In a SCADA network, network switch devices connect to protective-relay-based Intelligent Electronic Devices (IEDs), and cyber-attacks disrupt grid power distribution.^19–22 False-Data Injection Attacks (FDIA), Remote-Tripping Command Injection (RTCI), and System Reconfiguration Attacks (SRA) can cause SLG faults, relay failures, and breaker issues.^19–30 Based on the literature, we believe that to fully assess the impact of these attacks, SCADA networks need a digital twin and machine learning-based Cyber Threat Intelligence (CTI) approach, which is missing in current systems.

To address the mentioned issues, we proposed a DT-ML-SCADA-CTI approach for predicting and visualizing the effects of FDI, RTCI, and SRA attacks from network-switch devices to power lines in SCADA-network.

Digital-twin and machine learning-based SCADA cyber threat intelligence

SCADA network scenario and dataset

The “Power System Attack Datasets”,⁴³ developed by Mississippi State University and Oak Ridge National Laboratory, played a crucial role in this study. Various features from these datasets were utilized in our analysis. This dataset has been referenced in several research papers on cyber-attacks in the energy sector.^51–54 The SCADA network scenario for testing the proposed approach using the dataset is shown in Figure 2(a), and (b). Four IED devices are connected to the network switch in the SCADA-network scenario. IED-Relay-1, IED-Relay-2, IED-Relay-3, and IED-Relay-4 respectively. IED-Relay-1, IED-Relay-2, IED-Relay-3, and IED-Relay-4 devices are connected to Circuit-Breaker-1, Circuit-Breaker-2, Circuit-Breaker-3, and Circuit-Breaker-4, respectively. There is Power-line-1 from Circuit-Breaker-1 to Circuit-Breaker-2 and Power-line-2 from Circuit-Breaker-3 to Circuit-Breaker-4. Phasor data of voltage and current from power lines⁵² are measured by Phasor Measurement Unit (PMU)⁵³ devices and transferred to Phasor Data Concentrator (PDC) device.⁵⁴ The voltage and current-based phasor data of PDC or PMU run the OpenPDC (Open-source Phasor Data Concentrator) application on a Windows server.⁵⁵ OpenPDC relates to the SCADA control panel. Snort IDS (Intrusion Detection System) software is used for the forensics of networks,⁵⁶ and Snort IDS preprocesses common information about cyber-attacks. Syslog Server is a convenient tool for Snort IDS software,⁵⁶ and Syslog provides data logs that can be audited with Snort IDS to identify vulnerabilities. Wide-area network (WAN) is connected to the SCADA control panel,⁵⁷ and cyber attackers use the channel of WAN to carry out cyber-attacks on the SCADA network.

Figure 2.

(a). SCADA-network scenario for developing and experimenting proposed digital-twin based CTI approach. (b). Formation of Snort rule.

Steps of SCADA CTI

Figure 3 shows the flowchart of the proposed “Digital-twin and Machine Learning based SCADA Cyber Threat Intelligence (DT-ML-SCADA-CTI)” approach. The process of approach is discussed below.

(4) Storing data in SCADA-server and Syslog-server: Syslog-server contains log data of various SCADA-network operations, and from those log data external network intrusion and illegal SCADA operations or events can be found through Snort IDS.^56,58 On the other hand, energy-related information such as phasor (voltage, current, frequency) and SCADA operational data are stored in the SCADA server.

(5) Searching issues of SCADA networks: Any intrusion from the external network is searched using different Snort-IDS rules on the log data of the Syslog-server. Besides, relay disabled, relay-fault, IED-relay device issues, single-line-to-ground (SLG) faults, open breakers, and tripping command issues problems must be detected in SCADA operations.

(6) Data Transformation: If an intrusion is detected from outside the SCADA network and any issues related to energy operations are identified, two datasets need to be prepared for machine learning and digital twin processes through data transformation. The machine learning dataset includes phasor data (voltage, current, and frequency), where a value of 1 represents cyber-attack events and 0 represents natural events. Meanwhile, the digital twin dataset contains the names of the affected components and their corresponding effects.

(7) Machine-learning and Digital-twin process: Cyber-attack prediction is performed by applying machine learning methods such as Extra-Trees, XGBoost, Random Forest, Bagging, and Logistic Regression to a dataset obtained through data transformation. A knowledge-graph-based digital-twin method is applied to another dataset obtained from data transformation to visualize the effects of cyber-attacks, which contains the effects of the SCADA network and the affected components.

Figure 3.

Flowchart of proposed overall SCADA-CTI approach using digital twin and machine learning.

Finally, the outputs from machine-learning and digital-twin processes are documented or reported for future cybersecurity and vulnerability assessments.

Data transformation for machine learning and digital-twin processes

Data transformation is a part of data processing that converts data into a common format for a computational process. 41 SCADA-network scenarios carry out experiments in this research. Experiments in this research are done with 41 SCADA-network scenarios, where 28 are cyber-attack events, and 13 are natural or normal operational events.

41 scenarios have been computed for the machine learning and digital twin method process. The machine learning dataset includes numerical energy phasor data (voltage, current, and frequency) from 41 scenarios, while the digital twin dataset contains information on cyber-attack effects, affected components, and devices, a key contribution of this study. The features of both datasets are presented in Tables 1 and 2.

Table 1.

Data transformation for digital-twin process.

Field or column name	Description of the values
Scenario	Scenario 1 to 41
Event type	Normal and cyber-attack
Cyber-attack	No cyber-attack, FDIA, RTCI, SRA
Network device	Substation network switch
Influenced power transmission line	Line 1 and line 2
Connected circuit-breaker of power line	Circuit-breaker 1 to 4
Intelligent electronic device (IED) of circuit-breaker	IED-relay 1 to 4
Affected IED-relay	Affected IED-relay device 1 to 4
Single-line-to-ground (SLG) fault of power line	SLG fault and No SLG fault
SLG fault percentage	Percentage (%) of SLG fault
Tripping command issue of power line	No tripping command issue, tripping command problem
Command injection against IED relay	Affected IED-relay devices
IED-relay disabled issue	Relay disabled of IED-relay 1 to 4
IED-relay fault issue	Relay fault issue of IED-relay 1 to 4
Open breaker issue by command injection	Yes and No
Network intrusion detection system	Snort with syslog server

Table 2.

Data transformation for machine-learning process.

Features of fields or column	Description of the values
Scenario	Scenario 1 to 41
Event type	Normal and cyber-attack
PA (phase angle) 1– PA 3	Voltage PA data
PM (phase magnitude) 1 – PM3	Voltage PM data
PA 4 – PA 6	Current PM data
PA 7– PA 9	Voltage PA data
PM 7 – PM 9	Voltage PM data
PA 10 – PA 12	Current PA data
PM 10 – PM 10	Current PM data
F (frequency)	Frequency for IED-relay 1 to 4 devices
DF (delta-frequency)	DF for IED-relay devices
Control panel log	Log data of IED-relay 1 to 4
Relay log	Log data of IED-relay 1 to 4
Snort log1	Snort log data of IED-relay 1 to 4
IED-relay fault issue	Relay fault issue of IED-relay 1 to 4
Open breaker issue by command injection	Yes and No
Operation type	Normal operation and abnormal operation
Network intrusion detection system	Snort with syslog server

Machine learning based attack analysis using phasor data

In this process, a dataset (Table 2) is analyzed for cyber-attack prediction using machine learning algorithms based on Extra-Trees, XGBoost, Random-forest, Bootstrap-aggregating or bagging, and Logistic-regression. We processed this computation using Numpy, Pandas, Seaborn, Matplotlib, Sklearn, Python 3, Anaconda platform, and Jupyter Notebook. This computation was performed on a home computer with a configuration of Intel-Core-i3-4005U processor, 1.7 GHz (gigahertz) processor capacity, and 8 GB (giga-byte) installed RAM (Random access memory). Figure 4 illustrates the machine-learning process. Firstly, we set cyber-attack events as zero (1) and non-cyber-attack events as one (0). Then, we divided the dataset into two parts, with X and Y. X containing only numerical phasor data, and Y containing only zero-one event values. After that, we defined X_train, X_test, Y_train, and Y_test. We applied Extra-Trees, XGBoost, Random-forest, Bagging, and Logistic-regression methods on these train and test sets to get confusion matrix-based prediction outcomes.

Figure 4.

Working flow of Grid phasor data-oriented machine-learning computation for analyzing cyber-attacks.

The Extra-Trees classifier uses Gini-impurity (Equation (1)), and Entropy (Equation (2)).⁵⁹ Each tree using the extra-trees method requires each tree to select the best feature to split the data based on some mathematical criterion. This mathematical measure is the Gini index (Equation (3)). Here, S is the entropy value, and A is the field (event-type: cyber-attack and note-cyber-attack) to be predicted.

Gini Impurity = \sum_{j = 1}^{0} f_{j} (1 - f_{j})

(1)

Entropy = \sum_{j = 1}^{0} - f_{j} \log (f_{j})

(2)

Gain (S, A) = E n t r o p y (S) - \sum_{v e V a l u e s (A)} \frac{| S_{v} |}{| S |} E n t r o p y (S_{v})

(3)

The XGBoost algorithm, developed by Chen and Guestrin, is one of the popular algorithms in machine learning.⁶⁰ The XGBClassifier used in this study is an XGBoost model for classification. Equation (4) represents the objective loss function and the regularization function.⁶¹ Here, F = Objective-function, gi = Mean Square Error (MSE) first derivative, w = Score Vectors on leaves, ℎi = MSE second derivative, λ = Penality, T = Number of leaves, ρ = Complexity of Leaf, and Ij = Sample data of Leaf node j.

F^{t} \approx \sum_{j = 1}^{T} [(\sum_{i \in I_{j}} g_{i}) w_{j} + \frac{1}{2} (\sum_{i \in I_{j}} h_{i} + λ) w_{j}^{i}] + = ρ T

(4)

The random forest also uses the Gini index and entropy. The equation of the Random-forest classifier is shown in equation (5).⁶² Here, classification is $C_{r f}^{B} (x)$ , $C_{b} (x)$ is prediction of the b th random-forest tree and rf is random-forest-tree. A for-loop works here that is b = 1 to B.

{\hat{C}}_{r f}^{B} (x) = m a j o r i t y v o t e {{\hat{C}}_{b} (x)}_{1}^{B}

(5)

In Bootstrap-aggregating or bagging process, a data sample $(X_{1}^{*}, Y_{1}^{*}), . . . . . ., (X_{n}^{*}, Y_{n}^{*})$ is constructed where n is time.⁶³ In second step, the bootstrapped estimator ${\hat{g}}^{*} (\cdot)$ is computed using equation (6). In third step, step 1 and step two computed again for k (k = 1,…,M) times through bagging estimator ${\hat{g}}^{*} Bag (\cdot)$ equation (7).

{\hat{g}}^{*} (\cdot) = h_{n} ((X_{1}^{*}, Y_{1}^{*}), . . . . . ., (X_{n}^{*}, Y_{n}^{*})) (\cdot)

(6)

\hat{g} Bag (\cdot) = M^{- 1} \sum_{k = 1}^{M} {\hat{g}}^{* k} (\cdot)

(7)

The logistic regression method that we used in our study is not a classifier. In this method, we need the Sigmoid (σ) function (equation (8)) to map the linear combination z. Here, e is the base of the logarithm. The logistic function transforms the probability (p) (equation (9)).⁶⁴ The logistic function can be written as equation (9) according to the sigmoid function. For simplifying logit (equation (9)), the equation (10) is used to calculate the value of p. Here, β₀ = constant, β_k = coefficient of the predictor or independent variables (k = 1, 2, 3, k), and X_k = predictor or independent variables.

σ (z) = \frac{1}{1 + e^{- z}}

(8)

logit (p) = \log [\frac{p (x)}{1 - p (x)}] = β_{0} + β_{1} X_{1} + β_{2} X_{2} + . . . + β_{k} X_{k} + ε

(9)

p = \frac{\exp (β_{0} + β_{1} X_{1} + β_{2} X_{2} + . . . + β_{k} X_{k} + ε)}{1 + \exp (β_{0} + β_{1} X_{1} + β_{2} X_{2} + . . . + β_{k} X_{k} + ε)}

(10)

We used the confusion matrix at Extra-Trees, XGBoost, Random-forest, Bagging, and Logistic regression to visualize cyber-attack prediction. Here, True-positive (TP) is ‘(1,1)’, False-negative (FN) is ‘(1,0)’, True-negative (TN) is ‘(0,0)’, and False-positive (FP) is (0,1). The prediction label is a column, and the true label is a row. In addition to the confusion matrix, we determined model score, accuracy, precision, recall, F1, and classification report. Accuracy, precision, recall, and F1 formulas are shown in the 11, 12, 13, and 14 equations, respectively.

Accuracy = \frac{(T P + T N)}{(T P + T N + F P + F N)}

(11)

Recall = \frac{T P}{(T P + F N)}

(12)

F 1 = \frac{2}{\frac{1}{Precision} + \frac{1}{Recall}} = \frac{2 * Precision * Recall}{Precision + Recall}

(13)

After applying the mentioned models, we used the “ROC-AUC curve” (ROC: Receiver-operating-characteristic; AUC: Area under the ROC Curve) to understand the True-positive-rate (TPR) and False-positive-rate (FPR) based performance of the algorithms used. Formulas for TPR and FPR are shown in equations (15) and (16), respectively. Figure 5 illustrates the concept of the ROC-AUC curve. To test this entire machine learning process, a sample code of FDIA forecasts was taken,⁶⁵ and the errors noticed in the existing code were corrected in this study.

T P R = \frac{T P}{T P + F N}

(15)

T P R = \frac{F P}{F P + T N}

(16)

Figure 5.

ROC-AUC curve.

Digital-twin for visualizing cyber-attacking grid effects

In this process, we employed the Directed-Graph (DiGraph) method for knowledge-graph-enabled digital twin visualization. The DiGraph-based knowledge graph is used to analyze how cyber-attacks impact SCADA operations. The digital twin-based cyber-attack visualization illustrates the effects of FDIA, RTCI, and SRA attacks on SCADA networks. For the digital twin computation, we utilized Python libraries such as regular expression (RE), NetworkX, Numpy, Pandas, Matplotlib, and the Natural Language Toolkit (NLTK).

Triple_DiGraph = (A_{k}, B_{k}, E_{k}, d : A_{k} \to B_{k})

(17)

Algorithm 1: Pseudocode of DT-ML-SCADA-CTI

NetworkX offers data structures to represent various kinds of directed graphs with self-loops and parallel edges.⁶⁶ We have used the Trigraph-Orthography or triple concept of Knowledge-Graph with DiGraph. With the help of two studies,^67,68 we created the formula of DiGraph used in our study, which is shown in equation (17), and according to this formula, a sample of DiGraph is shown in Figure 6. Here, d is A→B direction, A is the source, B is the target, and E is an edge.

Figure 6.

Triple DiGraph concept for knowledge-graph enabled digital-twin visualization.

Algorithm 1 is developed to visualize the effects of cyber-attacks on SCADA networks, with equation (17) used to generate the DiGraph plot. The algorithm illustrates the impact of cyber-attacks on substation network switches, IED-relay devices, circuit breakers, and power transmission lines. Algorithm 1 is divided into seven steps, and in each step, equation (17) is worked for visualizing the Plot of DiGraph. In steps 1, 2, 3, 4, 5, 6, and 7, respectively, SLG faults, SLG faults’ percentage, IED-Relay devise disabled, IED-Relay devise fault, tripping-command, command against IED-relay devices, and open-breaker of circuit-breakers problems have been demonstrated. In the algorithm, according to Figure 2, the effects mentioned are set as Target (B), and FDIA, RTCI, and SRA cyberattacks are set as Source (A). According to equation (17), each step of the algorithm has Source (A) and Target (B) with Relation (E) and Direction which creates a digital twin-based relationship of Cyber-Attack and Effects.

Result and analysis

In this section, we present an in-depth analysis of the experimental results obtained from the DT-ML-SCADA-CTI approach, which was evaluated across 41 distinct scenarios based on SCADA system operations. These scenarios included 28 cyber-attack events and 13 non-cyber-attack events, allowing us to test the model’s ability to effectively differentiate between malicious and normal system behaviors. The primary research questions addressed by these experiments were: To solve these problems, we employed five machine learning algorithms—Extra-Trees, XGBoost, Random Forest, Bagging, and Logistic Regression. A dataset transformed from raw SCADA data. These methods were evaluated based on several performance metrics: accuracy, precision, recall, and F1-score.

The results revealed that the Extra-Trees, XGBoost, Random Forest, and Logistic Regression methods all achieved perfect accuracy (1.00), demonstrating their robust ability to predict cyber-attacks. However, the Bagging method performed slightly worse, with an accuracy of 0.90. The comparative analysis of these algorithms is further detailed in Table 3, which shows their respective performance metrics, and in Figure 8, which provides a classification report. The results of proposed approach contribute to solving the critical problem of cyber-attack detection in industrial control systems, offering a reliable solution for real-time threat prediction. For further context, recent studies in the related work section have highlighted similar approaches and compared them to our findings, reinforcing the novelty and effectiveness of our method in tackling cyber-security challenges in SCADA systems.

Table 3.

Performance scores of cyber-attack prediction.

Prediction method	Trian score	Accuracy score	Precision score	Recall score	F1 score
Extra-trees	1.00	1.00	1.00	1.00	1.00
XGBoost	1.00	1.00	1.00	1.00	1.00
Random-forest	1.00	1.00	1.00	1.00	1.00
Bagging	0.97	0.90	0.89	1.00	0.93
Logistic regression	1.00	1.00	1.00	1.00	1.00
Average score of hybrid machine learning	0.99	0.98	0.97	1.00	0.98
Percentage of accuracy score of hybrid machine learning model		98 %

We demonstrated cyber-attack prediction using confusion matrices for Extra-Trees, XGBoost, Random Forest, Bagging, and Logistic Regression. The results, shown in Figure 7, highlight a true positive (TP) prediction. In addition to the confusion matrix, we have evolved machine learning algorithms using the ROC-AUC curve (equations (15) and and (16), and Figure 5) method. Researchers using SVM^69–71 and DTC^72–74 for cyber-attack prediction on SCADA systems have emerged in recent studies. In the ROC-AUC curve computation, five machine-learning methods (Extra-Trees, XGBoost, Random Forest, Bagging, and Logistic Regression) were compared with Support Vector Machine (SVM) and Decision Tree Classifier (DTC), as shown in Figure 8. The results revealed that the five methods outperformed SVM and DTC. Figure 10 displays the ROC-AUC curve, comparing SVM and DTC with the five methods: ETC (Extra-Trees Classifier), XGBC (XGBoost Classifier), RF (Random Forest), BC (Bagging Classifier), and LR (Logistic Regression).

Figure 7.

Confusion-metrics-based cyber-attack analysis using hybrid machine machine-learning model.

Figure 8.

Performance scores of machine learning process.

Based on the digital twin computation from Algorithm 1, the effects of FDIA, RTCI, and SRA cyber-attacks on the network are illustrated in Figure 10. This visualization demonstrates how these attacks impact SCADA operations. The digital twin-based visualizations help analysts easily identify the disruptions caused by cyber-attacks. Figure 10 shows that FDIA and SRA result in SLG faults on power transmission lines (Line 1 and Line 2). Figure 9 show that SLG faults can occur from both non-cyber-attack events and cyber-attacks like FDIA and SRA, disrupting energy transmission. However, only the SRA attack led to the IED-relay disabled issue. Figure 10 shows that the SRA cyber-attack affects IED-relay devices, causing IED-relay faults. The FDIA attack leads to a tripping command issue, resulting in SLG faults. The RTCI attack causes a “command against IED-relay” issue, disrupting energy transmission and triggering open-breaker faults in circuit breakers 3 and 4, as shown in step 6 of Algorithm 1.

Figure 9.

ROC-AUC curve for comparing the performances of machine-learning methods.

Figure 10.

Steps of Algorithm 1 for visualizing digital-twin of cyber-attacking grid effects.

Thus, in this study, we have highlighted the overall effects of cyber-attacks on SCADA networks through knowledge-graph-based digital-twin computation. This experiment can be further improved in future studies, which we have planned for future studies.

Digital-twin-based computation for SCADA-networks’ cyber-attacks is a new research scope on which some research works (^36,40,45,51) have been carried out in recent years. We compared our approach with the studies listed in Table 4. Upon review, we found that one study focused on FDIA,⁴⁰ another on command injection,⁵¹ two studies on DoS/DDoS,^45,51 and one study on malware.³⁶ In contrast, our research addresses FDIA, RTCI, and SRA. Based on the information in Table 4, Figure 11 illustrates the performance evolution of the existing models compared to our proposed approach, where positive factors are assigned to a value of one and negative factors are given a value of zero to correlate with accuracy. Our study stands out by incorporating a digital-twin-based simulation to model the flow and impact of cyber-attacks on SCADA networks, an aspect not explored in the other studies.

Table 4.

Comparison between our study and other studies.

Study	Cyber-attacks of SCADA networks for digital-twin process					Prediction of cyber-attacks	Digital-twin of cyber-attacking gird effects	Accuracy
Study	FDIA	RTCI	SRA	DoS	Malware	Prediction of cyber-attacks	Digital-twin of cyber-attacking gird effects	Accuracy
Proposed CTI approach	Yes	Yes	Yes	No	No	Yes	Yes	0.98
Digital-twin reference scheme (2023;⁴⁰)	Yes	No	No	No	No	No	No	0.97
ELEGANT (2021;⁴⁵)	No	No	No	Yes	No	Yes	No	0.98
Stacked-ensemble (2022;⁷⁵)	No	Yes	No	Yes	No	Yes	No	0.927
EPICTWIN (2022;³⁶)	No	No	No	No	Yes	No	No	Not computed

Figure 11.

Performance evolution of other models with proposed studies.

Conclusion

Smart grids use SCADA systems to distribute electricity but cyberattacks on the network-switch device and IED-relay devices can disrupt power distribution. Three main types of attacks—FDIA, RTCI, and SRA—can cause various faults, including open-breaker and relay failures. A knowledge-graph-based digital twin (DT) visualization method is needed to understand these impacts. This study introduces a novel DT-ML-SCADA-CTI technique to predict and visualize the effects of these attacks. The process involves data transformation followed by machine learning algorithms (Extra-Trees, XGBoost, Random Forest, Bagging, Logistic Regression) for prediction, and a DiGraph-based method to assess the impact on SCADA operations.

This CTI approach needs to be further developed for the future. In particular, the digital-twin process requires machine-learning-based prediction at every step of Algorithms 1 and 2. DoS or DDoS is absent in this study. SCADA-network is related to smart meter networks where energy prediction and load forecasting are essential.^34,76 By connecting federated learning with the proposed model for the security of these computations, the study can be experimented on improving grid security. Moreover, how the proposed model can be improved by combining grid blockchain and federated learning can dominate future studies.⁷⁷ Future studies need to include DoS or DDoS. Moreover, work can be done to improve this study on malware injection. Finally, it can be said that further development is needed for this study to have a significant role in the future in cyber-attack analysis on SCADA networks.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has been supported by the Zayed University Research Incentive Fund (RIF) research grant code number: R21109.

ORCID iDs

Nabeel Al-Qirim

Hussam Al Hamadi

References

de Azambuja

AJG

Giese

Schützer

, et al. Digital twins in industry 4.0 – opportunities and challenges related to cyber security. Procedia CIRP 2024; 121: 25–30.

Billey

Wuest

. Energy digital twins in smart manufacturing systems: a case study. Robot Comput Integrated Manuf 2024; 88: 102729.

Hasan

Akhtaruzzaman

Kabir

, et al. “Evolution of industry and blockchain era: monitoring price hike and corruption using BIoT for smart government and industry 4.0”. IEEE Trans Ind Inf 2022; 18(12): 9153–9161. DOI: 10.1109/TII.2022.3164066.

Sadeq

Kabir

Haque

, et al. A cloud of things (CoT) approach for monitoring product purchase and price hike. In: Lecture notes in networks and systems. Singapore: Springer, 2020.

Jafari

Kavousi-Fard

Chen

, et al. “A review on digital twin technology in smart grid, transportation system and smart city: challenges and future”. IEEE Access 2023; 11: 17471–17484.

Sifat

MMH

Das

Choudhury

. “Design, development, and optimization of a conceptual framework of digital twin electric grid using systems engineering approach”. Elec Power Syst Res 2024; 226: 109958.

Jiang

, et al. “A novel application architecture of digital twin in smart grid”. J Ambient Intell Hum Comput 2022; 13: 3819–3835.

Mahmoud

Semeraro

Abdelkareem

, et al. “Designing and prototyping the architecture of a digital twin for wind turbine”. International Journal of Thermofluids 2024; 22: 100622.

Chen

Fang

. “Harnessing digital twin and IoT for real-time monitoring, diagnostics, and error correction in domestic solar energy storage”. Energy Rep 2024; 11: 3614–3623.

10.

Lopez

Rubio

Alcaraz

. “Digital twins for intelligent authorization in the B5G-enabled smart grid”. IEEE Wireless Commun 2021; 28(2): 48–55.

11.

Ortiz

Cardenas

Wool

. “SCADA world: an exploration of the diversity in power grid networks”. Proc ACM Meas Anal Comput Syst 2024; 8(1): 1–32.

12.

Schneider

. “Course modularization applied: the interface system and its implications for sequence control and data analysis”. Washington, DC: ERIC, 1973.

13.

de Vries

. Representation of science texts in knowledge graphs. Groningen, The Netherlands: University of Groningen, 1989.

14.

Bakker

. Knowledge graphs: representation and structuring of scientific knowledge. Enschede: University of Twente, 1987.

15.

van den Berg

. Knowledge graphs and logic: one of two kinds. Enschede: University of Twente, 1993.

16.

Akroyd

Mosbach

Bhave

, et al. “Universal digital twin - a dynamic knowledge graph”. Data-Centric Engineering 2021; 2: e14.

17.

Zhu

Chen

Zhou

, et al. “A knowledge graph based construction method for Digital Twin Network”. In: 2021 IEEE 1st international conference on digital twins and parallel intelligence (DTPI). Beijing, China: IEEE, 2021, pp. 362–365.

18.

Zhao

Zhang

Chen

, et al. “Digital twin-enabled dynamic spatial-temporal knowledge graph for production logistics resource allocation”. Comput Ind Eng 2022; 171: 108454.

19.

Morris

Power system datasets. Tennessee: Mississippi State University and Oak Ridge National Laboratory, 2014.

20.

Naeem

Ullah

Srivastava

. “Classification of intrusion cyber-attacks in smart power grids using deep ensemble learning with metaheuristic-based optimization”. Expert Syst 2024; 42: e13556.

21.

Zaman

Upadhyay

Lung

. “Validation of a machine learning-based IDS design framework using ORNL datasets for power system with SCADA”. IEEE Access 2023; 11: 118414–118426.

22.

Panthi

Kanti Das

. “Intelligent intrusion detection scheme for smart power-grid using optimized ensemble learning on selected features”. International Journal of Critical Infrastructure Protection 2022; 39: 100567.

23.

Pan

Morris

Adhikari

. “Developing a hybrid intrusion detection system using data mining for power systems”. IEEE Trans Smart Grid 2015; 6(6): 3104–3113.

24.

Pan

Morris

Adhikari

. “Classification of disturbances and cyber-attacks in power systems using heterogeneous time-synchronized data”. IEEE Trans Ind Inf 2015; 11(3): 650–662.

25.

Pan

Morris

Adhikari

“A specification-based intrusion DetectionFramework for cyber-physical environment inElectric power system”. Int J Netw Secur 2015; 17(2): 174–188.

26.

Borges Hink

Beaver

Buckner

, et al. “Machine learning for power system disturbance and cyber-attack discrimination”. In: 2014 7th international symposium on resilient control systems (ISRCS). Denver, CO, USA: IEEE, 2014, pp. 1–8.

27.

Paridari

O’Mahony

El-Din Mady

, et al. “A framework for attack-resilient industrial control systems: attack detection and controller reconfiguration”. Proc IEEE 2018; 106(1): 113–128.

28.

Toctaquiza

Carrión

Jaramillo

. “An electrical power system reconfiguration model based on optimal transmission switching under scenarios of intentional attacks”. Energies 2023; 16(6): 2879.

29.

Cómbita

Giraldo

Cárdenas

, et al. “Response and reconfiguration of cyber-physical control systems: a survey”. In: 2015 IEEE 2nd Colombian conference on automatic control (CCAC). Manizales, Colombia: IEEE, 2015, pp. 1–6. DOI: 10.1109/CCAC.2015.7345181.

30.

Stasinopoulos

Ntantogian

Xenakis

Commix: detecting and exploiting command injection flaws. In: Dept. Digit. Syst., univ. Piraeus, piraeus, Greece. Greenwich, CT: White Paper, 2015.

31.

Berting

. “The Ukraine cyber war: an analysis of the Russian cyber doctrine for comparing the Ukraine National Cyber Security Strategy with those of other western countries”. Prague: Charles University Digital Repository, 2023.

32.

Alam

Khan

Chowa

SBZ

, et al. (2023) 39, 47, Use of blockchain to prevent distributed denial-of-service (DDoS) attack: a systematic literature review. Berlin: Springer.

33.

Kazmi

SHA

“Threat intelligence with non-IID data in federated learning enabled intrusion detection for SDN: an experimental study”. In: 2023 24th international arab conference on information technology (ACIT), Ajman, United Arab Emirates, December 6–8, 2023, pp. 1–6.

34.

Alam

Khan

Chowa

SBZ

, et al. “Use of blockchain to prevent distributed denial-of-service (DDoS) attack: a systematic literature review”. In: Chinara

(ed), et al. (eds). Advances in Distributed Computing and Machine Learning. Singapore: Springer, 2023, vol 660, pp. 39–47.

35.

Saad

Faddel

Youssef

, et al. “On the implementation of IoT-based digital twin for networked microgrids resiliency against cyber attacks”. IEEE Trans Smart Grid 2020; 11(6): 5138–5150.

36.

Kandasamy

Venugopalan

Wong

, et al. “An electric power digital twin for cyber security testing, research and education”. Comput Electr Eng 2022; 101: 108061.

37.

Martinez-Ruedas

Flores-Arias

Moreno-Garcia

, et al. “A cyber–physical system based on digital twin and 3D SCADA for real-time monitoring of olive oil mills”. Technologies 2024; 12(5): 60.

38.

Hasan

Rayhan Kabir

Abdullah

, et al. “3D relative directions based evolutionary computation for UAV-to-UAV interaction in swarm intelligence enabled decentralized networks”. Alex Eng J 2023; 85: 104–113.

39.

Kabir

Alam

Allayear

, et al. Relative direction: location path providing method for allied intelligent agent. In: Singh

(ed), et al. (eds) Relative direction: location path providing method for allied intelligent agent. Singapore: Springer, 2018, vol 905, pp. 381–391.

40.

Khan

MMS

Giraldo

Parvania

. “Real-Time cyber attack localization in distribution systems using digital twin reference model”. IEEE Trans Power Deliv 2023; 38(5): 3238–3249.

41.

Kumar

Aljuhani

, et al. “Digital twin-driven SDN for smart grid: a deep learning integrated blockchain for cybersecurity”. Sol Energy 2023; 263: 111921.

42.

Sifat

Choudhury

Das

, et al. “Towards electric digital twin grid: technology and framework review,”. Energy and AI 2023; 11: 100213.

43.

Yan

Kunhui

. “Novel cyber-physical architecture for optimal operation of renewable-based smart city considering false data injection attacks: digital twin technologies for smart city infrastructure management”. Sustain Energy Technol Assessments 2024; 65: 103733.

44.

Ali

Kaddoum

, et al. “A smart digital twin enabled security framework for vehicle-to-grid cyber-physical systems”. IEEE Trans Inf Forensics Secur 2023; 18: 5258–5271.

45.

Sousa

Arieiro

Pereira

, et al. “ELEGANT: security of critical infrastructures with digital twins”. IEEE Access 2021; 9: 107574–107588.

46.

Danilczyk

Sun

. “ANGEL: an intelligent digital twin framework for microgrid security”. In: 2019 north American power symposium (NAPS). Wichita, KS, USA: IEEE, 2019, pp. 1–6.

47.

Al-Qirim

Bani-Hani

Majdalawieh

, et al. DiGraph enabled digital twin and label-encoding machine learning for SCADA network’s cyber attack analysis in industry 5.0. IEEE Open J Commun Soc 2024; 99: 1.

48.

Kallesøe

Wisniewski

. Cyber-attack and fault detection using a digital twin of the controller software. IFAC-PapersOnLine 2024; 58: 97–102.

49.

Mustofa

Rafiquzzaman

Hossain

NUI

. Analyzing the impact of cyber-attacks on the performance of digital twin-based industrial organizations. Journal of Industrial Information Integration 2024; 41: 100633.

50.

Krishnaveni

Chen

Sathiyanarayanan

, et al. CyberDefender: an integrated intelligent defense framework for digital-twin-based industrial cyber-physical systems. Berlin: Cluster Computing. Springer Science and Business Media LLC, 2024.

51.

Varghese

Ghadim

Balador

, et al. “Digital twin-based intrusion detection for industrial control systems”. In: 2022 IEEE international conference on pervasive computing and communications workshops and other affiliated events (PerCom workshops). Pisa, Italy: IEEE, 2022, pp. 611–617.

52.

Oleinikova

Mutule

Putnins

. “PMU measurements application for transmission line temperature and sag estimation algorithm development”. 2014 55th international scientific conference on power and electrical engineering of riga technical university (RTUCON). Riga, Latvia: IEEE, 2014, pp. 181–185.

53.

Vanfretti

Baudette

White

“Monitoring and control of renewable energy sources using synchronized phasor measurements”. Renewable Energy Integration 2017; 2017: 419–434.

54.

Dixit

. “A review on optimal placement of phasor measurement unit (PMU)”. System Assurances 2022; 2022: 513–530.

55.

Fritz

, “Simulation of man in the middle attack on smart grid testbed”. SoutheastCon, Huntsville, AL, USA: IEEE, 2019, pp. 1-6.

56.

Valli

. “SCADA forensics with Snort IDS”. In: Proceedings of WORLDCOMP2009, Security and Management 2009. USA July: CSREA Press, Las Vegas Nevada, 2009, pp. 618–621.

57.

Basagiannis

. “Implementation experiences from smart grid security applications and outlook on future research”. Smart Grid Security 2015: 283–306.

58.

Sheeraz

Hanif Durad

Tahir

, et al. “Advancing Snort IPS to achieve line rate traffic processing for effective network security monitoring”. IEEE Access 2024; 12: 61848–61859.

59.

Sharma

Kumar

Jain

. “Breast cancer prediction based on neural networks and extra tree classifier using feature ensemble learning”. Measurement: Sensors 2022; 24: 100560.

60.

Chen

Guestrin

. “Xgboost: a scalable tree boosting system”. Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining 2016; 2016: 785–794.

61.

Punuri

Kuanar

Kolhar

, et al. “Efficient net-XGBoost: an implementation for facial emotion recognition using transfer learning”. Mathematics 2023; 11(3): 776.

62.

Hastie

“Random forests”. In: The elements of statistical learning. Springer Series in Statistics, 2009, pp. 587–604.

63.

Bühlmann

. “Bagging, boosting and ensemble methods”. Handbook of Computational Statistics 2012; 2011: 985–1022.

64.

Jaafar

. “Logistic regression in analyzing the determinants of university students' mathematics performance”. Mathematical Sciences and Informatics Journal 2021; 2(2): 67–75.

65.

Afroz . “Smart grid false data injection attack prediction”. San Francisco: Kaggle.

66.

Aric

Swart

Chult

“Exploring network structure, dynamics, and function using NetworkX”. In: SCIPY 08. United States: Pasadena, 2008.

67.

Webstar

SBG

“The path space of a directed graph”

Proc Am Math Soc 2014; 142(1): 213–225.

68.

Chaudhri

“An introduction to knowledge graphs”. Stanford: Stanford AI Lab, 2021.

69.

Rajesh

Satyanarayana

. “Evaluation of machine learning algorithms for detection of malicious traffic in SCADA network”. J Electr Eng Technol 2022; 17: 913–928.

70.

Alqudhaibi

Albarrak

Aloseel

, et al. “Predicting cybersecurity threats in critical infrastructure for industry 4.0: a proactive approach based on attacker motivations”. Sensors 2023; 23(9): 4539.

71.

Ahakonye

LAC

Nwakanma

Lee

, et al. “Efficient classification of enciphered SCADA network traffic in smart factory using decision tree algorithm”. IEEE Access 2021; 9: 154892–154901.

72.

Ahakonye

LAC

Nwakanma

Lee

, et al. “SCADA intrusion detection scheme exploiting the fusion of modified decision tree and Chi-square feature selection”. Internet of Things 2023; 21: 100676.

73.

Polat

Türkoğlu

Polat

, et al. “Multi-Stage learning framework using convolutional neural network and decision tree-based classification for detection of DDoS pandemic attacks in SDN-based SCADA systems”. Sensors 2024; 24(3): 1040.

74.

Upadhyay

Manero

Zaman

, et al. “Intrusion detection in SCADA based power grids: recursive feature elimination model with majority vote ensemble algorithm”. IEEE Trans Netw Sci Eng 2021; 8(3): 2559–2574.

75.

Akhtaruzzaman

Hasan

Kabir

, et al. “HSIC bottleneck based distributed deep learning model for load forecasting in smart grid with a comprehensive survey”. IEEE Access 2020; 8: 222977–223008.

76.

Hasan

Ahmed

Islam

, et al. “Malaysia energy outlook from 1990 to 2050 for sustainability: business-as-usual and Alternative-policy Scenarios based economic projections with AI based experiments”. Energy Strategy Rev 2024; 53: 101360.

77.

Dhasaratha

Hasan

Islam

, et al. “Data privacy model using blockchain reinforcement federated learning approach for scalable internet of medical things”. CAAI Trans Intell Technol 2024.