Sage Journals: Discover world-class research

Abstract

Software-Defined Networking (SDN) is a strategy that leads the network via software by separating its control plane from the underlying forwarding plane. In support of a global digital network, multi-domain SDN architecture emerges as a viable solution. However, the complex and ever-evolving nature of network threats in a multi-domain environment presents a significant security challenge for controllers in detecting abnormalities. Moreover, multi-domain anomaly detection poses a daunting problem due to the need to process vast amounts of data from diverse domains. Deep learning models have gained popularity for extracting high-level feature representations from massive datasets. In this work, a novel deep neural network architecture, supervised learning based LD-BiHGA (Low Dimensional Bi-channel Hybrid GAN Attention) system is designed to learn class-specific features for accurate anomaly detection. Two asymmetric GANs are employed for learning the normal and abnormal network flows separately. Then, to extract more relevant features, a bi-channel attention mechanism is added. This is the first study to introduce an innovative hybrid architecture that merges bi-channel hybrid GANs with attention models for the purpose of anomaly detection in a multi-domain SDN environment that effectively handles real-time unbalanced data. The suggested architecture demonstrates its effectiveness on three benchmark datasets, achieving an average accuracy improvement of 7.225% on balanced datasets and 3.335% on imbalanced datasets compared to previous intrusion detection system (IDS) architectures in the literature.

Keywords

Hybrid GAN intrusion detection deep learning attention model dimensionality reduction denoising autoencoder

1 Introduction

Communication networks have seen tremendous achievements in recent years due to the advent of numerous mobile technologies. Packet-switching network-based applications such as Google, WhatsApp, Facebook, and Netflix have been instrumental in this. The cost-effective management of these geographically dispersed networks with retailer-defined tools and a scarcity of network administrators has long been a challenge. SDN assists in the reconfiguration of the entire network architecture as a software-based packet-switching network to resolve this concern. SDN deploys controllers, which control the entire network by issuing commands to network switches [1].

The multi-domain controller is used for dealing with large-scale networks. A hierarchy of control planes is maintained in multi-domain SDN controller architecture [2]. According to this paradigm, each domain has one controller for handling local events, while global events are handled by a higher level in the hierarchy, as indicated in Fig. 1. The multi-domain controller aids in the optimization of each SDN domain controller’s services for network supervision and high-performance application administration.

Fig. 1

Hierarchical multi-domain SDN topology.

Despite SDN tackling traditional network attacks, it is still vulnerable to new assaults focused on the centralized controller. Hijacking the controller provides the ability for attackers to exploit many network devices. Thus, future SDN research should concentrate on detecting abnormalities to enhance network security and simplify network management.

A multi-domain controller must provide cross-domain anomaly detection as well as secure communication between controllers and switches to ensure domain controller security. Considering the absence of the need for identical features between domains, there is a need to learn invariant feature space. The proposed work challenges this issue through machine learning methodologies.

Deep learning [3] is an ideal machine learning model for handling the complex data flows within each domain, as it has the capability to learn robust feature representations through multiple layers between raw input and the target [4]. Despite the employment of many algorithms like Restricted Boltzmann Machine (RBM) [5], Autoencoder (AE) [6], Generative Adversarial Network (GAN) [7], Long Short-Term Memory (LSTM) [8], Recurrent Neural Network (RNN) [9], and Convolutional Neural Network (CNN) [10] in deep learning strategies, each one of them has its own pros and cons. As a result, in the proposed study, a supervised learning-based hybrid deep learning technique is used for the construction of an effective model.

The ability of deep learning networks to extract key features from input has improved due to the discovery of an attention mechanism ([11, 12]). The proposed approach employs attention mechanisms in a novel hybrid form for making considerable progress in the anomaly detection method.

The following are the significant contributions of this work:

A novel deep learning multi-domain anomaly detection system is designed with a simple dimensionality reduction technique, hybrid GAN, and an attention mechanism to extract optimal features.

Ensemble dimensionality reduction at the domain level improves network management by reducing SDN traffic and computational complexity at the SDN controller.

A dual attention channel is used for the selection of the most relevant discriminating attack and benign features.

The efficiency of the model is demonstrated on both balanced and unbalanced supervised datasets.

The aim of this research is to achieve precise anomaly detection across multiple domains. The following hypotheses are explored in this study:

The IDS within the multi-domain SDN controller exhibits superior anomaly detection capabilities, especially for anomalies that impact multiple domains simultaneously within the hierarchical multi-domain SDN architecture.

The utilization of deep learning methodologies is to be more effective in processing extensive data from different controllers within the multi-domain SDN controller.

The hierarchical SDN architecture facilitates prompt response from at least one controller when a service flow sends a request to enhance the performance of multi-controller networking in large-scale networks.

The rest of this article is organized as follows: Related works related to the proposed mechanism are addressed in section 2. The system model, its security requirements, and its working methodology are presented in section 3. Section 4 provides a description of the proposed anomaly detection system. Section 5 presents an empirical study of the methodology. The findings and comparative analysis of the proposed system are provided in section 6. Finally, the conclusion is drawn along with suggested future enhancement.

2 Related works

Many studies rely on traditional machine learning techniques for tasks such as dimensionality reduction and classification. Traditional machine learning techniques have proven inefficient in handling large-scale network flow data. As a result, there is a need for the utilization of deep learning approaches for feature learning and classification. Deep learning, a branch of artificial neural network architectures, is favored for its ability to quickly learn and reveal hidden patterns in input distributions. In the past six years, related research has been explored from two distinct perspectives: one involving the use of deep learning methods for dimensionality reduction, and the other focusing on their application in the field of intrusion detection.

2.1 Deep learning methods for feature learning

Despite the utilization of various deep learning networks for feature dimensionality reduction, recent research has shown a preference for Autoencoders and their variants, including Denoising Autoencoders (DAE), Sparse Autoencoders, and GANs. These choices are based on their compatibility with other deep neural network architectures, their alignment with unsupervised learning paradigms, and their ability to ensure non-linear transformations.

An autoencoder-based wrapper feature selection framework was designed by Sharan Ramjee and Aly El Gamal [13]. The model’s hypothesis proposed that the significance of features was influenced by two characteristics: relevance and redundancy. Irrelevant features were considered insignificant as their removal does not lead to a reduction in classification accuracy. Conversely, redundant features were also considered insignificant since they can be inferred or approximated from other features, regardless of whether their relationship with these features is linear or non-linear, as long as the other features remain present. The framework was used in conjunction with the exclusive ranker model for the removal of features that were not found necessary for the classification mechanism and autoencoders for the elimination of correlated features. Consequently, backward feature selection was employed to improve efficiency. The classifier belongs to a category of supervised deep learning techniques tailored for specific applications. In another work [14], the authors combined unsupervised autoencoders for learning features separately among benign and anomaly flows with a supervised 1D convolution layer to reveal feature dependency among channels. Then, with the help of fully connected layers as the classifier, convolved features were refined for the promotion of other patterns’ interactions.

The following research works primarily focused on stacked autoencoders. This work [15] involved extraction of the corresponding features and maintaining the essential information using stacked autoencoder. The authors have identified the outliers based on their significant reconstruction error and restored them. Additionally, it supported two criteria like Grubbs and PauTa to facilitate the detection of outliers among the benign data. It enhanced the detection of both isolated and continuous outliers. A stacked sparse autoencoder exploited by Binghao Yan and Guodong Han [16], was used for the extraction of high-dimensional features. The optimal deep sparse features obtained were highly discriminative and significantly accelerated the classification process when used with three classifiers: support vector machine (SVM), random forests (RF), and K-nearest neighbor (KNN).

A denoising autoencoder was used in some works ([17 –19]) for the introduction of noise into the neural network and avoidance of learning the identity function. In the study [17], an ensemble approach was employed to perform dimensionality reduction on network traffic data, aiming to improve the detection of attack data. The process involved using statistical machine-learning techniques to select a substantial number of features. These features were then validated using a denoising autoencoder to ensure their effectiveness and relevance in the task. Intelligent defect diagnostic method [19] was introduced by the authors for the extraction of typical features from a huge amount of unlabeled data using an unsupervised denoising autoencoder. Only a little amount of labeled data was required for fine-tuning the deep neural network. This deep neural network architecture accomplished improved performance in the classification of faults. A Robust Software Modeling Tool (RSMT) [18] was used in this work for the examination of the runtime performance of the web apps. By utilizing a stacked denoising autoencoder, RSMT successfully detected the low-dimensional representation of the observed raw web application features. Additionally, this approach enabled the automatic detection of attacks on web applications. End-to-end deep learning techniques were paired with autonomic runtime behavior monitoring and web application description for the production of reliable, high-level output from raw feature input.

Based on the literature review of dimensionality reduction, it is evident that high-computation models are frequently employed for reducing feature dimensions. Consequently, there is a need for a simple model with low computational overhead, yet capable of maintaining high performance. In this proposed research, we leverage a basic unsupervised denoising autoencoder. This approach intentionally introduces corruption into the input data and then trains the model to enhance its robustness through this process.

2.2 Deep learning methods for intrusion detection

Machine learning algorithms have been used to build most IDS. Meanwhile, deep learning approaches are also being investigated to achieve high accuracy and efficiency, particularly when handling vast amounts of data. Despite the publication of numerous papers on deep learning algorithms for intrusion detection, they can be summarized into major categories, including CNN, LSTM, RBM, Autoencoder, and Attention.

The authors in [20] preferred CNN for IDS. It was a multilayered discriminative neural network comprising convolution and pooling layers stacked one over the other. In [21], a HYBRID-CNN model was devised to facilitate dual-channel feature extraction in the SDN-based Smart Grid for identifying anomalous flow. It memorized the global features filtered by a deep neural network (DNN) while training the one-dimensional data. Then, it generalized these features with the help of the CNN network. This model appreciated the importance of both DNN’s global learning and CNN’s local generalization. In another work, the authors suggested a near real-time SDN security solution [22] for safeguarding its controller from Distributed Denial of Service (DDoS) attacks. The authors used CNN to detect DDoS attacks and mitigate the eradication of traffic impairment. The game theory (GT)-based mitigation was helpful in restoring SDN’s normal activities, providing a reasonable methodology for dealing with internal and external DDoS attacks.

LSTM is a type of RNN that learns and classifies time series data to forecast long-term dependencies more accurately than vanilla RNNs, as indicated by various studies [23]. In another study [24], the LSTM-FUZZY network was presented for the detection and mitigation of DDoS and Portscan attacks in SDN environments. The authors developed a semi-supervised LSTM for predicting normal network activities by utilizing IP flows. Then the attacks were recognized by the coupling of Bienaymé-Chebyshev’s inequality with fuzzy logic. The authors preserved the network operations by using automated mitigation policies. Anomalous flows were dropped using MacNemar’s test with a significance level of 5%. The test was conducted to assess the null hypothesis that the marginal frequencies are equal. In this work, a BiDirectional LSTM (BiLSTM) [25] was effectively employed to improve the overall anomaly detection rate. It also significantly reduced the number of false alarms for each attack class.

RBM is a two-layered energy-based model that has its scalar energies adjusted throughout the learning process for the achievement of the desired qualities [26]. In [27], the authors presented a hybrid deep learning framework for the enhancement of the reliability of the SDN. This framework was employed with a multi-objective flow routing mechanism and an upgraded RBM with SVM in SDN. In another work [28], the authors created an Anomaly Network Intrusion Detection System with the help of DRBM (Deep Restricted Boltzmann Machine), for the detection of a new attack pattern. This system was tested with the Information Security Center of Excellence (ISCX) dataset, which was a well-balanced dataset that could help eliminate biases in the RBM network’s training.

Unlike conventional deep learning networks, GAN generates adversarial data by applying non-linear transformations to actual data. The authors have presented a powerful GAN-based framework [29] for detecting anomalies in unknown data. Using a specially designed loss function and the Wasserstein distance, it focused on multiple intermediary layers for closing the gap between latent and actual space. In [30], researchers combined RF with GAN and introduced GAN-RF to identify optimal solutions for anomaly detection in imbalanced datasets. They used GAN to generate minority class samples, leading to an improvement in overall accuracy. The authors have suggested a unique deep learning model based on GAN in [31], that uses BiLSTM as a generator and CNN as a discriminator to produce synthetic electrocardiograms(ECGs) identical to actual ECG data.

Deep learning-based anomaly detection has been combined with attention mechanisms in recent years for improved performance rather than traditional solo methodologies as in [11] and [32]. In [33], the authors designed an architecture, which combined BiLSTM with an attention mechanism as well as multiple convolutional layers. The local and data packet features were retrieved using these convolutional and BiLSTM layers. After that, the attention mechanism performed feature learning on the network flow vector on its own, eliminating the need for feature engineering. This architecture addressed the issue of lower accuracy commonly associated with traditional machine learning techniques. The Attention for Network Intrusion Detection model presented by authors in another work [34], was a modified version of the transformer model that leveraged timeslot-derived features to help identify real-time network intrusions. It was employed in language translation and has proven to be effective in detecting attacks.

The multi-domain controller not only assesses the classifier’s performance but also indirectly evaluates the quality of the reduced data generated by the domain controllers. The effectiveness of dimensionality reduction becomes evident through its impact on the subsequent predictive tasks’ performance. Attackers in the real world use numerous network domains to carry out their attacks. Multi-domain attack detection with huge traffic, without any deterioration in network performance, is a critical task. Furthermore, it is worth noting that many studies have primarily focused on enhancing classifier performance using balanced datasets, which may not accurately reflect real-world traffic scenarios. Hence, the purpose of this study is to improve results for unbalanced datasets. Additionally as shown in Table 1, some research relies on unsupervised models for feature reduction, using the same model for attack classification based on reconstruction error. On the other hand, others prefer using a separate supervised classifier model for classification. To address these challenges, a hybrid deep learning approach is proposed, which improves feature reduction and classification accuracy. Unlike most studies that avoid making hypotheses about system relationships or constraints, this research incorporates hypotheses to provide explanations. Consequently, recent popular high-performance deep learning models have been used in the proposed multi-domain controller attack detection framework.

Table 1
A summary of existing dimensionality reduction and intrusion detection approaches in the last six years

Category DL method Author Features Dataset Learning methodology Year

Dimensionality Reduction Autoencoder [14] Statistical KDDCUP99, UNSW-NB15, CICIDS2017 Unsupervised,

Supervised 2020

[13] Statistical MNIST, Reuters, Wisconsin Breast Cancer, RadioML2016.10b Unsupervised,

Supervised 2020

Stacked Autoencoder [15] Statistical ADIAC, Self-collected Unsupervised 2019

[16] Statistical KDD99, NSL-KDD, Kyoto2006 Unsupervised,

Supervised 2018

Denoising Autoencoder [17] Statistical KDD99 Unsupervised 2022

[19] Statistical Motor bearing vibration signals Unsupervised 2017

[18] Statistical Self-collected Unsupervised/Semi-

Supervised 2019

Intrusion Detection CNN [21] Statistical UNSW_NB15, KDDCup 99 Supervised 2020

[22] Statistical + payload CICDDoS 2019, Self-collected Supervised 2020

LSTM [25] Statistical NSL-KDD Supervised 2019

[24] Statistical + payload CICDDoS 2019, Self-collected Semi-supervised 2020

RBM [27] Statistical KDDCup 99, CMU, Self-collected Unsupervised,

Supervised 2018

[28] Statistical ISCX Unsupervised,

Supervised 2018

GAN [29] Statistical KDDCUP99 Unsupervised 2019

[30] Statistical CICIDS2017 Unsupervised 2020

Attention [33] Statistical NSL-KDD Supervised 2020

[34] Statistical CICIDS2017 Supervised 2020

3 System model and security bottlenecks

In this section, the system overview model and working techniques are presented, along with security system bottlenecks for the proposed LD-BiHGA approach.

3.1 System model

The network is logically viewed as a collection of domains based on geographical area with each domain controlled by a domain controller while the underlying domain controllers are controlled by a multi-domain controller.

Each element in the forwarding plane of every domain is directly programmed by its corresponding logical domain controller. These domain controllers communicate with the forwarding plane elements through open southbound interfaces, such as the IETF’s ForCES (Forwarding Control Element Separation) and the ONF’s OpenFlow [35]. Figure 2 shows domain controllers transmit their arbitrated results of reduced preprocessed data to a multi-domain controller. The multi-domain controller then validates the data and reports it back to the domain controllers, which in turn, route legitimate data between domains. Thus the proposed system encapsulated its functionality in the three components via a multi-domain controller, a domain controller, and domain forwarding plane elements such as switches, routers, and access points as shown in Fig. 2.

Fig. 2

System process sequence diagram of LD-BiHGA.

3.2 System security bottlenecks

3.2.1 Unpredictability of SDN domain data size

In a hierarchical multi-domain SDN environment, the presence of multiple domains introduces the possibility of varying data sizes, depending on the real-time demands of the network. Additionally, these SDN domains are susceptible to various attacks, underscoring the crucial role of the multi-domain controller as the central defender for the entire network. As the multi-domain controller manages SDN domains, detecting abnormalities in both small and large datasets becomes a critical concern for its effective operation.

3.2.2 Imbalanced SDN domain data

Some domains may contain predominantly legitimate data, while others may possess minimal legitimate data. In other words, variations in the proportion of legitimate data are inevitable, necessitating the domain controller’s ability to adapt to such diversity. Moreover, achieving reliable anomaly detection becomes challenging when dealing with a substantially lower degree of attack data.

3.3 Working methodology

The workflow of the proposed system is briefly described in this section. At each geographical domain, the devices at the physical layer send their access requests via the network. Their corresponding domain controller extracts the features by grasping the statistical flow table of the request, as shown in Figure 3. Each domain controller performs an ensemble dimensionality reduction process via three steps: data preprocessing, feature selection, and feature extraction to format data for further processing, and the dimensionality of data is thus reduced.

The reduced features of each domain are then provided to the multi-domain controller for anomaly detection. Leveraging the Bi-channel Hybrid GAN Attention (BiHGA) mechanism, the multi-domain controller identifies abnormalities in three distinct phases: dissociation, detection, and reporting. Normal and abnormal data are separated from domain-specific reduced features using the dissociation process. In the detection phase, the BiHGA algorithm comprises three stages for identifying abnormal flows: feature extraction on relevant classes, feature merging, and classification. The reporting phase is used for conveying anomaly reports to the control plane, and when an anomaly is detected, the multi-domain controller sends an alarm to its associated domain controller, instructing it to discard the packet.

Fig. 3

Working methodology of the proposed LD-BiHGA approach.

4 LD-BiHGA System Details

As shown in Fig. 4, the proposed system performs the functionalities of two primary components: domain controller dimensionality reduction and multi-domain controller traffic classification.

Fig. 4

LD-BiHGA system functionality.

4.1 Ensemble dimensionality reduction approach in domain controller

In one of our previous works [17], we highlighted that ensemble-based dimensionality reduction techniques have demonstrated their ability to generate improved features for subsequent classification stages. Consequently, the preprocessing stage in our proposed work employs a similar dimensionality reduction technique through the following steps:

4.1.1 Feature preprocessing

One-hot encoding is accomplished using feature encoding [36], and feature normalization [37] is performed via min-max normalization.

4.1.2 Feature selection

Highly correlated features are identified and eliminated using Spearman’s cross-correlation technique. This method is particularly recommended for feature selection due to its resistance to outliers [38] and its ability to capture non-linear relationships, which allows it to prioritize low-correlated features. Initially, the current SDN feature values (f₁, f₂, …, f_n) within each domain are ranked. These values are translated to an appropriate range, ensuring that they are normalized and can be compared across different variables. Then, Pearson’s correlation [39] has been found among the ranked feature variables (r_{f
₁}, r_{f
₂}, …, r_{f
_n}) to obtain the Spearman’s rank correlation value. Pearson’s correlation measures the linear association between two variables and provides a value between -1 and 1, where 1 indicates a perfect positive correlation, -1 indicates a perfect negative correlation, and 0 indicates no correlation. During this step, the actual feature values are substituted with their corresponding ranks. The Spearman’s cross-correlation coefficient (ρ_{r_f
₁,r_f
₂,…,r_{f
_n}}) is then determined, which represents the covariance of the ranked feature variables (cov (r_{f
₁}, r_{f
₂}, …, r_{f
_n})) normalized by their respective standard deviations (σ_{r_f
₁}, σ_{r_f
₂}, …, σ_{r_{f
_n}}). This normalization accounts for the variability in the ranked variables and ensures that the correlation coefficient is not influenced solely by the magnitude of the feature values. Equation 1 defines the formula to calculate Spearman’s cross-correlation coefficient r_s: $r_{s} = ρ_{r_{f_{1}}, r_{f_{2}}, r_{f_{n}}} = \frac{cov (r_{f_{1}}, r_{f_{2}}, \dots, r_{f_{n}})}{(σ_{r_{f_{1}}}, σ_{r_{f_{2}}}, \dots, σ_{r_{f_{n}}})}$ (1) where (cov (r_{f
₁}, r_{f
₂}, …, r_{f
_n})) denotes the covariance of the ranked feature variables and (σ_{r_f
₁}, σ_{r_f
₂}, …, σ_{r_{f
_n}}) represents their corresponding standard deviations.

4.1.3 Feature extraction

Following the removal of extraneous features, the denoising autoencoder has been used to select useful features, as it performs better on noisy data and avoids overfitting concerns. Bottleneck features were extracted using pre-trained networks [40] and hence relevant features were extracted from a bottleneck layer of an appropriate size, which enhances prediction accuracy.

The detection model is designed to handle m-sample data, which is represented as a vector X = {x₁, x₂, …, x_m}. Each x_i is an n-dimensional feature vector containing n distinct features, and $\hat{x_{i}}$ corresponds to the noise vector associated with each x_i. The main process begins with an imperative function (2) that utilizes an encoder to transform the input vector into a hidden vector. $h_{i} = r (W {\hat{x}}_{i} + b)$ (2)

In equation (2), the transformation involves using a weight matrix W and a bias value b to map each noise vector $\hat{x_{i}}$ to the corresponding hidden vector h_i. This mapping is achieved through the activation function r. The activation function r applied in equation (2) is the Rectified Linear Unit (ReLU) [41], as defined in equation (3). ReLU introduces non-linearity into the model by activating certain neurons while deactivating others. $r (t) = \max (0, t)$ (3) The ReLU function takes an input value of t and applies the "max" function. If t is positive or zero, the ReLU function returns t itself. However, if t is negative, the ReLU function returns 0. The hidden vector h_i obtained from the encoder is then passed through a decoder, which aims to reconstruct the original input vector y_i using the following mapping function: $y_{i} = r (W^{'} h_{i} + b^{'})$ (4)

where W′ and b′ represent the weight matrix and bias value, respectively, used to map the hidden vector h_i to an output vector y_i. Again, the ReLU activation function r is applied to introduce non-linearity to the decoding process. The goal of training DAE is to reduce the deviation between the noisy input $\hat{x_{i}}$ and the reconstructed output y_i vectors. This is achieved using the mean squared loss function [42]: $L (\hat{x}, y) = \frac{1}{m} \sum_{i}^{m} | | \hat{x_{i}} - y_{i} | |^{2}$ (5)

The loss function (5) calculates the mean squared difference between the original noisy input vectors $\hat{x_{i}}$ and the corresponding reconstructed output vectors y_i, for all m samples in the dataset. By minimizing this loss, the DAE aims to improve its ability to denoise the input and achieve more accurate reconstructions. This, in turn, helps the model to learn useful representations and features from the data. These steps illustrated in Figure 5 ensured a reduction in feature space without compromising detection accuracy.

Fig. 5

Workflow model of dimensionality reduction in SDN domain controllers.

Thus, to obtain the reduced feature subset, we employed a predictive denoising autoencoder model with a single bottleneck layer [17]. As a result, our ensemble dimensionality reduction strategy produces a simple and cost-effective model for the subsequent classification stages.

4.2 Anomaly detection in the multi-domain controller

The reduced feature data was received by the multi-domain controller from its underlying domain controllers. Since, in real-time, hackers can attack the network from various domains, anomalies in the complete and collected feature set were found necessary. As the classification accuracy depends on the quality of features, one of the essential tasks of this work was to extract the optimal features, and two extraction strategies, namely, hybrid GAN and attention network have been used in Bi-HGA system for the extraction of optimal features as shown in Fig. 6.

Fig. 6

Bi-channel Hybrid GAN Attention (BiHGA) based IDS.

4.2.1 Data preprocessing

Data Partition To start with, the identification of reduced features of each domain required collection and partitioned based on their dual data flows. Benign and attack data were collected separately and provided to the BiHGA model, enabling it to learn features individually for each network flow category.

4.2.2 BiHGA algorithm

Figure 6 illustrates the structure of the BiHGA model, which incorporates an intelligent strategy for feature learning and classification. SDN, in general, encompasses flows for both benign and malicious traffic. Thus, this model contains individual feature extraction for benign and attack flows, feature attention, feature fusion, and attack classification.

Feature Extraction Utilizing a hybrid GAN approach, as demonstrated in [31], with the LSTM generator and the CNN discriminator, the features for normal and abnormal flow have been extracted separately. With the ability of GAN to do automatic creation of realistic data in a semi-supervised manner [43], it is capable of generating synthetic data features for both attacks and typical SDN flows. So, it solves the problems caused by the use of suspicious features that are not always benign or malicious. The following are the steps involved in training the GAN model:

Using the random latent space, the LSTM generator generated data that resembled real data. The purpose of this generator was to figure out how real data is distributed while learning patterns that evolve over time.

Generated or actual data, arbitrarily supplied into the CNN discriminator, functioned as a classifier along with the extraction of in-depth features. It helped identification of the given data that originated from the generator or real data set. That is, it made a valid guess about the distribution of real data.

The loss in the generator was determined using both networks, namely the LSTM generator and the CNN discriminator. This required the combination of the losses from both networks, which were then backpropagated to the generator in the process of learning real data patterns. The duty of the discriminator was simple and losses were minimal when the performance of the generator was poor.

The procedure was repeated until the CNN discriminator could no longer identify the difference between created and actual data.

These two networks adopt a two-player minimax game as stated in [44] with diminished data X={x_i, i = 1, 2, …, n} and random latent space Z={z_i, i = 1, 2, …, n} and are trained using equation 6: $\begin{matrix} \overset{\min}{LSTM} \overset{\max}{CNN} V (CNN, LSTM) = \\ ɛ_{x \sim p_{diminished data}} (X) [log (CNN (x))] + \\ ɛ_{z \sim p_{z}} (Z) [log (1 - CNN (LSTM (z)))] \end{matrix}$ (6)

The first term in equation (6), ɛ_{x∼p_{diminisheddata}} (X) [log (CNN (x))] involves real data samples x drawn from the real data distribution p_{diminisheddata} (also known as the data domain X). The conditional discriminator CNN evaluates the real data samples x and provides a measure of confidence that the input x is real. The logarithm function log is applied to this value to transform it into a log-likelihood measurement.

The second term in equation (6), ɛ_{z∼p_z} (Z) [log (1 - CNN (LSTM (z))] involves latent variables z drawn from the latent space p_z (also known as the noise domain Z). The standard GAN generator LSTM takes these latent variables z as input and generates synthetic data samples y=LSTM(z). The conditional discriminator CNN then evaluates the synthetic samples y and provides a measure of confidence that the input is real. Since these samples are synthetic, 1-(CNN(LSTM(z))) is used to represent the discriminator’s confidence that the input is fake. The logarithm function log is applied to this value to transform it into a log-likelihood measurement.

LSTM generator LSTM is a form of RNN that acts as a generator [45]. RNNs are mostly employed for dealing with time-series data since they can follow and recognize patterns that develop over time. However, it may suffer from a vanishing gradient problem and make the model unfit to converge to a minor loss. To address this issue, LSTM is employed as a generator. LSTM is an effective technology for improving network traffic forecasting [46]. It is employed with each flow attribute to forecast the behavior of the network [24]. Bi-LSTM simultaneously learns the time-correlated features from forward and backward directions for enhancement of accuracy prediction and so employed in this investigation [47]. Both forward and backward LSTM layers have been used in this model. Vertical flow is a one-way stream from input to hidden layer, followed by output layer, whereas horizontal flow estimates forward LSTM hidden vector $\vec{h_{t}}$ and backward LSTM hidden vector simultaneously [48]. The ultimate conclusion of Bi-LSTM based on the connection of two hidden states is as follows: $\vec{h_{t}} = Bi - LSTM (x_{t}, \vec{h_{t - 1}})$ (7) $\overset{\leftarrow}{h_{t}} = Bi - LSTM (x_{t}, \vec{h_{t + 1}})$ (8) $y_{t} = W_{\vec{h_{y}}} \vec{h_{t}} + W_{\overset{\leftarrow}{h_{y}}} \overset{\leftarrow}{h_{t}} + b_{y}$ (9)

where Bi-LSTM is used for denoting LSTM functions, while the weights of forwarding LSTM and backward LSTM are represented by $W_{\vec{h_{y}}}$ and $W_{\overset{\leftarrow}{h_{y}}}$ respectively and bias of the output layer is denoted by b_y. CNN discriminator A convolutional neural network was employed as a discriminator due to its effective extraction of in-depth features and ability to handle spatial data. The input was converted from a one-dimensional vector to two-dimensional matrices to get deeper features. A sliding convolutional kernel was employed for the prediction of the local features of the given network flow. This kernel also limits scalability on account of the whole connection of neurons. The functionality of this network was supported by convolutional and flattening layers. The convolution-based feature extraction has been expressed in Equation 10: $F_{b} = f (Σ (W^{ab} \otimes F_{a - 1}))$ (10) where F_b denotes the convolutional layer output of the b^th feature map, f denotes the activation function, F_a-1 is the output of a-1 layer, ⊗ denotes the convolutional filter, and W^ab defined as the weights of the a^th layer in the b^th feature map. The extracted features were then applied to training data for the production of final predictions. The extracted features were then used for making final predictions. The main purpose of convolutional layers is to recognize spatial patterns and reduce input noise. Then, at the fully connected layer, each neuron was directly connected to every other neuron in both the previous and next layer.

In general, the GAN discriminator can extract features without any interruption from noise [49]. Thus the features extracted from the CNN discriminator were used for assistance to further processing. Hence it is dispatched to the subsequent attention layer.

Feature attention The extracted convolved features from both the benign and attack hybrid GAN models were emphasized using the LSTM-based self-attention strategy, which eliminated the vanishing gradient problem. The input F_i (k) contained queries, keys of dimension (d_k) and values of dimension (d_V) [50]. The queries were organized into a matrix (Q), with the corresponding keys and values organized into matrices K and V. The weights on the values were then calculated using the softmax function and normalized to a probability distribution. This type of attention is also known as "Scaled Dot-Product Attention" as stated in Equation 11. $Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt d_{k}}) V$ (11) Feature fusion and classification The attention features extracted from benign and attack BiHGA models were fused to get the combined vector F_i as shown in Equation 12. $F_{i} (K) = F_{benign} + F_{attack}$ (12)

Following feature fusion, the network flows were classified using the fully connected layer. This layer extracted high-level features related to the output layer’s mapping. It utilized the sigmoid activation function for binary classification, making the final prediction. The BiHGA model’s pseudocode is shown in Algorithm 1.

5 Empirical study

In this study, three datasets have been considered, as portrayed in Section 5.1, for the computation of the potency of the LD-BiHGA model. Cross-validation of each dataset was made using the k-fold cross-validation technique with four folds for learning the ability of the LD-BiHGA model. The LD-BiHGA model’s implementation specifics are shown in Section 5.2, and an assessment of the performance was made using commonly used IDS metrics, which are detailed in Section 5.3.

Algorithm 1 BiHGA Network Intrusion Detection model
Input:
Data: set of training samples $(x_{i}, y_{i})_{i = 1}^{N}$
X: data matrix of single-channel samples $x \in ℝ^{D}$
Set LSTM: queue to store a historical model of the discriminator
Set 1D CNN: queue to store a historical model of the generator
for each x ∈ Xdo
for each y_i∈ benign do
Sample noise p_{z _b} (z_b) via minibatch z_b1, z_b2, · , z_bm to get m random samples
Sample data p_data (x_b) via minibatch x_b1, x_b2, . . . , x_bm to get m random samples
Calculate the CNN discriminator error:
$J_{D_{b}} \leftarrow - \frac{1}{2 m} (\sum_{i = 1}^{m} log (1 D CNN (x_{bi})) + \sum_{i = 1}^{m} log (1 - 1 D CNN (LSTM (z_{bi}))))$
Update discriminator error θ_1DCNN using Adam optimizer
Calculate the LSTM generator error:
$J_{G_{b}} \leftarrow - \frac{1}{m} (\sum_{i = 1}^{m} \log (1 D CNN (LSTM (z_{bi}))))$
Update generator error θ_LSTM using Adam optimizer
Insert the updated errors of the discriminator and generator into LSTM and 1D CNN respectively
F_benign ← Self:attention:scheme (1DCNNConvolvedfeatures)
end for
for each y_i∈ attack do
Sample noise p_{z _a} (z_a) via minibatch z_a1, z_a2, . . . , z_am to get m random samples
Sample data p_data (x_a) via minibatch x_a1, x_a2, . . . , x_am to get m random samples
Calculate the CNN discriminator error:
$J_{D_{a}} \leftarrow - \frac{1}{2 m} (\sum_{i = 1}^{m} \log (1 D CNN (x_{ai})) + \sum_{i = 1}^{m} \log (1 - 1 D CNN (LSTM (z_{ai}))))$
Update discriminator error θ_1DCNN using Adam optimizer
Calculate the LSTM generator error:
$J_{G_{a}} \leftarrow - \frac{1}{m} (\sum_{i = 1}^{m} \log (1 D CNN (LSTM (z_{ai}))))$
Update generator error θ_LSTM using Adam optimizer
Insert the updated errors of the discriminator and generator into LSTM and 1D CNN respectively
F_attack ← Self:attention:scheme (1D:CNNConvolvedfeatures)
end for
end for
F_i (k) ← F_benign + F_attack
model ← DNN (F_i (k))
Output: F_i (k) , model

5.1 Dataset description

Due to a lack of sharable datasets, deep learning research was suppressed. Although testbeds can generate synthetic datasets, they may not accurately reflect real-time internet traffic. As a result, these synthetic datasets may not be credible [51]. Instead, the proposed scheme was tested using different free public network flow datasets, including KDDCUP 99, CICIDS, and InSDN. The major issue with this type of diverging network traffic was attack prediction. These datasets were in the.pcap format, which had all packet features. These files were evaluated for helping the SDN controller in the simulation of the flow table using flow statistics [52]. Table 2 depicts a high-level overview of the features in the datasets.

Table 2
Dataset description

Dataset Attributes Total Benign (%) Attacks (%)

KDDCup99 41 1,048,576 57 43

CICIDS2017 77 2,827,876 80 20

InSDN 80 343,939 20 80

Dataset	Attributes	Total	Benign (%)	Attacks (%)
KDDCup99	41	1,048,576	57	43
CICIDS2017	77	2,827,876	80	20
InSDN	80	343,939	20	80

Over the years, the KDDCup’99 dataset (Hettich and Bay) created at MIT Lincoln Labs for the purpose of the third International Knowledge Discovery and Data Mining Tools Competition has become a standard dataset for anomaly detection research. This dataset is used as a benchmark in most of anomaly detection studies. It provides labeled features for both benign and malicious network flows and does not have any raw packet-level data. There are 1,048,576 samples in all, with 41 different characteristics and one label column. There are 595,798 benign samples and 452,778 malicious samples. The 41 features are categorized as basic, content, time-based, and host-based traffic features [53]. The remaining two datasets are emphasized for addressing issues like lack of network traffic diversity and features that do not reflect the current scenario.

CICIDS2017 comprises the latest scenario with recent attacks and corresponding features [54]. The Canadian Institute for Cybersecurity (CIC) created the dataset in 2017. This CIC Intrusion Detection System (CICIDS) includes real-world threats in two files, MachineLearningCSV and GenerateLabelledFlows, which contain 86 and 79 features, respectively. A MachineLearningCSV data file that spans five days and eight traffic monitoring sessions has been used in this investigation. This data file generated eight separate data files, each containing 14 attacks. For further analysis, these eight datasets were concatenated into a single CSV file. The resulting CSV file contained 2,827,876 samples, 2,271,320 of which were benign and 556,556 of which were attack samples. There were 78 features and one label column in those samples. There was one unnecessary feature entitled "FwdHeaderLength" among the 78 features, and after deleting it, there are only 77 features left. It presented some affordable challenges to this model due to its massive and highly unbalanced data.

The earlier datasets were not compatible with SDN since they were not applied on the SDN platform. In 2020, the InSDN [55] dataset comprised of the recent attacks specific to the SDN environment, was used for clarification of the accuracy of anomaly detection systems. It also addressed the issues encountered with the CICIDS dataset, such as multiple missing values, redundant, and irrelevant records. It was categorized into three groups based on the targeted machines and the types of traffic it generated. The first group contained benign traffic, whereas the second and third groups contained anomalous traffic directed toward the OVS machine and the Mealsplotable-2 server. With 343,939 traffic cases in total, the resulting CSV file comprised 80 statistical aspects. There were 68,424 benign cases and 275,515 traffic cases among them. The tested performance of the fully featured dataset was found superior to the SDN-specific featured version of the dataset [55]. Hence, the proposed method was compared to the fully featured version of the dataset. Despite belonging to the imbalanced scenario, like the CICIDS dataset, it contained a smaller number of normal instances (20%) than attack samples (80%). But CICIDS contained 80% benign samples and 20% anomaly samples.

While these datasets proved valuable for intrusion detection research, they did not incorporate payload features to capture the actual content of network packets. Instead, their primary focus was on network flow information, facilitating the identification of potential attacks or anomalies.

5.2 Implementation details

Like many current deep learning methodologies, this proposed anomaly detection scheme was executed using TensorFlow and Keras. The simulation was done by running on a 64-bit machine with an Intel Core I7 processor, 16 GB of RAM, and an Nvidia GeForce GT 710 2GB GPU using Python and the Keras 2.3 library with TensorFlow as its backend for the evaluation of the suggested classification model.

The proposed system was tested using three datasets described in Table 2. Symbolic features were translated into numeric features for each dataset using a one-hot encoding technique, then rescaled to a given range using min-max normalization. The upright features of the network flow were then extracted using an ensemble dimensionality reduction approach [17]. The Spearman cross-correlation technique identified five correlated features in the KDDCUP99 dataset, 30 correlated features in the CICIDS dataset, and 36 correlated features in the InSDN datasets. Consequently, the denoising autoencoder network included 37 * 20 * 37 neurons in the KDDCUP dataset, 47 * 30 * 47 neurons in the CICIDS dataset, and 47 * 30 * 47 neurons in the InSDN dataset to remove the irrelevant features.

The dataset was split into two normal and abnormal channels, using the BiHGA algorithm for further processing. Feature extraction was done using a hybrid GAN with the LSTM generator and the CNN discriminator using the LeakyReLU activation function. Its generator is comprised of six LSTM hidden layers with 128 neurons each, two dense layers of 128 neurons each, and a dense layer of 20 neurons. Its discriminator included a single dense layer of 64 neurons, a 1D convolutional layer with 32 filters, which succeeded with one more single filter, two dense layers of 20 neurons, and a single neuron dense layer. Furthermore, the features were devoted by utilizing an attention mechanism with 20 neurons in the embedding layer and LSTM layer, a single neuron dense layer with a tanh activation function, a softmax attention layer, and a dense sigmoid layer for the generation of output. Finally, the resultant features were fused for classification with the help of FCN (Fully Connected Neural Network) with three dense layers. The first two dense layers had 12 and 8 neurons with ReLU activation functions and a single sigmoid output neuron. Table 3 summarises the general characteristics of the LD-BiHGA architecture.

Table 3
Specification details of LD-BiHGA system

Bi-channel Discriminator

KDDCUP99 Generator X ^Input Attention FCN

Input: (20,1) X ^Input ⇒ Dense (64) X ^Input X ^Input

CICIDS ⇒6 LSTM (128) ⇒Convolution1D (32,3) ⇒Embedding (20) ⇒Dense (12)

Input: (30,1) ⇒ 2 Dense (128) ⇒ Convolution1D (1,3) ⇒ LSTM (20) ⇒ Dense (8)

InSDN ⇒ Dense (20) ⇒ 2Dense (20) ⇒ Dense (1) ⇒ Dense (1)

Input: (30,1) ⇒ Dense (1)

Numerous hyper-parameters were utilized in LD-BiHGA for the regulation of the architecture learning process. Some of them are shown in Table 4. While training a neural network, the number of epochs determined the number of times the entire training set was displayed to the neural network. In order to aid neural networks in learning, loss functions quantified how well the model performed over training datasets, while regularisation terms were employed to avoid overfitting. Learning rate was used to control the speed at which the neural network learn from the estimated loss of the training dataset. Dropout was used for random removal of a predetermined number of neurons from a layer in order to avoid the overfitting issue, while batch size refers to the number of training samples used in one iteration.

Table 4

Hyper-parameter search space for each methodology in the LD-BiHGA system

Hyperparameter	Methodology
	Denoising Autoencoder	Hybrid GAN	Attention	FCN
Number of epochs	10	1000	5	20
Loss function	Mean squared
	error	Binary cross-entropy	Binary cross-entropy	Binary cross-entropy
L2 Regularization (λ)	10e-5	-	-	-
Learning rate	-	0.0001	-	-
Batch size	100	20	1024	10
Dropout	0.5	0.4	-	-

Validation of each dataset was done using the k-fold cross-validation technique with 30 % of the training set as its validation set for four splits.

5.3 Evaluation metrics

Since this approach attempts to save processing time, the incoming network flow was just classified as benign or malicious rather than examining the nature of the attack. The corresponding domain controller was alerted when the multi-domain controller predicted the flow as harmful. The domain controller was then able to stop the malicious flow. Otherwise, the packets can be routed to the appropriate destination. The primary goal of this evaluation is to show how the LD-BiHGA model may improve its overall performance by increasing its detection rate and accuracy [56]. For this binary classifier, accuracy has been represented by the proportion of correctly classified records, with the overall harmonic mean of precision and recall as the F1 score. Mathematically, precision is defined as the ratio of correctly identified intruders by the model to all predicted intruders. The ratio of correctly identified intruders to all actual intruders was calculated by recall or detection rate. A confusion matrix is a metric table used for the assessment of the classification model outcomes. The Area Under the Curve (AUC) is a reliable indicator of the overall performance of the binary classifier. It predicts the capability of the model to distinguish between benign and malicious classes.

6 Result analysis

In this section, an evaluation of the detection performance of the LD-BiHGA system was done on three benchmark datasets: KDDCUP 99, CICIDS, and InSDN. It began with an ablation study in a balanced scenario, as detailed in Section 6.1. Subsequently, in Section 6.2, the system’s performance on imbalanced datasets was assessed. Finally, in Section 6.3, the performance of the proposed LD-BiHGA system with state-of-the-art approaches from the literature was compared.

6.1 Ablation study on balanced dataset scenario

An ablation study on the LD-BiHGA system was conducted to assess the efficiency of each module and gain comprehensive insights. The results of this study, conducted using the KDDCUP dataset, have been documented in Sections 6.1.1 through 6.1.3.

6.1.1 Model performance

A number of model comparisons were conducted in order to prove the superiority of the hybrid scheme to other hybrid deep learning strategies. The evaluation of the proposed methodology was done by looking at i) how the dimensionality reduction technique contributed to the proposed procedure for anomaly detection accuracy, ii) the supplementary details are extracted through a hybrid GAN approach, and iii) the significant features are identified by using a self-attention procedure. Three baseline architecture configurations of the proposed system were examined as milestones as follows:

BiHGA component: the ensemble dimensionality reduction module was removed from the LD-BiHGA system and retained with the hybrid GAN and attention module.

LD-BiA (Low Dimensional Bi-channel Attention) component: the hybrid GAN module was removed and retained with the remaining two modules.

LD-BiHG (Low Dimensional Bi-channel Hybrid GAN) component: the attention module of LD-BiHGA was removed.

The findings of a detailed investigation of LD-BiHGA’s performance on the KDDCUP dataset are shown in Table 5. A comparison of LD-BiHGA with their components led to the conclusion that (i) even though the dimensionality reduction module was removed, and all features were retained, it became evident that predominant features were picked with the help of hybrid GAN and attention mechanism (ii) the effectiveness of the hybrid GAN became apparent as it extensively captured essential features (iii) the exclusion of the self-attention module resulted in a decrease in the recall score. This was due to the module’s ability to extract a significant number of intruders by selecting optimal features. Consequently, removing the hybrid GAN module had a more pronounced negative impact on performance, underscoring the significant role of hybrid GAN in this work.

Table 5
Comprehensive performance of LD-BiHGA in ablation study on KDDCUP dataset

Architecture Accuracy (%) F1 (%) Pre (%) DR (%) AUC-ROC

i) LD-BiHGA (Proposed) 99.9675 99.9675 1.0 99.9350 0.999675

ii) BiHGA component 99.7262 99.7254 99.4722 99.9799 0.997261

iii) LD-BiA component 96.3886 96.3379 96.9346 95.9151 0.963890

iv) LD-BiHG component 99.6149 99.6150 99.7021 99.5284 0.996154

Using a consistent baseline parameter configuration, which included the number of neurons in each layer, activation functions, and loss functions, LD-BiHGA was evaluated based on various performance metrics, including F1 score, accuracy, precision, recall, and AUC-ROC (Receiver Operating Characteristic). As demonstrated in Table 5, LD-BiHGA outperformed the other baseline models. This highlights its capacity to effectively aggregate multi-channel input, utilize convolutions, leverage long-short term memory, and harness a denoising autoencoder to achieve accuracy in anomaly detection. LD-BiHGA exhibited its best performance when utilizing convolutions and long-short term memory on the enriched data obtained from the denoising autoencoder. Additionally, a dual-channel approach was employed to pass inputs and extract their specific features.

6.1.2 Computational complexity

The investigation of the ablation study, as detailed in Table 6, provides comprehensive information on various aspects, including the total number of training parameters (measured in millions) and the running time (measured in seconds) for both the proposed LD-BiHGA and other modules within the system. This table sheds light on the computational efficiency and time requirements associated with each module’s performance.

Table 6
Computational complexity of LD-BiHGA in ablation study on KDDCUP dataset

Architecture Trainable Parameters (millions) Validation time (seconds)

i) LD-BiHGA (Proposed) 5.4898 2307

ii) BiHGA component 7.9227 6705

iii) LD-BiA component 1.1297 1517

iv) LD-BiHG component 4.3647 2207

The following inferences have been drawn from the ablation study (i) The removal of dimensionality reduction did not have a significant impact on performance; however, it resulted in a substantial increase in computational parameters and validation time due to the use of the entire feature set. (ii) Even though the hybrid GAN module exploited huge computational parameters and time in LD-BiHGA, it made a significant improvement in the performance of the model (iii) The attention module had an influence on the performance of the model by reducing the average number of parameters and execution time.

6.1.3 Confusion matrix

The confusion matrix was used to assess the performance of the portrayed model. It enumerated true and false predictions. Figure 7 shows the binary class confusion matrices for all of the proposed system’s baseline models.

Fig. 7

Confusion matrix of binary classification of KDDCUP in ablation study.

6.2 Imbalanced scenario

An analysis of the robustness of the LD-BiHGA architecture was made from the imbalanced data aspect, which was the expected scenario in real-time networks. For this analysis, the CICIDS 2017 dataset and the InSDN dataset were chosen, as they contain approximately 80% of one specific network flow and 20% of another. Despite the proportionality of data, LD-BiHGA maintained its robustness with the help of the GAN network, due to the ability of adversarial networks to generate a new set of training samples and train the model with them.

6.2.1 CICIDS 2017 dataset

The CICIDS dataset contained 80% of benign network flows and 20% of abnormal network flows. In this study, special attention was given to new set trials, where the entire set of normal network flows was combined with samples of attack flows at varying percentages: 5%, 25%, 50%, 75%, and 100% were considered during the cross-validation stage. Despite a decrease in recall and its corresponding F1-score with 5% and 25% of attack flows, the GAN helped the model to sustain its performance with more than 50% of attack flows, as depicted in Fig. 8.

Fig. 8

Performance of LD-BiHGA on CICIDS 2017 dataset on subsiding of attacks.

Since CICIDS is an imbalanced dataset, there is a need to examine some other impacts to provide a detailed view of the performance of the model. The confusion matrix on the CICIDS dataset illustrates the variations in the number of attacks and is displayed in Fig. 9.

Fig. 9

Confusion matrix of LD-BiHGA system on CICIDS dataset.

The performance of LD-BiHGA models was further investigated using ROC and precision-recall curves, as depicted in Fig. 10. Both summarise the performance of the binary classification model graphically. In this case, ROC portrayed the existence of a trade-off between a true positive rate and a false positive rate and precision-recall portrayed the trade-off between precision and recall at different thresholds. Although ROC curves are most appropriate for balanced datasets, a precision-recall curve was included in the performance study, as it was found to be more suitable for the imbalanced dataset in CICIDS, as shown in Fig. 10.

Fig. 10

Attack declination performance of LD-BiHGA on CICIDS2017 dataset.

6.2.2 InSDN dataset

This dataset [55] consists of 80% anomaly flows and 20% benign network flows, in contrast to the CICIDS dataset. The deployed SDN dataset exhibited improved performance in the fully-featured version rather than in the SDN-specific version. Hence, the proposed work was evaluated using the fully-featured version. Precision, recall, accuracy, and F1-score of LD-BiHGA were compared with standard and state-of-the-art deep learning models such as RNN, LSTM, GRU (Gated Recurrent Unit) [57], OC-SVM (One-Class SVM), and LSTM-AE-OC-SVM [58]) as depicted in Figure 11. However, while some of the metrics of other techniques were found to be slightly higher, the proposed system consistently outperformed them in all performance metrics. The binary classification performance of LD-HGA, as shown in Table 7 [59], demonstrates its efficiency.

Fig. 11

Comparative performance of LD-BiHGA on InSDN dataset.

Table 7

Comparative analysis of binary classification performance of LD-BiHGA on InSDN dataset

Model	Precision (%)		Recall (%)		F1-Score (%)		AUC-ROC
	Normal	Attack	Normal	Attack	Normal	Attack
LD-BiHGA (Proposed)	97.25	100	99.25	99.25	98	99	0.991
CNN Standard [59]	76.69	98.86	97.47	88.11	85.84	93.18	0.928
LSTM [59]	84.53	98.31	96.02	92.95	89.91	95.55	0.945
CNN (L2 Reg.) [59]	84.24	98.56	96.62	92.75	90	95.56	0.947
CNN-LSTM [59]	93.18	97.6	94.04	97.24	93.61	97.42	0.956

6.3 Competitor analysis

A comparison of the LD-BiHGA architecture was made against a number of competitors in the current state-of-the-art literature to conclude this assessment. Table 8 provides details of the evaluation of the proposed LD-BiHGA compared to existing architectures in terms of various performance metrics. The detection model used by the competitors has been listed in column 3. The performance of LD-BiHGA has surpassed the competitors for both the KDDCUP and InSDN datasets. On the CICIDS dataset, the proposed system has better accuracy, and other parameters are on par with competitors. Details in Table 8 helped ascertain the performance of LD-BiHGA with 7.225% better accuracy on a balanced dataset and 3.335% better accuracy on imbalanced datasets than its competitors.

Table 8
Performance comparison of LD-BiHGA with several competitors stated in current literature overbalanced and imbalanced datasets

Dataset Algorithm Description Accuracy F-measure Detection rate AUC-ROC

KDDCUP99 (1999) LD-BiHGA (Proposed) LSTM + 1D CNN + Attention 99.97 99.97 99.93 0.999

MINDFUL [14] Autoencoder + 1D CNN 92.49 95.13 - -

HYBRID CNN [21] CNN + DNN + Attention - 96.74 98.21 -

DNN 4 layers [60] DNN + Text representation methods 93.00 95.50 91.50 0.956

CICIDS (2017) LD-BiHGA (Proposed) LSTM + 1D CNN + Attention 96.55 91.10 90.65 0.943

MINDFUL [14] Autoencoder + 1D CNN 97.90 94.93 - -

ANID [34] Attention + Self-Attention - 95.28 94.40 -

AIDA [61] [14] Autoencoder + MLP 94.50 85.80 - -

DNN 4 layers [60] DNN + Text representation methods 93.60 90.10 97.60 0.991

InSDN (2020) LD-BiHGA (Proposed) LSTM + 1D CNN + Attention 99.24 99.49 99.55 0.991

RNN [57] RNN 98.09 98.77 99.66 0.963

LSTM [57] LSTM 98.87 99.27 99.70 0.979

GRU [57] GRU 98.21 98.84 99.76 0.964

OC-SVM [58] SVM 87.5 91 93 -

LSTM-AE-OC-SVM [58] LSTM+AE+SVM 90.5 93 93 0.906

Dataset	Algorithm	Description	Accuracy	F-measure	Detection rate	AUC-ROC
KDDCUP99 (1999)	LD-BiHGA (Proposed)	LSTM + 1D CNN + Attention	99.97	99.97	99.93	0.999
	MINDFUL [14]	Autoencoder + 1D CNN	92.49	95.13	-	-
	HYBRID CNN [21]	CNN + DNN + Attention	-	96.74	98.21	-
	DNN 4 layers [60]	DNN + Text representation methods	93.00	95.50	91.50	0.956
CICIDS (2017)	LD-BiHGA (Proposed)	LSTM + 1D CNN + Attention	96.55	91.10	90.65	0.943
	MINDFUL [14]	Autoencoder + 1D CNN	97.90	94.93	-	-
	ANID [34]	Attention + Self-Attention	-	95.28	94.40	-
	AIDA [61] [14]	Autoencoder + MLP	94.50	85.80	-	-
	DNN 4 layers [60]	DNN + Text representation methods	93.60	90.10	97.60	0.991
InSDN (2020)	LD-BiHGA (Proposed)	LSTM + 1D CNN + Attention	99.24	99.49	99.55	0.991
	RNN [57]	RNN	98.09	98.77	99.66	0.963
	LSTM [57]	LSTM	98.87	99.27	99.70	0.979
	GRU [57]	GRU	98.21	98.84	99.76	0.964
	OC-SVM [58]	SVM	87.5	91	93	-
	LSTM-AE-OC-SVM [58]	LSTM+AE+SVM	90.5	93	93	0.906

7 Conclusion and future direction

This work introduces LD-BiHGA, a network intrusion detection system designed to extract distinct features from benign and abnormal network flows separately and then fuse them using a fully connected network. LD-BiHGA aids the SDN controller in detecting network intrusions. While this research primarily relies on supervised learning, it also leverages unsupervised and semi-supervised learning to execute three key modules: ensemble dimensionality reduction, bi-channel feature extraction, and feature attention, resulting in impressive performance. Ultimately, these modules are combined with the fully connected neural network module to enhance its accuracy.

The evaluation of the proposed learning methodology’s performance was conducted using three benchmark datasets, each containing diverse network flows collected under various scenarios and over time. The experimental results demonstrated the proposed architecture was effective at detecting anomalies. LD-BiHGA surpassed its performance under an imbalanced flow strategy with the help of an adversarial network, GAN.

Future work should focus on addressing significant limitations, including network structure optimization, automatic hyper-parameter tuning, and multi-class attack classification. Utilizing bio-inspired optimization algorithms can automate the process of hyper-parameter tuning. Additionally, implementing multi-channel classification is essential for detecting various categories of network intrusions. Another avenue for future research involves fine-tuning the deep learning model based on the SDN-specific featured version of the dataset rather than the fully-featured dataset.

Declarations

Ethical Approval

This article does not contain any information from studies or experimentation with the involvement of human or animal subjects.

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Authors’ contributions

Saranya Prabu: Conceptualization, Investigation, Writing – original draft, review, and editing. Jayashree Padmanabhan: Conceptualization, Investigation, Writing – review and editing.

Funding

This work is financially supported by Anna Centenary Research Fellowships (ACRF), Anna University, Chennai under grant number CFR/ACRF/19234391164/AR1.

Availability of Data and Code

The InSDN dataset is publicly available on https://aseados.ucd.ie/datasets/SDN/. The code associated with this research can be found within the GitHub repository located at https://github.com/Saranya-prabu/BiHGA.

References

Danda Rawat

and Swetha Reddy

, Software defined networking architecture, security and energy efficiency: A survey, IEEE Communications Surveys Tutorials 19(1) (2017), 325–346.

Franciscus Wibowo

X.A.

, Mark Gregory

, Ahmed

and Karina Gomez

, Multi-domain software defined networking: Research status and challenges, Journal of Network and Computer Applications 87 (2017), 32–45.

Lakshmanna

, Kaluri

, Gundluru

, Zamil Alzamil

, Rajput

D.S.

, Ahmad Khan

, Anul Haq

and Alhussen

, A review on deeplearning techniques for iot data, Electronics 11(10) (2022), 1604.

Liu

and Lang

, Machine learning and deep learning methods for intrusion detection systems: A survey, Applied Sciences 9(20), 2019.

Vera

, Vega

L.R.

and Piantanida

, Information flow in deep restricted boltzmann machines: An analysis of mutual information between inputs and outputs, Neurocomputing 507 (2022), 235–246.

Yang

, Xu

, Luo

and Chen

, Autoencoderbased representation learning and its application in intelligent fault diagnosis: A review, Measurement 189 (2022), 110460.

Brophy

, Wang

, She

and Ward

, Generative adversarial networks in time series: A systematic literature review, ACM Computing Surveys 55(10) (2023), 1–31.

Huang

, Wei

, Wang

, Yang

, Xu

, Wu

and Huang

, Well performance prediction based on long short-term memory (lstm) neural network, Journal of Petroleum Science and Engineering 208 (2022), 109686.

Kumar Tyagi

, Abraham

Recurrent Neural Networks: Concepts and Applications. CRC Press, 2022.

10.

Ma’arif

, Rahmaniar

, Fathurrahman

H.I.K.

, Kusuma Frisky

A.Z.

et al. Understanding of convolutional neural network (cnn): A review, International Journal of Robotics & Control Systems 2(4), 2022.

11.

, Zhang

and Ding

, Understanding and improving deep learning-based rolling bearing fault diagnosis with attention mechanism, Signal Processing 161 (2019), 136–154.

12.

Wang

, Yu

, Li

, Shen

and Yao

, Sr-hgn: Semantic- and relation-aware heterogeneous graph neural network, Expert Systems with Applications 224 (2023), 119982.

13.

Ramjee

, Gamal

A.E.

Efficient wrapper feature selection using autoencoder and model based elimination, 2020.

14.

Andresini

, Appice

, Mauro

N.D.

, Loglisci

and Malerba

, Multi-channel deep feature learning for intrusion detection, IEEE Access 8 (2020), 53346–53359.

15.

Wan

, Guo

, Zhang

, Guo

and Liu

, Outlier detection for monitoring data using stacked autoencoder, IEEE Access PP (2019), 1–1, 11.

16.

Yan

and Han

, Effective feature extraction via stacked sparse autoencoder to improve intrusion detection system, IEEE Access 6 (2018), 41238–41248.

17.

Prabu

, Padmanabhan

, Bala

Effective ensemble dimensionality reduction approach using denoising autoencoder for intrusion detection system. In Intelligent Sustainable Systems, (2022), pp. 273–285. Springer.

18.

Pan

, Sun

, Teng

, White

, Schmidt

, Staples

and Krause

, Detecting web attacks with end-to-end deep learning, Journal of Internet Services and Applications 10 (2019), 12.

19.

Xia

, Li

, Liu

, Xu

and Silva

C.W.

, Intelligent fault diagnosis approach with unsupervised feature learning by stacked denoising autoencoder, IET Science, Measurement & Technology 11(6) (2017), 687–695.

20.

ElSayed

M.S.

, Le-Khac

N-A.

, Albahar

M.A.

and Jurcut

, A novel hybrid model for intrusion detection systems in sdns based on cnn and a new regularization technique, Journal of Network and Computer Applications 191 (2021), 103160.

21.

Ding

, Li

, Wang

, Wen

, Guan

and Zhang

, Hybrid-cnn: An efficient scheme for abnormal flow detection in the sdn-based smart grid,January, Sec. And Commun. Netw. (2020), 2020.

22.

Marcos de Assis

V.O.

, Luiz Carvalho

, Joel Rodrigues

J.P.C.

, Lloret

, Mario Proenca

Jr , Near real-time security system applied to sdn environments in iot networks using convolutional neural network, Computers and Electrical Engineering 86 (2020), 106738.

23.

Azzouni

, Pujolle

Neutm: A neural network-based framework for traffic matrix prediction in sdn. In NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium, (2018), pp. 1–5.

24.

Matheus Novaes

, Luiz Carvalho

, Lloret

and Proenca

M.L.

, Long short-term memory and fuzzy logic for anomaly detection and mitigation in software-defined network environment, IEEE Access 8 (2020), 83765–83781.

25.

Imrana

, Xiang

, Ali

and Abdul-Rauf

, A bidirectional lstm deep learning approach for intrusion detection, Expert Systems with Applications 185 (2021), 115524.

26.

Dawoud

, Shahristani

and Raun

, Deep learning and software-defined networks: Towards secure iot architecture, Internet of Things 3-4 (2018), 82–89.

27.

Garg

, Kaur

, Kumar

and Rodrigues

J.J.P.C.

, Hybrid deep-learning-based anomaly detection scheme for suspicious flow detection in sdn: A social multimedia perspective, IEEE Transactions on Multimedia 21(3) (2019), 566–578.

28.

Aldwairi

, Perera

and Novotny

M.A.

, An evaluation of the performance of restricted boltzmann machines as a model for anomaly network intrusion detection, Computer Networks 144 (2018), 111–119.

29.

Chen

, Jiang

Efficient gan-based method for cyberintrusion detection, 2019.

30.

Lee

J.H.

and Park

K.H.

, Gan-based imbalanced data intrusion detection system, Personal and Ubiquitous Computing 25(1) (2021), 121–128.

31.

Zhu

, Ye

, Fu

, Liu

and Shen

, Electrocardiogram generation with a bidirectional lstm-cnn generative adversarial network, Scientific Reports 9(1) (2019), 1–11.

32.

Brown

, Tuor

, Hutchinson

, Nichols

Recurrent neural network attention mechanisms for interpretable system log anomaly detection. In Proceedings of the First Workshop on Machine Learning for Computing Systems, (2018), pp. 1–8.

33.

, Sun

, Zhu

, Wang

and Li

, Bat: Deep learning methods on network intrusion detection using nsl-kdd dataset, IEEE Access 8 (2020), 29575–29585.

34.

Tan

, Iacovazzi

, Man

N.-M.

A neural attention model for real-time network intrusion detection. In 2019 IEEE 44th Conference on Local Computer Networks (LCN), (2019), pp. 291–299.

35.

Nishtha

, Sood

Software defined network architectures. In 2014 International Conference on Parallel, Distributed and Grid Computing, (2014), pp. 451–456.

36.

Udilá

Encoding methods for categorical data:Acomparative analysis for linear models, decision trees, and support vector machines. In Encoding methods for categorical data, 2023.

37.

Singh

and Singh

, Feature wise normalization: An effective way of normalizing data, Pattern Recognition 122 (2022), 108307.

38.

Jayashree

, Laila

, Santhosh Kumar

, Udayavannan

Social network mining for predicting users’ credibility with optimal feature selection. In Jennifer S. Raj, Ram Palanisamy, Isidoros Perikos, and Yong Shi, editors, Intelligent Sustainable Systems, (2022), pp. 361–373, Singapore. Springer Singapore.

39.

, Zhang

, Wu

and Zhan

, Pearson correlation coefficient-based performance enhancement of broad learning system for stock price prediction, IEEE Transactions on Circuits and Systems II: Express Briefs 69(5) (2022), 2413–2417.

40.

Padmanabhan

, Jose Premkumar

J.M.

Advanced deep neural networks for pattern recognition: An experimental study. In International Conference on Soft Computing and Pattern Recognition, (2016), pp. 166–175. Springer.

41.

Bai

Relu-function and derived function review. In SHS Web of Conferences, volume 144, page 02006. EDP Sciences, 2022.

42.

Ren

, Zhang

, Yu

, Liu

Balanced mse for imbalanced visual regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2022), pp. 7926–7935.

43.

AlEroud

, Karabatis

Sdn-gan: Generative adversarial deep nns for synthesizing cyber attacks on software defined networks. In Christophe Debruyne, Herve Panetto, Wided Guedria, Peter Bollen, Ioana Ciuciu, George Karabatis, and Robert Meersman, editors, On the Move to Meaningful Internet Systems: OTM 2019 Workshops, pp. 211–220, Cham, 2020. Springer International Publishing.

44.

, Chen

, Shi

, Jin

, Goh

, Ng

S-K.

Madgan: Multivariate anomaly detection for time series data with generative adversarial networks, 2019.

45.

Hasan

, Adnan

, Giannetsos

, Malik.

Orchestrating sdn control plane towards enhanced iot security. In 2020 6th IEEE Conference on Network Softwarization (NetSoft), (2020), pp. 457–464.

46.

Bhatia

, Dave

, Bhayani

, Tanwar

and Nayyar

, Sdn-based real-time urban traffic analysis in vanet environment, Computer Communications 149 (2020), 162–175.

47.

Zhen

, Niu

, Wang

, Shi

, Ji

and Xu

, Photovoltaic power forecasting based on ga improved bi-lstm in microgrid without meteorological information, Energy 231 (2021), 120908.

48.

Kulshrestha

, Krishnaswamy

and Sharma

, Bayesian bilstm approach for tourism demand forecasting, Annals of Tourism Research 83 (2020), 102925.

49.

Mao

, Su

, Tan

P.S.

, Kang Chow

, Wang

Y.-H.

Is discriminator a good feature extractor?, 2020.

50.

Vaswani

, Shazeer

, Parmar

, Uszkoreit

, Jones

, Gomez

A.N.

, Kaiser

, Polosukhin

Attention is all you need, 2017.

51.

Hwang

R.-H.

, Peng

M.-C.

, Huang

C.-W.

, Lin

P.-C.

and Nguyen

V.-L.

, An unsupervised deep learning model for early network traffic anomaly detection, IEEE Access 8 (2020), 30387–30399.

52.

Zhu

, Tang

, Shen

, Du

and Guizani

, Privacy-preserving ddos attack detection using cross-domain traffic in software defined networks, IEEE Journal on Selected Areas in Communications 36(3) (2018), 628–643.

53.

Choudhary

and Kesswani

, Analysis of kddcup99, nsl-kdd and unsw-nb15 datasets using deep learning in iot, Procedia Computer Science 167 (2020). 1561–1573. International Conference on Computational Intelligence and Data Science

54.

Panigrahi

and Borah

, A detailed analysis of cicidsdataset for designing intrusion detection systems, International Journal of Engineering and Technology 7(3.24) (2018), 479–482.

55.

Elsayed

, Le-Khac

N.-A.

and Jurcut

, Insdn: A novel sdn intrusion dataset, IEEE Access 09, 2020.

56.

Farahnakian

, Heikkonen

A deep autoencoder based approach for intrusion detection system. In 2018 20th International Conference 1283 on Advanced Communication Technology (ICACT), (2018), pp. 178–183.

57.

Alshraa

A.S.

, Farhat

and Seitz

, Deep learning algorithms for detecting denial of service attacks in software-defined networks, Procedia Computer Science 191 (2021), 254–263 The 18th International Conference on Mobile Systems and Pervasive Computing (MobiSPC), The 16th International Conference on Future Networks and Communications (FNC), The 11th International Conference on Sustainable Energy Information Technology

58.

Elsayed

M.S.

, Le-Khac

N.-A.

, Dev

, Jurcut

A.D.

Network anomaly detection using lstm based autoencoder. In Proceedings of the 16th ACMSymposium on QoS and Security for Wireless and Mobile Networks, Q2SWinet ’20, pp. 3745, New York, NY, USA, 2020. Association for Computing Machinery.

59.

Abdallah

, Le Khac

N.A.

, Jahromi

, Jurcut

A.D.

A hybrid cnn-lstm based approach for anomaly detection systems in sdns. In The 16th International Conference on Availability, Reliability and Security, (2021), pp. 1–7.

60.

Vinayakumar

, Alazab

, Soman

K.P.

, Poornachandran

, Al-Nemrat

and Venkatraman

, Deep learning approach for intelligent intrusion detection system, IEEE Access 7 (2019), 41525–41550.

61.

Andresini

, Appice

, Mauro

N.D.

, Loglisci

, Malerba

Exploiting the autoencoder residual error for intrusion detection. In 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS PW), (2019), pp. 281–290.

Category	DL method	Author	Features	Dataset	Learning methodology	Year
Dimensionality Reduction	Autoencoder	[14]	Statistical	KDDCUP99, UNSW-NB15, CICIDS2017	Unsupervised,
					Supervised	2020
		[13]	Statistical	MNIST, Reuters, Wisconsin Breast Cancer, RadioML2016.10b	Unsupervised,
					Supervised	2020
	Stacked Autoencoder	[15]	Statistical	ADIAC, Self-collected	Unsupervised	2019
		[16]	Statistical	KDD99, NSL-KDD, Kyoto2006	Unsupervised,
					Supervised	2018
	Denoising Autoencoder	[17]	Statistical	KDD99	Unsupervised	2022
		[19]	Statistical	Motor bearing vibration signals	Unsupervised	2017
		[18]	Statistical	Self-collected	Unsupervised/Semi-
					Supervised	2019
Intrusion Detection	CNN	[21]	Statistical	UNSW_NB15, KDDCup 99	Supervised	2020
		[22]	Statistical + payload	CICDDoS 2019, Self-collected	Supervised	2020
	LSTM	[25]	Statistical	NSL-KDD	Supervised	2019
		[24]	Statistical + payload	CICDDoS 2019, Self-collected	Semi-supervised	2020
	RBM	[27]	Statistical	KDDCup 99, CMU, Self-collected	Unsupervised,
					Supervised	2018
		[28]	Statistical	ISCX	Unsupervised,
					Supervised	2018
	GAN	[29]	Statistical	KDDCUP99	Unsupervised	2019
		[30]	Statistical	CICIDS2017	Unsupervised	2020
	Attention	[33]	Statistical	NSL-KDD	Supervised	2020
		[34]	Statistical	CICIDS2017	Supervised	2020

Bi-channel		Discriminator
KDDCUP99	Generator	X ^Input	Attention	FCN
Input: (20,1)	X ^Input	⇒ Dense (64)	X ^Input	X ^Input
CICIDS	⇒6 LSTM (128)	⇒Convolution1D (32,3)	⇒Embedding (20)	⇒Dense (12)
Input: (30,1)	⇒ 2 Dense (128)	⇒ Convolution1D (1,3)	⇒ LSTM (20)	⇒ Dense (8)
InSDN	⇒ Dense (20)	⇒ 2Dense (20)	⇒ Dense (1)	⇒ Dense (1)
Input: (30,1)		⇒ Dense (1)

Architecture	Accuracy (%)	F1 (%)	Pre (%)	DR (%)	AUC-ROC
i) LD-BiHGA (Proposed)	99.9675	99.9675	1.0	99.9350	0.999675
ii) BiHGA component	99.7262	99.7254	99.4722	99.9799	0.997261
iii) LD-BiA component	96.3886	96.3379	96.9346	95.9151	0.963890
iv) LD-BiHG component	99.6149	99.6150	99.7021	99.5284	0.996154

Architecture	Trainable Parameters (millions)	Validation time (seconds)
i) LD-BiHGA (Proposed)	5.4898	2307
ii) BiHGA component	7.9227	6705
iii) LD-BiA component	1.1297	1517
iv) LD-BiHG component	4.3647	2207

Bi-channel hybrid GAN attention based anomaly detection system for multi-domain SDN environment

Abstract

Keywords

1 Introduction

2.1 Deep learning methods for feature learning

2.2 Deep learning methods for intrusion detection

3.1 System model

3.2.1 Unpredictability of SDN domain data size

3.2.2 Imbalanced SDN domain data

3.3 Working methodology

4.1.1 Feature preprocessing

4.1.2 Feature selection

4.2.2 BiHGA algorithm

5.1 Dataset description

Table 2 Dataset description Dataset Attributes Total Benign (%) Attacks (%) KDDCup99 41 1,048,576 57 43 CICIDS2017 77 2,827,876 80 20 InSDN 80 343,939 20 80

6 Result analysis

6.1 Ablation study on balanced dataset scenario

6.1.1 Model performance

Table 6 Computational complexity of LD-BiHGA in ablation study on KDDCUP dataset Architecture Trainable Parameters (millions) Validation time (seconds) i) LD-BiHGA (Proposed) 5.4898 2307 ii) BiHGA component 7.9227 6705 iii) LD-BiA component 1.1297 1517 iv) LD-BiHG component 4.3647 2207

6.2.1 CICIDS 2017 dataset

Declarations

Ethical Approval

Competing interests

Authors’ contributions

Funding

Availability of Data and Code

References

Table 2
Dataset description

Dataset Attributes Total Benign (%) Attacks (%)

KDDCup99 41 1,048,576 57 43

CICIDS2017 77 2,827,876 80 20

InSDN 80 343,939 20 80

Table 6
Computational complexity of LD-BiHGA in ablation study on KDDCUP dataset

Architecture Trainable Parameters (millions) Validation time (seconds)

i) LD-BiHGA (Proposed) 5.4898 2307

ii) BiHGA component 7.9227 6705

iii) LD-BiA component 1.1297 1517

iv) LD-BiHG component 4.3647 2207