Abstract
Awareness of Network Security Situation (abbreviated as NSS for short) technology is in a period of vigorous development recently. NSS technology means network security situational awareness technology. It refers to the technology of collecting, processing, and analyzing various real-time information in the network to understand and evaluate the current network security status. It can not only find network security threats, but also reflect the NSS in the system security metrics, and provide users with targeted security protection measures. Based on data mining methods, this paper analyzed and models perceived threats and security events with data mining algorithms, and improved information security measurement methods based on association analysis. This paper proposed network security information analysis and NSS based on data mining, and analyzed the experimental results of network awareness of NSS information security measurement. The experimental results showed that when the Timer was 8, the accuracy of the awareness of NSS information security measurement method based on data mining can reach 92.89%. The data mining model had the highest accuracy of 93.14% in situation understanding and evaluation of KDDCup-99 dataset. The results showed that the model can accurately predict the NSS. When Timer was 6, the highest accuracy of the model was 92.71%. In general, the NSS prediction mining model based on KDDCup-99 can better understand, evaluate and predict the situation.
Introduction
With the rapid development of information technology, the traditional awareness of NSS model can no longer meet the needs of current enterprise security, and to a certain extent, it also puts forward higher requirements for the ability to identify security risks. Traditional NSS models are often based on pre-determined assumptions or rules and lack data-driven analytical methods, which can make it difficult to effectively identify new security risks and vulnerabilities and provide accurate security situational awareness. Therefore, more advanced NSS models are needed to enhance network security defense capabilities. More security risks can be effectively found through data mining model [1, 2]. In the future, for a long time, awareness of NSS would be more widely used, which is also one of the directions that need to be constantly explored in the future security situational awareness construction. Of course, there are also some challenges. First, the traditional situation awareness assessment mode can only judge the number and degree of security risks in the current security state, which is not comprehensive, accurate and effective enough, and can not timely promote the security risk management to a new level. This is also one of the key issues to be considered in the construction of future security situational awareness. Secondly, as the information security system continues to improve and new technology achievements continue to emerge and are widely used in various security fields of enterprises, it would provide a rich and effective way for security situational awareness to achieve security risk management. This is also one of the indispensable contents for the development of awareness of NSS system in the future.
According to the existing research progress, different researchers have also conducted corresponding cooperative research on NSS. Zhang Hongbin proposed a awareness of NSS algorithm based on random game theory in the cloud computing environment. The value of NSS was based on the utility of both sides of the game [3]. At present, most of the methods are used to store data in different ways, which leads to low efficiency of data query and analysis. To solve this problem, Tao Xiaoling designed a layered and multi domain awareness of NSS data storage scheme [4]. Kou Guang analyzed the current NSS assessment methods, and found that these methods could not correctly reflect large-scale, collaborative, multi-stage network attacks. Therefore, he made an in-depth analysis of the correlation between attack intent and network configuration information [5]. In order to better assess network security risks, Yi Bo proposed a network security risk assessment model based on fuzzy theory based on fuzzy theory, particle swarm optimization and radial basis function neural network [6]. However, these scholars lack some technical argumentation on the exploration of NSS.
Some scholars also have some research on data mining technology. In view of the complexity and universality of the current network security early warning data, Zhou Ying proposed a new method of network security data mining using cloud computing technology to solve the problem of failing to effectively understand the NSS [7]. Zhao Wenwen first briefly described the network security evaluation and data mining technology, then analyzed the advantages and disadvantages of using data mining technology for network security evaluation, and looked forward to the future network protection work [8]. However, these scholars did not discuss the information security measurement method of awareness of NSS based on data mining, but only unilaterally discussed its significance.
This paper drew the following conclusions by studying and analyzing the experimental results of awareness of NSS information security measurement. The awareness of NSS information security measurement method based on data mining can evaluate the impact of different attack methods on network security, so as to objectively and fairly evaluate the security of the system. Comprehensiveness: this paper considered various security data, analyzed and evaluated multiple indicators, which can effectively identify threats and vulnerabilities in the network.
Network security situation awareness method based on data mining
Realization of awareness of NSS
Network security situational awareness is a process of using data mining technology to comprehensively analyze the occurrence, development, scope of influence, probability of occurrence and other relevant information of network security events [9, 10]. Situation awareness provides users with situation analysis reports through real-time monitoring, situation monitoring and situation assessment of changes in various elements, and data collection and statistical analysis methods that are interconnected with existing monitoring equipment [11, 12]. Figure 1 shows the NSS prediction process. Cloud computing technology can be used to mine and process network data to discover potential threats, vulnerabilities and events. Through the analysis and statistics of various threats, the characteristics of network security events are summarized and sorted, and machine learning models are used for classification and prediction. The data set can be divided into a training set and a test set according to a certain proportion, and the training set data can be used for machine learning model training. Corresponding algorithms and models (such as support vector machines, decision trees, neural networks, etc.) can be selected for training and optimization, so that the model can better learn and understand the characteristics and rules of security events. Finally, quantitative sorting is carried out to achieve a comprehensive awareness of the security situation [13, 14]. Compared with traditional information security measurement methods, situation awareness has significant advantages. It can make the security measurement more accurate, comprehensive and intuitive to grasp the development trend of security events.

Network security situation prediction.
Data source: this paper analyzes the various data needed to collect network information, including log data, traffic data, device data, etc., which can be obtained from network devices, security tools, operating systems, etc. Network information analysis mainly includes the source, content, type and result presentation of the analysis data. At present, network information analysis is mainly carried out from the following aspects: network information model analysis, data feature analysis, feature point analysis, etc. Figure 2 shows the analysis of network security information. Network information model analysis mainly analyzes information security data through data model to detect and control security events [15, 16]. Among them, data feature analysis mainly uses data models to generate indicators, define data features and establish mathematical models to extract features from data and analyze the relationship between features, so as to determine the meaning of indicators and the expression form of relationships. The goals of network information security mainly include judging whether the website has vulnerabilities by analyzing the user’s behavior; through security resource analysis, it is determined that website security functions and security requirements need to be focused; determining malicious behaviors exist through abnormal behavior analysis; locating information security vulnerabilities and hidden dangers through the data visualization function; finding hidden dangers of website information security according to user behavior model; judging whether the website security resources are sufficient according to the security resource analysis; analyzing whether there are security vulnerabilities according to security capabilities [17, 18].

Network security information analysis.
Network attack is a planned attack, whose goal is to complete a series of related operations. Network attackers may use various means to steal sensitive information in the system, such as user accounts, passwords, enterprise confidential data, etc. Disrupting system stability: network attackers may exploit vulnerabilities or other security vulnerabilities to disrupt the stability of the system, such as tampering with data, paralyzing the system, etc. A complete attack cannot be simply reflected by an alarm event, that is, the alarm triggered by the entire attack is not separate, but has some logical connection. In the current network environment, the number of data is increasing, and the means of attack is becoming more and more complex, so how to obtain the behavior and threat information of network attacks from a large number of data is particularly important. According to the purpose of data mining, data mining can be divided into two types: one is descriptive data mining, which is mainly used to mine the characteristics of the objects to be mined. Descriptive data mining, also known as data exploration, is a process of discovering hidden patterns and structures in data through visualization and statistical analysis. The second is predictive data mining, which refers to association analysis between data and prediction based on the relationship between data. On this basis, this paper adopted the association rules of data mining technology, and conducts an in-depth study on them. The efficiency of data mining can be effectively improved by mining a large number of multi-stage attack sequences.
Association analysis refers to the use of strong interaction between various nodes in the network to discover the interaction between a node and generate a new association. The application of correlation analysis is very extensive, mainly including the following areas: marketing: correlation analysis can predict or recommend other related products by identifying purchase patterns and preferences in sales data. Social networks: association analysis can discover connections between users or recommend content or information that may be of interest [19, 20]. The method on network security status is called association analysis [21, 22]. In association analysis, it is first necessary to analyze the association degree of each dataset on the node. According to the degree of association, nodes can be divided into following categories: data input, data output, and node association intersection. Relevant definitions of association rules are as follows:
Let set O contain m different elements
C is the premise of the association rule, and U is the result of the association rule.
Generally, association rules can be described in four aspects:
(1) Support
Support means that if transaction data F contains m transactions Y, if there are transactions, that is, both itemset C and itemset U, then the number of transactions of this type would be counted. The percentage between it and m is called support, recorded as
(2) Confidence
Confidence refers to the percentage of the number of transactions that assume that transaction database F contains m transactions, including both itemset C and itemset U, and the number of transactions that contain itemset C,
(3) Expected confidence
The expected confidence level refers to the percentage of transactions including itemset U and m, which is recorded as
(4) Lift
The lifting degree is the ratio of confidence degree and expected confidence degree, which is recorded as
Support reflects the number of times the association rules are used in the transaction database. The higher the table support, the higher the consistency of the association rules.
In general, support and credibility are the two most important indicators to measure association rules. Support refers to the frequency at which an itemset appears in all transactions. In association analysis, support is used to measure the frequency of an itemset, in order to determine whether it is a key indicator of frequent itemsets. Support is generally expressed as a percentage or probability value, and generally speaking, itemsets with higher support have stronger correlation. The mining of association rules can be divided into two stages: finding all frequent item sets in the transaction database. By using frequent item sets to generate association rules, the process can be defined as: for each frequent item set C, if
If the confidence level of rule
Among them, mining frequent items is the precondition and key link of mining association rules.
Experimental environment and data set
The hardware configuration used in this experiment is 64 bit, Windows7 Professional, Intel i5-6500 CPU, 8 GB memory, NVIDIAGT730 display card, and 2 TB hard disk.
This paper chooses KDDCup-99 dataset as a typical wired network environment. KDDCup-99 data set collects the network status of the laboratory for 9 weeks, and it is the continuous data obtained from the simulation LAN, and contains a certain time series, which is consistent with the network model described in the article. KDDCup-99 is a data network based on wired network, which has important reference value for the research of wired network situational awareness. In this experiment, one tenth of five million samples are generally selected as the training set, including 494021 data, and 311029 test data constitute a test set.
Network situation understanding and assessment
(1) The influence of Timer value on the understanding and evaluation of network situation in KDDCup-99 dataset
Timer can also reduce the complexity of the program, it can be used to control some actions of the single chip computer, so that the number of instructions and the complexity of the program in the program can be reduced. The introduction of timer can pave the way for the experiments. It is the control variable of the experiment, and it is convenient to observe the data mining model’s view of the scene understanding and evaluation accuracy of the KDDCup-99 data set.
Because the traditional network security state awareness method does not fully consider the temporal relationship between the original network data, this paper introduces the time factor Timer. Timer is a network security state awareness method based on time series, which emphasizes the time factor and adds a time dimension to traditional network security state awareness methods. The Timer value can be used to adjust and optimize the input channel of the neural network to determine the network structure. The Timer value can be used to adjust the fusion degree, so as to control the number of samples in a single loaded network. The data obtained through Timer fusion is network data with time series relationship. By loading it into the network model, it can learn the original data. It can also learn the fusion data, so that the network can better learn more abstract characteristics, thus improving the accuracy of situation awareness [23, 24].
This experiment tests the role of Timer value in situation understanding and evaluation accuracy, and obtains the best Timer value through experiments to make the model have the best effect. The experiment can use grid search or cross validation methods to obtain the optimal Timer value. The grid search method refers to conducting exhaustive searches on different Timer values within a certain range and comparing model performance to determine the optimal parameters. On this basis, the proposed data mining model can be used to measure the security of network awareness of NSS information, and compare it with traditional measurement methods to draw the following conclusions, as shown in Fig. 3. Figure 3(a) is the evaluation time and Fig. 3(b) is the accuracy.

Comparison between Timer value and model evaluation time and accuracy.
The conclusion drawn from the above eight experiments is that with the increase of Timer value, the estimated time of the model would increase, and the correctness of the model would generally increase first and then decline. This is mainly because when the Timer increases, the more network data contained in the feature map of the input network, the more the neural network knows about the network information, and the better the effect would be. When Timer is 8, the highest accuracy of this method is 92.89%. Due to the growth of Timer, the network contains a lot of network information, which leads to excessive fitting in model training.
(2) Impact of Different Algorithms on Situation Understanding and Assessment under KDDCup-99 Dataset
The experiment aims to verify that the data mining model proposed in this paper has obvious advantages compared with other situation understanding and evaluation methods. This experiment adopts random forest algorithm (RF), Adaboost algorithm, NSS understanding and evaluation algorithm based on improved short –and long-term memory network (LSTM), and data mining model proposed in this paper. These algorithms are mainly used for abnormal behavior detection and prediction in network security state awareness tasks. Random forest algorithm is a classifier based on decision tree. It can improve the accuracy of classification by building multiple decision trees on the original data and integrating these decision results. The comparative experiment of situation understanding and evaluation can be conducted on the KDDCup-99 dataset, with the accuracy rate of situation understanding and evaluation as the evaluation index. The experimental results are shown in Fig. 4.

Situation understanding and evaluation of different algorithms in KDDCup-99 dataset.
The above four methods are used to understand and evaluate the situation of KDDCup-99 dataset, and the accuracy of data mining model is 93.14%. It shows that the data mining model proposed in this paper has good prediction accuracy in NSS prediction.
(1) Impact of Timer Value in KDDCup-99 Data Set on Situation Prediction
In order to test the influence of Timer value on the accuracy of situation prediction, this paper also conducts experiments on it and obtains the optimal time value. In this paper, the method of data mining is used to adjust the value of Timer, and a comparative experiment is carried out. In this paper, the prediction time of the model is recorded, and the accuracy rate is taken as the evaluation index. As shown in Fig. 5, Fig. 5(a) is the evaluation time and Fig. 5(b) is the accuracy rate.

Comparison of prediction time and accuracy between Timer value and model.
It can be seen from the above figures that when the Timer value increases, the prediction time of the model would increase, and the accuracy of the model would generally increase first and then decrease. The prediction time of the model can be evaluated by recording the time required for the model to process data. This is because in the initial stage, when the Timer value increases, the network characteristic curve contains a large amount of network data. The neural network can obtain more information from the network, so that the model has a better evaluation effect. When Timer is 6, the accuracy of the model is the highest, 92.71%.
(2) Comparison of Situation Prediction Effects of Different Algorithms in KDDCup-99 Environment
Compared with other situation prediction methods, the NSS prediction mining model proposed in this paper has greater advantages. On the KDDCup-99 data set, the KDDCup-99 situation prediction comparison test was conducted using the random forest algorithm, Adaboost algorithm, improved LSTM algorithm and data mining model, and the correctness of network situation prediction was taken as the evaluation index, as shown in Fig. 6.

Situation prediction of several different algorithms on KDDCup-99 dataset.
The above four methods are used to predict the situation of KDDCup-99 dataset. The NSS prediction mining model proposed in this paper has good prediction accuracy, with an accuracy rate of 92.89%.
KDDCup-99 has five types of attacks, which limit the network status value to 0–4. On this basis, it can use the attacks and damages on the network to quantify, and give specific quantitative indicators, as shown in Table 1.
Comparison of quantitative indicators of KDDCup-99
Comparison of quantitative indicators of KDDCup-99
Through the situation indicator comparison table, the network situation and evaluation data and their actual values are quantified and represented, and the results are shown in Fig. 7.

Comparison of situation understanding and evaluation generation curves on KDDCup-99.
Based on the situation indicator comparison table, the NSS prediction data and its actual value are quantified and represented, and the results are shown in Fig. 8.

Comparison of situation prediction generation curves on KDDCup-99.
In Figs. 7 and 8, the black line is the real marked value of the data, while the yellow line is the state judgment and evaluation prediction value. When presenting the situation, it should first draw the actual value, and then draw the output of the model. In this way, the area where the model output value covers the real value represents the situation understanding and evaluation of the model and the accuracy of prediction.
In Figs. 7 and 8, it is clear that there are many black and yellow overlapping areas and many horizontal lines with a tag value of 2, which indicates that a large number of Dos attacks are being correctly estimated and predicted. The situation understanding, evaluation and prediction results with a tag value of 4 are not satisfactory. This is because the accuracy of understanding, prediction and evaluation of R2 L attacks is very low. Because the number of such attacks is too small to provide a better model for such attacks. In general, the situation understanding, evaluation and prediction results of the dataset in the data mining model are good. This shows that the NSS prediction mining model proposed in this paper can better understand, evaluate and predict the situation on the KDDCup-99 dataset. NSS identifies significant events in the network through the collection, processing, and analysis of network data streams, and generates information about network status and vulnerabilities to support network security decisions. NSS technology has broad application prospects in the field of network security.
This paper proposed a awareness of NSS information security measurement method based on data mining, which can effectively evaluate the impact of different attacks on network security, and objectively and fairly evaluate the security of the system. It compared the performance of multiple network intrusion detection algorithms in the experiment, verifying the effectiveness and superiority of the method proposed in this paper. The method is to analyze the causes and possible consequences of attacks from multiple perspectives through data analysis and mining. According to different attack behavior characteristics, it provides valuable security measurement decisions and corresponding measures for users. This method can be used to quantify a large number of attacks that cannot be measured qualitatively. Therefore, it can provide a relatively reliable evaluation method for the occurrence of similar security risks and system security events. In the evaluation process, firstly, from the perspective of NSS, based on the threat mining technology, the threat data obtained from the analysis are classified and processed, and the risk measurement is conducted to a certain extent according to the classification results. Secondly, according to the information characteristics of network attacks, it could classify the behaviors that need to be focused on, and measure the security of each type of threat. Finally, from the perspective of attack motivation, it is classified as general attack behavior and non general stage of attack (it is mainly implemented under a relatively complex attack behavior state). Different evaluation methods can be used according to different situations. Finally, according to the comprehensive evaluation results, it would provide users with network security policy recommendations, so as to provide users with comprehensive, effective and targeted preventive measures and policy recommendations. Because the network security situation information and measurement based on data mining has a strong adaptability, the network threat will occur very quickly and complex. However, in the network threat analysis based on mining algorithm, the experimental part of this paper did not analyze the difference between network threats according to different attack methods and situations. In the future, this paper will learn real-time data according to different attack methods and carry out real-time adaptive optimization.
