Abstract
Although Internet of Things (IoT) technologies and services are being developed rapidly worldwide, concerns of potential security threats such as privacy violation, information leak, and hacking are increasing as more various sensors are connected to the Internet. There is a need for the study of introducing risk management and existing security management standard (e.g., ISO27001) to ensure the stability and reliability of IoT services. K-ISMS is a representative certification system that evaluates the security management level of the enterprise in Korea and is possible to apply as a standardized process to enhance the security management of IoT services. However, there are growing concerns about the quality deterioration of the K-ISMS certification assessment these days because of internet security incidents occurring frequently in K-ISMS certified enterprises. Therefore, various researches are required to improve the accuracy and objectivity of the certification assessment. Since existing studies mainly focus on simple statistical analysis of the K-ISMS assessment results, analysis on the cause of certification assessment fault based on past data analysis is insufficient. As a method of managing the certification inspection quality, in this paper, we analyze the association among the fault items of the K-ISMS certification assessment results using association rule mining which involves identifying an association rule among items in the database.
1. Introduction
According to a survey by Gartner, things connected to the Internet are expected to grow to 26 billion, and market size is forecast to reach USD 1 trillion by 2020 [1]. IoT is widely applied in various areas closely related to daily lives such as smart home appliance, smart car, and healthcare. Since multiple networks can be controlled even with a sensor, hacking of an area can be fatal since it can cause security threats in a chain reaction. With the IoT service broadly utilizing the sensor information to provide a wide range of information, the risk of information leak will increase. Malfunction or suspension of IoT devices will also pose very serious threat even to social infrastructure that the economic damage is predicted to reach KRW 17.7 trillion by 2020 [1]. Therefore, there is a need to consider the technical and administrative vulnerabilities of each element, such as sensor, wired and wireless network, and platform of IoT, from the design stage and to review and study the application of the existing standard as the key tool for continuously inspecting them even at the operating stage.
Korean government operates the information security management system “K-ISMS” certification system to assess if an organization has established and managed appropriate information security environment. Therefore, the K-ISMS, which is similar to “ISO27001” as the international standard for information security management systems, is designed to improve the information security management level of enterprises and protect them from various security threats [2]. K-ISMS evaluates whether an enterprise has set up a comprehensive information security management system including administrative, technical, and physical protective measures to protect the safety of their information assets, using the 104 certification criteria; it then issues the certification if the enterprise meets the requirements. K-ISMS was introduced in Korea in 2002, and 332 enterprises have acquired the certification. Since the acquisition of the certification by an enterprise over a certain size became mandatory in 2013, demand for and interest in K-ISMS certification has been gradually increasing [3].
With Internet security incidents (e.g., hacking) occurring frequently in K-ISMS certified enterprises, however, there are growing concerns about the quality deterioration of K-ISMS certification assessment these days. Since the information security management systems of enterprises are different, and fault cases vary, specialization and extensive experiences in certification assessment are required. Moreover, there are limits in maintaining objective and accurate assessment quality, since the 104 criteria should be evaluated within a short period of time. To solve these problems, there were various studies including the case study of faults in K-ISMS [4], the economic effect analysis of K-ISMS certification [5], and the analysis about process of security management for various IT services [6, 7]. However, a study using data mining between faults was not performed yet.
Thus, we studied to provide a guide to extracting preferred assessment items during the limited assessment period by analyzing the fault pattern that occurs frequently and association through the K-ISMS fault data. For this purpose, we apply the data mining analysis technique in order to analyze the association relationship among the fault items of the K-ISMS certification criteria. The paper is organized as follows. Section 2 introduces association rule mining the most known and used unsupervised algorithms. In Sections 3 and 4, the experiments are performed on K-ISMS fault data. The conclusion is given in Section 5.
2. Theoretical Background
Data mining is a knowledge-finding process that extracts unknown useful information by analyzing a large quantity of accumulated data. Among the research studies identifying the hidden pattern in the data, the association rule finding area was studied most widely in many areas of market forecast, medical and IT engineering research [8, 9]. The association rule analysis refers to a technique that finds a useful pattern, which is expressed as a “condition-result” formula among data items. The list of association rules extractable from a given data set is compared in order to evaluate their importance level. The measures commonly used to assess the strength of an association rule are the indexes of support, confidence, and interest [10].
The problem of finding association rules
Association rule finding consists of the process of identifying an association rule that has the threshold value of predefined support and confidence. This process broadly includes two processes [15, 16]. One is “frequent itemset finding,” which finds the itemset that satisfies the support threshold value “minimum support” only as a technique of finding the items that occur concurrently in the transaction. The other is “association rule generation,” which adopts the rule satisfying the confidence threshold value “minimum confidence” only among association rules created from the found frequent itemsets.
2.1. Frequent Itemset Finding
While finding the frequent itemset, a combination of the itemset that can be created from the given item is created, and the transaction data is searched for the individual itemset that has been created in this manner to check whether the minimum support can be satisfied.
When a set of frequent items in the transaction database is
If transaction
When the user defines minimum support (minsup) and
2.2. Association Rule Generation
Rule generation is a process of creating an association rule from the frequent itemset found during the “frequent itemset finding” process. Suppose X and Y are a set of items that do not contain the same element:
Support and confidence are used as a statistical criterion to verify the validity of the association rule. “Support” is a ratio of the transactions that satisfy items X and Y among all transactions and is expressed as
“Confidence” is a criterion for implying the strength of the rule. If the rule
Finding an association rule in the data item involves finding an itemset that has higher support and confidence than the user-defined minimum support and minimum confidence. For example, let us assume a situation where in the following association rule candidates are identified from the {bread, egg, milk} itemset [18]. At this time, if the minimum confidence is 70%, the second and third association rules will be selected. In this way, possible combinations of all itemsets are created, and some of them are selected as association rule depending on whether the minimum confidence is satisfied or not.
(Rule 1) bread ⇒ egg, milk confidence = 0.3/0.5 = 60%.
(Rule 2) bread, egg ⇒ milk confidence = 0.3/0.3 = 100%.
(Rule 3) bread, milk ⇒ egg confidence = 0.3/0.4 = 75%.
A strong association rule can be filtered out using the support and confidence criteria. However, there are weakness of support and confidence. Support suffers from the “rare item problem” [20]: infrequent items not meeting minimum support are ignored which is problematic if rare items are important [21]. On the other hand, if the minimum support is low, the find area becomes larger. In addition, high minimum confidence and minimum support do not necessarily mean strong association, and they can occur accidently. Therefore, there is a need for statistical correlation analysis, such as lift and conviction, to solve these problems and find a strong association rule [21]. Interest (or lift) is another statistic which attempts to correct this weakness [22].
Confidence tends to rate rules highly where the consequent
The
The interest measure [13] is defined over If If If
3. Association Rule Analysis of the K-ISMS Fault
3.1. Analysis Data
In this paper, we analyzed the fault data of 76 enterprises that received certification assessment in 2013 (uses only the statistical results) and applied the representative “Apriori algorithm” [17–19] for association rule mining. The average fault rate of those enterprises was found to be 15%. (The explanation about terms of K-ISMS control items used in this paper is described in appendix.)
Among the 104 K-ISMS certification assessment items, the frequent itemset whose value was higher than the minimum support was created. A total of 825 rules were found using the brute-force method. When a number of fault items included in these rules were analyzed, 58 one-itemsets, 494 two-itemsets, 312 three-itemsets, and 20 four-itemsets were created. Figure 1 illustrates the K-ISMS fault items that occur frequently, listed in order of CL-1-1 (asset identification), AC-3-3 (access control), OS-2-2 (security system operation), AC-2-3 (access right review), and CC-1-1 (encryption policy establishment).

Frequency items of faults in the data set of K-ISMS.
If a number of items increase when generating a frequently occurring itemset candidate, the computation workload increases exponentially. To solve this problem, the “pruning” method [24] is used to make the unnecessary part concise. “Pruning” involves getting rid of the combination that does not satisfy the threshold criterion in each phase. The most universal pruning method is MSP (minimum support pruning). In other words, if support of the itemset combination is smaller than the threshold value, the item is no longer added. To remove the duplicated association rule, 307 association rule candidate groups are created through support-based pruning.
3.2. Association Rule Analysis
In the stage of support analysis, a 10% support of a certain rule

(a) A distribution graph of support, (b) a distribution graph of confidence.
Support is designed to measure how frequently those two faults occur among all transactions, whereas confidence implies that the possibility of fault “Y” occurrence is 30% if “X” has occurred when the confidence of a certain rule
Figure 3 shows a visualization of the correlation among fault items that were found by the measure criterion (support, confidence, and lift) of association analysis.

(a) This graph focuses on how the rules are composed of individual items and shows which rules share items. (b) Scatter plot with three measures (e.g., support and confidence and lift).
Table 1 shows the top 5 association rules sorted by the measure of support value. Number 44 of rule can be analyzed as follows. The support level is 34.2%, which is the ratio of finding a fault in the CL-1-1 and OS-2-2 control items at the same time. There is 59% probability that a fault occurs in the OS-2-2 control item if a fault occurs in the CL-1-1 control item. In addition, since the lift is over 1, which shows a correlation between the two control items, the correlation of the association rule is strong. On the other hand, Number 443 of rule is the association rule with low correlation because the lift is under 1, even though the support and confidence of this rule are 30.2% and 52.2%, respectively.
Top 5 association rules sorted by the measure of support.
Table 2 shows the top 5 association rules sorted by the measure of confidence.
Top 5 association rules sorted by the measure of confidence.
Number 40 of rule can be analyzed as follows. The ratio that a fault occurs in control items CC-2-1, DR-2-1, and AC-2-3 at the same time “support” is low (10.5%). Note, however, that the ratio of a fault occurring in control item AC-2-3—when fault occurs in control items AC2-2 and OS-2-2 “confidence”—is 100%. In addition, since the lift of this rule is over 1, the correlation of the association rule seems strong rule.
4. Results and Discussion
We have performed the process of figuring out the association rule that has the predefined support and confidence threshold value, in order to carry out relation analysis among K-ISMS faults. The first process is “frequent itemset finding,” which finds the itemset that satisfies the support threshold value only. The other process is “association rule generation” that adopts the rule satisfying the confidence threshold value only among association rules, which were created from the found frequent itemsets. Table 3 shows the summary of strong association rules within the range of the minimum support and minimum confidence.
Summary of the association rules found by interesting measure (support, confidence, and lift).
However, all strong association rules are not always useful. The support-confidence framework can induce a rule minimum support > 30% and minimum confidence > 50% and lift > 1, minimum support > 20% and minimum confidence > 30% and lift > 1, minimum support > 10% and minimum confidence > 80% and lift > 1.
The result of associated relationship among K-ISMS faults (interest >1).
From the result of analysis, we can forecast that the association of fault occurrence among control items is high, as the rule {CL-1-1 (Information asset identification) ⇒ OS-2-2 (Security system operation)} is the one that has over 30% support level and 50% confidence level. The “information asset identification” is a control item that should classify and identify all information assets of the organization. A fault occurs, if those information assets are not identified periodically, or some assets are skipped. In other words, if a fault is found in “information asset identification,” there is a high possibility of fault occurrence in “security system operation.” The “security system operation” fault refers to the case that the security system operating procedure is not complied, or the blocking rule management log of the security system (e.g., firewall) is not available or lost.
There were two rules that have over 20% support level and 70% confidence level. The first rule was {PH-1-4 (Access control of physical area), ⇒ AC-3-3 (User password management)} and the second rule was {AC-4-6 (access control of internet connection) ⇒ CL-1-1 (Information asset identification)}. The first rule—“Access control of physical” control item, refers to the requirement that only the authorized person should be allowed to access the major systems inside the security area, and the access log should be reviewed periodically. A fault occurs, if access control of the outsiders is not sufficient, or the mobile device (e.g., USB) can be brought to it. In addition, the fault of the “user password management” control item occurs when the password of major systems is not changed periodically, or the password use requirements are not met. Therefore, if the enterprise does not perform proper “access control” on major facilities and systems, there is a possibility that the “user password management” of major systems (e.g., server, network) can also be insufficient.
5. Conclusions
In this study, we used the association mining applied with the “Apriori” algorithm in order to analyze the correlation among K-ISMS faults and could find 43 association rules. The result of this study suggests having a high correlation among faults as if the organization identifies and manages their information asset carelessly, then it can also affect security system operation.
Therefore, the result of those association rule may be referred to as the useful information for decision-making of organization's security activities and can be can be utilized as a guide to the assessment method during K-ISMS certification assessment. If any fault occurs among K-ISMS certification criteria, those items related to the association rule can be checked intensively. Also, it can be a guidance of analyzing the level of the Plan-Do-Check-Act activity (organization's security management phase) from the perspective of the correlation among faults.
However, finding a useful rule can be different according to the size of the data set because the adoption of the useful association rule depends on the occurrence frequency of the analysis data. Therefore, we need various studies of K-ISMS fault analysis such as association in accordance with the scope of organization's certification (a number of employees and system). Based on the association rule results obtained in this study, decision-making tree analysis to forecast the fault status, and fault factor analysis using the structural equation model will be studied as a subsequent study.
Because the paradigm of IT environment is changing from conventional PC and mobile to IoT, new approach is needed in terms of range of protection targets, characteristics of protection targets, and protection subjects to strengthen the security level of IoT continuously. In other words, the protection target should be expanded from the existing PC and mobile devices to all objects such as home appliances, automobiles, and medical supplies. It is also necessary to break away from the conventional method of protection with separate security system and software implementation and interface to establish the security policy, procedure, and standard to control and manage efficiently the administrative security, physical security, and technical security.
Moreover, there is a need for the application of information security management system suitable for the IoT environment to maintain the continuous security level of IoT services and prevent the spread of risk of intrusion incidents, including the identification of key assets to be protected and threats as well as assessment of the current security level to establish policies for coping with threats.
Footnotes
Appendix
See Table 5.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
This research was supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (NIPA-2014-H0301-14-1004) supervised by the NIPA (National IT Industry Promotion Agency).
