Abstract
The use of Semantic Web in cybersecurity systems is becoming more and more popular. This is an important problem, especially in times when Internet of Things (IoT) systems are developing very quickly and and their security must be maintained. Thanks to the semantic web, it is possible to store and process cybersecurity knowledge using ontology. We describe a system for analyzing the level of cybersecurity among Polish citizens, in particular IoT users. An ontology-based knowledge representation related to the security level was created for the described system. The ontology contains the information necessary to determine the security level in different locations and to conduct deeper analysis. It has been prepared for the needs of the IoT system for storing data and knowledge. The described Semantic Web application is part of a larger project that allows to determine cyber security and cyber threats of IoT devices.
Introduction
Along with the development of the Internet of Things (IoT), the field of cybersecurity of devices located in the IoT is also developing very rapidly. Devices connected to a common network can see each other and unauthorized persons can access these devices (Lu & Da Xu, 2018). This may lead to a security breach or leakage of sensitive data. IoT can bring many benefits, such as accessibility, integrity, availability, scalability, confidentiality, and interoperability. Nowadays, more and more such systems are used in various applications. However, we must be aware of the safety of IoT devices.
Semantic Web (SW) is more and more often used in various fields of science, including systems related to cybersecurity systems (Adam, 2020; Gyrard et al., 2014; Kirrane et al., 2018; Merah & Kenaza, 2021; Onwubiko, 2018; Syed et al., 2016; Takahashi & Kadobayashi, 2015). Moreover, SW are also widely used in IoT systems. In the literature, you can find applications, for example, in health care, agriculture or smart homes (Chatzimichail et al., 2021; Costa Lima et al., 2023; Dawood & Sah, 2021; Loukil et al., 2018; Rhayem et al., 2020). Due to the fact that the subject of cybersecurity is a very important issue, in particular when considering IoT systems, we present the possibility of using SW in this subject. This is a new approach where SW has been designed and created to support the process of determining the level of cybersecurity. The main goal is to determine the level of cybersecurity in a given location based on aggregated data collected during network scanning. During scanning, we receive information about IoT devices connected to the network and some information about their properties, technologies or features used. This is possible thanks to the collection of data from many devices and scans carried out by users. The assumption is that the user will carry out network scans in order to ensure his safety in this network. This data will be further transmitted and analyzed using the SW. The involvement of many millions of users is part of Mobile Crowdsourcing (MCS). One of the application of MCS is crowdsensing. Crowdsensing, consisting in collecting data from many mobile devices of users and analyzing the surrounding real world based on this data (Ray et al., 2023; Wang et al., 2017). Moreover, human–machine interaction becomes a large challenge in crowded environments (Klin et al., 2023). The use of systems that allow intelligent processing of data in such a crowded environment is an important aspect.
SW provides the possibility of creating complex schemas, which can be used to design relationships between objects in the real world. These relationships can be of different types: Hierarchical, causal, functional, temporal, and others. However, the flexibility of this tool can lead to difficulties in searching for knowledge organized in this way (Murtaza & Ahmed, 2020). To address the problems related to efficiency, clustering mechanisms (such as K-NN or K-means) are sometimes used. A related method for reducing the complexity of the problem of searching for data in semantic web is data granulation, where a representative is designated for each data group, called a granule (Bryniarska, 2017, 2019b, 2021).
In this paper, we present an overview of the use of ontology-based architecture for IoT systems. Then we describe cybersecurity systems that use ontologies to store and process knowledge. We also describe the cybersecurity system which, among other things, allows you to determine the level of security of scanned devices. Part of this system is the SW which was created for the needs of this system. The created ontology is presented in detail and metrics of this SW were created. Next, the future application possibilities and the use of this ontology for particular tasks are described.
SW for IoT
In the IoT systems, data are collected from sensors of some devices connected to the network. IoT systems allow us to develop new services and applications to connect smart objects, integrate network technologies, devices, sensors, software and distinct infrastructures. The so-called SW of Things (SWoT) is created, where data obtained directly from sensors are stored in the SW and later they are processed by systems (Botonakis et al., 2020; Ruta & Scioscia, 2020; Salama et al., 2020).
Due to the relationship between SW and IoT technologies, a comprehensive review of this topic is needed in the literature. The review paper tries to fill this gap (Andročec et al., 2018). As an example of the versatile use of SW and IoT, there may be their implementations in intelligent dialogue systems (Sermet & Demir, 2021) or health care (Costa Lima et al., 2023; Malik & Malik, 2020).
SW and IoT are commonly used in cloud solutions – an overview of this topic can be found in the review articles (Taher et al., 2021) (review of Cloud-Driven SW solutions), Chatzimichail et al. (2021) and Rhayem et al. (2020) (Semantics for IoT security subchapter). Another review of solutions (Adam, 2020) describes models for detecting, selecting and connecting devices in the IoT network. In addition to review articles, several papers describing specific solutions for the use of SW and IoT in clouds can be distinguished in this group. An example may be the application in agriculture (nut cultivation in Turkey) (Aydin & Aydin, 2020) – the wireless sensor network (WSN) collects information from sensors saved in the form of SW. This information is then processed using SPARQL (a query language and protocol for RDF files). The use of SW to describe IoT devices has been described in Chatzimichail et al. (2021). There is a description of SW for the purpose of storing data on Web of Things and SWoT devices, and the topic of using SW in cybersecurity. A common problem in real-world systems where data is obtained from devices on the network is missing or lost data. The problem of recovering lost data is dealt with in Afzaal and Shoaib (2021). Another article Ruta and Scioscia (2020) analyzing the current approach to device description (SWoT) points to the ineffectiveness of the approach in which device descriptions are decentralized and introduces a proposal for a new approach, the information-centric networking.
Another topic present in the literature on the subject is the use of SW and IoT for security. In Kontopoulos et al. (2018), an ontology for managing the climate crisis and climate change was created. It is a proposal to use a data structure to store and organize the flood of data coming from people and sensors during crisis management. OntoMetrics was used to evaluate the SW, while queries to the network were made using SPARQL. Another paper Chatzimichail et al. (2019) describes the IoT infrastructure for security in public places, the DESMOS system. GraphDB stores an ontology that contains data such as GPS position and locator signal strength. Applications include: lost child, fainting due to illness, natural disasters. For devices such as automatic vehicles, ontologies are created in which information about the vehicle’s surroundings can be stored (Bagschik et al., 2018) or information can be exchanged in networks to which vehicles are connected (Bista et al., 2018). The issue of secure exchange of information between devices was addressed by the authors of the publication (Gyrard et al., 2014). This article presents a number of ontologies in different domains (Web, MANET, 2G/GSM, 3G/UMTS, 4G/LTE, Wi-Fi, Intrusion Detection System (IDS)). In this paper, attention was drawn to the fact that ontologies for security are rarely published and are not standardized. Another publication (Choi & Choi, 2019) describes an ontology-based reasoning system for devices in an electrical network. The developed system concerns the security of such an IoT system. The proposal of a secure IoT architecture, where data from sensors are saved in the SW, was presented in the article (Alam et al., 2011). The issue of information exchange of devices in smart homes is described in Tao et al. (2018). Devices from different manufacturers and operating in different standards require a unified standard. An ontology has been developed that stores data in a multi-layered manner. The issue of privacy and security in such networks was also developed. Another publication (Ekelhart et al., 2009) describes the AURUM tool for company risk management. An OWL-based knowledge base was used to store the data. The next publication describes the use of the SW in security systems (Välja et al., 2020). This paper presents the problem of automatic (using ontologies) analysis of threats to which the company may be exposed. As can be seen in the literature, there are many applications of SWs to IoT data. This proves that the SW has great potential in the case of data collected from IoT sensors.
SW in Cybersecurity
The SW can also be used to record, process and store data related to cybersecurity for the IoT (Mozzaquatro et al., 2015, 2018). Then SW stores data from the security field. IoT systems, due to heterogeneous connectivity, pose many challenges and possible threats. Application of SW to cybersecurity systems for IoT can bring many benefits. SW are based on an ontology in which certain classes are written, but also the relationships that arise between these classes. This enables reasoning based on the ontology.
There are some solutions related to cybersecurity problem in IoT systems based on the ontology (Merah & Kenaza, 2021). The paper Takahashi and Kadobayashi (2015) presents a practical example in which a reference ontology was created using the SW to describe information on operations carried out in the field of cybersecurity. This solution uses the CVE database offered by the MITER consortium. First, the ontology describes the users of the system and their roles in this system related to protection against cyber threats. The created ontology was described using Description Logic DL. For the prepared ontology, an application with an interface enabling the extraction of information from this web was also created. The solution was developed in cooperation with Security Operations Centre (SOC) units working in the USA, Japan and South Korea. A similar system is the CoCoa system described in Onwubiko (2018), which refers to the NIST cybersecurity framework and the Cyber SOC. In this ontology, the entire cybersecurity management process is modelled along with the people responsible for these processes. The ontology is used to analyze whether a cybersecurity incident has occurred, that is whether an attack has occurred. In Ficco (2013), Ficco et al. (2016) also the detection of cybersecurity incidents (IDS system) in cloud computing is considered. It is interesting to use correlations between certain conditions and symptoms to determine whether a device is under attack.
Another complete system related to IoT cybersecurity is the system described in Mozzaquatro et al. (2018). An ontology-based cybersecurity framework has been proposed that can be used in a company to describe procedures related to maintaining security. The main purpose of creating this framework is to detect intruders (IDS) trying to get into the resources of a given company. The paper presents the created ontology, the methodology of checking the ontology, and examples of reasoning based on this ontology. In Syed et al. (2016) the creation of a single unified ontology language for cybersecurity is presented, the so-called UCO unified cybersecurity ontology. They use RDF files and DL. The most important classes that are described in this ontology are: Means (attack type), Consequences (attack consequences), Attack (attack, attack threat), Attacker (attacker), AttackPattern (description of typical attacks), Exploit (known exploits), Exploit Target (exploit target). The CVE database was used to describe the vulnerability to cyber threats. Examples of queries using the SPARQL language are given.
The paper Syed (2020) presents the Cybersecurity Vulnerability Ontology and on this basis the Cyber Intelligence Alert (CIA) system was created. This system alerts you when there is a cyberattack. The solution uses the CVE database to describe the vulnerability to threats. The inference rules are presented using the SWRL language. An extensive evaluation and verification of the presented solution was carried out.
An interesting idea was also presented in the paper Pastuszuk et al. (2021), where a dynamic analysis of the IT system is carried out on the basis of the created ontology. The network administrator checks in real time using this ontology whether a given network is secure at a given moment based on the CVE database. The system informs you if there is a device at a given IP address that can be considered dangerous.
On the other hand, in the paper Välja et al. (2020), automatic modelling of threats based on ontology was dealt with. For example, in a given company we check what cyber threats it may be exposed to, but instead of manually determining these threats, it is done automatically based on the data stored in the ontology.
The use of SW in cybersecurity systems, especially for IoT systems, is absolutely justified. The advantages of SW is the recording of knowledge using ontology. Then the knowledge is arranged into classes and individuals that relate to each other.
The Cybersecurity System
The cybersecurity system for which the ontology saved with SW will be created is a large system for analyzing data on the security of Polish citizens. The system consists of a mobile application that can be installed on a mobile phone and an analysis module where anonymized data are transferred. These data are subject to further analysis and recording in SW. The system that is being created is much larger and contains many different elements, but in the paper we will focus and describe only the elements related to building ontology and its further use.
The use of SW in the cybersecurity system offers many benefits like improving knowledge representation, flexible structure of data network and possibility to infer new knowledge from existing data. Even if the system appears to be closed, SW technologies can help integrate data from multiple sources or support future scalability. There is the possibility that the system may eventually interact with open systems or data sources in future. The data and knowledge models remain flexible, reusable, and well-suited to potential expansions or integrations with other systems.
The schema of proposed IoT cybersecurity system is presented in Figure 1. A mobile application has been designed in the system that scans the environment. In the network where the phone with the application running is located, all IoT devices connected to it are scanned. The mobile application will have many users who, using it, will be able to scan the network in which their mobile device is located. Then the user will receive a scan report with relevant recommendations. This is the MCS application running on the MCS system.
The next step is sending data to the profiling subsystem (PS). PS uses cybersecurity expertise to determine certain measures of device security in various aspects. A database is created in which there are individual scans and devices found during these scans. Additionally, each device has a specific metric with safety factors. Cybersecurity experts set these metrics based on their knowledge. Another important data obtained in the MCS system by the mobile application is the location of the scan. This enables data to be grouped according to localization. It is also the module that defines the recommendations for the user.
Then data are anonymized and sent to the data analysis subsystem (DAS). During anonymization, the location of the scanned device is also generalized. In DAS there is a SW Module (SWM) which contains an ontology-based knowledge representation. The SWM module is described in detail in the following sections of this paper.
The end result of the system operation is the display of the results of the analysis of anonymized and grouped data in the Web Application. The described MCS system allows you to obtain a lot of data that can be grouped. We can do big data research obtained from scans. In the DAS there are also other data analysis modules that are not the subject of these studies and operate independently of SWM.
In Figure 2 a schema of the SWM is presented. First, the data is sent to the module in the form of a json file. Based on the json file, a SW describing this knowledge is created. The structure of the data and SW has been prepared and developed in advance because we know the structure of the input data. the SW schema with examples is described in the next ontology section. Data and appropriate relations between individuals (instances of given classes) in the SW are only added to the appropriate classes in the SW. In this way, we obtain an ontology describing IoT devices and indicators regarding their security level.

A schema of the Internet of Things (IoT) cybersecurity system.

A schema of the Semantic Web Module.
SW prepared in this way is passed on to information granulation. Based on selected cluster analysis algorithms such as
During the entire process of SWM operation, all data is saved and retrieved from the ontology in the SW. All calculated parameters are further transferred from the SW to the database and then to the results presentation module in the web application.
Although the current system operates as a closed environment, the use of SW technologies is justified by the need for a flexible, extensible and formally structured knowledge model that supports advanced reasoning about IoT cybersecurity. The ontology ensures consistent representation of device characteristics, security metrics and relations between devices, enabling automatic inference, detection of complex threat synergies and scalable analysis of heterogeneous data collected from multiple scans. Unlike fixed database schemas, the SW model can be easily expanded with new concepts or metrics as the system evolves, while maintaining full compatibility with standard tools for querying, validation and knowledge processing. This guarantees that the architecture remains future-proof, allowing integration with additional data sources or external cybersecurity resources when the system is extended beyond its current scope.
The project will apply in practice the theoretical study presented in the papers Bryniarska (2019a, 2020, 2022). These papers present a theoretical apparatus for SW, searching for knowledge in this web, also when this knowledge is uncertain or inaccurate.
In the literature, it can be noticed that for practical solutions, the SW is described using the OWL2 or RDF language, based on the structure of xml files. In order to write data to the Semantic Web from the database, the relational database can be mapped to OWL files (Xu et al., 2006) and otherwise. Often, data from a non-relational database such as Firebase or GraphDB are mapped to JSON format and then to rdf files. The same is the case here. To perform operations on ontology or reasoning in the SW, python or java programming languages are most often used. In this case, data is downloaded from the json file and saved to the ontology-based file based on the schema of this SW prepared earlier (Figure 3).

A schema of the Semantic Web.
The primary component of the web is the ScannedDevice, which is identified during a Scan conducted at a specific location defined by the MapLocation. During this Scan, information about various devices (ScannedDevice) connected to the network being analyzed can be obtained. Subsequently, multiple parameters and properties of the scanned devices are determined, such as type, model, manufacturer, firmware, as well as the type and version of the device’s operating system.
To enable more in-depth analysis of data collected from multiple scans, a process called data granulation is performed (Bryniarska, 2017, 2019a, 2021). This process facilitates the aggregation of the collected data, thereby enhancing its interpretability and enabling more detailed insights. For this purpose, the Granule class was created. Individual devices are assigned to specific granules based on an analysis of their properties. Subsequently, a GranuleRepresentative is established for each granule, representing the granule itself and allowing it to be processed instead of the full set of collected data. This approach introduces a novel concept of data granulation within the context of the SW.
Very often, for the visualization of owl or rdf files, the Protege programme is used, which allows you to display the created ontology using trees or graphs (Musen, 2015). The Protege programme was used to design and create a SW schema for the SWM module.
The SW schema includes following classes:
In addition to the associations of the listed classes with the class that represents scanned devices
In OWL or RDF support a set of axioms for stating assertions. Assertions are axioms about individuals that are often also called facts. Assertions are implemented by object properties. In presented SW the object properties are:
Data properties are also defined in SW. The instances of the
As part of data properties, we have created two properties
To evaluate the finished SW, we used the free OntoMetrics tool from the University of Rostock (Lantow, 2016). The base metrics of SW are presented in Table 1. Base metrics contains simple metrics that show the quantity of ontology elements.
Metrics Genereted for Semantic Web (SW) by the OntoMetrics Tool.
Schema metrics are provided in Table 2. They address the design of the ontology. Metrics in this category indicate the richness, width, depth, and inheritance of an ontology schema design. Moreover, graph metrics are presented in Table 3. Graph, also called structural, metrics calculate the structure of ontology.
Schema Metrics for Semantic Web (SW) by the OntoMetrics Tool.
Graph Metrics for Semantic Web (SW) by the OntoMetrics Tool.
The presented metrics indicate that the SW is well-designed and thoroughly prepared for effective data management and analysis. The high value of relationship richness reflects a strong emphasis on relationships between entities, which is crucial for precise and advanced reasoning. The complex structure of properties and axioms ensures a high level of data description detail, allowing for accurate knowledge representation and the handling of complex queries. Although some indicators, such as inheritance richness, might suggest potential for further development of the class hierarchy, the current structure already performs its role excellently. The high axiom-to-class ratio, along with well-defined properties and constraints, demonstrates that the SW is thoughtfully designed and optimized for its intended purpose. The metric results suggest that the SW is ready for effective use in data analysis and processing, maintaining a balanced approach between detail and performance.
The individuals assigned to the classes are not described in this paper because the SWM module is launched at certain time intervals. Then, the individuals are each time added to the SW schema for a given time range and the entire module calculates the appropriate security levels for new data. Thus individuals are not part of the SW schema. The processing of individuals will be the subject of further research.
This chapter presents plans to use the designed SW in the visualization of the level of cybersecurity of IoT devices in order to increase the efficiency and efficacy of activities in this area.
The actual data from the scanned devices lacks the values of many metrics needed for security analysis. For this reason, the first planned action on this raw data will be to supplement the missing data on the basis of devices of a similar nature. For this purpose, the
After filling in the missing metric values of IoT devices, you can determine aggregate information about cybersecurity in a given location and time. Due to the privacy policy, data that comes from mobile devices of individual users are subject to the process of anonymization (personal data) and quantization (location data) before they reach the DAS subsystem and the SW module. Therefore, the SW module does not have information about the exact location of devices, however, it can tell which devices are in the immediate vicinity based on whether they were scanned during one scan. Such information is valuable due to the likelihood of synergy of threats. Threat synergy occurs when two devices that are moderately vulnerable to hacking can be simultaneously used by a cybercriminal to perform a successful attack. An example of such synergy can be a device on which a password must be entered and another device with a camera in the immediate vicinity – the threat is then greater than the sum of threats for the same devices when they are not adjacent to each other. The synergy of threats can be estimated based on the knowledge of the types and models of devices as well as on the basis of the software (firmware, operating system) on it. Generally, the more data we know about devices, the more accurately we can estimate synergies between them. Data processed in this way will be grouped by location and will be available to persons responsible for counteracting cybercrime.
Another way of using data prepared in this way in the structure of the SW is the assessment of cybersecurity for specific devices. Some devices have published information about their vulnerability. However, new models of IoT devices with unknown vulnerabilities are created every day, and although information about them is supplemented cyclically, many months can pass from the moment the device is released on the market until the vulnerability is published. To estimate the probability of vulnerability of an unknown device, data classification algorithms can be used, where the learning data will be the metrics of devices for which the security level has been previously determined. The classification algorithm can be based on one of many tutored learning methods, for example Decision Trees, k-Nearest Neighbours, Naive Bayes, Support Vector Machines, Logistic Regression, Neural Networks, Random Forest, etc. The properly classified data can then be presented using a table or tree in which the types and models of devices will be distinguished and the estimated level of vulnerability to cyberattacks for them.
Conclusion
In times of the growth of IoT networks, the issue of cybersecurity takes on a different dimension and becomes a key issue. In the paper we described the cybersecurity system collecting data from IoT devices. In order to obtain data, network scans are carried out by many users on many mobile devices. For this MCS system the SW was created. Knowledge from this system was represented by ontology-based representation. The proposed ontology is part of a research project currently in progress. The presented ontology contains data necessary to determine the level of security of Polish citizens collected during network scans in the mobile application of the system. Later, the system can be extended worldwide.
The proposed ontology will be used further in the analysis system to determine the granules of knowledge. Then, on this basis, it will be possible to supplement the missing knowledge. A map of the security level will be prepared for the cybersecurity system, which will use the created ontology and will take into account the synergies between threats in different regions. Building an ontology is the first step to creating a system that will be able to assess the level of security, analyze the synergy taking place in different geographic regions and help supplement the knowledge in case of lack of it.
The presented solution is an innovative method of using SW and ontology-based knowledge representation for the MCS system collecting IoT data. The use of ontology introduces new possibilities in data processing. The advantages we get is using the ontology is the expressiveness of OWL which allows to specify logical classes. Ontology introduces a richer representation of knowledge and allow us to use the relationship between classes in calculations.
In conclusion, the adoption of SW technologies in a closed system, which was presented here, offers significant advantages that are more than data management needs. The use of standardized data models and ontologies enables continuous data integration, advanced querying, and improved data consistency, even when the system operates in isolation. These technologies provide a structured approach to information representation, ensuring that the system can efficiently handle complex searches and infer meaningful insights from existing data.This not only improves the system’s analytical capabilities but also reduces data redundancy and maintains a integrity of data.
Moreover, the flexibility and scalability offered by SW technologies play a significant role in future-proofing the system. Even though the current application is closed, the standardized frameworks of SW allow for potential integration with open systems or new data sources in the future. This adaptability ensures that the system can evolve with changing needs without significant reconfiguration. In summary, the decision to employ SW technologies in this environment not only optimizes present functionality but also sets a foundation for long-term growth and integration possibilities.
Footnotes
Acknowledgments
The project is co-financed by the National Center for Research and Development Poland under the CyberSecIdent programme ‘Cybersecurity and e-Identity’. Agreement: CYBERSECIDENT/489912/IV/NCBR/2021.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
