Abstract
In recent years, there has been an increased focus on early detection, prevention, and prediction of diseases. This, together with advances in sensor technology and the internet of things, has led to accelerated efforts in the development of personal health monitoring systems. This study analyses the state of the art in the use of semantic web technologies in sensor-based personal health monitoring systems. Using a systematic approach, a total of 48 systems are selected as representative of the current state of the art. We critically analyze the extent to which the selected systems address seven key challenges: interoperability, situation detection, situation prediction, decision support, context awareness, explainability, and uncertainty handling. We discuss the role and limitations of semantic web technologies in managing each challenge. We then conduct a quality assessment of the selected systems based on the data and devices used, system and components development, rigor of evaluation, and accessibility of research outputs. Finally, we propose a reference architecture to provide guidance for the design and development of new systems. This study provides a comprehensive mapping of the field, identifies inadequacies in the state of the art, and provides recommendations for future research.
Keywords
Introduction
Non-communicable diseases are on the rise globally, resulting not only in decreased quality of life but also increasing healthcare costs (Murphy et al., 2020). For this reason, there have been accelerated efforts to develop personal health monitoring systems for early detection, prediction, and prevention of diseases. The emerging paradigm of precision health goes beyond treating existing diseases, instead focusing on preventing disease before it strikes. Eschewing the one-size-fits-all approach in favor of assessing individual circumstances, precision health encourages people to actively monitor and work toward improving their health so as to lower the risk of disease (Gambhir et al., 2018). Personal health monitoring is part of this vision, allowing people to not only increase understanding of their health but also to receive recommendations for any necessary interventions. Significant advances in the internet of things (IoT) over the last decade have led to the rapid rise of wearable sensors, which are increasingly being used for health monitoring outside traditional clinical settings. Wearable sensors can collect and measure physiological data such as vital signs, which can be combined with health records and questionnaires to determine lifestyle habits and medical history. Beyond physiological data, ambient sensors can monitor environmental factors such as air quality and weather, which have a significant impact on health. Additionally, the widespread adoption of artificial intelligence (AI) in the health domain has led to personal health monitoring systems becoming increasingly AI-driven. Such systems use techniques such as knowledge representation and reasoning and machine learning (ML) to analyze health data and provide actionable insights.
There are several crucial issues affecting sensor-based personal health monitoring systems, which can be distilled into seven key challenges. The first of these is
Semantic web technologies (SWTs) have been used in numerous health applications ranging from health data integration (Hammad et al., 2020; Peng et al., 2020) to clinical decision support (Cui et al., 2025; Jing et al., 2023). While they have shown promise in alleviating the seven key challenges, the degree to which they address each challenge differs. Therefore, the goal of this study is to systematically map the state of the art in the use of SWTs in sensor-based personal health monitoring systems. We analyze the effectiveness of the systems in addressing the seven key challenges, identify the role and limitations of SWTs, and assess the overall quality of the systems. Accordingly, a systematic mapping study was selected as the most appropriate approach. Systematic mapping studies have become increasingly popular in software engineering (Khan et al., 2019). While systematic reviews aim at synthesizing evidence for specific research questions, mapping studies go further and provide a high-level view of the research landscape (Khan et al., 2019). By structuring a research area through classification and categorization, mapping studies seek to discover emerging research trends and identify potential gaps for further lines of inquiry (Budgen et al., 2008; Petersen et al., 2015).
The contributions of this study are as follows: We present a We We We undertake a Following an analysis of the current architectures, components, functionalities, and development tools, we propose a We highlight inadequacies in existing systems and outstanding issues in the field, thereby identifying potential
The remainder of this paper is structured as follows. Section 2 provides an overview of personal health monitoring using sensors and highlights how SWTs can enhance sensor-based health monitoring systems. Section 3 discusses related reviews and surveys, motivating the novelty and importance of this study. Section 4 details the methodology used to conduct the study, including the search strategy and the inclusion and exclusion criteria, culminating in a summary of the selected systems. Section 5 discusses the seven key challenges that such systems must address, and critically analyses the capacity of the systems to deal with these challenges, while Section 6 analyses the quality of each system. The architectures of the selected systems are discussed in Section 7, and a reference architecture is proposed. Section 8 summarizes the main findings of the study, discusses its limitations, and makes recommendations for future research directions. Finally, Section 9 concludes the study.
Background
Sensor-Based Personal Health Monitoring
Sensors used for health monitoring are typically worn, implanted, or placed in close proximity to the human body. When several such sensors are used at the same time, they form a wireless body sensor network, also known as a body area network (BAN; Gravina & Fortino, 2021). This is part of the IoT paradigm, in which sensor-based “things” connect and exchange data over a shared network such as the Internet. Two categories of physiological data can be collected from health monitoring sensors: vital signs and biological signals (biosignals). The primary vital signs are heart rate, blood pressure, respiratory rate, temperature, and blood oxygen saturation (Dias & Cunha, 2018). Biosignals are space- or time-based records produced from electrical, chemical, or mechanical activity within the body during a biological event such as a beating heart (Escabí, 2012). They include records of electrical activity in the body, such as electrocardiograms (ECGs) for the heart, electromyograms (EMGs) for the skeletal muscles, and electroencephalograms (EEGs) for the brain, as well as data from photoplethysmography (PPG), an optical sensing technology consisting of an LED and a photodetector to detect blood volume changes (Ferlini et al., 2022). In addition to physiological data, physical activity data, such as daily step count, can also be captured by sensors. These data provide important contextual information about an individual’s lifestyle, which can enhance health monitoring.
Health monitoring sensors are usually either wearable or implantable. Wearable sensors are worn on the body or are otherwise integrated with clothes and shoes. Such sensors include electrodes for measuring electrical signals, thermal sensors for measuring temperature, and PPG sensors. Smart watches and bands are the most commonly used wearable sensors, but earables (devices placed in the ear) have recently emerged as a promising alternative (Choudhury, 2021). In contrast, implantable sensors operate from within the human body. Although they are much less commonly used than wearable sensors, they are particularly useful for monitoring chronic illness as well as post-surgery monitoring to minimize complications and avoid readmission (Andreu-Perez et al., 2015). Health monitoring sensors also include portable devices that can measure physiological and activity data, but cannot be practically worn or used for prolonged periods of time. Examples of these include blood pressure monitors and pulse oximeters, as well as smartphones which contain sensors such as accelerometers, which measure acceleration, and gyroscopes, which measure orientation and angular velocity (Straczkiewicz et al., 2021).
Besides wearable and implantable sensors, there exist contact-free sensors that can monitor health-related factors. For example, the commercially available Emfit QS device 1 sensor uses ballistocardiography, a measure of ballistic forces generated by the heart, to measure heart rate variability during sleep when placed underneath one’s mattress. Additionally, device-free human sensing is also gaining traction in many domains, including health (Xiao et al., 2022). Examples include the use of radio frequency signal reflections and WiFi channel state information to passively monitor vital signs (A. Kumar et al., 2022) and detect falls (Tian et al., 2018) without requiring physical devices. Additionally, ambient sensors are increasingly being incorporated in health monitoring to monitor the state of the external environment, such as temperature, humidity, and air quality, as these factors have a significant impact on human health (Cusack et al., 2024; Dias & Cunha, 2018).
Sensors, while essential for personal health monitoring, also contribute significantly to the identified challenges. The heterogeneity of sensor devices, observation data, and measurement procedures can hinder interoperability in personal health monitoring systems (Compton et al., 2012). The dynamicity and complexity of sensor data require expert knowledge to interpret and analyze it. This affects both situation analysis and decision support. Furthermore, sensors can contribute to uncertainty. Data are uncertain when the degree of confidence about what is stated by the data is <100% (Khaleghi et al., 2013). This can arise when there is missing data or when all the relevant attributes cannot be measured by the available sensors (Gravina et al., 2017). Some of these challenges can be addressed by the incorporation of SWTs.
Semantic Web Technologies (SWTs)
The semantic web provides foundational mechanisms for knowledge representation and reasoning, thereby playing an important role in the design of AI-driven health monitoring systems. Three overlapping SWTs have emerged as the most prominent over the years: ontologies, knowledge graphs, and linked data (Hitzler, 2021). We will begin with an overview of key semantic web languages and standards, after which we will discuss each of the three technologies in turn, highlighting how they contribute to health monitoring.
Languages and Standards
The development of SWTs is facilitated using different languages and standards. Resource Description Framework (RDF), 2 a standard for the description and exchange of interconnected data in the form of subject–predicate–object triples, can be considered one of the core building blocks of the semantic web. As RDF is an abstract data model, it can be serialized in different formats, including N-Triples, Terse RDF Triple Language (Turtle), eXtensible Markup Language (XML), and JavaScript Object Notation for Linked Data (Schreiber & Raimond, 2014). Several extensions to RDF have been proposed. These include RDF Schema (RDFS), 3 which provides a vocabulary to enrich RDF data; RDF-star, 4 which allows an RDF triple to be embedded as the subject or object of another triple, without necessarily asserting the embedded triple, thereby enabling richer metadata annotation; and the Notation3 5 specification, which extends the representational abilities of RDF by supportive declarative programming and allowing the access of online knowledge. Other important standards in the semantic web community are: Web Ontology Language (OWL), 6 a language for constructing ontologies; Semantic Web Rule Language (SWRL), 7 a language for expressing rules and logic; Shapes Constraint Language (SHACL), 8 a language for describing RDF graphs, which also includes a rules language; and SPARQL Protocol and RDF Query Language (SPARQL), 9 a language for retrieving and manipulating RDF data. SPARQL-star extends SPARQL to allow querying and updating of RDF-star data, while SPARQL Inferencing Notation (SPIN) 10 is a rules language based on SPARQL.
Ontologies
Arguably, the key technology underpinning the semantic web is ontologies, which have been widely used for reasoning and representation in sensor-based systems (Ye et al., 2011). Their ability to represent knowledge formally and unambiguously not only enhances interoperability but is also useful in capturing the domain knowledge necessary for situation analysis and subsequent decision support. Several ontologies have been developed to support the description of sensors and their observations, which is critical in any sensor-based system. Two particularly prominent sensor ontologies are the semantic sensor network (SSN) ontology (Compton et al., 2012) and the Smart Appliances REFerence (SAREF) 11 ontology (García-Castro et al., 2023). Both are standardized ontologies developed by the World Wide Web Consortium (W3C) and the European Telecommunication Standardization Institute (ETSI), respectively, with the aim of enabling semantic interoperability. However, while SSN was developed for sensors and sensor-based systems in general, SAREF focuses on smart appliances and IoT devices. The latest version of SSN is based on the Sensor, Observation, Sample, and Actuator (SOSA) ontology (Janowicz et al., 2019), which provides it with a lightweight, user-friendly, and extendable core. SAREF has mappings to SSN, from which it borrows modeling patterns for several classes (García-Castro et al., 2023; Moreira et al., 2020).
As domain-agnostic ontologies, both SSN and SAREF require augmentation to meet application-specific requirements (Poveda-Villalon et al., 2018). SAREF provides a suite of ontologies that extend the core ontology for different domains, including two that are relevant for personal health monitoring: SAREF4EHAW 12 for eHealth and ageing well, and SAREF4WEAR 13 for wearable devices. SAREF4EHAW provides support for modeling concepts such as health system actors (including patients and caregivers) and health devices (including wearables), with the wearable concept linked to the SAREF4WEAR ontology. An additional extension, SAREF4Health, was developed to address the limitations of SSN and SAREF in representing real-time ECG time series data exchanged between mobile devices and cloud gateways (Moreira et al., 2020). In contrast to SSN, SAREF is targeted at industry developers rather than ontology experts (Moreira et al., 2020), making it more readily adoptable by those without extensive ontology development experience. Furthermore, its extensions for the health domain provide a solid foundation for building semantic personal health monitoring systems. Additional representational support can be obtained by integrating resources such as standardized clinical terminologies and medical knowledge bases.
Knowledge Graphs
A knowledge graph can generally be understood as a knowledge base of real-world data represented in a graph-based data model. Ontologies are a vital building block in many knowledge graphs, and are used to define their data schema (such as properties, restrictions, and relationships) as well as enable semantic reasoning and entailment (Hogan et al., 2022). Knowledge graphs have seen increasingly widespread use in the health domain. Their graph structure enables the conceptualization, representation, and integration of data (Hogan et al., 2022). This is advantageous in health monitoring systems, where the integration of various sources of health data is critical. An example of this is the Precision Medicine Knowledge Graph, which integrates diverse biomedical data from multiple sources with the goal of enabling precision medicine analyses (Chandak et al., 2023). Previous research has also explored the automatic construction of knowledge graphs from electronic health records (Chen et al., 2019; Rotmensch et al., 2017), which can then be used for clinical decision support. This is related to personal health knowledge graphs, which are used to represent and reason over individual health data, including data from sensors (Gyrard & Boudaoud, 2022) and electronic health records (Jiang et al., 2024; Tao et al., 2020). Additionally, knowledge graphs have been proposed for drug discovery (Zeng et al., 2022) and as a tool for explainability in AI-driven health monitoring systems (Lecue, 2020; Rajabi & Kafaie, 2022). Knowledge graphs have also proven useful in sensor-based systems, for example, by providing graph-based visualizations of the data generated by IoT devices, which can then be queried in real time (Le-Phuoc et al., 2016).
Linked Data
Both knowledge graphs and ontologies can be published using a linked data approach (Hitzler, 2021), whereby uniform resource identifiers are used to identify distinct resources (Bizer et al., 2011). When the emphasis is on free use, modification, and sharing, it is referred to as Linked Open Data (Hitzler, 2021). Linked data have been proposed for augmenting and representing sensor data in order to improve their accessibility and interoperability (L. Yu & Liu, 2015). In the health domain, it has been explored in applications ranging from drug discovery (Gray et al., 2014) to the representation of electronic health records (Pathak et al., 2013). Linked data can contribute to interoperability by ensuring heterogeneous health data are stored in a consistent format and structure. However, their use in health monitoring is not well explored in the literature.
Related Reviews
Several reviews related to sensors, SWTs, and the health domain have been published. These reviews can generally be categorized into three overlapping groups, which are illustrated as a Venn diagram in Figure 1. The reviews in Group 1 focus on the use of SWTs in the health domain; those in Group 2 review the use of sensors and IoT in the health domain; those in Group 3 review the use of SWTs with sensor and IoT data; and finally, Group 4 consists of other related reviews that do not fit neatly into any of the first three groups. The related reviews are discussed in detail in the remainder of this section and summarized in Table 1.

Venn diagram illustrating the three focus areas of this study as well as the different groups of related reviews.
Summary of Related Reviews and Their Focus Areas.
This group of reviews explores the use of SWTs in healthcare. Zenuni et al. (2015) review ontologies and semantic data repositories used in different aspects of the health domain, including hospital systems and health datasets. A similar review is conducted by Haque et al. (2022), who explore themes such as e-healthcare, disease diagnosis, and information management. Peng et al. (2020) and Hammad et al. (2020) focus on semantic approaches for health data integration and management, including data from wearable devices. Dimitrieski et al. (2016) review ontologies and ontology alignment approaches in healthcare, while Jing et al. (2023) focus on ontologies for rule management in clinical decision support systems. More recently, Amar et al. (2024) examine semantic interoperability issues in electronic health records based on the Fast Healthcare Interoperability Resources (FHIRs) standard, highlighting how RDF and OWL can improve interoperability. The review by Miranda et al. (2024) explores how SWTs can be used to enhance the interoperability and management of electronic health records in healthcare systems. Similarly, Wu et al. (2025) analyze recent work at the intersection of SWTs and electronic health records, with a particular focus on how such technologies can improve data quality. Finally, Cui et al. (2025) provide a comprehensive review of healthcare knowledge graphs, discussing their construction and use in a wide range of health applications, including clinical decision support and pharmaceutical research. Although the reviews in this group provide a good overview of the ways in which SWTs have been used in the health domain, five of them do not mention sensors at all, while the remaining five reviews do not include sensor data as a major focus.
Group 2: Sensors and IoT in the Health Domain
This group considers the use of sensors and IoT in the health domain. Islam et al. (2015) and Yin et al. (2016) conduct general surveys on IoT for healthcare, covering a broad range of considerations on the topic, including networks, communication standards and protocols, and cybersecurity. The review by Qi et al. (2017) focuses on the use of IoT in personalized healthcare systems, including sensor devices and data processing techniques. Philip et al. (2021) explore advances in the field such as cloud computing, while Albahri et al. (2018) focus on health monitoring systems for telemedicine applications, highlighting techniques that support the connection of hospital services to remote patients. There have also been reviews specifically focusing on the state of the art in wearable sensors for health monitoring, such as those by Babu et al. (2024), Cusack et al. (2024), Dias and Cunha (2018), and Majumder et al. (2017). J. Kim et al. (2019) hone in on biosensors that detect biofluids, such as sweat and tears, while Baig et al. (2017) highlight the potential of remote monitoring systems for clinical adoption. Punj and Kumar (2019), Banaee et al. (2013), and Andreu-Perez et al. (2015) explore advances in wearable sensor data collection, mining, and processing, and Dang et al. (2023) focus on statistical analysis and ML as modeling tools. Bollineni et al. (2025) adopt a forward-looking perspective in their review, highlighting both emerging technologies and future prospects for health-IoT architectures. While these reviews provide useful analyses on the role of sensors and IoT in health monitoring, 10 of them do not mention SWTs, while the remaining six do so briefly without an in-depth analysis of their role in health monitoring.
Group 3: SWTs for Sensors and IoT
This group reviews the intersection between SWTs and sensors without being limited to a particular domain. Honti and Abonyi (2019) and Rhayem et al. (2020) explore the use of ontologies in IoT-based systems in different domains. Bajaj et al. (2017) adopt a similar focus on ontologies, reviewing both general sensor ontologies as well as domain-specific ones for IoT. Compton et al. (2009) present a review of the semantic specification of sensors using ontologies, analyzing the range and expressive power of sensor ontologies. The review by Harlamova et al. (2017) explores the challenges in the use of SWTs in IoT, while Ye et al. (2015) review the application of SWTs in pervasive and sensor-driven systems. Although these reviews highlight the use of SWTs with sensors and IoT, they are not specific to the health domain.
Group 4: Other Reviews Related to AI and Technology in the Health Domain
A small number of reviews take a broader lens and consider different aspects of AI and technology in the health domain. This includes the concept of Healthcare 4.0, a term referring to the increasing digitization of the healthcare industry. The reviews by Tortorella et al. (2020) and Jayaraman et al. (2020) broadly cover Healthcare 4.0, and highlight health monitoring systems that use IoT and sensors. However, only the review by Jayaraman et al. (2020) mentions ontologies and other knowledge representation techniques. More recent reviews, such as those by Rahman et al. (2025) and Rashid and Nemati (2024), have begun addressing the transition toward Healthcare 5.0, which emphasizes explainability and human-centricity in health systems. While both of these reviews mention the role of sensors, IoT, and AI in healthcare, neither mentions the semantic web. Another review in this group is by P. Kumar et al. (2023), who discuss AI in healthcare. Although they mention IoT and knowledge graphs, neither of these is the focus of the review. Lastly, the review by Behera et al. (2019) focuses on techniques used to create healthcare systems modeled on human cognitive processes such as perception and thought. They highlight cognitive IoT as a future research direction through wearable sensors, while also mentioning SWTs for knowledge representation. However, neither the SWTs nor sensors are discussed in detail.
Summary
Table 1 summarizes the related reviews. The current study differs from existing work by focusing on the use of sensors and SWTs for personal health monitoring, with both sensor data and SWTs being primary points of focus. Additionally, the majority of the related reviews and surveys do not take a systems perspective, whereas this study highlights how the different system components are integrated and discusses the development methodologies and tools, evaluation approaches, and architectures of the included systems.
Methodology
Objectives and Reporting Strategy
In order to achieve our goal of mapping the state of the art in the use of SWTs in sensor-based personal health monitoring systems, the following are the objectives of this study: To systematically select systems that represent the state of the art in the use of SWTs in sensor-based personal health monitoring systems. To determine the extent to which the seven key challenges are addressed by the selected systems. To assess the role and limitations of SWTs in addressing these challenges. To conduct a comprehensive quality assessment of selected systems based on data and devices used, system and component development, evaluation rigor, and accessibility of research outputs. To propose a reference architecture that provides guidance for the design and development of new systems. To highlight inadequacies in existing systems and provide recommendations for future research.
The study was conducted and is reported using the preferred reporting items for systematic reviews and meta-analyses (PRISMA) framework (Moher et al., 2009). To further ensure the quality of the study, we adhered to the following quality assessment criteria as described by Kitchenham et al. (2010): “The inclusion criteria are explicitly defined in the paper”: The inclusion and exclusion criteria are specified in Section 4.3. “The authors have either searched four or more digital libraries and included additional search strategies or identified and referenced all journals addressing the topic of interest”: Six digital libraries were searched, and additional records were identified by using the preliminary search results and related reviews to search for similar studies. More details on the search strategy are given in Section 4.2. “The authors have explicitly defined quality criteria and extracted them from each primary study”: The systems are analyzed in Section 5 based on the seven identified challenges, and the criteria are outlined in Table 9. Additionally, the quality of each system is assessed and discussed in Section 6 based on criteria outlined in Table 14. “Information is presented about each paper so that the data summaries can clearly be traced to relevant papers”: A summary of all the included systems is shown in Table 5, with all systems fully cited. A GitHub repository
14
has been created for this study, which includes copies and links of the selected papers and other supplemental material.
Search Strategy
Six digital libraries were searched: ACM Digital Library, 15 IEEE Xplore, 16 PubMed, 17 ScienceDirect, 18 Scopus, 19 and Web of Science. 20 An initial search was conducted between 9 and 12 February 2024, and a second search was conducted between 14 and 15 August 2025 to ensure inclusion of the most recent literature. Abstracts, titles, and/or keywords were searched using terms related to the topic of the study, at the intersection of five areas: SWTs, sensors, the health domain, monitoring, and systems. The search strings used are shown in Table 2. Boolean operators were used for a more specific search, although the Science Direct library had a limit on the number of Boolean operators that could be used per search. This library also did not allow the use of wildcard characters. To ensure a state-of-the-art study, all results were filtered to only include literature published in or after 2012 during the first search, and between 2024 and 2025 during the second search, thus covering the period between 2012 and 2025. Additionally, where possible, the results were filtered to only include conference papers and journal articles published in English. This filtered out other types of literature such as surveys and reviews, books and book chapters, research abstracts, posters and conference proceedings, as well as articles written in languages other than English. The first search yielded 960 records, while the second search yielded a further 217 records, resulting in a total of 1,177 records from the digital library search.
Search Strings Used in Digital Library Search.
Search Strings Used in Digital Library Search.
The selected results from the digital library search, together with the related review articles discussed in Section 3, were then used to identify further potentially relevant studies through a related paper search. This was done using two online tools, Connected Papers 21 and Semantic Scholar. 22 The first related paper search in February 2024 identified 62 records, while the second one in August 2025 identified 22 records, resulting in 84 additional records. Thus, the entire search process yielded a total of 1,261 records, which were then assessed for eligibility. Rayyan 23 (Ouzzani et al., 2016), an online tool for the management of systematic reviews, was used to facilitate the screening and assessment process. The search results from each phase of the search process are summarized in Table 3.
Summary of Search Results From Both the First (February 2024) and the Second (August 2025) Searches.
This study includes only peer-reviewed journal articles and conference papers written in English. Further, we only include systems that incorporate one of the three SWTs (i.e., an ontology, knowledge graph, or the explicit use of linked data) as an integral system component, with a clear description of the technical implementation. Systems lacking sufficient implementation details were excluded, as this would compromise the quality assessment. Additionally, because a system consists of several integrated components, studies reporting the development of only one component (e.g., an ontology) were excluded. Of particular interest to this study are sensors that measure physiological data (i.e., biosignals and vital signs) and/or physical activity data (e.g., daily step count). Applications of sensors outside health monitoring, such as activity recognition, fitness, or nutrition, were excluded. Furthermore, systems that do not have an analysis, inference, or reasoning component were also excluded. These inclusion and exclusion criteria are summarized in Table 4.
Inclusion and Exclusion Criteria.
Inclusion and Exclusion Criteria.
Summary of Systems Selected for this Study.
From the 1,261 identified records, 366 duplicates were removed, resulting in 895 unique records. Next, preliminary screening was done by reviewing the title and abstract of each record. At this stage, records were excluded for reasons such as not being focused on the health domain or not involving health monitoring. We also found that a number of records had bypassed some of the filters that were applied in the initial identification stage, such as publication year and language. We excluded 664 records based on the title and abstract screening. The remaining 231 papers were read in full to determine if they still met the inclusion criteria. One reason for exclusion at this stage was if the system had been extended in later work, and the extension was one of the systems being assessed. In such cases, the extension was included in the study while the previous work was excluded. Additionally, a small number of publications were excluded for reasons such as the full text being inaccessible without additional payment or the article being retracted. Ultimately, 48 systems were selected for inclusion in this study. Figure 2 shows a PRISMA flow diagram illustrating the identification, screening, eligibility, and inclusion process. The diagram also provides details on the specific reasons for exclusion and the corresponding number of publications excluded at each stage.

Preferred reporting items for systematic reviews and meta-analyses (PRISMA) flow diagram outlining the selection process, combining the results from both the first and second searches.
A summary of the 48 selected systems is shown in Table 5, while Figure 3 shows the distribution of the systems according to the publication year. The year of publication ranges from 2012 to 2025, with 2021 being the most common. In terms of the application area, 29 focus on a particular disease or diseases, while the remaining 19 provide a solution for general health monitoring. Additionally, 17 of the systems mention elderly people as a target user (Bampi et al., 2025; Chiang & Liang, 2015; Elhadj et al., 2021; Esposito et al., 2018; Garcia-Moreno et al., 2023; Garcia-Valverde et al., 2014; Henaien et al., 2020; Hooda & Rani, 2020; Ivascu et al., 2015; Ivaşcu & Negru, 2021; Kilintzis et al., 2019; Spoladore et al., 2021; Stavropoulos et al., 2021; Titi et al., 2019; Vadillo et al., 2013; Villarreal et al., 2014; Zhou et al., 2022). Personal health monitoring systems can be classified into three development stages: research prototypes (validated on synthetic or existing datasets but not tested with real users), user-validated systems (tested through user studies or in real-world settings), and deployed systems (production-level and currently operational). Of the 48 systems, a majority are research prototypes (36 systems), with 12 user-validated systems and no deployed system.

Bar graph showing the distribution of the systems by year of publication.
Regarding the types of SWTs used in the systems, nearly all make use of ontologies. The three exceptions are G. Yu et al. (2022) and Zhou et al. (2022), who use only knowledge graphs, and Xu et al. (2017), who use linked data and knowledge graphs. Six systems use multiple SWTs: Kilintzis et al. (2019) and Reda et al. (2022) combine linked data with ontologies; Stavropoulos et al. (2021) and Zafeiropoulos et al. (2024) combine knowledge graphs with ontologies; Xu et al. (2017) combine linked data with knowledge graphs; and Ammar et al. (2021) incorporate all three technologies. Table 5 also provides an overview of other complementary technologies and techniques used, as well as the architecture type of each system. These aspects are discussed in more detail in Sections 5 and 7, respectively.
This section examines the role of SWTs in addressing the seven key challenges identified in Section 1, as well as the contribution of other complementary technologies and techniques that are incorporated into the systems. Additionally, a critical evaluation is provided assessing the extent to which each system succeeds in addressing these key challenges. Although there is a broader range of challenges facing sensor-based health monitoring systems, we have necessarily had to delimit the scope of this article. By focusing our analysis on these seven salient challenges, we aim to provide an in-depth assessment of how effectively they have been addressed in the current state of the field. However, we briefly discuss some other considerations, including privacy and security as well as usability, at the end of the section.
Interoperability
Interoperability can be defined as the ability of different components or systems not only to exchange information but also to make use of it (Benson & Grieve, 2021). There are three types of interoperability identified in the health domain: technical, semantic, and process interoperability (Benson & Grieve, 2021; Gibbons et al., 2007). Technical interoperability refers to the way data or information moves from one system or component to another. Related to this is syntactic interoperability, which provides a structure and syntax for the transmitted data (Hosseini & Dixon, 2016). Semantic interoperability refers to the ability of the recipient to understand and make use of the received data (Benson & Grieve, 2021), whereas process interoperability concerns the seamless coordination of workflows in care delivery (Kuziemsky & Peyton, 2016). A subset of this is clinical interoperability, through which patients can be seamlessly transferred between different care teams (Benson & Grieve, 2021). This review focuses on technical, syntactic, and semantic interoperability. We also discuss the role of interoperability in enabling data fusion, a crucial functionality in health monitoring systems.
Technical Interoperability
Differing data transmission technologies can contribute to a lack of technical interoperability in health monitoring systems, particularly those that use a range of different sensors. Data transmission protocols used in sensors include Bluetooth, Bluetooth Low Energy, ANT+, and Zigbee, with the first three being the most common among wearable devices today (Gravina & Fortino, 2021). Interoperability among these different protocols can be achieved using gateway devices, which receive data from different sensors and transmit it to cloud services (Rahmani et al., 2018). This is done by Ali et al. (2018), who use a router as a gateway to receive sensor data and transmit it to the internet. Eleven of the systems (Ali et al., 2020; Alti et al., 2022; Ammar et al., 2021; Elhadj et al., 2021; El-Sappagh et al., 2019; Hussain & Park, 2021; Khozouie et al., 2018; Lopes de Souza et al., 2023; Peral et al., 2018; Villarreal et al., 2014; Zhang et al., 2014) use a mobile phone as a gateway device or base station, typically receiving sensor data via Bluetooth or Bluetooth Low Energy and transmitting it to the cloud via WiFi or mobile data.
Syntactic Interoperability
While technical interoperability is associated with hardware components and infrastructure, syntactic interoperability is usually associated with data formats (Veer & Wiles, 2008). There are several standards that are widely used to promote syntactic interoperability among systems. Among them is the ISO/IEEE 11073 standard, which provides a common format for communication involving medical devices and patient health data, with an emphasis on vital signs. This is used by El-Sappagh et al. (2019) for formatting data for transmission to the base unit or gateway. Other important standards for health data are provided by Health Level 7 (HL7). One of these is FHIR, which describes data formats, resources, and an application programming interface through which health information can be exchanged (Benson & Grieve, 2021). Two of the 48 systems make use of FHIR. El-Sappagh et al. (2019) convert sensor data from the ISO/IEEE 11073 standard to FHIR formats, while also receiving data in FHIR format from hospital information systems. In this way, both sensor data and data from hospital systems are in the same format. Similarly, Kilintzis et al. (2019) propose a semantic model based on FHIR, using its data types and defining classes as FHIR categories. FHIRs can be defined using different data formats, 24 including XML, RDF serializsed in Turtle, and JSON. Both systems use the JSON format.
Semantic Interoperability
The next type of interoperability is semantic interoperability, which is concerned with the meaning of the exchanged information. Semantic interoperability can be achieved through the use of unambiguous codes and identifiers, which can be provided by existing standard classifications and terminologies (Benson & Grieve, 2021). Ontologies are, of course, a well-established way to embed semantic interoperability in a system (Sheth et al., 2008). Within the medical domain, many existing medical terminologies are available as ontologies, including SNOMED CT, 25 the International Classification of Diseases (ICD), 26 and the International Classification for Nursing Practice (ICNP). 27 Among the systems, SNOMED CT is the most commonly used (Bampi et al., 2025; Chatterjee et al., 2021; El-Sappagh et al., 2019; Kilintzis et al., 2019; Kordestani et al., 2021; Lopes de Souza et al., 2023; Reda et al., 2022; Rhayem et al., 2021; Titi et al., 2019; Zhou et al., 2022). ICNP is used by Elhadj et al. (2021) and Henaien et al. (2020), while ICD is used by Spoladore et al. (2021) and G. Yu et al. (2022) (ICD-11, the latest version) as well as (ICD-10; Titi et al., 2019). The Unified Medical Language System (UMLS; Bodenreider, 2004) is a large thesaurus that integrates multiple terminologies of medical knowledge. It is used by Peral et al. (2018), and Zhou et al. (2022). Another thesaurus is Medical Subject Headings (MeSH), which is used for indexing, cataloguing, and searching health information, and is integrated in the system proposed by Reda et al. (2022). Garcia-Moreno et al. (2023) and Spoladore et al. (2021) incorporate the International Classification of Functioning, Disability and Health (ICF). 28
Terminologies for specific diseases and conditions also exist. For example, Ali et al. (2021) and El-Sappagh et al. (2019) reuse ontologies specific to diabetes. Similarly, De Brouwer et al. (2022) use the third edition of the International Classification of Headache Disorders (ICHD-3), 29 while Hristoskova et al. (2014) and Zafeiropoulos et al. (2024) reuse the Heart Failure Ontology and Parkinson and Movement Disorder Ontology, respectively. The Vital Sign Ontology is extended by El-Sappagh et al. (2019) and Ivaşcu and Negru (2021). Xu et al. (2017) posit that it is difficult to build scalable ontology-based systems suitable for large amounts of healthcare data and instead opt for a linked data approach to add semantic information to the data. Their proposed system uses linked open data medical knowledge graphs, namely Diseasome, DBpedia, and DrugBank. Using these resources, they create a knowledge graph showing the relationships between symptoms and diseases. Domain-independent concepts can also be referenced from SWTs. For instance, Peral et al. (2018) and Reda et al. (2022) both use WordNet, a lexical English language database of semantic relations between words, linking them into semantic relations.
SWTs also provide a means to represent sensors and the data they capture. Sensors can be represented with varying degrees of expressiveness. Concepts that can be captured about sensors include unique identifier, manufacturer, location of deployment, dimensions, operating conditions, type of data captured, and hierarchy with regard to related sensors (Compton et al., 2009). Similarly, various sensor data concepts can be represented, such as the property being observed, units of measurement, and measurement timestamps. Thirty-eight systems represent sensor and sensor data concepts in ontologies. The reuse of existing sensor ontologies, particularly established ones such as SAREF, can contribute to a higher degree of expressiveness for sensor and sensor data concepts. This is because these validated ontologies provide rich modeling of such concepts, facilitating more effective querying of and reasoning on sensor data, which is essential for situation analysis. Comprehensive sensor ontologies also support sensor management, allowing sensors to be catalogued based on their attributes as captured in ontologies (Compton et al., 2009). Despite these benefits, only 16 systems reuse existing sensor or device ontologies, namely SSN/SOSA (Bampi et al., 2025; Chatterjee et al., 2021; Elhadj et al., 2021; El-Sappagh et al., 2019; Garcia-Moreno et al., 2023; Ivaşcu & Negru, 2021; Martella et al., 2025; Rhayem et al., 2021; Stavropoulos et al., 2021; Titi et al., 2019), SAREF and its extensions (De Brouwer et al., 2022; Hadjadj & Halimi, 2021; Lopes de Souza et al., 2023; Zafeiropoulos et al., 2024), the Amigo device ontology (Hristoskova et al., 2014), and the Moving Objects ontology (Rhayem et al., 2021). This could be attributed to the existing ontologies providing more complexity than the systems require, although this can be mitigated by selectively importing only the relevant classes.
Foundational ontologies can contribute to semantic interoperability by providing unambiguous and domain-independent concept definitions (Amaral et al., 2021). Three of the selected systems directly incorporate a foundational ontology. El-Sappagh et al. (2019) use the Basic Formal Ontology, while De Brouwer et al. (2022) and Stavropoulos et al. (2021) use the DOLCE + Description and Situation (DnS) Ultra Lite (DUL) ontology. Other systems indirectly integrate foundational ontologies by reusing other ontologies that have already incorporated them. For example, the SSN ontology uses DUL as its upper ontology (Compton et al., 2012), and the SAREF ontology also has an indirect reference to DUL through its mappings to the SSN ontology (Daniele et al., 2015). Consequently, any system that reuses the SSN or SAREF ontologies inherits an indirect connection to DUL.
Supporting Data Fusion
For real-time health monitoring, streaming sensor data must be retrieved and dynamically fused with other heterogeneous, multimodal, and distributed sources of data. This data fusion is pivotal for downstream situation detection, situation prediction, and decision support. Interoperability serves as an enabler for effective data fusion. Technical interoperability provides the protocols and hardware necessary to collect data from diverse sources; syntactic interoperability ensures data formats and structures are compatible across systems; and semantic interoperability establishes shared meaning for data elements, allowing accurate interpretation. Additionally, data fusion supports process interoperability by creating a unified and comprehensive patient view from diverse data sources, which enables different healthcare providers to coordinate their workflows effectively (Kuziemsky & Peyton, 2016).
As seen in Table 6, the selected systems support a wide range of heterogeneous and multimodal data. All 48 systems collect physiological data from body sensors, while 15 systems additionally incorporate weather data from ambient sensors. Health and medical records are the most frequently incorporated data source other than sensor data, with 21 systems supporting the incorporation of such data from external hospital systems. These records provide additional information that is useful for health monitoring, such as an individual’s disease history, laboratory test results, medications taken, allergies, and previous hospital admissions. The systems proposed by Ali et al. (2018, 2020), El-Sappagh et al. (2019), and Rhayem et al. (2021) have the most comprehensive records, capturing laboratory tests, prior disease diagnoses, and lifestyle information such as exercise, nutrition, alcohol consumption, and smoking status. Some systems use medical records to extract diagnosis status (Ali et al., 2021), while others use them to extract an individual’s risk factors for disease (Ali et al., 2020). These records can also be used to overcome limitations of sensor data, such as missing values, as was done by Ali et al. (2021). Besides health and medical records, data from social networks can also be used to complement sensor data. This is done in two systems: Ali et al. (2021), who use social networking data to monitor individuals’ mental health through sentiment analysis; and Ammar et al. (2021), who use social networks, blogs, and news articles as sources of public knowledge.
Sensor Data and Other Types of Data Used in the Systems.
Note . BG = blood glucose; BP = blood pressure; BT = body temperature; CO2 = carbon dioxide; CO = carbon monoxide; ECG = electrocardiogram; EEG = electroencephalogram; EIT = electrical impedance tomography; EMG = electromyogram; GPS = global positioning system; HR = heart rate; O2 = oxygen; RR = respiratory rate; SpO2 = blood oxygen saturation; TVOC = total volatile organic compounds.
Sensor Data and Other Types of Data Used in the Systems.
A detailed analysis of the data sources, including sensor devices and existing datasets, is provided in Section 6.1.
SWTs do not inherently provide support for technical interoperability, since they operate at a higher, more abstract level to formally represent and derive meaning from the data. Therefore, in order to achieve technical interoperability, health monitoring systems must leverage data transmission protocols and devices. However, SWTs are critical in the achievement of semantic interoperability among the selected systems. Twenty of the systems make use of terminologies such as SNOMED CT and ICD through ontologies, and access knowledge graphs such as DrugBank and Diseasome, which are published as linked data. This allows the systems to access and reason with standardized health domain knowledge.
SWTs can also contribute to syntactic interoperability. For instance, El-Sappagh et al. (2019) and Kilintzis et al. (2019) map FHIRs to an ontology, allowing for interoperability between their proposed system and hospital information systems that use FHIR. These are the only two systems that use SWTs to achieve interoperability with established health standards. This can be attributed to the historical gap in user-friendly tools for this purpose. For example, although FHIR has an RDF representation serialized in Turtle format, early adopters noted several issues precluding its ease of use, such as literal values and FHIR references being nested under blank nodes, and unnecessarily long predicate names (Sharma et al., 2022). However, more recent versions of FHIR RDF have largely addressed these issues.
Finally, with regard to the representation of heterogeneous and multimodal data, 42 systems use ontologies for this purpose. The exceptions are three systems which represent the data in knowledge graphs (Ammar et al., 2021; Xu et al., 2017; G. Yu et al., 2022), and three systems which only use SWTs (i.e., domain ontologies, external knowledge graphs, and linked data repositories) as a source of domain knowledge (Ali et al., 2021; Hussain & Park, 2021; Zhou et al., 2022).
Situation Detection
A situation can be understood as a higher-level interpretation of sensor data that is relevant and of interest in an application domain (Ye et al., 2011). Personal health monitoring systems should be capable of situation analysis, which entails both the detection and the prediction of health situations. We discuss situation detection below, and situation prediction in Section 5.3.
Forms of Situation Detection
In health monitoring systems, situation detection can take a variety of forms. One of these is the categorization of individual sensor observations based on whether they are within or outside a given range as determined by domain knowledge. For example, in the system proposed by Akhtar et al. (2022), when vital signs such as temperature and heart rate are outside the normal range, the situation is classified as an emergency. Likewise, Elhadj et al. (2021) classify expected observations as normal, while observations outside the normal ranges are classified as abnormal. They also include a third classification, wrong, for faulty observations from malfunctioning sensors. Similar threshold-based situation categories are used in 19 of the systems (Alti et al., 2022; Ammar et al., 2021; Bampi et al., 2025; De Brouwer et al., 2022; Garcia-Valverde et al., 2014; Hadjadj & Halimi, 2021; Hristoskova et al., 2014; Ivaşcu & Negru, 2021; Khozouie et al., 2018; Kilintzis et al., 2019; Lopes de Souza et al., 2023; Martella et al., 2025; Peral et al., 2018; Rhayem et al., 2021; Titi et al., 2019; Villarreal et al., 2014; Zafeiropoulos et al., 2024; Zeshan et al., 2023; Zhang et al., 2014). Thresholds have also been used to classify physical activity based on level of intensity (Chatterjee et al., 2021; El-Sappagh et al., 2019; Esposito et al., 2018; Garcia-Valverde et al., 2014; Ivaşcu & Negru, 2021). A better approach than using individual sensor observations is to consider different observations and personal attributes to classify individuals. This is done by Ali et al. (2018), who classify the patient health condition as either healthy, moderate, or serious based on multiple sensor outputs and properties such as sex, weight, and height. Similarly, Chiang and Liang (2015) classify situations as either healthy, moderate, or severe based on age, blood pressure, blood glucose, heart rate, and cholesterol.
Another form of situation detection in health monitoring is the detection of medical conditions and diseases. Some conditions, such as hypertension and hyperglycemia, can be diagnosed based on individual sensor observation thresholds. This is done by J. Kim et al. (2014), who detect prehypertension and steps 1 and 2 hypertension based on defined blood pressure thresholds. Similarly, hyperglycemia is detected by Rhayem et al. (2021) based on blood glucose levels. Other diseases require the analysis of signs and symptoms based on a combination of different sensor observations and other sources of data. For example, Ivascu et al. (2015) detect mental disorders (Parkinson’s, Alzheimer’s, psychosis, and depression) using signs and symptoms related to behavior, motor skills, cognitive skills, facial appearance, mood, sleep, weight, and speech. Other systems are able to detect types of headaches (De Brouwer et al., 2022), heart disease (Ali et al., 2020), diabetes (Ali et al., 2021, 2018; Hooda & Rani, 2020), frailty (Garcia-Moreno et al., 2023), stroke (Hussain & Park, 2021), and skin and kidney diseases (Kordestani et al., 2021). Beyond the detection of diseases, Zafeiropoulos et al. (2024) detect medication adherence in Parkinson’s disease patients by recognizing missed doses based on two features: tremors and bradykinisia.
Techniques for Situation Detection
Thirty-nine of the systems implement some form of rule-based reasoning for situation detection. Of these, 28 systems specify using semantic web rules in particular (Akhtar et al., 2022; Ali et al., 2018; Alti et al., 2022; Bampi et al., 2025; Chatterjee et al., 2021; Chiang & Liang, 2015; Elhadj et al., 2021; El-Sappagh et al., 2019; Esposito et al., 2018; Garcia-Valverde et al., 2014; Hadjadj & Halimi, 2021; Henaien et al., 2020; Hooda & Rani, 2020; Hristoskova et al., 2014; Kilintzis et al., 2019; J. Kim et al., 2014; Lopes de Souza et al., 2023; Mezghani et al., 2015; Minutolo et al., 2016; Reda et al., 2022; Rhayem et al., 2021; Spoladore et al., 2021; Stavropoulos et al., 2021; Titi et al., 2019; H. Q. Yu et al., 2017; Zafeiropoulos et al., 2024; Zeshan et al., 2023; Zhang et al., 2014). A discussion of the specific rule languages used in the systems can be found in Section 6.2.2. Rules provide a way to implement expert knowledge in an if–then form, whereby if certain conditions are met, then a consequent conclusion is made or action taken. Despite their widespread use, rules have several limitations. Firstly, crisp rules are unable to handle uncertainty and ambiguity in sensor observations and the determination of health situations. To mitigate this, five systems incorporate fuzzy logic (Ali et al., 2018; Chiang & Liang, 2015; Esposito et al., 2018; Fenza et al., 2012; Minutolo et al., 2016), while one incorporates defeasible logic (Akhtar et al., 2022) within the rules. These techniques are discussed in greater detail in Section 5.7, which focuses on techniques for handling uncertainty in health monitoring. Secondly, rules are typically based on existing expert knowledge, and therefore cannot incorporate new knowledge that experts may be unaware of. Additionally, manually updating rules is time-consuming, making them difficult to scale. This challenge can be overcome using learned rules based on ML algorithms, which can acquire new, high-quality knowledge automatically (Hitzler et al., 2020) and contribute to dynamic and adaptable rule-based systems. The systems proposed by Hussain and Park (2021), Henaien et al. (2020), and Peral et al. (2018) extract rules from decision trees. However, caution should be exercised when using ML-derived rules, as they may still need verification and validation from domain experts. As an alternative to rule-based reasoning, Xu et al. (2017) implement case-based reasoning, arguing that it is easier to capture human experiences using cases rather than rules. By searching for historical cases that are similar to the current case, their proposed system is able to obtain treatment plans that have been successful in the past.
In addition to the development of rules as discussed above, ML is also used in a number of systems for the classification of diseases based not only on sensor data but also other data sources. Ali et al. (2021) use a bidirectional long short-term memory (BiLSTM) model to detect diabetes and blood pressure, to classify sentiments from social networking data for mental health monitoring, and to classify drug side effects. Their proposed system uses domain ontologies to extract important features that can enhance the ML classification. Zhou et al. (2022) also use a BiLSTM model for disease prediction, while Garcia-Moreno et al. (2023) use k-nearest neighbors to classify elderly individuals based on frailty and dependence. Other ML algorithms used include a multi-layer perceptron for heart disease detection (Ali et al., 2020) and a random forest for stroke detection (Hussain & Park, 2021). ML is also used for physical activity classification, for example, using the k-nearest neighbors (Garcia-Valverde et al., 2014; Mavropoulos et al., 2021), decision trees (Mavropoulos et al., 2021), and random forest (Ivaşcu & Negru, 2021; Mavropoulos et al., 2021) algorithms. Finally, ML can also be used to classify situation severity for reporting purposes. This is done by Zafeiropoulos et al. (2024), who use a graph neural network to distinguish between medium and high alerts. A full review of ML techniques for situation analysis in the health domain is outside the scope of this study. Readers are referred to the reviews by Ravì et al. (2017) and Li et al. (2021).
The Role of SWTs
SWTs can support situation detection in two main ways: firstly, they formally represent important concepts and the relationships between them, that is, sensor data, domain knowledge, contextual information, and even the situations themselves; and secondly, they support reasoning through which new knowledge can be derived from existing knowledge (Ye et al., 2011). Although several situation-focused ontologies have been developed, including the Situation Theory Ontology (Kokar et al., 2009) and the Scenes and Situations ontology (Almeida et al., 2018), none of the selected systems reuse any such ontologies. Despite this, SWTs remain vital for situation detection among the selected systems. Rule-based reasoning is the most common mechanism for situation detection among systems. These rules rely heavily on concepts that are formally defined in ontologies, and they are more often than not expressed in standard semantic web languages such as SWRL.
Situation Prediction
All 48 selected systems detect current situations. In contrast, only 12 of the systems go beyond this to predict some future outcome (Ali et al., 2020; Alti et al., 2022; Chiang & Liang, 2015; De Brouwer et al., 2022; Fenza et al., 2012; Hristoskova et al., 2014; Mcheick et al., 2016; Peral et al., 2018; Reda et al., 2022; Rhayem et al., 2021; H. Q. Yu et al., 2017; Zhou et al., 2022).
Forms of Situation Prediction
All 12 of these systems explore the concept of risk as a situation prediction feature, since determining an individual’s risk profile for a certain condition can be used to predict future adverse health situations. This includes the risks of heart disease (Ali et al., 2020; Hristoskova et al., 2014), arthritis recurrence (Chiang & Liang, 2015), stroke (Mcheick et al., 2016), and fetal loss in gestational diabetes patients (Rhayem et al., 2021). Zhou et al. (2022) use a multi-label classification approach to simultaneously assess the risk of multiple chronic diseases, including hypertension, diabetes, and arthritis. De Brouwer et al. (2022) detect triggers as a means to anticipate potential headache attacks in the future, which can be considered a form of risk prediction. To support the identification of potential risks, future physiological readings can also be predicted using historical sensor observations, as is done by Peral et al. (2018). Their proposed system predicts blood glucose levels over three-day and five-day windows. These predictions of sensor measurements can then be analyzed to determine future health risks. Hristoskova et al. (2014) similarly adopt a temporal analysis, determining the risk of congestive heart failure over a four-year time horizon.
Another useful aspect of situation prediction is the determination of the prognosis, that is, expected progression, of a detected disease, although this is poorly explored in the systems. Hussain and Park (2021) mention the intention to extend their system in future work to include automated stroke prognosis; however, this is not implemented in the current version of the system. In contrast, H. Q. Yu et al. (2017) include a disease progression class in their proposed ontology, representing past diagnoses or potential health risks and their associated times. However, the system does not include any methods to predict the progression of detected conditions.
Techniques for Situation Prediction
Similar to situation detection, the most common technique used in situation prediction is rules, which is used in nine of the 12 systems. Of these, seven specify semantic web rules in particular (Ali et al., 2020; Alti et al., 2022; Chiang & Liang, 2015; Hristoskova et al., 2014; Reda et al., 2022; Rhayem et al., 2021; H. Q. Yu et al., 2017). As discussed in Section 5.2, both Chiang and Liang (2015) and Fenza et al. (2012) use fuzzy rules to enhance crisp rules. ML is used for situation prediction in four of the systems (Ali et al., 2020; Peral et al., 2018; Zafeiropoulos et al., 2024; Zhou et al., 2022). Ali et al. (2020) use an ensemble deep learning classifier, which consists of a five-layer feed-forward network that incorporates a boosting algorithm, to predict future heart attacks, while Peral et al. (2018) use support vector machine and logistic regression models to forecast future blood glucose measurements. Although De Brouwer et al. (2022) use ML to detect headache triggers, the actual prediction of headaches is knowledge-based using SPARQL queries on stored data.
Besides rules and ML, Bayesian networks (BNs) are used in two of the systems. These are probabilistic models in the form of directed acyclic graphs that can represent causal relationships among variables in a domain. Mcheick et al. (2016) use a BN to calculate the risk of stroke occurring in the next seven days, based on risk factors such as age, presence of diabetes, high blood pressure, and symptom duration. This approach is also taken by Kordestani et al. (2021) to determine the probability of the occurrence of kidney disease. The use of fuzzy rules and BNs is discussed in greater detail in Section 5.6 on explainability and Section 5.7 on uncertainty handling.
The Role of SWTs
By providing a structured representation of risk factors, temporal concepts, health outcomes, and situations, and supporting reasoning over these concepts, SWTs support more effective situation prediction. Similarly to situation detection, rules remain the most used reasoning approach for situation prediction among the systems, with seven of them using semantic web rules specifically. However, as discussed in Section 5.2, semantic-based approaches to situation analysis face drawbacks such as scaling difficulties and the inability to handle uncertainty inherently. These limitations can be mitigated by combining them with complementary techniques such as ML, fuzzy logic, and BNs. In fact, several extensions of semantic web standards and languages have been proposed that incorporate these uncertainty handling techniques, and these are discussed in Section 5.7.6. However, none of these are used in the reviewed systems.
Decision Support
Decision support is the natural next step after situation analysis. Based on the detected and predicted situations, targeted support can be offered to mitigate adverse situations and promote favorable health outcomes. Forty-five systems implement some form of decision support.
Forms of Decision Support
Messages and notifications are used in the majority of the systems to warn of potentially dangerous situations, issue reminders, and prompt mitigating action. These notifications can be sent to monitored individuals, caregivers, clinicians, and emergency services, depending on the severity of the situation. The most common form of decision support is alerting users to adverse situations. This is done in nearly all of the selected systems, with the exception of 11 which do not mention this crucial functionality (Ali et al., 2020, 2021; Alti et al., 2022; Chatterjee et al., 2021; El-Sappagh et al., 2019; Fenza et al., 2012; Kilintzis et al., 2019; J. Kim et al., 2014; Minutolo et al., 2016; Reda et al., 2022; H. Q. Yu et al., 2017). A well-documented issue with alerts in the health domain is alert fatigue, a phenomenon in which users become desensitized to alerts due to their frequency (Cash, 2009). Esposito et al. (2018) mitigate this by differentiating between critical and non-critical abnormal situations, with the latter being sent out in a daily summary report email rather than an instantaneous notification for each case. Additionally, 10 systems send reminder messages of some kind to users, for instance about medications, exercise, and other interventions (Ammar et al., 2021; Chiang & Liang, 2015; Hadjadj & Halimi, 2021; Henaien et al., 2020; Kordestani et al., 2021; Mavropoulos et al., 2021; Stavropoulos et al., 2021; Titi et al., 2019; Vadillo et al., 2013; G. Yu et al., 2022). For example, the system proposed by Kordestani et al. (2021) can remind clinicians to order additional laboratory tests when previously taken tests become out of date.
In addition to alerts and reminders, health monitoring systems can also trigger actions in response to adverse situations. For example, the system proposed by Alti et al. (2022) triggers the injection of insulin in response to a blood glucose level above a certain threshold, while the system proposed by Hadjadj and Halimi (2021) can trigger the opening of a vehicle door. Such systems must be integrated with an actuation device capable of carrying out the action. The system proposed by Titi et al. (2019) includes several actuators, such as a smoke alarm. Other systems are integrated with actuators capable of opening doors and moving beds (Bampi et al., 2025), turning on lights and heaters (Chiang & Liang, 2015), making emergency calls (Rhayem et al., 2021), or turning off water or gas if detected (Vadillo et al., 2013).
Another form of decision support is the generation of suggestions or recommendations for the mitigation of adverse situations, which is implemented in 29 systems. The recommendations include lifestyle modifications, such as diet and activity (Ali et al., 2020, 2018; Alti et al., 2022; Ammar et al., 2021; Chatterjee et al., 2021; El-Sappagh et al., 2019; Garcia-Valverde et al., 2014; Hadjadj & Halimi, 2021; Hussain & Park, 2021; J. Kim et al., 2014; Lopes de Souza et al., 2023; Rhayem et al., 2021; Spoladore et al., 2021; Villarreal et al., 2014; G. Yu et al., 2022), medication (Alti et al., 2022; Elhadj et al., 2021; El-Sappagh et al., 2019; Hristoskova et al., 2014; Kordestani et al., 2021; Lopes de Souza et al., 2023; Peral et al., 2018; Rhayem et al., 2021; Titi et al., 2019; Xu et al., 2017), suitable environmental conditions (Chiang & Liang, 2015), hospital visits (Zhou et al., 2022), and other unspecified treatments and mitigations (Akhtar et al., 2022; Ali et al., 2021; De Brouwer et al., 2022; Martella et al., 2025; Mavropoulos et al., 2021; Zafeiropoulos et al., 2024). Two important considerations when choosing appropriate medications are their side effects and how they interact with other medications. Ali et al. (2021) use drug review websites to collect data on side effects, while Elhadj et al. (2021) keep track of medication interactions as well as patient allergies. Similarly, the system proposed by Ammar et al. (2021) includes an app where monitored individuals can report medication side effects. This information assists clinicians in prescribing appropriate medications for each patient. Related to recommendations is the ability for the monitored individual to seek out relevant and trusted medical information. For example, the system proposed by Rhayem et al. (2021) includes a notification module that allows patients to contact a clinician and receive recommendations and treatments from them.
Techniques for Decision Support
Much like situation detection and situation prediction, rule-based reasoning is the most commonly used technique for decision support in the selected systems. Rules are used by 37 of the 45 systems that implement decision support, of which 27 systems use semantic web rules in particular (Akhtar et al., 2022; Ali et al., 2020, 2018; Alti et al., 2022; Bampi et al., 2025; Chatterjee et al., 2021; Chiang & Liang, 2015; Elhadj et al., 2021; El-Sappagh et al., 2019; Esposito et al., 2018; Garcia-Valverde et al., 2014; Hadjadj & Halimi, 2021; Henaien et al., 2020; Hooda & Rani, 2020; Hristoskova et al., 2014; Kilintzis et al., 2019; J. Kim et al., 2014; Lopes de Souza et al., 2023; Mezghani et al., 2015; Rhayem et al., 2021; Spoladore et al., 2021; Stavropoulos et al., 2021; Titi et al., 2019; H. Q. Yu et al., 2017; Zafeiropoulos et al., 2024; Zeshan et al., 2023; Zhang et al., 2014). They are typically implemented based on detected and predicted situations, such that adverse situations trigger alerts or recommendations. Zafeiropoulos et al. (2024) combine ML and rules, using a graph convolution network to classify alerts into one of two levels (medium or high), after which the alerts are sent based on rules.
Similar to their approach for situation detection, Xu et al. (2017) adopt case-based reasoning rather than rules for decision support. They compare treatments and clinical paths in similar patients, and, if differences are found, the system sends an alert to flag the treatment plan for adjustment. Among the agent-based systems, the functionality related to decision support is often delegated to agents, such as the notification and alert agents in the systems proposed by Ivascu et al. (2015) and Ivaşcu and Negru (2021). Additionally, interactive agents and chatbots play an important role in decision support. Ammar et al. (2021) propose digital assistants that provide personalized alerts and suggestions and summarize text. Similarly, Mavropoulos et al. (2021) propose a smart virtual agent that clinicians can interact with via voice commands, while G. Yu et al. (2022) implement an AI chatbot to answer user questions.
Quality of Decision Support
An important factor in the quality of decision support is user agency. Rather than simply providing recommendations, decision support tools should allow decision-makers the agency to engage in decision-making by helping them to: (1) identify and narrow down
Furthermore, the soundness of recommended interventions can be ensured by incorporating established and clinically validated medical guidelines or vetted information from medical or allied health professional bodies. This can contribute to the acceptance of health monitoring systems by medical professionals and regulatory bodies. Despite this, only 15 of the selected systems mention the use of such guidelines or vetted medical information. They include Lopes de Souza et al. (2023), who incorporate risk level classifications from the American Heart Association, and De Brouwer et al. (2022), who use the International Classification of Headache Disorders’ criteria to issue relevant alerts. Garcia-Moreno et al. (2023) and Spoladore et al. (2021) use the International Classification of Functioning, Disability and Health for health status classification; while Ali et al. (2020) and Hristoskova et al. (2014) use the Framingham Risk Score to determine congestive heart failure risk. El-Sappagh et al. (2019) extract knowledge from bodies such as the American and Canadian diabetes associations, while H. Q. Yu et al. (2017) use guidelines from the United Kingdom’s National Health Service. Finally, the system proposed by Ammar et al. (2021) includes an option for clinicians to refer patients to trusted medical knowledge sources such as the Centers for Disease Control and Prevention.
The Role of SWTs
The main value that SWTs add to decision support in the selected systems is their ability to represent important concepts, particularly situations and domain knowledge, which can then be reasoned over. This is consistent with our findings on the role of SWTs in situation detection and situation prediction. It also aligns with previous research findings on the use of SWTs for decision support (Blomqvist, 2014; Jing et al., 2023). Rules are the most common reasoning tool among the selected systems, with semantic web rules used for decision support in 27 of them. However, rule-based reasoning for decision support faces the same limitations in scalability and lack of inherent uncertainty handling as discussed in the previous sections.
Context Awareness
An important aspect of health monitoring is the ability to take context into consideration, which is critical for situation analysis since contextual information enhances sensor data and supports its interpretation. This, in turn, impacts decision support. Consider a case where an individual’s heart rate is suddenly elevated. If the individual is engaged in exercise, the increased heart rate is expected. However, if the individual is at rest, this could be a cause for alarm and may necessitate intervention. Therefore, health monitoring systems must be able to adapt based on the context of the individual being monitored. The four most common aspects of context are location, time, identity (of a person or agent), and activity (or events; Stevenson et al., 2009; Ye et al., 2007).
Location
Ye et al. (2007) highlight three types of locations that can be represented: symbolic locations, coordinate locations, and regions. The systems proposed by Akhtar et al. (2022), Bampi et al. (2025), Chiang and Liang (2015), and Vadillo et al. (2013) keep track of the different rooms in a house where an individual may be, while those proposed by Khozouie et al. (2018), Titi et al. (2019), and Zeshan et al. (2023) indicate more generally the place the monitored individual is (e.g., “home” or “hospital”). These are symbolic locations. One purpose of such locations is to allow the systems to suggest relevant services based on the type of space currently occupied, as is the case in the system proposed by Chiang and Liang (2015). In the system proposed by Hristoskova et al. (2014), the clinician’s location (i.e., the room they occupy in a hospital) is used to determine which device to send notifications to, optimizing for the closest device. Likewise, Zeshan et al. (2023) determine the closeness between the monitored individual and their caregivers in order to select which caregiver or clinician to notify in case of abnormal sensor observations. This is similar to the system proposed by Alti et al. (2022), which supports a GPS sensor that captures the current coordinates of the monitored individual. In this system, location is used to select devices closest to the user from which to deploy health services so as to increase efficiency and minimizse inter-device communication costs. Coordinate locations also serve the purpose of alerting caregivers and emergency services of the exact location of a person in the event of a medical emergency, as is suggested by Rhayem et al. (2021). The system proposed by Hadjadj and Halimi (2021) integrates health monitoring in the public transport system, and therefore includes location sensors in public transportation vehicles. The final type of location is regions, which are geometrical two- or three-dimensional representations of locations (Ye et al., 2007). This type of location is used in the system proposed by J. Kim et al. (2014) in order to advise users of region-specific situations, such as adverse or dangerous weather. Similarly, Ammar et al. (2021) use ZIP codes, which represent a small geographic region, to extract neighborhood-specific social determinants of health such as walkability. El-Sappagh et al. (2019) use the spatial region class from the Basic Formal Ontology to represent the patient’s current location, as well as the placement of the sensors. Despite the importance of location as an aspect of context, only 21 of the 48 systems (Akhtar et al., 2022; Alti et al., 2022; Ammar et al., 2021; Bampi et al., 2025; Chiang & Liang, 2015; De Brouwer et al., 2022; Elhadj et al., 2021; El-Sappagh et al., 2019; Garcia-Moreno et al., 2023; Hadjadj & Halimi, 2021; Henaien et al., 2020; Hristoskova et al., 2014; Khozouie et al., 2018; J. Kim et al., 2014; Martella et al., 2025; Reda et al., 2022; Rhayem et al., 2021; Titi et al., 2019; Vadillo et al., 2013; H. Q. Yu et al., 2017; Zeshan et al., 2023) include it.
Time
In contrast, all of the systems include the concept of time, with the exception of five (Ali et al., 2020, 2018; Fenza et al., 2012; Hooda & Rani, 2020; G. Yu et al., 2022). Observation time is the most common way time is incorporated in the systems, with 25 systems capturing the exact timestamp for each sensor observation (Alti et al., 2022; Bampi et al., 2025; Chatterjee et al., 2021; De Brouwer et al., 2022; Elhadj et al., 2021; El-Sappagh et al., 2019; Esposito et al., 2018; Garcia-Moreno et al., 2023; Hadjadj & Halimi, 2021; Hussain & Park, 2021; Ivascu et al., 2015; Khozouie et al., 2018; Kilintzis et al., 2019; Lopes de Souza et al., 2023; Martella et al., 2025; Mavropoulos et al., 2021; Minutolo et al., 2016; Peral et al., 2018; Reda et al., 2022; Rhayem et al., 2021; Stavropoulos et al., 2021; Titi et al., 2019; Vadillo et al., 2013; Zafeiropoulos et al., 2024; Zeshan et al., 2023). Besides observation time, the time at which certain events occur can be captured, for example, calls to emergency services (Alti et al., 2022). This allows the systems to display or analyze trends over time. Additionally, Alti et al. (2022) capture the time intervals in which reports should be sent. Rather than a timestamp, Ali et al. (2021) record the general time of day during which daily activities occur, that is, morning, afternoon, or evening. Similarly, Peral et al. (2018) use mealtimes as a point of reference, which is particularly important when taking blood glucose measurements. They distinguish between pre-breakfast, pre-lunch, and pre-dinner readings.
Duration and frequency are other important aspects of time. Duration can be captured for disease (Zafeiropoulos et al., 2024), physical activity (Chatterjee et al., 2021; Spoladore et al., 2021), sleep (Chatterjee et al., 2021; Stavropoulos et al., 2021; Zhou et al., 2022), symptoms (Mcheick et al., 2016), and treatment (Titi et al., 2019; Xu et al., 2017). De Brouwer et al. (2022) capture headache duration as well as the duration of events that influence headaches, such as stress and sleep. Symptom duration can influence the risk for certain illnesses, while specifying treatment duration ensures medication reminders are sent only during the prescribed period. When combined with thresholds, duration can be useful in identifying different situations. For example, Stavropoulos et al. (2021) determine that an individual has a lack of movement if they have fewer than 500 steps and their heart rate has been <100 beats per minute for longer than 800 min. Frequency is used by Chiang and Liang (2015), Spoladore et al. (2021), and H. Q. Yu et al. (2017) as a metric for physical activity. Other examples of frequency in the systems are frequency of sensor observations (Mezghani et al., 2015) and frequency of disease occurrence (Villarreal et al., 2014).
Notably, valuable features can be extracted from changes in time series sensor data. For instance, Hussain and Park (2021) and Ivaşcu and Negru (2021) use the time-domain features of the ECG to calculate heart rate and heart rate variability. Additionally, the multi-agent system proposed by Akhtar et al. (2022) incorporates temporal logic, which allows for the formalization of temporal ordering operators such as “next,” “always,” “until,” and “while” without referencing actual times (Alagar & Periyasamy, 2011). Another interesting time-related aspect is trajectory, which combines both spatial and temporal properties to represent the mobility of a sensor. This is incorporated in the system proposed by Rhayem et al. (2021) to define a source and destination of a sensor within a particular duration of time.
Identity
Identity, which pertains to the actors in a system, is another important aspect of context (Ye et al., 2007). This includes the definition of individuals and their properties, such as name, address, gender, and age. For health monitoring, this can include additional information such as weight, height, and blood group. This is the most ubiquitous aspect of context in the systems, with every system including personal information about the monitored individuals. Besides personal properties, identity also encompasses different user roles within the system. Nearly all of the systems support different users besides the individual being monitored, typically including clinicians and, in some cases, caregivers and family members, with the exception of 11 systems (Chiang & Liang, 2015; Fenza et al., 2012; Garcia-Valverde et al., 2014; Henaien et al., 2020; Khozouie et al., 2018; J. Kim et al., 2014; Martella et al., 2025; Minutolo et al., 2016; Vadillo et al., 2013; H. Q. Yu et al., 2017; Zhang et al., 2014). Identity also includes agents, which are used in the agent-based systems (Akhtar et al., 2022; Alti et al., 2022; Ammar et al., 2021; Ivascu et al., 2015; Ivaşcu & Negru, 2021; Martella et al., 2025; Mavropoulos et al., 2021; Vadillo et al., 2013). Agents, that is, computer systems capable of acting autonomously to achieve some goal(s) (Wooldridge, 2013), have been applied extensively in the health domain (Isern & Moreno, 2016) as well as in sensor-based systems (Savaglio et al., 2020). The agent-based approach offers several advantages. For example, agents can be used as personal assistants to support humans in performing tasks and services (Montagna et al., 2020), such as the interactive virtual agent in the system proposed by Mavropoulos et al. (2021). Agent-based architectures and their advantages are discussed in greater detail in Section 7.
Activity
The fourth essential aspect of context is activity. This can refer to physical activity or the different activities of daily living, such as eating and sleeping, both of which are important considerations for situation analysis. Activity can be derived from sensors such as accelerometers, or can be deduced from location or time (e.g., a person in a bedroom in the middle of the night can be assumed to be sleeping). Physical activity is closely tied to health, and there are many physical activity guidelines issued by governments and global health organizations, including the World Health Organization (Bull et al., 2020). Due to this link between physical activity and health, 31 of the systems include physical activity as contextual information. Such systems monitor physical activity using smartphones, smart watches, or inertial measurement units, which combine accelerometers, gyroscopes, and in some cases, magnetometers (Ali et al., 2020, 2021; De Brouwer et al., 2022; El-Sappagh et al., 2019; Esposito et al., 2018; Garcia-Moreno et al., 2023; Garcia-Valverde et al., 2014; Ivascu et al., 2015; Ivaşcu & Negru, 2021; Khozouie et al., 2018; Lopes de Souza et al., 2023; Mavropoulos et al., 2021; Minutolo et al., 2016; Zafeiropoulos et al., 2024). Chiang and Liang (2015) monitor body movement using motion sensors placed around the home. This serves two purposes. Firstly, the individual’s movement within the home is able to be monitored. This can determine their location at any given time. Secondly, they are able to interact with the system using body movements, such as hand-waving to activate the system. Ali et al. (2018) similarly use motion sensors to keep track of body movement. They use range of motion as a metric, which is particularly important for elderly patients who may lose their ability to perform daily activities as their range of motion decreases. The systems proposed by Ivascu et al. (2015) and Zafeiropoulos et al. (2024) perform gait analysis, capturing features such as freezing of gait, postural instability, and rigidity.
Self-reported information can also be used to determine physical activity, but this may not be accurate. To mitigate this, Chatterjee et al. (2021) use a combination of sensor and questionnaire data. Sensors are used to monitor the number of steps and duration of activity, while questionnaires are used to determine the type of activity, for example, running or weightlifting. Beyond tracking physical activity, activity recognition is also important in health monitoring. It can help in the detection of adverse events such as falls, as is done in the systems proposed by Chiang and Liang (2015), Vadillo et al. (2013), and Zafeiropoulos et al. (2024). Additionally, seven systems are able to recognize daily activities such as sitting, walking, and sleeping (De Brouwer et al., 2022; Garcia-Moreno et al., 2023; Garcia-Valverde et al., 2014; Ivaşcu & Negru, 2021; Mavropoulos et al., 2021; Rhayem et al., 2021; Zafeiropoulos et al., 2024).
Other Types of Context
Besides location, time, identity, and activity, other types of contextual information are incorporated in the systems. Alti et al. (2022) include hardware and network information as part of context, such as available communication protocols, CPU speed, battery power, and memory size. This information is used to ensure the efficient deployment of health services. Similarly, Zeshan et al. (2023) use battery level and device response time to determine which caregiver or clinician’s device to send notifications to. Hristoskova et al. (2014) incorporate media devices and their properties in their interpretation of context. For example, the screen size of devices such as mobile phones and tablets is used to determine how to display the health monitoring results. For small screens, the results are summarized. An important factor in health monitoring is the state of a person’s environment. Fifteen of the systems use weather data such as temperature and humidity from ambient sensors to provide additional context (Akhtar et al., 2022; Bampi et al., 2025; Chatterjee et al., 2021; Chiang & Liang, 2015; Elhadj et al., 2021; Fenza et al., 2012; Garcia-Moreno et al., 2023; Henaien et al., 2020; Khozouie et al., 2018; J. Kim et al., 2014; Kordestani et al., 2021; Martella et al., 2025; Rhayem et al., 2021; Titi et al., 2019; Vadillo et al., 2013; Zhou et al., 2022). Weather data sources such as forecasts and indices are used by J. Kim et al. (2014) to supplement sensor data, while the other systems (Akhtar et al., 2022; Khozouie et al., 2018; Martella et al., 2025; Vadillo et al., 2013; Zhou et al., 2022) include sensors to monitor air quality by checking the levels of different gases and/or inhalable particulate matter in the air. Contextual information can also include details about an individual’s diet, medication, and emotional state. These details are collected in the system proposed by De Brouwer et al. (2022) through self-reporting via a mobile app. Finally, Martella et al. (2025) and Zafeiropoulos et al. (2024) capture information about the user’s fatigue level.
The Role of SWTs
Ontologies appear to be particularly useful for the representation of contextual information, and the majority of the selected systems reuse existing ontologies to do so. For instance, OWL-Time, 30 an ontology that describes temporal properties of real-world objects such as sensors, is reused by a number of systems (De Brouwer et al., 2022; Mavropoulos et al., 2021; Rhayem et al., 2021; Titi et al., 2019; H. Q. Yu et al., 2017). Friend of a Friend (FOAF), 31 an ontology that describes people profiles, is also widely reused among the systems (Ammar et al., 2021; Elhadj et al., 2021; Garcia-Moreno et al., 2023; Henaien et al., 2020; Mavropoulos et al., 2021; Reda et al., 2022; Spoladore et al., 2021; Titi et al., 2019; H. Q. Yu et al., 2017). Additionally, sensor ontologies, while not focused solely on contextual information, also include some aspects of context. For example, SAREF and SSN/SOSA ontologies include timestamps for sensor observations. Knowledge graphs can also be used to represent contextual information, as is done by Ammar et al. (2021) and G. Yu et al. (2022), who use a knowledge graph to capture patient profile data from health records. Linked data are also useful for context awareness. For instance, Ammar et al. (2021) use linked open data to access global knowledge from the web, including data relating to social determinants of health. Additionally, Martella et al. (2025) mention the possibility of using linked open data as a source of external information, although this is not implemented in the system.
Table 7 summarizes the contextual information included in the systems and indicates which types of contextual information are captured using SWTs.
Summary of Contextual Information Captured in the Systems and Represented Using Semantic Web Technologies.
Note . SWT = semantic web technologies; L = location; T = time; I = identity; A = activity; O = other.
Summary of Contextual Information Captured in the Systems and Represented Using Semantic Web Technologies.
A critical consideration in health monitoring systems, particularly those that incorporate AI, is explainability. For the purposes of this study, we adopt the perspective that explainability is essentially equivalent to interpretability (Miller, 2019), which in turn can be defined as the degree to which a system’s operations can be understood by a human (Biran & Cotton, 2017). Explainability contributes significantly to the overall trustworthiness and adoption of systems in the health domain. Two complementary approaches to explainability in AI systems are prioritizing human understanding of generated outputs, and providing explicit explanations for those outputs (Miller, 2019). The former can be achieved by using techniques that are inherently intuitive and easily comprehensible (intrinsic explainability), or by explaining the workings of a model after it has been trained (post hoc explainability; Molnar, 2023). We examine both intrinsic and post hoc explainability below, followed by a consideration of explicit explanations and their quality.
Intrinsic Explainability
It can be argued that SWTs are inherently intuitive and can contribute to the development of explainable systems; we discuss this in Section 5.6.4. However, the use of these technologies does not guarantee the explainability of the system as a whole. Other technologies and techniques implemented within the systems, and the ways in which they are combined, can also play a role in either enhancing or hindering the overall explainability of the system. For example, rules are inherently easy to understand (Hagras, 2018), and nearly all the selected systems implement some form of rule-based reasoning using semantic web rule languages. However, in some cases, rules are implemented for one aspect of the system while less interpretable techniques are used for other components. An instance of this is the system proposed by Ali et al. (2020), where a deep learning model is used for disease prediction, while rule-based reasoning is applied for recommendation generation. This results in the decision support functionality being highly explainable, while the situation prediction component remains less so.
BNs can also be considered highly interpretable, as they can perform predictive and diagnostic reasoning (Derks & de Waal, 2020) in a way that can be visually interpreted due to their graphical structure (Kyrimi et al., 2021). Predictive reasoning is done by Mcheick et al. (2016), who use a BN to determine whether a person has a high risk of stroke based on risk factors. The reason for whether the risk is high or not can be traced back to the presence of risk factors. On the other hand, Kordestani et al. (2021) use a BN for diagnostic reasoning, allowing for the understanding of a kidney disease diagnosis based on its causes. Fuzzy logic represents another class of interpretable techniques, since it allows variables and their classifications to be presented in a way that is intuitive (Hagras, 2018). Fuzzy approaches are used in five of the selected systems; we discuss them in more detail in Section 5.7.1.
While ML is often criticized for its tendency to produce black box models, certain ML models are intrinsically interpretable, such as logistic or linear regression models and decision trees (Molnar, 2023; Petch et al., 2022). However, even such models can be rendered uninterpretable as their complexity and scale increase, as is the case with large decision trees (Petch et al., 2022). In cases where less interpretable ML models are used, they can be combined with SWTs to enhance explainability. This is closely related to neuro-symbolic AI, in which the strengths of neural networks and symbolic AI are combined to achieve the best of both worlds (Sarker et al., 2021). Zafeiropoulos et al. (2024) and Zhou et al. (2022) explore this hybrid approach, with both using a knowledge graph to provide training data for deep learning models. Additionally, Ali et al. (2021) use an ontology for feature extraction, providing some transparency into the selection of features for their BiLSTM model.
Post hoc Explainability
The use of inherently interpretable models should be prioritized in high-stakes domains (Rudin, 2019). Nonetheless, explaining the workings of a model after it has been trained is a well-established approach to explainability (Molnar, 2023). Two well-known post hoc explainability techniques are Local Interpretable Model-agnostic Explanations (LIMEs; Ribeiro et al., 2016) and SHapley Additive exPlanations (SHAP; Lundberg & Lee, 2017). LIME approximates a black box model with an interpretable surrogate model that is then used to explain the original model’s predictions. SHAP, by contrast, uses Shapley values from coalitional game theory to explain individual predictions by quantifying how much each feature contributes to the outcome. Beyond these methods, plotting techniques can also be used to visualize feature effects. They include partial dependence plots, which show the marginal effects of one or two features on predictions; accumulated local effects plots, which provide more accurate feature effects by accounting for feature correlations; and individual conditional expectation plots, which show how an individual prediction changes when a feature changes (Molnar, 2023). However, such post hoc explainability methods are not mentioned in any of the selected systems.
Explanation Quality
Once the reasoning behind the system’s outputs has been determined, the next step is to communicate this to the users of the system through explicit explanations. An explanation can be defined as the answer to a why-question (Miller, 2019), such as “ They are They are They They are They are
The target audience is also an important consideration in the communication of explanations, since different audiences may value distinct aspects based on their role and perspective (Barredo Arrieta et al., 2020). Health monitoring systems should therefore tailor explanations to the needs of various audiences, such as monitored individuals, clinicians, and caregivers. For certain audiences, such as regulatory stakeholders, it may be more appropriate to include explanations as part of the overall system documentation. Only eight of the selected systems report or indicate that some form of explanation is made available to users (Akhtar et al., 2022; Chiang & Liang, 2015; Elhadj et al., 2021; Martella et al., 2025; Mavropoulos et al., 2021; Rhayem et al., 2021; Villarreal et al., 2014; G. Yu et al., 2022). This is typically through a user interface, messaging platform, or, as in the system proposed by Akhtar et al. (2022), as part of system logs. Chatbots and virtual agents, as implemented by Mavropoulos et al. (2021) and G. Yu et al. (2022), can also provide explanations, since they are designed to answer queries from users.
The Role of SWTs
Domain knowledge can enhance explainability (Tocchetti & Brambilla, 2022), and SWTs excel at structuring such shared knowledge formally (Confalonieri & Guizzardi, 2025). Ontologies can contribute to the development of explainable systems from three perspectives: by providing sound and explicit knowledge reference models; by supporting common-sense reasoning through the representation of context-aware semantic information; and by facilitating flexible knowledge abstraction and refinement (Confalonieri & Guizzardi, 2025). Since most of the systems use ontologies to represent expert knowledge and contextual information, this can be seen as a first step toward achieving explainability. Moreover, many ontology reasoners, such as HermiT (Shearer et al., 2008) and Pellet (Sirin et al., 2007), provide justifications for their inferences. These justifications can be generated into human-readable explanations, which in turn can be presented to system users (Van Woensel et al., 2024). Although 11 systems (Ali et al., 2018; Chatterjee et al., 2021; Elhadj et al., 2021; El-Sappagh et al., 2019; Esposito et al., 2018; Hooda & Rani, 2020; Hristoskova et al., 2014; Khozouie et al., 2018; Titi et al., 2019; Vadillo et al., 2013; Zafeiropoulos et al., 2024) mention using ontology reasoners, none of them indicate that the reasoner justifications are presented to users as explanations. Beyond reasoners, explanations themselves can be represented using ontologies. For instance, the Explanation Ontology (Chari et al., 2024) is a general-purpose ontology that connects explanations to underlying data and knowledge. Similarly, the Evidence and Conclusion Ontology (Nadendla et al., 2022) captures evidence to support annotations and assertions in the biomedical domain, which can be used to generate explanations. Yet, none of the reviewed systems formalize explanations in this way.
Knowledge graphs can also contribute to better understanding of system outputs in several ways, including providing a graph-based visualization of concepts, entity and relation extraction from unstructured data, enrichment of datasets, and inference and reasoning (Rajabi & Kafaie, 2022). Finally, although the connection between linked data and explainability is not direct, the open accessibility and interconnection of knowledge shared using a linked data approach can nonetheless contribute to explainability.
Uncertainty Handling
Given the uncertainty inherent in health decision-making as well as the high likelihood of ambiguity, noise, and missing values in sensor observations, health monitoring systems are greatly enhanced by being able to handle uncertainty. Despite this, only 22 of the systems addressed some aspect of this (Akhtar et al., 2022; Ali et al., 2018, 2020, 2021; Chiang & Liang, 2015; Esposito et al., 2018; Fenza et al., 2012; Garcia-Moreno et al., 2023; Garcia-Valverde et al., 2014; Hooda & Rani, 2020; Hristoskova et al., 2014; Hussain & Park, 2021; Kilintzis et al., 2019; Kordestani et al., 2021; Martella et al., 2025; Mcheick et al., 2016; Mezghani et al., 2015; Minutolo et al., 2016; Reda et al., 2022; Rhayem et al., 2021; Titi et al., 2019; Zhou et al., 2022). The approaches used to handle uncertainty are summarized in Table 8 and discussed in detail in the remainder of this subsection.
Approaches to Handle Uncertainty and the Systems That Use Them.
Approaches to Handle Uncertainty and the Systems That Use Them.
Important Aspects Related to the Seven Key Challenges.
Fuzzy logic is a widely used technique for representing ambiguity and vagueness in sensor data (Khaleghi et al., 2013). It is used by five of the systems, making it the most commonly implemented uncertainty handling approach among the systems, besides the preprocessing of sensor data. In fuzzy logic, the truth of a statement is not binary (i.e., either true or false), but can rather be represented in a range from false to true. Therefore, rather than having crisp thresholds for different categories, fuzzy logic allows for values with different degrees of membership for the different categories. The process of converting crisp inputs into fuzzy sets is called fuzzification. For example, heart rate is represented in beats per minute, which can be classified into crisp categories. Generally, a heart rate greater than 100 beats per minute can be categorized as “fast,” a heart rate between 60 and 100 beats per minute can be categorized as “normal,” and a heart rate below 60 beats per minute can be categorized as “slow” (Bennett, 2013). However, with fuzzy logic, any given heart rate value has a certain degree of membership to any of the categories. For instance, a heart rate of 80 beats per minute may have a high degree of membership to the “normal” category (e.g., 75%), a lower degree of membership to the “fast” category (e.g., 20%), and an even lower degree of membership to the “slow” category (e.g., 5%). Fuzzy logic provides a better approach to dealing with boundary conditions. For example, when the heart rate is either 100 or 101, it can be reflected as mostly normal and, to a lesser degree, fast. Both Ali et al. (2018) and Chiang and Liang (2015) fuzzify sensor data such as blood pressure and heart rate, as well as attributes such as age and weight. Similarly, Esposito et al. (2018) fuzzify the intensity of physical activity, which provides important context for heart rate thresholds. Fenza et al. (2012) incorporate fuzzy logic with rules to determine the degree of membership to different situation categories based on different combinations of vital signs, while Minutolo et al. (2016) use hybrid rules that incorporate both crisp and fuzzy variables. Fuzzy logic provides a simple but effective mechanism for representing imprecision and vagueness in sensor observations and allows this to be taken into account for more effective situation detection.
Bayesian Networks (BNs)
BNs are well known for modeling uncertainty and have been widely used in the health domain (Kyrimi et al., 2021). Kordestani et al. (2021) use a BN for probabilistic diagnosis of acute kidney injury. The BN models immediate (short-term) and background (long-term) causes of acute kidney injury, as well as its symptoms. They used experts to determine the conditional probabilities of the presence of acute kidney injury given these variables. Similarly, Mcheick et al. (2016) represent risk factors for stroke using a BN.
Nonmonotonic Reasoning
Monotonic reasoning holds that the rejection of an earlier conclusion must only be done if the evidence for the conclusion is also rejected. Contrastingly, nonmonotonic reasoning holds that an earlier conclusion can be rejected based on new evidence, even when earlier evidence was valid (Nute, 2003). This ability to revise conclusions in the face of new evidence is useful in handling uncertainty. Defeasible logic is an example of nonmonotonic reasoning in which there are three kinds of rules: strict rules, which can never have exceptions, defeasible rules, which are typically true but can have exceptions, and undercutting defeaters, which are weak possibilities (Nute, 2003). Akhtar et al. (2022) use defeasible logic to handle inconsistencies in sensor data as well as patient information. Another type of nonmonotonic reasoning is answer set programming (ASP), which is used by Kordestani et al. (2021) to automatically customize treatments for each patient. They combine ASP with probability to reason with uncertain knowledge regarding treatment. Using probabilistic ASP rules, their proposed system obtains all possible treatment options for a medical episode and the associated probability of the episode occurring. If the probability of the episode occurring decreases with a particular treatment, then the treatment’s award value is increased. The treatment with the highest award value is ultimately selected by the system.
Probabilistic Risk Classification
Hristoskova et al. (2014) account for uncertainty in situation prediction by classifying patients into risk stages based on the four-year probability of congestive heart failure. They use probabilistic rules defined using SWRL to analyze risk factors and classify individuals based on the Framingham Risk Score.
Preprocessing Sensor Data
Uncertainty can stem from various factors in sensor data, including ambiguous or imprecise readings, noise, or missing values caused by sensor malfunctions or network failures (Gravina et al., 2017; Khaleghi et al., 2013). Sixteen systems have addressed the issue of missing or invalid values in sensor data. Ali et al. (2020, 2021) replace them with mean and median values from existing data, while Hooda and Rani (2020) replace them using the preceding value. Rhayem et al. (2021) take the approach of removing any missing or unusual values, for example, those outside the device measurement ranges. Similarly, Kilintzis et al. (2019), Martella et al. (2025), Titi et al. (2019), and Reda et al. (2022) use rules to check whether sensor data falls within the expected minimum and maximum bounds. In their proposed system, Hussain and Park (2021) use the Pan-Tompkins algorithm to detect the QRS complex in the ECG. This identifies beats without a QRS complex, which may be premature, missing, or ectopic, and are subsequently eliminated. While Garcia-Moreno et al. (2023) mention missing value imputation as part of their ML pipeline, they do not specify a technique for this. To deal with noisy data, three systems use filters to improve signal quality. Ali et al. (2020, 2021) use a Kalman filter to remove noise, while Garcia-Valverde et al. (2014) use a moving average filter for the same purpose.
The Role of SWTs
As discussed in the previous challenges, SWTs provide limited inherent support for uncertainty handling, but this can be mitigated through the techniques discussed in this section. Several extensions for semantic web standards have been proposed in the literature that make use of fuzzy logic and Bayesian inference, such as BayesOWL (Ding et al., 2006) and Bayes-SWRL (Liu et al., 2013), probabilistic extensions for OWL and SWRL, respectively, as well as fuzzyDL (Bobillo & Straccia, 2016), a fuzzy ontology reasoner. However, none of the selected systems reports using any such extensions, opting instead to define custom solutions. Another way that SWTs can support uncertainty handling is through using ontology reasoners for inconsistency detection. Missing values or otherwise invalid data can be detected through ontology reasoners based on specified and inferred axioms; the use of these tools is discussed in greater detail in Section 6. Additionally, rule-based reasoning can be combined with ontologies to detect invalid data, an approach used by five of the selected systems (Kilintzis et al., 2019; Martella et al., 2025; Reda et al., 2022; Rhayem et al., 2021; Titi et al., 2019). Finally, SWTs can indirectly support uncertainty handling by representing data quality properties that affect uncertainty. For instance, Garcia-Moreno et al. (2023) model quality properties of sensor data using an ontology, including correctness and completeness.
Other Challenges
While we consider the seven challenges discussed above to be particularly salient in sensor-based personal health monitoring systems, we acknowledge that there are other factors that such systems must take into account. This subsection briefly discusses a few of them. As an in-depth analysis of these additional challenges is outside the scope of this study, we also include references to relevant articles that interested readers can consult.
Security and Privacy
As health-related information is highly sensitive, security and privacy are important to consider. Particular aspects of this include security of data storage, network and transmission security, user authentication and access control, consent management, and the use of privacy-preserving techniques such as federated learning. For insights on security and privacy, we direct interested readers to the following articles: Rasool et al. (2022) review security and privacy in the context of the Internet of Medical Things; Thapa and Camtepe (2021) explore security and privacy challenges and techniques for health data in general; and finally, Kirrane et al. (2018) provide an overview of security and privacy issues that relate to SWTs.
Usability
Usability is another factor that health monitoring systems should consider, and can broadly be defined as the ease of use of a system. It is a multi-faceted concept with several contributing factors, including understandability (which is closely related to explainability), attractiveness, and overall user satisfaction (Saeed et al., 2020). Interested readers can refer to the following articles for more information: Saeed et al. (2020) explore pertinent usability issues in health monitoring systems and identify possible solutions. With regard to the evaluation of usability, Maramba et al. (2019) identify current methods used in usability testing in health monitoring applications, while Cho et al. (2018) present a usability evaluation framework for mobile health applications. Finally, considering the usability of the sensors themselves, the reviews by Cusack et al. (2024), Dias and Cunha (2018), and Andreu-Perez et al. (2015) highlight the types and characteristics of wearable sensors available for health monitoring.
Scalability
Scalability generally refers to the ability of a system to handle increased workload (Weinstock & Goodenough, 2006). In the context of sensor-based health monitoring, this workload could arise from an increased number of sensors, other data sources, users, and services provided by the system. For further reading on scalability in IoT applications, readers can consult the review by Fortino et al. (2021). The reviews by Albahri et al. (2018) and Philip et al. (2021) discuss scalability in the context of IoT and healthcare.
Ethics and Regulatory Compliance
The high-stakes nature of the health domain necessitates careful consideration of ethical issues. Although there are many benefits of technology-enabled personal health monitoring, there are also potential harms that it exposes. Many of these ethical issues overlap with the challenges already discussed, such as situation detection and situation prediction (how accurate and reliable are the detected and predicted situations?), decision support (how appropriate are the suggested recommendations and how much autonomy do system users have?), explainability (to what extent can system outputs be understood?), and security and privacy (how secure is user data and how is consent managed?). An additional ethical concern is the cascade of care, a phenomenon in which incidental findings from screenings or monitoring result in further clinical care. This may cause undue anxiety and psychological harm to the monitored individual, while also potentially leading to costly or invasive diagnostic procedures. Some of these ethical considerations can be enforced through regulation. Readers seeking further exploration on this challenge may consult the following articles: Morley et al. (2020) comprehensively map the ethics of AI in healthcare; Kwan et al. (2017) highlight the ethical issues associated with health monitoring applications; Nittari et al. (2020) review ethical and legal challenges in telemedicine more broadly; Hassanaly and Dufour (2021) explore the regulation of mobile health applications in the United States, the European Union, and France; and Thapa and Camtepe (2021) explore legality and regulatory compliance from the perspective of security and privacy.
Challenges Assessment
In this subsection, we conduct an assessment of each system based on the seven key challenges. Table 9 summarizes the different aspects related to the challenges. While we consider these aspects to be highly important for achieving effective sensor-based personal health monitoring, it is possible that some of them may exceed the requirements for specific health conditions or applications, and may therefore not be essential in those particular cases. To assess the degree to which each system tackles the seven challenges, we use a rating scheme based on the identified aspects. A score of 1 point is assigned to each system for every aspect that is addressed. All challenges are equally weighted. We use a four-point rating scale as follows: ✗: None of the aspects are addressed by the system.
Table 10 shows the number of systems with a particular rating for each challenge, while Figure 4 presents a radar chart summarizing the ratings of all 48 systems across the seven challenges. The chart provides a high-level, bird’s-eye view of how well current systems address each challenge, making it easy to identify areas that are generally well covered (e.g., context awareness) as well as those that remain comparatively under-addressed (e.g., uncertainty handling). For more fine-grained analysis, separate radar charts for each individual system are also provided, 32 and the individual system ratings are shown in Table 18 in the Appendix.

Combined radar chart providing a high-level visualization of how well the current state of the art collectively addresses each challenge.
Counts of Number of Systems With Each Rating Across the Seven Challenges.
This section has provided an in-depth analysis of seven key challenges that must be addressed in health monitoring systems, as well as a brief discussion of additional challenges that are important in such systems. The role played by SWTs in overcoming the seven key challenges has been critically examined. Additionally, non-semantic techniques that are incorporated in the systems have also been discussed. A full list of reused semantic resources, systems that reuse them, and the challenges they address can be found in Table 17 in the Appendix. These resources include ontologies, knowledge graphs, and linked data, as well as vocabularies, taxonomies, and classifications. To summarize the section, we discuss the results of the assessment, highlighting the challenges that are most neglected among the systems.
Based on the challenges assessment described in Section 5.9, it is evident that more work is needed to address situation prediction, uncertainty handling, explainability, and to a lesser extent, interoperability and decision support. Situation prediction stands out as the most neglected challenge, with 36 of the 48 systems failing to address it altogether. This is closely followed by uncertainty handling, which 26 systems do not address at all. Although all of the selected systems have some degree of explainability through their use of inherently interpretable SWTs, 40 of them have a low rating for this challenge, with the remaining eight having a medium rating. Notably, none of the selected systems achieve a high rating for situation prediction, uncertainty handling, or explainability. Interoperability is not very well addressed among the systems, with half of them (24) scoring a low rating for this challenge, although two systems achieve a high score. Similarly, only three systems achieve a high score for decision support. There is a nearly even split between low and medium ratings, with 22 and 20 systems, respectively, while three systems do not provide decision support at all. Context awareness is the best addressed challenge among the systems, with 22 achieving a high score, 23 achieving a medium score, and the remaining three attaining a low score. Situation detection is also fairly well addressed, with nine high scores, 24 medium scores, and 15 low scores.
System Quality
In this section, we examine the quality of the selected systems as reported in the respective research articles. We consider four main criteria: the data sources and devices used to collect the data; the development methodologies and tools used; the evaluation approaches and rigor; and finally, the accessibility of research outputs. These factors all contribute to the credibility, reliability, and reproducibility of the reported systems. Additionally, this assessment can also be used to inform benchmarking for future research and development of such systems. We begin with a discussion of the methods of data collection, sources of data, and sensors reported in the systems. We then review the methodologies and tools used for the development of the different components of the system, including programming languages, libraries, frameworks, and other software. Next, we examine the evaluation approaches used to evaluate the system components and the systems as a whole. The last criterion we discuss is the accessibility of the resources and outputs of each system, including ontologies, data, code, and even user interfaces. We conclude by outlining the different aspects related to each criterion and then scoring the selected systems based on these aspects.
Data and Devices
The data collection methodology varies among the systems. Fourteen systems used existing datasets from publicly available repositories such as PhysioNet 33 and the University of California, Irvine (UCI) ML repository. 34 These systems and the datasets they use are summarized in Table 11.
Existing Health Datasets Used.
Existing Health Datasets Used.
Ten systems (Ali et al., 2018; De Brouwer et al., 2022; Esposito et al., 2018; Garcia-Moreno et al., 2023; Hristoskova et al., 2014; Hussain & Park, 2021; Kilintzis et al., 2019; Stavropoulos et al., 2021; Vadillo et al., 2013; Villarreal et al., 2014) used data collected from participants rather than existing data. For example, Ali et al. (2018) collected data from 44 diabetes patients, while Esposito et al. (2018) collected data from 10 healthy volunteers. Another approach was to simulate or manually generate the data. This was done by Alti et al. (2022) who simulated temperature and camera data; Bampi et al. (2025) who simulated time-varying sensor data; Chatterjee et al. (2021), who simulated the sensor, interview, and questionnaire data of four dummy participants; Martella et al. (2025), who used a script to simulate sensor-generated data streams; and Zafeiropoulos et al. (2024), who simulated the sensor observations and health records of three virtual patients. Mcheick et al. (2016) similarly generated 513 data records. Additionally, Stavropoulos et al. (2021) simulated records in order to test the scalability of their proposed system. Eighteen systems (Akhtar et al., 2022; Chiang & Liang, 2015; Elhadj et al., 2021; El-Sappagh et al., 2019; Fenza et al., 2012; Ivascu et al., 2015; Khozouie et al., 2018; Lopes de Souza et al., 2023; Mezghani et al., 2015; Minutolo et al., 2016; Reda et al., 2022; Spoladore et al., 2021; Titi et al., 2019; Vadillo et al., 2013; Xu et al., 2017; H. Q. Yu et al., 2017; Zhang et al., 2014; Zhou et al., 2022) indicated the types of data and sensors supported by the systems, but did not mention the source of the data. It is unclear whether these systems were validated using actual sensor data, beyond a theoretical validation of the system functionality. Only 19 of the systems gave specific details of the devices used for data collection. Table 12 indicates these systems and the types and descriptions of devices mentioned. While most of these are commercially available devices, the systems proposed by J. Kim et al. (2014) and Lopes de Souza et al. (2023) used custom-made prototypes.
Devices Mentioned for Data Collection in the Selected Systems.
Development Methodologies
The use of a development methodology can streamline the process of developing SWTs. In particular, the literature on ontology development methodologies is quite rich, with a large number of established methodologies proposed (Fernández-López & Gómez-Pérez, 2002; Iqbal et al., 2013). There have also been several proposed approaches toward developing knowledge graphs (Cimiano & Paulheim, 2017) and ensuring the quality of linked data (Debattista et al., 2016; Kontokostas et al., 2014). Despite this, only five systems specified an existing methodology for SWTs, with all five methodologies being ontology-focused. Hadjadj and Halimi (2021) used the NeOn framework (Suárez-Figueroa et al., 2015), a scenario-based methodology for building ontologies, while Zafeiropoulos et al. (2024) used the Human-Centered Ontology engineering MEthodology (HCOME; Kotis & Vouros, 2006). Titi et al. (2019) used an existing case-based ontology engineering methodology (El-Sappagh et al., 2014). Similarly, Kilintzis et al. (2019) followed four of the ontology construction steps recommended by Spear (2006). Although not a development methodology, Peral et al. (2018) used the SemanTic Refinement of Ontology MAppings (STROMA; Arnold & Rahm, 2014) approach for aligning corresponding concepts between different ontologies.
Additionally, two systems reported the use of existing methodologies for overall system development. Garcia-Moreno et al. (2023) used the design science research methodology (Hevner & Chatterjee, 2010), while Vadillo et al. (2013) used CommonKADS (Knowledge Acquisition and Development Systems; (Kingston, 1998)), a knowledge-based system design methodology, and its extension for multi-agent systems, MAS-CommonKADS (Iglesias et al., 1998). Overall, only 16 of the systems reported or described the use of a systematic methodology, whether existing or novel, in the development of either the SWTs, other system components, or the system as a whole.
Development Tools
Various programming languages, frameworks, and libraries were used to develop the systems. When it comes to rule languages, SWRL is the most commonly used, with 24 systems either explicitly mentioning it or demonstrating SWRL syntax in code snippets (Akhtar et al., 2022; Ali et al., 2020, 2018; Alti et al., 2022; Bampi et al., 2025; Chatterjee et al., 2021; Elhadj et al., 2021; El-Sappagh et al., 2019; Esposito et al., 2018; Hadjadj & Halimi, 2021; Henaien et al., 2020; Hooda & Rani, 2020; Hristoskova et al., 2014; Lopes de Souza et al., 2023; Mezghani et al., 2015; Minutolo et al., 2016; Reda et al., 2022; Rhayem et al., 2021; Spoladore et al., 2021; Titi et al., 2019; H. Q. Yu et al., 2017; Zafeiropoulos et al., 2024; Zeshan et al., 2023; Zhang et al., 2014). Apache Jena includes a general purpose rule-based reasoner which is used by three systems (Chiang & Liang, 2015; Garcia-Valverde et al., 2014; J. Kim et al., 2014). Other semantic web-based rule languages include SHACL, used by Stavropoulos et al. (2021) and SPIN, used by Kilintzis et al. (2019). Although Mavropoulos et al. (2021) mention using OWL to create rules, this approach is inherently limited for situation analysis and decision support because OWL lacks the expressivity for if–then rules. The authors indicate that they will implement more complex reasoning rules in future work.
Links to System Outputs as Shared by Researchers.
Links to System Outputs as Shared by Researchers.
Programming languages can also be used to configure rules, as is done by Khozouie et al. (2018) using Java. Similarly, Fenza et al. (2012) use MATLAB and Fuzzy Control Language to define their fuzzy rules, while Kordestani et al. (2021) use Drools, 35 a business rule management system. Therefore, among the 41 systems that implement rules, 25 systems use semantic web-based rule languages, while three use non-semantic web languages. Eight systems do not mention or indicate a specific formal language in which rules are defined (Ali et al., 2021; Ammar et al., 2021; Hussain & Park, 2021; Ivascu et al., 2015; Ivaşcu & Negru, 2021; Martella et al., 2025; Peral et al., 2018; G. Yu et al., 2022). For queries, 26 of the systems use SPARQL, with five also using Apache Jena Fuseki, a SPARQL server, to publish their SPARQL endpoints.
Among the systems that incorporate ontologies, Protégé 36 is most commonly cited as the ontology development platform of choice, used in 27 of the systems (Akhtar et al., 2022; Ali et al., 2018, 2020, 2021; Alti et al., 2022; Bampi et al., 2025; Chatterjee et al., 2021; Chiang & Liang, 2015; De Brouwer et al., 2022; Elhadj et al., 2021; El-Sappagh et al., 2019; Esposito et al., 2018; Hadjadj & Halimi, 2021; Henaien et al., 2020; Hooda & Rani, 2020; Hussain & Park, 2021; Ivascu et al., 2015; Ivaşcu & Negru, 2021; Khozouie et al., 2018; J. Kim et al., 2014; Martella et al., 2025; Spoladore et al., 2021; Titi et al., 2019; Vadillo et al., 2013; Zafeiropoulos et al., 2024; Zeshan et al., 2023; Zhang et al., 2014). Protégé is an ontology editor that supports the latest OWL and RDF specifications. Another commonly used platform is Apache Jena, 37 a Java framework for building semantic web and linked data applications, used in 18 of the systems (Ali et al., 2018; Alti et al., 2022; Bampi et al., 2025; Chatterjee et al., 2021; Chiang & Liang, 2015; De Brouwer et al., 2022; Elhadj et al., 2021; Garcia-Valverde et al., 2014; Hadjadj & Halimi, 2021; Hooda & Rani, 2020; Ivascu et al., 2015; Ivaşcu & Negru, 2021; J. Kim et al., 2014; Rhayem et al., 2021; Titi et al., 2019; Vadillo et al., 2013; H. Q. Yu et al., 2017; Zeshan et al., 2023). Twenty-seven systems report using OWL (Ali et al., 2018, 2020, 2021; Ammar et al., 2021; Chatterjee et al., 2021; Elhadj et al., 2021; El-Sappagh et al., 2019; Esposito et al., 2018; Fenza et al., 2012; Hadjadj & Halimi, 2021; Hooda & Rani, 2020; Hristoskova et al., 2014; Khozouie et al., 2018; Kilintzis et al., 2019; J. Kim et al., 2014; Mavropoulos et al., 2021; Minutolo et al., 2016; Reda et al., 2022; Rhayem et al., 2021; Spoladore et al., 2021; Stavropoulos et al., 2021; Titi et al., 2019; Vadillo et al., 2013; H. Q. Yu et al., 2017; Zafeiropoulos et al., 2024; Zeshan et al., 2023; Zhang et al., 2014), while 16 report using RDF (Ammar et al., 2021; Bampi et al., 2025; Chatterjee et al., 2021; Chiang & Liang, 2015; De Brouwer et al., 2022; Hadjadj & Halimi, 2021; Hooda & Rani, 2020; Hussain & Park, 2021; Kilintzis et al., 2019; Lopes de Souza et al., 2023; Mezghani et al., 2015; Reda et al., 2022; Titi et al., 2019; H. Q. Yu et al., 2017; Zafeiropoulos et al., 2024; Zhang et al., 2014). Ammar et al. (2021) use Solid (Social Linked Data), 38 a platform for developing decentralized linked data applications with the goal of enhanced privacy and data ownership. They also mention the JavaScript-based LDflex 39 and the Python-based RDFlib 40 for linked data manipulation and querying, the latter of which is also used by Zafeiropoulos et al. (2024). These platforms and libraries are all free and open source.
For storage, Mavropoulos et al. (2021) and Stavropoulos et al. (2021) use GraphDB, while Spoladore et al. (2021) use Stardog. 41 Both of these are enterprise semantic databases. Non-semantic database management systems are also used by 11 of the systems, with SQL-based systems being more popular than NoSQL alternatives. Four systems use MySQL 42 (Alti et al., 2022; Chiang & Liang, 2015; Titi et al., 2019; Villarreal et al., 2014), three use SQLite 43 (El-Sappagh et al., 2019; Esposito et al., 2018; Lopes de Souza et al., 2023), and one uses PostgreSQL 44 (Martella et al., 2025), while three systems use the NoSQL database system MongoDB 45 (De Brouwer et al., 2022; Elhadj et al., 2021; Martella et al., 2025). A summary of all the development tools used in the selected systems is included in Table 19 in the Appendix.
A variety of evaluation approaches are used by the selected systems. The most common approach is case-based evaluation. Twenty-one systems were evaluated through use case scenarios, which generally describe the sequence of events when a user interacts with the system (Ammar et al., 2021; Bampi et al., 2025; Chiang & Liang, 2015; De Brouwer et al., 2022; Elhadj et al., 2021; El-Sappagh et al., 2019; Garcia-Valverde et al., 2014; Hadjadj & Halimi, 2021; Ivascu et al., 2015; Khozouie et al., 2018; Kordestani et al., 2021; Mcheick et al., 2016; Mezghani et al., 2015; Spoladore et al., 2021; Stavropoulos et al., 2021; Vadillo et al., 2013; G. Yu et al., 2022; Zafeiropoulos et al., 2024; Zeshan et al., 2023; Zhang et al., 2014; Zhou et al., 2022). Ten systems were evaluated using case studies, which are similar to use case scenarios but are more extensive and detailed (Akhtar et al., 2022; Alti et al., 2022; Esposito et al., 2018; Fenza et al., 2012; Henaien et al., 2020; Martella et al., 2025; Minutolo et al., 2016; Peral et al., 2018; Villarreal et al., 2014; Xu et al., 2017). Twelve systems were evaluated by running user studies with real users (Ali et al., 2018; De Brouwer et al., 2022; Esposito et al., 2018; Garcia-Moreno et al., 2023; Hristoskova et al., 2014; Hussain & Park, 2021; Kilintzis et al., 2019; J. Kim et al., 2014; Mavropoulos et al., 2021; Stavropoulos et al., 2021; Villarreal et al., 2014; G. Yu et al., 2022). Among these systems, three used Likert scales to measure user feedback (J. Kim et al., 2014; Mavropoulos et al., 2021; Stavropoulos et al., 2021). Additionally, 16 systems either compared their systems with existing ones, showing how their results performed against the state of the art using a set criteria, or described had their systems evaluated by experts (Ali et al., 2018, 2020, 2021; Ammar et al., 2021; Elhadj et al., 2021; El-Sappagh et al., 2019; Fenza et al., 2012; Hadjadj & Halimi, 2021; Hussain & Park, 2021; J. Kim et al., 2014; Kordestani et al., 2021; Mavropoulos et al., 2021; Rhayem et al., 2021; Villarreal et al., 2014; Xu et al., 2017; G. Yu et al., 2022).
Twenty-one systems were evaluated based on non-functional requirements. For example, Esposito et al. (2018) and Vadillo et al. (2013) used the architecture-level modifiability analysis (ALMA) method to evaluate the potential costs associated with modifying their systems, such as by adding more sensors. Similarly, Alti et al. (2022) evaluated their system based on execution time, optimality, application’s lifetime, and number of discovered services. Domain-specific quality metrics were also considered. G. Yu et al. (2022) evaluated their system using the chronic care model (CCM), an established framework for chronic care management that includes criteria such as system design, self-management support, and decision support.
Additionally, five systems used simulation as a means to investigate the system functionality. For example, Akhtar et al. (2022) used Netlogo, a multi-agent modeling platform, to simulate the use of their system. Chiang and Liang (2015) used a fuzzy logic simulation tool to validate their fuzzy inference module. Ivaşcu and Negru (2021) simulated the system functionality by using each subject in the dataset as the target user, while Martella et al. (2025) used a testbed environment to simulate user scenarios and test how well the system responds to varying workloads. Finally, Reda et al. (2022) used a web portal with sample data for testing purposes. Expert validation was also used to evaluate the systems, with the aim of ensuring maximum similarity between the system output and expert opinion. This approach was taken by Ali et al. (2020), El-Sappagh et al. (2019), Hadjadj and Halimi (2021), Hristoskova et al. (2014), and Khozouie et al. (2018). Additionally, 10 systems used query-based validation, where the system is validated by checking the answers to SPARQL queries (Ali et al., 2018; Chatterjee et al., 2021; De Brouwer et al., 2022; El-Sappagh et al., 2019; Hadjadj & Halimi, 2021; Kilintzis et al., 2019; J. Kim et al., 2014; Titi et al., 2019; H. Q. Yu et al., 2017; Zafeiropoulos et al., 2024).
In addition to the overall system, the system components were also evaluated. Inconsistencies in ontologies can be detected using ontology reasoners, which check whether there are contradictions in class hierarchies or class instances (Ye et al., 2011). Reasoners were used in 11 of the systems to evaluate the structural consistency of ontologies. Ten systems used Pellet (Ali et al., 2018; Elhadj et al., 2021; El-Sappagh et al., 2019; Esposito et al., 2018; Hooda & Rani, 2020; Hristoskova et al., 2014; Khozouie et al., 2018; Titi et al., 2019; Vadillo et al., 2013; Zafeiropoulos et al., 2024), one system used HermiT (Chatterjee et al., 2021), and one system reported using both Pellet and HermiT (El-Sappagh et al., 2019). Additionally, three systems used ontology evaluation frameworks, namely OntOlogy Pitfall Scanner! (Poveda-Villalón et al., 2014), which was used by El-Sappagh et al. (2019) and Zafeiropoulos et al. (2024), and OQuaRE (Duque-Ramos et al., 2011), which was used by Rhayem et al. (2021). Two systems also evaluated the effect of different components within the same system through ablation studies. Ali et al. (2021) tested the performance of their BiLSTM model for classifying healthcare data while using an ontology and without using an ontology. The results showed an increase in the accuracy of the model when combined with an ontology. Similarly, Ali et al. (2020) compared the performance of their proposed ensemble deep learning model with and without feature selection. Finally, 13 systems, mainly those that implement ML, used well-known metrics such as accuracy, precision, recall, F-score, and mean square error (Ali et al., 2018, 2020, 2021; Alti et al., 2022; Garcia-Moreno et al., 2023; Garcia-Valverde et al., 2014; Hussain & Park, 2021; Ivaşcu & Negru, 2021; J. Kim et al., 2014; Mavropoulos et al., 2021; Rhayem et al., 2021; G. Yu et al., 2022; Zafeiropoulos et al., 2024; Zeshan et al., 2023).
Accessibility of Research Outputs
The sharing of research outputs, such as code, ontologies, knowledge graphs, and data, is a critical aspect of ensuring research is reproducible and verifiable. These resources can also be built upon by other researchers, contributing to their reuse for more efficient system development. This is severely neglected among the selected systems, with only six articles including links to their research outputs. Among them are Chatterjee et al. (2021), who include their OWL ontology, simulated data, propositional variables, rule base, and queries as multimedia appendices. Using platforms such as GitHub rather than static files has the advantage of version control, allowing researchers to manage future updates and revisions. Three systems take this approach. The system proposed by Bampi et al. (2025) is available on GitHub, including the backend, frontend, ontology, rules, and queries. Similarly, Zafeiropoulos et al. (2024) make their proposed ontology, queries, rules, code, and even research papers available via a GitHub repository. De Brouwer et al. (2022) include a link to a GitHub repository associated with the Data Analytics for Health and Connected Care ontology, which their proposed ontology extends and was developed by their research group. However, the new mBrain ontology reported in the article is not made available. In contrast, the ontology proposed by El-Sappagh et al. (2019) has been published on Bioportal, 46 a popular repository of biomedical ontologies. However, no other research outputs, such as rules and queries, are made available.
An important consideration when sharing system resources is ensuring their long-term accessibility. For instance, although Kilintzis et al. (2019) and Reda et al. (2022) provide links to their systems, the web pages were unavailable at the time of writing this article. Kilintzis et al. (2019) stated that the link made their ontology available either for reuse or review purposes, while the link from Reda et al. (2022) was to a portal with a video tutorial and sample datasets for testing purposes. As these resources are now inaccessible, we are unable to ascertain if they were ever operational and for how long they may have been active. Additionally, while GitHub repositories allow for easy accessibility and future updates, they can also be deleted or made private. A good alternative is Zenodo, 47 a general-purpose research repository designed for long-term preservation by ensuring that published records can be updated but not unpublished. Further, Zenodo provides persistent digital object identifiers for each upload, ensuring easy referencing.
Alti et al. (2022) and Garcia-Moreno et al. (2023) include a note that the data associated with their studies are available upon request. However, this is a suboptimal approach as it is impossible to guarantee the authors’ ability or willingness to consistently respond to such requests over time, potentially leading to prolonged delays or even a lack of response. Publishers can mitigate this through well-defined data availability policies. Finally, we note that researchers may be restricted from sharing participant data due to privacy concerns. A potential solution would be to seek participants’ consent in sharing their data anonymized and non-identifiable form. Table 13 indicates the links shared by researchers.
Important Aspects Related to the Quality Evaluation Criteria.
Important Aspects Related to the Quality Evaluation Criteria.
Mirroring our assessment of how well the systems tackle the key challenges in Section 5.9, we have also evaluated the quality of the systems as reported in the corresponding research articles. We base our evaluation on the aspects of the quality criteria which are summarized in Table 14, and use the same four-point rating scale (✗, Low, Medium, and High) determined by the percentage of aspects that each system has met. The quality ratings for each system are shown in Table 20 in the Appendix.
Table 15 shows the number of systems with a particular rating for each criteria, while the combined radar chart in Figure 5 provides a visualization of the overall quality of the systems. Separate radar charts for the individual systems are also available. 48

Combined radar chart providing a high-level visualization of the overall system quality of the current state of the art.
Counts of Number of Systems With Each Rating Across the Four Main Quality Criteria.
This section has critically examined the quality of the selected systems, with a focus on four criteria: the data sources and devices used to collect the data; the development methodologies and tools used; the evaluation approaches and rigor; and the accessibility of research outputs. A summary list of the development tools, evaluation approaches, and evaluation metrics used by the systems can be found in Table 19 in the Appendix. Though this analysis extends beyond SWTs, we also consider several factors that are specific to SWTs, such as methodologies, languages, frameworks, and semantic databases. To summarize the section, we discuss the results of the quality assessment, highlighting the aspects that are poorly addressed among the systems.
The accessibility of research outputs is by far the most overlooked quality criterion. Only seven studies included links to research outputs, with only five still accessible at the time of writing this article. With regard to the rigor of evaluation, 20 systems achieved a low rating, with another 20 achieving a medium rating. This can be primarily attributed to the fact that numerous researchers only reported the evaluation of one system component, failing to evaluate or account for the impact of other components. The potential real-world functionality of the system was another poorly addressed aspect of evaluation. Most systems were not tested with actual users, with only 12 systems being user validated. Case studies and scenarios were the most common approach, used in 29 of the systems. Additionally, most researchers (32 systems) overlooked the importance of evaluating their systems against external benchmarks, such as drawing comparisons with similar existing systems or seeking evaluations from domain experts.
The description of data collection methods or existing datasets was generally well done among the systems, with 30 of them giving adequate details regarding the number and demographics of participants or dataset records and properly citing reused datasets. However, only 19 of the systems gave details of the specific devices used to collect the sensor data. A likely explanation for this is that many of the proposed systems are conceptual proposals rather than functional implementations, and therefore, they were not tested on real sensor data collected from actual devices. Finally, with regard to system development, 36 papers comprehensively reported on the tools used to develop the different components of their proposed systems. However, only 16 of the papers reported the use of an existing development methodology, or else adequately described the systematic steps taken to develop each system component.
System Architectures
The architecture of a system can be defined as an abstraction of the system in the form of a set of software structures needed to reason about it (Bass et al., 2021). An important concept when discussing system architectures is the architectural style, which defines constraints on the form and structure of an architecture (Garlan & Perry, 1995). This is closely related to the architectural pattern, which is a reusable, well-established architectural solution to a recurring design problem (Bass et al., 2021). As summarized in Table 5, the systems implement a range of architectural styles and patterns. This section will discuss the architectures of the systems, including how they support the achievement of the seven key challenges discussed in Section 5.
Architectural Styles and Patterns
The selected systems implement established architectural styles and patterns, including layered, modular, service-oriented, and agent-based architectures (Buschmann et al., 1996; Richards, 2015). We discuss each in turn below.
Layered Architecture
The most common type of architecture among the systems is the layered architecture, implemented in 28 of the systems (Akhtar et al., 2022; Ali et al., 2018, 2020, 2021; Alti et al., 2022; Ammar et al., 2021; Elhadj et al., 2021; Esposito et al., 2018; Fenza et al., 2012; Garcia-Moreno et al., 2023; Hadjadj & Halimi, 2021; Henaien et al., 2020; Kilintzis et al., 2019; J. Kim et al., 2014; Kordestani et al., 2021; Lopes de Souza et al., 2023; Martella et al., 2025; Mavropoulos et al., 2021; Mcheick et al., 2016; Mezghani et al., 2015; Reda et al., 2022; Spoladore et al., 2021; Titi et al., 2019; Vadillo et al., 2013; Villarreal et al., 2014; Xu et al., 2017; H. Q. Yu et al., 2017; Zhang et al., 2014). In this pattern, each layer consists of a group of subtasks, with each group being at a particular level of abstraction (Buschmann et al., 1996). This offers several advantages. It is simple to understand, and the separation of concerns among the different layers makes it easy to test and maintain the systems developed using this architecture (Richards, 2015). Among the systems, there are variations in the number of layers and their functionality. However, the first layer is typically dedicated to data collection from wearable or ambient sensors as well as other data sources. It may be named the data collection layer, as in the systems by Ali et al. (2020, 2021), the sensing layer, as in the systems by Elhadj et al. (2021) and Esposito et al. (2018), or the user layer, as in the system by Alti et al. (2022). Other typical layers include a data storage layer in which data are securely stored; networking layer which manages data communication and transmission in the system; inference and data analysis layer, in which the raw data are processed and analyzed to derive important insights; and finally, presentation layer in the form of a user interface where individuals and in some cases, their clinicians and caregivers, can receive visualizations and alerts. Other specialized layers may also be included, such as the security layer in the system by Ali et al. (2018), or the agents modeling and reasoning layer as proposed by Akhtar et al. (2022).
Modular Architecture
Similar to the layered architecture is the modular architecture, in which the system is subdivided into modules, blocks, or subsystems. This is the second most common architectural pattern among the systems, with some kind of modular pattern implemented in 21 of the systems (Bampi et al., 2025; Chatterjee et al., 2021; Chiang & Liang, 2015; De Brouwer et al., 2022; El-Sappagh et al., 2019; Hooda & Rani, 2020; Hussain & Park, 2021; Ivascu et al., 2015; Ivaşcu & Negru, 2021; Khozouie et al., 2018; Lopes de Souza et al., 2023; Martella et al., 2025; Mavropoulos et al., 2021; Minutolo et al., 2016; Rhayem et al., 2021; Stavropoulos et al., 2021; G. Yu et al., 2022; Zafeiropoulos et al., 2024; Zeshan et al., 2023; Zhang et al., 2014; Zhou et al., 2022). Modular and layered architectural patterns can be used concurrently, as is done in four systems (Lopes de Souza et al., 2023; Martella et al., 2025; Mavropoulos et al., 2021; Zhang et al., 2014). For example, in the system proposed by Zhang et al. (2014), the client management module has a middleware with a layered architecture. Additionally, because layered architectures tend to be monolithic, making them less agile and difficult to scale and deploy (Richards, 2015), modularity of layered architectures is advised, in which each layer consists of a modular set of components with a single function or purpose (Meyer & Webb, 2005). This is implemented by Mavropoulos et al. (2021), whose architecture has three levels (layers), with each containing specific modules. For example, the sensors management level contains a data analysis module, while the communication understanding level contains a natural language processing module. Similarly, Lopes de Souza et al. (2023) implement a semantic module within their layered architecture.
Service-Oriented Architecture
Another well-known architectural pattern is the service-oriented architecture, a distributed pattern in which system components provide and consume services (Bass et al., 2021). This is used in eight of the systems (Alti et al., 2022; Fenza et al., 2012; Garcia-Moreno et al., 2023; Hristoskova et al., 2014; Kilintzis et al., 2019; Martella et al., 2025; Mezghani et al., 2015; Xu et al., 2017). In service-oriented architectures, the different aspects of the challenges can be achieved using specialized services. For example, Hristoskova et al. (2014) implement services such as a notification service to generate alerts (decision support) and a user location service to localize specific users (context awareness). While the service-oriented architectural pattern is powerful and offers a high level of abstraction, it is often overly complex and difficult to understand (Richards, 2015). A way of mitigating these issues is to implement services in a layered architecture, as is done in several other systems (Alti et al., 2022; Fenza et al., 2012; Mezghani et al., 2015; Xu et al., 2017). Additionally, agents can be used to effectively manage services, as is the case in the systems proposed by Alti et al. (2022) and Fenza et al. (2012).
Agent-Based Architecture
Among the systems, nine implement an agent-based architecture. Eight of these use a multi-agent architecture (Akhtar et al., 2022; Alti et al., 2022; Ammar et al., 2021; Fenza et al., 2012; Ivascu et al., 2015; Ivaşcu & Negru, 2021; Martella et al., 2025; Vadillo et al., 2013), while one implements a single-agent architecture (Mavropoulos et al., 2021). Multi-agent systems are characterized by the existence of more than one agent acting autonomously within the system. Typically, each agent manages a particular aspect of the system, which enables decentralization, efficiency, and scalability. For example, Alti et al. (2022) implement situation detection using a situation reasoning agent and a disease classifying agent, while Ivaşcu and Negru (2021) and Ivascu et al. (2015) have notification and alert agents that enhance decision support. Similarly, the system proposed by Vadillo et al. (2013) has a sensor validation agent to verify sensor observations, thereby managing uncertainty in sensor data, a location agent to manage user locations, thereby contributing to context awareness, and a medication agent to oversee the administering of medication, which contributes to decision support. Among the multi-agent systems that incorporate a service-oriented architecture, agents are instrumental in managing the complexity of the services. Both Alti et al. (2022) and Fenza et al. (2012) use agents to handle service discovery and selection. Agents can also enhance decision support by interacting directly with users of the system. This is demonstrated by Mavropoulos et al. (2021), who use a smart virtual agent capable of dialogue to communicate with clinicians and support their decision-making.
Proposed Reference Architecture
Based on an analysis of the systems as well as an overview of general sensor-based systems, a reference architecture for personal health monitoring systems is presented in Figure 6.

Reference architecture for sensor-based personal health monitoring systems.
The architecture consists of three layers as described below: The The The
The architecture also includes a cross-cutting
Summary of the Layers and Modules Within the Proposed Reference Architecture.
Summary of Findings
Figure 7 shows a map outlining the current state of the field. In the remainder of this section, the inadequacies and limitations in current systems will be highlighted, paving the way for opportunities for future research.

Map showing the current state of the field.
Our findings show that three of the seven key challenges are particularly poorly addressed among the systems: situation prediction, explainability, and uncertainty handling. Most of the systems included in this study do not adequately address the challenge of situation prediction, with only 12 systems being capable of predicting health risks or giving insight into how detected conditions may progress with time. In order to achieve the vision of precision health, it is important for health monitoring systems to go beyond detecting current health states and move toward the anticipation and mitigation of adverse health states. With regard to explainability, all the reviewed systems use SWTs, which are inherently interpretable, and three systems provide chatbots or digital assistants that can respond to user queries. However, only nine systems present explicit explanations for system outputs. Additionally, none of the systems implement the criteria for good explanations as outlined by Molnar (2023), nor do any systems mention tailoring the explanations presented to suit different audiences.
Uncertainty handling is similarly poorly implemented or not addressed at all in the majority of the systems. While 16 of the systems consider the impact of sensor limitations such as noise and missing values, only 11 address the inherent uncertainty present in situation analysis and decision support in the health domain. This hinders their ability to perform reliably when faced with ambiguous data or vague or limited knowledge, thus reducing their trustworthiness and dependability. Both situation prediction and uncertainty handling can be enhanced by a combination of techniques, as suggested by Behera et al. (2019), such as ML and BNs. Few of the systems take such an approach, with the majority using solely rule-based reasoning.
To a lesser extent, interoperability and decision support are also not fully addressed in the selected systems. Considering interoperability, we found that only 15 of the systems take advantage of established sensor ontologies such as SAREF. Neglecting to use such ontologies limits the standardization and expressiveness of the descriptions of sensors and, importantly, sensor data. This results in less effective querying of and reasoning on sensor data, which in turn negatively impacts situation analysis. Further, while 20 systems incorporate established medical terminologies such as SNOMED CT and ICD, all but two systems fail to consider existing health data standards. This significantly limits their ability to effectively integrate electronic health records and other standardized clinical data sources.
There is also significant room for improvement in addressing the challenge of decision support. While 37 of the included systems incorporate alerts to warn of hazardous situations, many do not offer recommendations or reminders for medication or lifestyle factors such as diet and exercise. Similarly, only 15 of the systems report using established medical guidelines, clinical workflows, or information from vetted health professional bodies, which can help to provide a sound justification for any recommendations made, thereby enhancing the trustworthiness of the systems. However, the most overlooked aspect of decision support remains the human-centered aspect. Systems should support users’ agency to cognitively engage in decision-making by presenting them with various options and their potential outcomes, and allowing them to be the final decision-makers. While 29 of the selected systems do suggest recommendations, none offer more than one potential option or present their potential outcomes.
Summary of the Quality Assessment
With regard to the quality assessment, we found that 18 of the systems did not report the data collection methods or sources. Additionally, only 19 of the systems reported the specific devices used for data collection. It can be assumed that such systems may not have been properly validated using realistic data, which casts doubts on the claims made regarding the system functionality and performance. To mitigate this, researchers should clearly indicate which data were used to validate their systems, including how the data were collected, who it was collected from, and the devices that were used. Concerning the development of the systems, only 16 systems used an existing methodology or else systematically outlined the development steps followed for any of the system components. However, nearly all the systems provided details of the languages, platforms, tools, and other software used in the development process.
When it comes to the evaluation of the systems and components, nearly all the systems reported on the methods, metrics, and results of the evaluation process. However, only 16 included some kind of external evaluation, whether through a systematic comparison with other similar systems or seeking evaluation from domain experts. Furthermore, as was found in the review by Haque et al. (2022), most of the selected systems are yet to be evaluated in real-world settings. While this is to be expected in an emerging area, it is imperative that more systems be evaluated in real-world settings going forward, so that practical challenges and user feedback can be identified early on and considered in future system proposals. This feedback loop is essential for undertaking further research into personal health monitoring systems that fully harness the potential of SWTs. Finally, accessibility of resources was poor among the systems, with only six systems providing access to relevant system files. Wherever possible, researchers should include the research outputs, such as ontologies, data, and code, as publicly accessible supplemental material in order to enhance reproducibility and verifiability.
Future Research Directions
This study highlights the fact that many personal health monitoring systems do not fully leverage reusable resources and instead opt to build resources from scratch. We also find that an overwhelming majority of the systems are not built with reusability in mind, as evidenced by the limited availability of research outputs from various researchers. Although different health conditions may require specific features and functionalities, SWTs have the potential to be extendable, allowing for the addition of knowledge as it evolves and making them suitable for reuse across a wide range of health monitoring applications. We therefore invite researchers to not only reuse existing resources but also to build generalizable semantic and non-semantic system components and make them publicly available. This would play an important role in accelerating the development of personal health monitoring systems by avoiding redundant efforts.
Another takeaway that has clearly emerged from this study is the advantage of combining SWTs with other AI techniques, such as ML and BNs. Integrating these approaches can significantly improve the tackling of the seven key challenges identified in the study. ML can also be leveraged to support the development of SWTs (d’Amato, 2020; Kotis et al., 2021). We encourage researchers to explore recent software libraries such as DeepOnto (He et al., 2024), which support ontology engineering tasks using deep learning tools and pre-trained language models. Indeed, the use of large language models and generative AI is gaining traction in the semantic web community for tasks ranging from the generation of competency questions, OWL files, and documentation in ontology engineering (Garijo et al., n.d.), to link prediction (Shu et al., 2024) and graph population (Meyer et al., 2024) in knowledge graph engineering. Future research should explore how generative AI can accelerate the development of semantic web-based personal health monitoring systems. We also note that most systems do not report the use of recently proposed semantic web standards, with only one system using SHACL, one using SPIN, and none using RDF-star or Notation3. We encourage researchers to explore these and other state-of-the-art standards.
Additionally, many of the systems do not take into account factors such as diet, exercise, and other determinants of health. The next generation of personal health monitoring systems must be more holistic, focusing not only on disease but also on overall wellness. This includes the monitoring of emotional and mental states, which have been shown to be linked to physical health (Uskul & Horn, 2015). Such information can be represented using SWTs, including ontologies (Alani et al., 2018) and knowledge graphs (Gyrard & Boudaoud, 2022). The inadequately addressed challenges, together with the need for more holistic health monitoring, present interesting and important directions for future research in the field.
Limitations of this Study
While we believe that this article offers a comprehensive overview of the use of SWTs in personal health monitoring systems, it is a very broad area and thus we have necessarily had to delimit the scope of the article. We focus on the depth of coverage on the seven key challenges, the quality analysis of the selected systems, and the proposed reference architecture. While a few additional challenges are discussed, they are not included in the in-depth analysis, and other potential challenges may not be mentioned in this article. Furthermore, although we provide an overview of sensors used for health monitoring and highlight some hardware-based interoperability challenges, the practical aspects of the seamless integration of physical sensors and the real-time processing of sensor data are not discussed in depth.
Additionally, we recognize that the challenge assessment framework may not be equally applicable to all health monitoring systems, as some aspects may not be relevant to a particular system’s design goals. Rather than providing a definitive score on system excellence, it should be viewed as a framework for considering different aspects of the seven challenges and identifying existing systems that have addressed them effectively. We encourage researchers and developers to adapt the framework to align with their specific system requirements. Finally, as our inclusion criteria specified peer-reviewed journal articles and conference papers published in English, we acknowledge possible publication and language bias. Despite these limitations, we believe that the study provides valuable insights and offers a foundation for future research in this rapidly evolving field.
Conclusion
This systematic mapping study has analyzed the landscape of sensor-based personal health monitoring systems that incorporate SWTs. After a careful consideration of the pertinent issues in this application area, we identified seven key challenges that such systems must address. In a systematic process, we selected 48 systems as representative of the state of the art in the field, and critically analyzed them based on their capacity to address the seven challenges. We also evaluated the quality of the research undertaken to develop them. Moreover, we discussed the architectures of the selected systems and proposed a reference architecture to streamline the development of such systems. Lastly, we discussed the key findings of the study and highlighted opportunities for future research. It is our hope that this study will serve as a comprehensive overview of the field and spur further high-quality research in effective personal health monitoring systems.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was financially supported by the Hasso Plattner Institute for Digital Engineering through the HPI Research School at the University of Cape Town. It was also supported in part by the National Research Foundation of South Africa (grant number 151217).
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Notes
Reused Semantic Resources, the Systems that Reuse Them and the Challenges They Address. Each Resource Includes Either a Hyperlink or a Reference.
| Reused Resource | Description | Systems That Reuse it | Challenge(s) That it Addresses |
|---|---|---|---|
| Amigo Device Ontology (Vallée et al., 2005) | Provides support for the description of devices, user context, quality of service parameters, and communication protocols. | Hristoskova et al. (2014) | Interoperability, context awareness |
| Association Ontology | Provides basic concepts and properties for describing specific association statements to something. | H. Q. Yu et al. (2017) | Interoperability |
| Basic Formal Ontology | A foundational ontology designed for use in supporting information retrieval, analysis, and integration in scientific and other domains. | El-Sappagh et al. (2019) | Interoperability |
| Basic Geo (WGS84 – lat/long) | A basic RDF vocabulary that provides a namespace for representing latitude, longitude, and other information about spatially located things. | Martella et al. (2025); Reda et al. (2022) | Interoperability, context awareness |
| BioMedBridges Diabetes Ontology | Represents expert knowledge about stages and phenotypes of type 2 diabetes. | Ali et al. (2021) | Interoperability, situation detection, situation prediction, explainability |
| Chronic Obstructive Pulmonary Disease Ontology (COPDology; Ajami & Mcheick, 2018) | Contains concepts related to the disease, environment, equipment, patient data, and treatment. | Mavropoulos et al. (2021) | Interoperability, situation detection, situation prediction, decision support, context awareness, explainability |
| Context-Driven Adaptation of Mobile Services (CoDAMoS) Ontology (Preuveneers et al., 2004) | An adaptable and extensible context ontology for creating context-aware computing infrastructures by representing users, environments, hardware and software platforms, and services. | Martella et al. (2025) | Interoperability, context awareness |
| Core Ontology Model for Crowd-Sensing (COMCS) Ontology (Wang et al., 2015) | A core ontology modelling upper-level concepts for worker selection in different crowd-sensing tasks. | Martella et al. (2025) | Interoperability, context awareness |
| Cyc Ontology | A large knowledge base of common sense and background knowledge. | Peral et al. (2018) | Interoperability |
| Data Analytics for Health and Connected Care (DAHCC) Ontology | Captures care, patient, daily life activity recognition, and lifestyle domain knowledge. | De Brouwer et al. (2022); Zafeiropoulos et al. (2024) | Interoperability, situation detection, situation prediction, context awareness, explainability |
| DBPedia | A knowledge base of extracted structured information from Wikipedia in the form of an open knowledge graph served as linked data. | Reda et al. (2022); Xu et al. (2017) | Interoperability |
| Diabetes Mellitus Diagnosis Ontology (El-Sappagh & Ali, 2016) | A diabetes knowledge base that supports automatic reasoning for solving problems related to diabetes diagnosis. | El-Sappagh et al. (2019) | Interoperability, situation detection, situation prediction, decision support, explainability |
| Diabetes Mellitus Treatment Ontology | Provides knowledge about type 2 diabetes and its patients, including complications, symptoms, tests, interactions, and treatment plans. | Ali et al. (2021); El-Sappagh et al. (2019) | Interoperability, situation detection, situation prediction, decision support, explainability |
| Diseasome (Goh et al., 2007) | A.k.a. the human disease network, a graph that captures all genetic disorders and disease genes and the links between them. | Xu et al. (2017) | Interoperability, situation detection, situation prediction, explainability |
| DogOnt Ontology | Offers a uniform, extensible model for smart IoT environments, including the location, capabilities, and configurations of devices, and the composition of the smart environment. | Bampi et al. (2025) | Interoperability, context awareness |
| DOLCE + DnS Ultra Lite (DUL) Ontology | A foundational ontology that provides a set of upper-level concepts for easier interoperability among middle and lower level ontologies. | De Brouwer et al. (2022); Stavropoulos et al. (2021) | Interoperability |
| DrugBank | A knowledge resource for drug, drug–target and related pharmaceutical information. | Xu et al. (2017) | Interoperability, decision support, explainability |
| Drug Target Ontology | Provides formalized and standardized classifications and annotations of druggable protein targets. | Ali et al. (2021) | Interoperability, decision support, explainability |
| Event Ontology | Deals with the notion of reified events, which may have a location, a time, active agents, factors, and products. | H. Q. Yu et al. (2017) | Interoperability, context awareness |
| Fast Healthcare Interoperability Resources (FHIR) OWL Ontology | OWL ontology for FHIR represented in RDF. | El-Sappagh et al. (2019) | Interoperability |
| FHIR & SSN-Based T1 Diabetes Ontology (FASTO) (El-Sappagh et al., 2019) | Ontology integrating FHIR standard and SSN ontology with clinical practice guidelines for real-time management of insulin for diabetes patients. | Ali et al. (2021) | Interoperability, situation detection, situation prediction, decision support, explainability |
| Food Ontology (FoodOn) | A lightweight ontology for describing recipes, ingredients, menus, and diets. | Spoladore et al. (2021) | Interoperability, decision support, explainability |
| Friend of a Friend (FOAF) Ontology | An ontology describing persons, their activities, and their relations to other people and objects. | Ammar et al. (2021); Elhadj et al. (2021); Garcia-Moreno et al. (2023); Henaien et al. (2020); Mavropoulos et al. (2021); Reda et al. (2022); Spoladore et al. (2021); Titi et al. (2019); H. Q. Yu et al. (2017) | Interoperability, context awareness |
| General User Model Ontology (GUMO; (Heckmann et al., 2005)) | A user model ontology for use in user-adaptive or ubiquitous computing systems. | Mavropoulos et al. (2021) | Interoperability, context awareness |
| GeoNames Ontology | Provides elements of description for geographical features. | Rhayem et al. (2021) | Interoperability, context awareness |
| HealthIoT Ontology (Rhayem et al., 2017) | Provides semantic representation of both the medical connected objects (i.e., sensors) and their data. | Ivaşcu and Negru (2021) | Interoperability, context awareness |
| Heart Failure Ontology | Defines heart-failure-relevant information, including the causes and risk factors, signs and symptoms, diagnostic tests and results, and treatment. | Hristoskova et al. (2014) | Interoperability, situation detection, situation prediction, decision support, explainability |
| Human Disease Ontology | Represents common and rare disease concepts. | Ali et al. (2021) | Interoperability |
| Informed Consent Ontology | Represents processes and information pertaining to obtaining informed consent in medical investigations. | H. Q. Yu et al. (2017) | Interoperability, context awareness |
| International Classification for Nursing Practice (ICNP) | Provides a standardized terminology that can be used to record the observations and interventions of nurses. | Elhadj et al. (2021); Henaien et al. (2020) | Interoperability |
| International Classification of Diseases (ICD) | A tool for recording, reporting, and grouping conditions and factors that influence health. | Spoladore et al. (2021); Titi et al. (2019); G. Yu et al. (2022) | Interoperability, situation detection, explainability |
| International Classification of Functioning, Disability and Health (ICF) | A framework for describing and organizing information on functioning and disability. | Garcia-Moreno et al. (2023); Spoladore et al. (2021) | Interoperability, situation detection, explainability |
| International Classification of Headache Disorders (ICHD-3) | A detailed hierarchical classification of all headache-related disorders. | De Brouwer et al. (2022) | Interoperability, situation detection, explainability |
| IoT-lite Ontology | A lightweight ontology to represent IoT resources, entities, and services. | Garcia-Moreno et al. (2023); Rhayem et al. (2021) | Interoperability, context awareness |
| IoT-Stream Ontology Elsaleh et al. (2020) | A lightweight ontology to semantically annotate stream data from IoT devices. | Garcia-Moreno et al. (2023) | Interoperability, context awareness |
| Logical Observation Identifiers Names and Codes (LOINC) | An international standard for identifying health measurements, observations, and documents. | El-Sappagh et al. (2019) | Interoperability |
| Linked Data Notifications (LDN) | A protocol that describes how servers (receivers) can have messages pushed to them by applications (senders), as well as how other applications (consumers) may retrieve those messages. | Ammar et al. (2021) | Interoperability |
| Linked Data Platform (LDP) | Defines a set of rules for HTTP operations on web resources, some based on RDF, to provide an architecture for read-write Linked Data on the web. | Ammar et al. (2021) | Interoperability |
| Medical Subject Headings (MeSH) | A vocabulary for indexing, cataloguing, and searching biomedical and health-related information. | Reda et al. (2022) | Interoperability |
| MIMU-Wear Ontology (Villalonga et al., 2017) | Describes wearable sensor platforms consisting of mainstream magnetic and inertial measurement units. | Ivaşcu and Negru (2021) | Interoperability, Context awareness |
| Mobile Crowd Sensing (MCS) Ontology (Gasmi et al., 2016) | A semantic model designed to structure collected data from crowd-sensing systems. | Martella et al. (2025) | Interoperability, context awareness |
| Mobile Object Ontology (Wannous et al., 2013) | Models temporal, spatial, and domain-related information about mobile objects. | Rhayem et al. (2021) | Interoperability, context awareness |
| Medical Web Lifestyle Aggregator Ontology | Represents user data from the web. | H. Q. Yu et al. (2017) | Interoperability, context awareness |
| Ontology for Nutritional Studies | A comprehensive resource for the description of concepts in the broader human nutrition domain. | Ali et al. (2021) | Interoperability, decision support, explainability |
| Ontology of units of Measure (OM) | An ontology that models concepts and relations important to scientific research, with a strong focus on units, quantities, measurements, and dimensions. | Martella et al. (2025) | Interoperability, context awareness |
| OpenThesaurus | A multilingual web-based thesaurus. | Peral et al. (2018) | Interoperability |
| OwlSpeak Ontology (Heinroth et al., 2010) | Represents static and dynamic concepts related to spoken dialogue. | Mavropoulos et al. (2021) | Interoperability, context awareness |
| OWL-S Ontology | Describes semantic web services for automatic service discovery, invocation, composition, and interoperation. | Fenza et al. (2012); Hristoskova et al. (2014); Martella et al. (2025) | Interoperability, situation detection, situation prediction, decision support, context awareness |
| OWL-Time Ontology | Describes the temporal properties of resources. | De Brouwer et al. (2022); Martella et al. (2025); Mavropoulos et al. (2021); Rhayem et al. (2021); Titi et al. (2019); H. Q. Yu et al. (2017) | Interoperability, context awareness |
| Parkinson and Movement Disorder Ontology | Captures neurological findings, treatment plans, and instruments used to evaluate various traits of Parkinson’s disorder. | Zafeiropoulos et al. (2024) | Interoperability, situation detection, situation prediction, decision support, context awareness, explainability |
| Physical Activity Concept Ontology (H. Kim et al., 2019) | Captures various concepts that are required to describe one’s physical activity. | Ivaşcu and Negru (2021) | Interoperability, situation detection, situation prediction, decision support, context awareness, explainability |
| Places Ontology | A lightweight ontology for describing places of geographic interest. | H. Q. Yu et al. (2017) | Interoperability, context awareness |
| Quantities and Units (QU) Ontology | An ontology to model quantities and units of measurement. | Garcia-Moreno et al. (2023) | Interoperability, context awareness |
| RxNorm | Provides normalized names for clinical drugs and links to other drug vocabularies. | El-Sappagh et al. (2019) | Interoperability |
| SAREF for eHealth and Ageing Well (SAREF4EHAW) Ontology | An extension of the SAREF ontology for applications related to eHealth and ageing well. | De Brouwer et al. (2022); Lopes de Souza et al. (2023) | Interoperability, context awareness |
| SAREF for Wearables (SAREF4WEAR) Ontology | An extension of the SAREF ontology for applications related to wearables. | Hadjadj and Halimi (2021) | Interoperability, context awareness |
| Semantic Sensor Network (SSN)/Sensor, Observation, Sample, and Actuator (SOSA) Ontology | Describes sensors and their capabilities, measurement processes, observations, and deployments. | Bampi et al. (2025); Chatterjee et al. (2021); De Brouwer et al. (2022); Elhadj et al. (2021); El-Sappagh et al. (2019); Garcia-Moreno et al. (2023); Henaien et al. (2020); Ivaşcu and Negru (2021); Martella et al. (2025); Rhayem et al. (2021); Stavropoulos et al. (2021); Titi et al. (2019); Zafeiropoulos et al. (2024) | Interoperability, context awareness |
| Smart Applications REFerence (SAREF) Ontology | An ontology to enable semantic interoperability for smart appliances. | Zafeiropoulos et al. (2024) | Interoperability, context awareness |
| Smart Body Area Network (SmartBAN) Ontology | Describes data related to sensors in BANs. | El-Sappagh et al. (2019) | Interoperability, context awareness |
| Stream Annotation Ontology (Kolozali et al., 2014) | An ontology for real-time semantic annotation of streaming IoT data to support dynamic integration into the Web | Garcia-Moreno et al. (2023) | Interoperability, context awareness |
| SWRL Temporal Ontology | Defines a set of built-ins that can be used to represent temporal information in SWRL rules. | El-Sappagh et al. (2019) | Interoperability, context awareness |
| Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) | A collection of medical terms providing codes, terms, synonyms, and definitions used in clinical documentation and reporting. | Bampi et al. (2025); Chatterjee et al. (2021); El-Sappagh et al. (2019); Kilintzis et al. (2019); Kordestani et al. (2021); Lopes de Souza et al. (2023); Reda et al. (2022); Rhayem et al. (2021); Titi et al. (2019); Zhou et al. (2022) | Interoperability |
| Translational Medicine Ontology | An ontology to integrate chemical, genomic, and proteomic data with disease, treatment, and electronic health records. | H. Q. Yu et al. (2017) | Interoperability, context awareness |
| Unified Medical Language System (UMLS) | Provides a mapping structure among different health and biomedical terminologies, classifications, coding standards, and vocabularies. | Peral et al. (2018); Zhou et al. (2022) | Interoperability |
| Vital Sign Ontology | Provides a controlled, structured vocabulary for describing vital sign measurement data, the processes of measuring vital signs, and the relevant anatomical entities. | El-Sappagh et al. (2019); Ivaşcu and Negru (2021) | Interoperability |
| Web Access Control Ontology | Defines and enforces authorization conditions on web resources, allowing resource owners to control who can access their data and how. | Ammar et al. (2021) | Interoperability |
| Web Annotation Ontology | Structures annotations (i.e., information about a resource or associations between resources) to enable sharing and reuse across platforms. | Ammar et al. (2021) | Interoperability |
| WordNet | A lexical English language database of semantic relations between words, linking them into semantic relations. | Peral et al. (2018); Reda et al. (2022) | Interoperability |
Summary of the Extent to Which Each System Addresses the Seven Key Challenges.
| # | System | Year | Interoperability (out of 6) | Sit. Detection (out of 3) | Sit. Prediction (out of 3) | Dec. Support (out of 6) | Ctxt. Awareness (out of 7) | Explainability (out of 6) | Unc. Handling (out of 6) |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Akhtar et al. (2022) | 2022 | Low (1) | Medium (2) | ✗ (0) | Medium (3) | High (5) | Medium (3) | Low (1) |
| 2 | Ali et al. (2021) | 2021 | Medium (3) | High (3) | ✗ (0) | Low (2) | Medium (4) | Low (2) | Low (1) |
| 3 | Ali et al. (2020) | 2020 | Low (2) | Medium (2) | Low (1) | Medium (3) | Medium (3) | Low (1) | Medium (2) |
| 4 | Ali et al. (2018) | 2018 | Low (1) | High (3) | ✗ (0) | Medium (3) | Medium (3) | Low (2) | Medium (2) |
| 5 | Alti et al. (2022) | 2021 | Low (1) | Medium (2) | Low (1) | Low (1) | High (5) | Low (2) | ✗ (0) |
| 6 | Ammar et al. (2021) | 2021 | Medium (3) | Low (1) | ✗ (0) | High (5) | High (5) | Low (2) | ✗ (0) |
| 7 | Bampi et al. (2025) | 2025 | Medium (4) | Medium (2) | ✗ (0) | Low (2) | High (5) | Low (2) | ✗ (0) |
| 8 | Chatterjee et al. (2021) | 2021 | Medium (4) | Medium (2) | ✗ (0) | Low (2) | High (5) | Medium (3) | ✗ (0) |
| 9 | Chiang and Liang (2015) | 2015 | Low (1) | Medium (2) | Low (1) | Medium (4) | High (5) | Medium (3) | Low (1) |
| 10 | De Brouwer et al. (2022) | 2022 | Medium (4) | High (3) | Low (1) | Medium (3) | High (6) | Low (2) | ✗ (0) |
| 11 | El-Sappagh et al. (2019) | 2019 | High (5) | Low (1) | ✗ (0) | Medium (3) | High (5) | Low (2) | ✗ (0) |
| 12 | Elhadj et al. (2021) | 2021 | High (5) | Medium (2) | ✗ (0) | Low (2) | High (5) | Medium (4) | ✗ (0) |
| 13 | Esposito et al. (2018) | 2018 | Low (1) | High (3) | ✗ (0) | Low (2) | Medium (4) | Low (2) | Medium (2) |
| 14 | Fenza et al. (2012) | 2012 | Low (2) | High (3) | Low (1) | ✗ (0) | Low (2) | Low (1) | Medium (2) |
| 15 | Garcia-Moreno et al. (2023) | 2023 | Medium (4) | Medium (2) | ✗ (0) | Medium (3) | High (6) | Low (1) | Low (1) |
| 16 | Garcia-Valverde et al. (2014) | 2014 | ✗ (0) | Low (1) | ✗ (0) | Low (2) | Medium (3) | Low (2) | Low (1) |
| 17 | Hadjadj and Halimi (2021) | 2021 | Medium (3) | Low (1) | ✗ (0) | Medium (3) | High (5) | Low (2) | ✗ (0) |
| 18 | Henaien et al. (2020) | 2020 | Medium (4) | Low (1) | ✗ (0) | Medium (3) | Medium (4) | Low (2) | ✗ (0) |
| 19 | Hooda and Rani (2020) | 2020 | Low (1) | Medium (2) | ✗ (0) | Low (2) | Low (2) | Low (2) | Low (1) |
| 20 | Hristoskova et al. (2014) | 2014 | Medium (3) | Medium (2) | Low (1) | Medium (3) | High (6) | Low (2) | Low (1) |
| 21 | Hussain and Park (2021) | 2021 | Low (2) | Medium (2) | ✗ (0) | Low (2) | Medium (3) | Low (2) | Low (1) |
| 22 | Ivaşcu and Negru (2021) | 2021 | Low (1) | Low (1) | ✗ (0) | Low (1) | Medium (4) | Low (2) | ✗ (0) |
| 23 | Ivascu et al. (2015) | 2015 | Low (1) | Medium (2) | ✗ (0) | Low (1) | High (5) | Low (2) | ✗ (0) |
| 24 | Khozouie et al. (2018) | 2018 | Low (2) | Low (1) | ✗ (0) | Low (1) | High (5) | Low (2) | ✗ (0) |
| 25 | Kilintzis et al. (2019) | 2019 | Medium (3) | Low (1) | ✗ (0) | Low (1) | Medium (4) | Low (1) | Low (1) |
| 26 | J. Kim et al. (2014) | 2014 | Low (1) | High (3) | ✗ (0) | Low (1) | Medium (3) | Low (1) | ✗ (0) |
| 27 | Kordestani et al. (2021) | 2021 | Medium (3) | Medium (2) | ✗ (0) | Medium (4) | Medium (4) | Low (2) | Medium (2) |
| 28 | Lopes de Souza et al. (2023) | 2023 | Medium (3) | High (3) | ✗ (0) | Medium (4) | Medium (4) | Low (2) | ✗ (0) |
| 29 | Martella et al. (2025) | 2025 | Medium (3) | Low (1) | ✗ (0) | Low (2) | High(5) | Low (2) | Low (1) |
| 30 | Mavropoulos et al. (2021) | 2021 | Low (2) | Medium (2) | ✗ (0) | Medium (4) | High (5) | Medium (4) | ✗ (0) |
| 31 | Mcheick et al. (2016) | 2016 | ✗ (0) | Low (1) | Low (1) | Low (1) | Medium (3) | Low (2) | Low (1) |
| 32 | Mezghani et al. (2015) | 2015 | Low (1) | Low (1) | ✗ (0) | Low (1) | Medium (3) | Low (1) | Low (1) |
| 33 | Minutolo et al. (2016) | 2016 | Low (1) | High (3) | ✗ (0) | ✗ (0) | Medium (3) | Low (1) | Low (1) |
| 34 | Peral et al. (2018) | 2018 | Medium (4) | Low (1) | Medium (2) | Medium (3) | Medium (3) | Low (1) | ✗ (0) |
| 35 | Reda et al. (2022) | 2022 | Low (1) | Medium (2) | Low (1) | ✗ (0) | High (5) | Low (1) | Low (1) |
| 36 | Rhayem et al. (2021) | 2021 | Medium (4) | Medium (2) | Low (1) | Medium (3) | High (7) | Medium (3) | Low (1) |
| 37 | Spoladore et al. (2021) | 2021 | Low (2) | Medium (2) | ✗ (0) | Medium (4) | Medium (4) | Low (2) | ✗ (0) |
| 38 | Stavropoulos et al. (2021) | 2021 | Low (1) | Medium (2) | ✗ (0) | Low (2) | Medium (4) | Low (2) | ✗ (0) |
| 39 | Titi et al. (2019) | 2019 | Medium (4) | Medium (2) | ✗ (0) | High (5) | High (6) | Low (2) | Low (1) |
| 40 | Vadillo et al. (2013) | 2013 | Low (2) | Low (1) | ✗ (0) | Medium (4) | High (5) | Low (2) | ✗ (0) |
| 41 | Villarreal et al. (2014) | 2014 | Low (1) | Medium (2) | ✗ (0) | Medium (3) | Medium (4) | Medium (4) | ✗ (0) |
| 42 | Xu et al. (2017) | 2017 | Low (2) | Medium (2) | ✗ (0) | Low (2) | Medium (3) | Low (2) | ✗ (0) |
| 43 | G. Yu et al. (2022) | 2022 | Low (2) | Low (1) | ✗ (0) | High (5) | Medium (4) | Medium (4) | ✗ (0) |
| 44 | H. Q. Yu et al. (2017) | 2017 | Medium (3) | Medium (2) | Low (1) | Low (1) | Medium (4) | Low (2) | ✗ (0) |
| 45 | Zafeiropoulos et al. (2024) | 2024 | Medium (3) | High (3) | ✗ (0) | Medium (3) | High (5) | Low (2) | ✗ (0) |
| 46 | Zeshan et al. (2023) | 2023 | ✗ (0) | Medium (2) | ✗ (0) | Low (1) | High (5) | Low (2) | ✗ (0) |
| 47 | Zhang et al. (2014) | 2014 | Low (2) | Medium (2) | ✗ (0) | Low (1) | Low (2) | Low (2) | ✗ (0) |
| 48 | Zhou et al. (2022) | 2022 | Medium (4) | Low (1) | Low (1) | Medium (3) | Medium (4) | Low (1) | Low (1) |
Summary of Development Tools, Evaluation Approaches, and Evaluation Metrics Used by the Systems.
| # | System | Semantic Web-Based Languages, Standards, Reasoners, and Frameworks | Other Languages, Libraries, and Frameworks | Database Systems, Data Repositories, and Data Analysis Tools | Evaluation Approaches | Evaluation Metrics |
|---|---|---|---|---|---|---|
| 1 | Akhtar et al. (2022) | Protégé, SWRL | NetLogo | None mentioned | Case study, system simulation | None mentioned |
| 2 | Ali et al. (2021) | OWL, Protégé | Java, WEKA | Amazon S3, Apache Pig, Apache Hadoop | ML model performance, comparison with SOTA, ablation study | Accuracy, precision, recall, RMSE, MAE |
| 3 | Ali et al. (2020) | OWL, Protégé, SWRL | Java, WEKA | None mentioned | ML model performance, Comparison with SOTA, Ablation study | Accuracy, precision, recall, F-score, RMSE, MAE |
| 4 | Ali et al. (2018) | Apache Jena, OWL, Pellet, Protégé, SPARQL, SWRL | Java | None mentioned | User evaluation with real users, comparison with SOTA, expert evaluation of system, ontology validation, system queries | Accuracy, precision, recall, F-score |
| 5 | Alti et al. (2022) | Apache Jena, Protégé, SWRL | JADE, Java, Java Expert System Shell | MySQL | Case study, comparison with SOTA, NFR validation | Execution time, optimality, application lifetime, no. of discovered services |
| 6 | Ammar et al. (2021) | LDflex, OWL, RDF, RDFlib, Solid, SPARQL | JavaScript, Python | None mentioned | Use case scenarios, comparison with SOTA | Utilization of public health library, incorporation of observations of daily living, incorporation of social determinants of health |
| 7 | Bampi et al. (2025) | Apache Jena, Protégé, RDF, SPARQL, SWRL | None mentioned | None mentioned | Use case scenario, system simulation, NFR validation | System latency, resource usage (memory & CPU), comparison with SOTA |
| 8 | Chatterjee et al. (2021) | Apache Jena, Apache Jena Fuseki, HermiT, OWL, Protégé, RDF, SPARQL, SWRL | None mentioned | None mentioned | Ontology validation, system queries | Ontology reasoning time |
| 9 | Chiang and Liang (2015) | Apache Jena, Protégé, RDF | C#, Java, MATLAB | MySQL | Use case scenarios, system simulation | None mentioned |
| 10 | De Brouwer et al. (2022) | Apache Jena, Protégé, RDF, SPARQL | None mentioned | MongoDB | System queries, use case scenarios with real users | None mentioned |
| 11 | El-Sappagh et al. (2019) | HermiT, OWL, Pellet, Protégé, SPARQL, SWRL | SQL | SQLite | Ontology validation, comparison with SOTA, comparison with expert opinion, use case scenarios, NFR validation, system queries | Correctness, completeness, extensibility, conciseness, organizational fitness |
| 12 | Elhadj et al. (2021) | Apache Jena, Apache Jena Fuseki, OWL, Pellet, Protégé, SPARQL, SWRL | Java | MongoDB | Use case scenarios | None mentioned |
| 13 | Esposito et al. (2018) | OWL, Pellet, Protégé, SWRL | Java | SQLite | Case study with real users, ALMA method, ontology validation, NFR validation | Modifiability |
| 14 | Fenza et al. (2012) | OWL, SPARQL | JADE, MATLAB, Fuzzy control language | None mentioned | Case study, comparison with logic-based matching | Precision, recall |
| 15 | Garcia-Moreno et al. (2023) | None mentioned | None mentioned | None mentioned | ML model performance, NFR validation, user evaluation with real users | Accuracy, F-score, sensitivity, specificity, reusability, extensibility, adaptability |
| 16 | Garcia-Valverde et al. (2014) | Apache Jena | None mentioned | None mentioned | Use case scenario | Accuracy, precision, F-score |
| 17 | Hadjadj and Halimi (2021) | Apache Jena, OWL, Protégé, RDF, SPARQL, SWRL | None mentioned | None mentioned | Use case scenario, comparison with expert opinion, system queries | Similarity between system and expert opinion |
| 18 | Henaien et al. (2020) | Protégé, SWRL | WEKA | None mentioned | None mentioned | None mentioned |
| 19 | Hooda and Rani (2020) | Apache Jena, OWL, Pellet, Protégé, RDF, SPARQL, SWRL | None mentioned | None mentioned | Ontology validation | None mentioned |
| 20 | Hristoskova et al. (2014) | OWL, Pellet, SWRL | None mentioned | None mentioned | Expert evaluation, user evaluation with real users, ontology validation, NFR validation | Performance, scalability |
| 21 | Hussain and Park (2021) | Protégé, RDF | None mentioned | Apache ActiveMQ, Apache Hadoop, Apache Spark, Elasticsearch, MariaDB | Comparison with SOTA, user evaluation with real users | AUC, accuracy, precision, recall, Neg. predictive value |
| 22 | Ivaşcu and Negru (2021) | Apache Jena, Apache Jena Fuseki, Protégé, SPARQL | JADE, WEKA | None mentioned | Comparison with SOTA, system simulation | Accuracy, precision, recall, F-score |
| 23 | Ivascu et al. (2015) | Apache Jena, Protégé | None mentioned | None mentioned | Use case scenarios | None mentioned |
| 24 | Khozouie et al. (2018) | OWL, Pellet, Protégé | Java | None mentioned | Use case scenario, ontology validation, expert evaluation | None mentioned |
| 25 | Kilintzis et al. (2019) | Apache Jena Fuseki, OWL, RDF, SPARQL, SPIN | None mentioned | Virtuoso | User evaluation with real users, NFR validation, system queries | Robustness, data integrity |
| # | System | Semantic Web-Based Languages, Standards, Reasoners, and Frameworks | Other Languages, Libraries, and Frameworks | Database Systems, Data Repositories, and Data Analysis Tools | Evaluation Approaches | Evaluation Metrics |
| 26 | J. Kim et al. (2014) | Apache Jena, OWL, Protégé | SQL | None mentioned | Ontology validation, user evaluation, system queries with real users | Precision, recall, F-score, Likert scale |
| 27 | Kordestani et al. (2021) | None mentioned | Drools | None mentioned | Comparison between BN and ML diagnosis, use case scenarios | F-score |
| 28 | Lopes de Souza et al. (2023) | RDF, SWRL, SPARQL | GraphQL, Java, Javascript, Node.js, React | SQLite | Ontology evaluation | None mentioned |
| 29 | Martella et al. (2025) | Protégé, SPARQL | Java, Python | Apache Spark, MongoDB, PostgreSQL | Case study with simulated users, NFR validation, system simulation | Response time, throughput, delay threshold |
| 30 | Mavropoulos et al. (2021) | OWL, SPARQL | None mentioned | GraphDB | Comparison with SOTA, user evaluation with real users | Accuracy, precision, recall, F-score, Likert scale |
| 31 | Mcheick et al. (2016) | None mentioned | Netica | None mentioned | Use case scenarios, NFR validation | Adaptability |
| 32 | Mezghani et al. (2015) | Apache Jena Fuseki, RDF, SPARQL, SWRL | SQL | None mentioned | Use case scenario | None mentioned |
| 33 | Minutolo et al. (2016) | OWL, SPARQL, SWRL | None mentioned | None mentioned | Case study | None mentioned |
| 34 | ? | None mentioned | None mentioned | None mentioned | Case study | Similarity between actual and predicted values |
| 35 | Reda et al. (2022) | OWL, RDF, RML, SPARQL, SWRL | None mentioned | None mentioned | System simulation | Not mentioned |
| 36 | Rhayem et al. (2021) | Apache Jena, OWL, SPARQL, SWRL | Drools | None mentioned | Comparison with SOTA, ontology validation | F-score, precision, recall, response time, ontology coverage |
| 37 | Spoladore et al. (2021) | OWL, Protégé, SPARQL, SWRL | None mentioned | Stardog | Use case scenarios | None mentioned |
| 38 | Stavropoulos et al. (2021) | OWL, SHACL, SPARQL | Python, Django | GraphDB | Use case scenarios with real users, NFR validation, Focus group with clinicians | Scalability, Likert Scale |
| 39 | Titi et al. (2019) | Apache Jena, Pellet, Protégé, RDF, SPARQL, SWRL | Java, Java Server Faces, PrimeFaces | MySQL | System queries | None mentioned |
| 40 | Vadillo et al. (2013) | Apache Jena, OWL, Pellet, Protégé | JADE | None mentioned | Use case scenarios, ontology validation | Processing time |
| 41 | Villarreal et al. (2014) | None mentioned | Java | MySQL | Case study, ALMA method, User evaluation with real users, NFR validation | Response time, usability, recommendation suitability |
| 42 | Xu et al. (2017) | SPARQL | None mentioned | None mentioned | Case study, comparison with SOTA | None mentioned |
| 43 | G. Yu et al. (2022) | None mentioned | Python, spaCy, NetworkX | None mentioned | ML model performance, use case scenarios with real users | AUC, chronic care model criteria |
| 44 | H. Q. Yu et al. (2017) | Apache Jena, OWL, RDF, SPARQL, SWRL | None mentioned | Virtuoso | System queries | None mentioned |
| 45 | Zafeiropoulos et al. (2024) | OWL, Pellet, Protégé, RDF, RDFlib, SPARQL, SWRL, Owlready2 | Python, PyTorch | None mentioned | Use case scenario, ontology evaluation, system queries, ML model performance | MAE, MSE, R-squared |
| 46 | Zeshan et al. (2023) | Apache Jena, OWL, Protégé, SWRL | None mentioned | None mentioned | Use case scenario | Precision, recall, response time |
| 47 | Zhang et al. (2014) | OWL, Protégé, RDF, SPARQL, SWRL | None mentioned | None mentioned | Use case scenario | None mentioned |
| 48 | Zhou et al. (2022) | None mentioned | None mentioned | None mentioned | Use case scenario | None mentioned |
D. Quality Assessment
Summary of the Quality Evaluation of Each System.
| # | System | Year | Data & Devices (out of 3) | System & Components Development (out of 3) | Rigor of Evaluation (out of 7) | Accessibility of Research Outputs (out of 2) |
|---|---|---|---|---|---|---|
| 1 | Akhtar et al. (2022) | 2022 | ✗ (0) | Medium (2) | Low (1) | ✗ (0) |
| 2 | Ali et al. (2021) | 2021 | Medium (2) | Medium (2) | Medium (4) | ✗ (0) |
| 3 | Ali et al. (2020) | 2020 | Medium (2) | Medium (2) | Medium (4) | ✗ (0) |
| 4 | Ali et al. (2018) | 2018 | Low (1) | Medium (2) | High (6) | ✗ (0) |
| 5 | Alti et al. (2022) | 2021 | Low (1) | Medium (2) | Low (2) | ✗ (0) |
| 6 | Ammar et al. (2021) | 2021 | Low (1) | Medium (2) | Medium (3) | ✗ (0) |
| 7 | Bampi et al. (2025) | 2025 | High (3) | Medium (2) | Low (2) | High (2) |
| 8 | Chatterjee et al. (2021) | 2021 | Medium (2) | High (3) | Medium (3) | High (2) |
| 9 | Chiang and Liang (2015) | 2015 | Low (1) | High (3) | Low (1) | ✗ (0) |
| 10 | De Brouwer et al. (2022) | 2022 | High (3) | Medium (2) | Low (2) | Medium (1) |
| 11 | El-Sappagh et al. (2019) | 2019 | ✗ (0) | High (3) | Medium (4) | Medium (1) |
| 12 | Elhadj et al. (2021) | 2021 | ✗ (0) | High (3) | Medium (3) | ✗ (0) |
| 13 | Esposito et al. (2018) | 2018 | High (3) | Medium (2) | High (5) | ✗ (0) |
| 14 | Fenza et al. (2012) | 2012 | ✗ (0) | Medium (2) | Low (2) | ✗ (0) |
| 15 | Garcia-Moreno et al. (2023) | 2023 | High (3) | Low (1) | Medium (4) | ✗ (0) |
| 16 | Garcia-Valverde et al. (2014) | 2014 | Medium (2) | Low (1) | Low (2) | ✗ (0) |
| 17 | Hadjadj and Halimi (2021) | 2021 | Medium (2) | High (3) | Medium (3) | ✗ (0) |
| 18 | Henaien et al. (2020) | 2020 | Low (1) | Medium (2) | ✗ (0) | ✗ (0) |
| 19 | Hooda and Rani (2020) | 2020 | Low (1) | Medium (2) | Low (1) | ✗ (0) |
| 20 | Hristoskova et al. (2014) | 2014 | High (3) | Medium (2) | Medium (3) | ✗ (0) |
| 21 | Hussain and Park (2021) | 2021 | High (3) | Medium (2) | Medium (4) | ✗ (0) |
| 22 | Ivaşcu and Negru (2021) | 2021 | High (3) | High (3) | Medium (3) | ✗ (0) |
| 23 | Ivascu et al. (2015) | 2015 | ✗ (0) | Medium (2) | ✗ (0) | ✗ (0) |
| 24 | Khozouie et al. (2018) | 2018 | ✗ (0) | Medium (2) | Low (2) | ✗ (0) |
| 25 | Kilintzis et al. (2019) | 2019 | Medium (2) | High (3) | Medium (4) | ✗ (0) |
| 26 | J. Kim et al. (2014) | 2014 | Medium (2) | Medium (2) | Medium (4) | ✗ (0) |
| 27 | Kordestani et al. (2021) | 2021 | Medium (2) | Low (1) | Low (2) | ✗ (0) |
| 28 | Lopes de Souza et al. (2023) | 2023 | Low (1) | Medium (2) | Low (1) | ✗ (0) |
| 29 | Martella et al. (2025) | 2025 | High (3) | High (3) | Medium (3) | ✗ (0) |
| 30 | Mavropoulos et al. (2021) | 2021 | High (3) | Low (1) | High (6) | ✗ (0) |
| 31 | Mcheick et al. (2016) | 2016 | Low (1) | Low (1) | Low (2) | ✗ (0) |
| 32 | Mezghani et al. (2015) | 2015 | ✗ (0) | Low (1) | Low (2) | ✗ (0) |
| 33 | Minutolo et al. (2016) | 2016 | ✗ (0) | Medium (2) | Medium (3) | ✗ (0) |
| 34 | Peral et al. (2018) | 2018 | Medium (2) | Low (1) | Low (2) | ✗ (0) |
| 35 | Reda et al. (2022) | 2022 | Low (1) | Medium (2) | Low (1) | ✗ (0) |
| 36 | Rhayem et al. (2021) | 2021 | Medium (2) | Medium (2) | High (5) | ✗ (0) |
| 37 | Spoladore et al. (2021) | 2021 | Low (1) | Medium (2) | Low (2) | ✗ (0) |
| 38 | Stavropoulos et al. (2021) | 2021 | High (3) | Medium (2) | Medium (3) | ✗ (0) |
| 39 | Titi et al. (2019) | 2019 | ✗ (0) | High (3) | Low (1) | ✗ (0) |
| 40 | Vadillo et al. (2013) | 2013 | Low (1) | High (3) | Medium (3) | ✗ (0) |
| 41 | Villarreal et al. (2014) | 2014 | Medium (2) | Low (1) | High (6) | ✗ (0) |
| 42 | Xu et al. (2017) | 2017 | ✗ (0) | Low (1) | Medium (3) | ✗ (0) |
| 43 | G. Yu et al. (2022) | 2022 | Medium (2) | Low (1) | High (5) | ✗ (0) |
| 44 | H. Q. Yu et al. (2017) | 2017 | Low (1) | High (3) | Low (1) | ✗ (0) |
| 45 | Zafeiropoulos et al. (2024) | 2024 | Medium (2) | High (3) | Medium (3) | High (2) |
| 46 | Zeshan et al. (2023) | 2023 | Low (1) | Medium (2) | Medium (3) | ✗ (0) |
| 47 | Zhang et al. (2014) | 2014 | Low (1) | Medium (2) | Low (2) | ✗ (0) |
| 48 | Zhou et al. (2022) | 2022 | ✗ (0) | ✗ (0) | Low (1) | ✗ (0) |
