Sage Journals: Discover world-class research

Abstract

Lately, the availability of massive amounts of data necessitates the adoption of modern representation techniques, such as knowledge graphs (KGs). KGs are typically constructed via automated procedures and by utilizing heterogeneous data sources. This nevertheless hinders the quality of these large resulting KGs, as they might contain contradictions, that is a set of assertions that conflict with some axioms often set by human experts. In turn, classical description logics reasoners cannot be applied as no useful inference results can be generated in the face of inconsistencies. Meanwhile, classical reasoners can be used in order to retrieve the inconsistency explanations, but as the KG size grows larger the required time for this task increases significantly. To address the problem of reasoning with large and inconsistent KGs, we propose an open-source system that detects and fixes inconsistencies by splitting the KG into modules and then processing them in parallel to speed up the process. An empirical evaluation of two datasets illustrates the potential for effective inconsistency detection and fixing of large KGs.

Keywords

inconsistent knowledge graphs reasoning knowledge graph correction

1. Introduction

Knowledge graphs (KGs) are widely adopted globally, among leading entities from diverse sectors such as finance, healthcare, and information and communications technology (Abu-Salih, 2021; Ji et al., 2022). KGs integrate data from various heterogeneous sources and provide a plethora of capabilities with respect to knowledge acquisition, fusion, and up-keeping (Abu-Salih, 2021). A key aspect of KGs is the ability to apply formal reasoning tasks for querying, inferencing, and explainability (Lecue, 2020). At the same time, the construction of KGs often relies on automated or semi-automated processes for extracting knowledge and on combining data from diverse sources; these are prone to producing errors and in turn inconsistencies (Hofer et al., 2024; Huang et al., 2005; Paulheim, 2016). Unfortunately, the existence of KG inconsistency in practical applications prohibits the utilization of classical reasoners (Pensel & Turhan, 2018). Even a single contradiction within the KG obstructs the entailment of other consistent conclusions; from an inconsistent KG, anything can be entailed (Lembo et al., 2015). To overcome this obstacle, the KG can either be fixed—that is, to alter particular assertions—so that inconsistencies are corrected, or incorporate nonstandard reasoning techniques that tolerate the presence of inconsistencies (Ma et al., 2007). The detection and fixing of KG inconsistency is not always an easy-to-perform task, especially when considering large KGs.

To begin, the detection of inconsistencies and the retrieval of their explanations is a complex procedure. Inconsistency explanations, which are also known as justifications, are the minimal sets of axioms and assertions that logically contradict each other (Horridge et al., 2009). Note that computing the explanations is a computationally demanding task, as it is affected by the exponential explosion in the case of KGs with large numbers of assertions. This way, the invocation of an off-the-shelf description logics (DL) reasoner, does not always guarantee that all the explanations will be retrieved within a reasonable time frame Tran et al. (2020). Considering that all KG inconsistencies are detected, the next step would be to update the erroneous assertions with appropriate fixes. Instead of updating the wrong assertions, another approach would be to remove an assertion from each explanation and this way overcome the corresponding inconsistency. This, however, would result in a loss of information. Instead, we seek to replace some entities found in the assertions of an explanation with different ones, in an attempt to recognize the erroneous ones and correct them. Existing work mainly performs this step with the help of human experts, who revise the inconsistency explanations and manually select appropriate assertion updates that can also be generated by appropriate algorithms (Arioua & Bonifati, 2018; Fan et al., 2019). Considering the size of real-world KGs though, the number of alternative updates increases significantly, often rendering this way the manual selection process infeasible.

We develop a framework for detecting and fixing inconsistencies in large KGs. We present a method for the detection of inconsistency that is performed in a horizontally scalable manner. We split the KG into extended modules of individuals; that is, subgraphs that contain the assertions that refer (directly or indirectly) to a particular individual. Such modules are usually significantly smaller than the initial KG and in this way, the application of a classical DL reasoner to each of the modules in parallel for obtaining the explanations of inconsistency can be deemed as a realistic venture. Additionally, after retrieving the inconsistency explanations, and inspired by the work of Arioua and Bonifati (2018), we calculate update-based fixes for each explanation and automatically select those that can lead to consistency by following different fix selection strategies. To illustrate the applicability of our approach, we evaluate our system on two KG datasets, which contain inconsistencies of different kinds. Experimental results suggest that our approach can fix large and inconsistent KGs, so that reasoning can be further performed.

Our contributions are the following:

We propose a unified framework for detecting and fixing inconsistencies in KGs.

We expand the language expressivity of a state-of-the-art parallel inconsistency detection approach from a fragment of OWL2 (Tran et al., 2020) to the full OWL2, by splitting the initial KG into extended modules of individuals.

We re-formulate the task of KG fixing by adopting standards of the Semantic Web and propose new fix-selection strategies for increased efficiency.

The implementation of our framework is openly available via an online repository.¹

This work is an extended version of a conference paper published in the SETN-2024 conference. In particular, we have enriched the background and related work section with references to paraconsistent logics, and a more thorough analysis of additional related works than what was present in the conference paper version. Also, we have included a theoretical discussion regarding the correctness of our method in detecting inconsistencies in more expressive languages. Furthermore, in this extended version, the algorithm and implementation details are described in more depth. Finally, we have included experiments on an additional real-world dataset.

The remainder of this article is structured as follows. In Section 2, we present necessary background notions and discuss the state-of-the-art in inconsistency detection and fixing. In Section 3.1, we explain the inconsistency detection part and provide details regarding the correctness of the full OWL2 expressivity support. Section 3.2 introduces KGFixer, which is our proposed method for update-based fixing of inconsistent KGs. Section 4 describes our implementation. In Section 5, we present the results of our experimental evaluation, and, finally, in Section 6, we conclude with future research directions.

2. Background and Related Work

A KG (Hogan et al., 2022) $K = T \cup A$ typically consists of two components, the ABox $A$ and the TBox $T$ . The ABox includes the assertions that assign concepts (or classes) to individuals, the relations (or object properties) between the individuals as well as the data properties that accompany them. The ABox can be extended with extra knowledge as more data is being mined. The TBox is the intensional knowledge that is mainly constructed by domain experts and is typically left unchanged during the KG life-cycle. The TBox axioms are expressed in a language $L$ that is a fragment of first-order logic. Although the TBox can contain errors such as unsatisfiable classes (Kalyanpur et al., 2005), in our work, we consider it as being correct and that any inconsistency is due to erroneous ABox assertions that conflict with the TBox axioms. In the following, we assume that the ABox is encoded as a set of triples $(s, p, o),$ where $s$ is the subject, $p$ is the predicate, and $o$ is an object. Concept assertions are triples, where $s$ is the individual, $o$ is the class, and $p$ is the predicate rdf:type. On the other hand, relation assertions are represented as triples where $s$ and $o$ are individuals and $p$ is the relation name. Similarly, the triples for data property assertions include the subject $s$ , a data property $p$ , and a data value $o$ .

Moving forward, an interpretation of a KG is a mapping of sets of individuals to each class, and of sets of pairs of individuals to relations. An interpretation is a model of $K$ if and only if it satisfies all axioms and assertions in $K$ . $K$ is satisfiable if it admits a model; and $K$ entails a formula $ϕ$ if every model of $K$ is also a model of $ϕ$ (Baader et al., 2008). In cases where no model exists, then $K$ is inconsistent. For an inconsistent KG, the distinction between its true and false assertions cannot be performed with the use of classical DL reasoners. This means that no actual knowledge can be inferred, and a variety of reasoning tasks cannot be applied. Inconsistencies may occur when definitions from multiple sources are combined, for example, in cases of knowledge migration or KG merging (Huang et al., 2005). Interestingly though, a classical DL reasoner can be utilized to retrieve the inconsistency explanations (also termed as justifications), which are the minimal subsets $E$ of axioms and assertions in $K$ ( $E \subseteq K$ ) that are inconsistent. Minimal means that if any of the axioms or assertions in $E$ is removed, then the resulting $E^{'} \subset E$ should be consistent.

To perform reasoning on inconsistent KGs, a different class of formal logics could be incorporated, termed as paraconsistent logics. In this category, the work of Salhi and Sioutis (2023) introduces the concept of paraconsistent scenarios and provides implementations that are able to solve such problems. Paraconsistent approaches consider more than two truth values, for example, apart from “true” and “false,” there are also the “overdetermined” and “underdetermined” ones (Nescolarde-Selva et al., 2015). A thorough comparison between paraconsistent DL can be found in Kamide (2013). Nevertheless, due to the increased complexity of paraconsistent reasoning compared to classical reasoning, we do not further explore such approaches as our goal is to provide a scalable method that can tackle large inconsistent KGs.

2.1. Existing Work on Inconsistency Detection

Various approaches for detecting KG inconsistencies have been introduced in the literature. To begin, approximate methods can be proven quite effective in highlighting erroneous ABox assertions, nevertheless, they cannot always detect every inconsistency that is present in the original KG. de Groot et al. (2021) propose a method that divides the KG into subgraphs. Each subject of a triple can be deemed as a root node of a subgraph, which is then expanded by breadth-first search, following also the predicate and object expansions. Then, classical reasoners can be used to retrieve the inconsistency explanations on these smaller subgraphs. The expansion continues up to a maximum subgraph depth that is set by the user and effectively tunes the balance between scalability and completeness. The work of Paulheim and Stuckenschmidt (2016) operates on a series of ABoxes, for which particular feature vectors are calculated that attempt to indicate inconsistency. Based on these feature vectors, off-the-shelf classifiers are employed to predict if an ABox can be considered inconsistent or not. In a similar manner, Paulheim and Bizer (2014) analyze the statistical properties of types in the subject and object positions of the KG triples, and judging by a confidence score they categorize some as erroneous.

In Senaratne (2023), a method based on anomaly detection is presented. The so-called Corroborative Path algorithm is introduced, by which alternative paths between entities in a triple are considered, and a binary feature matrix is constructed. This, along with other types of features is taken into account for highlighting anomalous (i.e., erroneous) triples using a one-class support vector machine. In a different philosophy, Chen et al. (2021) rely on neural networks that operate upon embeddings of the TBox axioms and the ABox assertions. The embeddings calculated are such, that the distance between the original data, combined with additional attention coefficients, constitute an indicator of probable errors. Overall, the sacrifice approximate methods performed on completeness make them inappropriate for our purposes, that is, to fix a KG so that all meaningful reasoning results can be obtained from it.

Turning our attention to sound and complete inconsistency detection methods, as, for example, in Horridge et al. (2009), we can see that the scalability issues faced make them inapplicable for our purposes. To overcome this shortcoming, we can either reduce the expressivity of the supported language or employ modularization techniques that effectively achieve scalability. More specifically, Meilicke et al. (2017) focus on the ${DL - Lite}_{A}$ fragment, and utilize respective clash queries. These clash queries are “precompiled” combinations of classes, relations, and attributes that have been identified as potential causes of inconsistency in this kind of KG. In the other category, the work of Suntisrivaraporn et al. (2008) puts forward locality-based modules, which, combined with a hitting set tree algorithm is shown to be capable of detecting inconsistencies in KGs that are expressed in the $S H O I Q$ DL fragment. The work of Tran et al. (2020), decomposes the original KG into modules of individuals. Each module consists of all the triples in the original KG that include a reference to a particular individual. In the typical case, each module, that is, the TBox and the subset of the ABox, tends to be of significantly smaller size than the original KG, and the results of each module are independent of each other. Thus, off-the-shelf reasoners can be used in parallel to obtain results in a reasonable time, regardless of the initial KG’s size. Although scalable, this approach works on ${DL - Lite}_{b o o l}^{H, +}$ , which is a fragment of the OWL2 language (cf. Table 1), and it is not able to detect inconsistencies of particular types, for example, those inflicted by incompatible (w.r.t. the TBox) chain property relations.

Table 1.
Syntax of the Supported Fragment by Tran et al. (2020).

Syntax

$R$

ObjectInverseOf( $R$ )

$A$

owl:Thing

owl:Nothing

ObjectComplementOf( $C$ )

ObjectIntersectionOf( $C$ )

ObjectUnionOf( $C$ , $D$ )

ObjectSomeValuesFrom( $P$ ,owl:Thing)

SubClassOf( $C$ , $D$ )

SubObjectPropertyOf( $P$ , $S$ )

TransitiveObjectProperty( $P$ )

ClassAssertion( $C$ , $α$ )

ObjectPropertyAssertion( $R$ , $α$ , $b$ )

Syntax
$R$
ObjectInverseOf( $R$ )
$A$
owl:Thing
owl:Nothing
ObjectComplementOf( $C$ )
ObjectIntersectionOf( $C$ )
ObjectUnionOf( $C$ , $D$ )
ObjectSomeValuesFrom( $P$ ,owl:Thing)
SubClassOf( $C$ , $D$ )
SubObjectPropertyOf( $P$ , $S$ )
TransitiveObjectProperty( $P$ )
ClassAssertion( $C$ , $α$ )
ObjectPropertyAssertion( $R$ , $α$ , $b$ )

Note. $A$ are class names, $R$ property names, $C$ and $D$ class expressions, $P$ and $S$ property expressions, and $α$ and $b$ are individuals.

Finally, the approach of Zhu et al. (2022) incorporates the $k$ -hop modules of individuals, which are subgraphs centered on individuals $a$ and include all concept and role assertions that lie within the range of $k$ -hops. Splitting $K$ into $k$ -hop modules of individuals and performing reasoning in parallel, is shown to be an approximate reasoning method, which, in the average case suffices to infer all entailments in real-world $K$ s expressed in $S H O I N$ . In fact, focusing on the domain of inconsistency detection, most types of inconsistencies can be detected in the 1-hop modules of individuals that also contain the TBox. Module extensions to hops to more than one hop are only required in case chain relationships are in place, or in the face of complex class expressions that bind individuals in the scope of restricted role ranges and domains.

Considering the analysis of the existing scalable methods for inconsistency detection, we adopt an approach that is similar to Tran et al. (2020), but instead of modules of individuals, we consider extended modules of individuals to also capture more expressive KGs, such as those expressed in OWL2 DL. In our approach, the initial KG is divided into modules of individuals, which are nevertheless extended in a guided manner, according to whether “suspicious” axioms or assertions for inducing inconsistency are included in the original KG. If $a$ and $b$ are individuals then the module of $a$ is combined with the module of $b$ to an extended module if a role assertion $\tilde{R}$ exists that (i) binds together individuals $a, b$ , that is, $\tilde{R} (a, b)$ exists in $K$ and (ii) $\tilde{R}$ is restricted in $T$ (e.g., in class expressions defined in EquivalentClasses axioms). Although the extended modules of individuals can be slightly larger than the plain modules of individuals of Tran et al. (2020), in most real KG cases, they are still significantly smaller than the original KG. This way, their parallel processing can produce sound and complete results within a reasonable time frame.

2.2. Existing Work on Fixing Inconsistencies

Having detected the inconsistencies that are present in a KG, the next step is to attempt to correct them, so that classical reasoners can perform inferencing without issues. KG correction can be performed in many ways, each having different merits and shortcomings. The inconsistencies of a real-world KG can be induced either solely by conflicting TBox axioms or by ABox assertions that do not follow the rules defined in the TBox. Regarding the case of an unreliable TBox, quite a few approaches have been introduced in the literature, but they require significant effort from human experts in order to resolve the errors. For example, in the method introduced by Heyvaert et al. (2019), the elements of the KG are ranked according to the number of inconsistencies that they take part in; the work of Nikitina et al. (2012) ranks them according to how many axioms are involved during their evaluation; and in Peñaloza (2019), the author discerns them based on the consequences of the possible update actions that are available. Interactive query-based approaches that guide human experts are also available, as, for example, in Shchekotykhin et al. (2012), Schekotihin et al. (2018), and Rodler (2022). In our work, we consider that the TBox is reliable, and focus on the existence of wrong ABox assertions, since this kind of inconsistency fixing is more easily managed by automated algorithms and, due to the fact that the ABox is typically much larger than the TBox, it is more difficult to be managed by humans (Paulheim, 2016).

The fixing of the ABox can be performed either by deletion-based or update-based repairs. In the deletion-based category, the aim is to select the ABox assertions that induce the inconsistencies and simply remove them from the KG (Baader et al., 2022; Bonatti et al., 2011; Melo & Paulheim, 2020). However, this naturally results in loss of information which can be deemed as a shortcoming of the respective methods. We thus focus on the update-based repairs. In contrast to the previous category, these approaches do not remove assertions, but minimally alter parts of them with such values that the inconsistencies are resolved and the most possible amount of original information is retained (Arnaout et al., 2022; Pellissier Tanon et al., 2019). For example the approach of Geerts et al. (2020) relies on the well-known chase algorithm (Benedikt et al., 2017) to apply update actions “on-the-fly” during the reasoning process. The fixes are applied incrementally and the exhaustive evaluation of all alternative update actions is avoided. The work of Arioua and Bonifati (2018) has introduced an interesting approach accompanied with guarantees, but for a slightly different domain where the TBox consists of certain types of database-oriented constraints, which do not account for the whole OWL2 DL expressivity. Also, Fan et al. (2019) introduce a graph database compatible method, that is too based on the chase algorithm.

A differentiating factor among the existing KG fixing techniques is the method for selecting the most appropriate fix. Toward this aspect, a primary objective is the automated selection of alternative fixes based on particular reliability levels or minimality criteria (Ahmetaj et al., 2022; Baader et al., 2022). Moreover, various machine learning (ML) models have emerged as effective tools for detecting and correcting errors in KGs, often utilizing patterns or external knowledge sources (Chen et al., 2020; Paulheim & Bizer, 2014). While analytical approaches provide precise formal frameworks for generating fixes, ML-based methods rely on the KG itself, though ensuring consistency remains a challenge. The combination of these approaches shows potential for efficiently addressing inconsistencies, with some research exploring reasoning strategies to tolerate inconsistencies when resolution is not feasible (Bienvenu, 2020). In our work, we provide an open-source implementation of the update-based approach for user-guided KG fixing (Arioua & Bonifati, 2018), re-formulating the task to make it applicable to a variety of RDF KGs with an OWL2 DL TBox. In addition, we combine fixing with parallel inconsistency detection and propose two new fix-selection strategies, leading to improved KG-fixing performance.

3. Restoring KG Consistency

In this work, our focus is on a framework for detecting and fixing inconsistencies in large real-world KGs. As we have already mentioned in our analysis in the previous section, the attempt to cover more expressive languages such as OWL2 DL, leads classical reasoning approaches to exponential explosion as the size of the KG increases. In what follows, we describe a method for restoring KG consistency, which relies on a parallel detection and fixing procedure operated upon extended modules of individuals. This splits the KG into smaller subgraphs on which reasoning can be performed both faster on the one hand, and in a parallel manner on the other hand. The module extension is such, that the same inconsistency explanations are detected, as if we performed reasoning on the initial KG. Having detected the inconsistencies, the next step is to apply fixes so that a reasoner can perform all relevant tasks without issues.

3.1. Parallel Inconsistency Detection

Modern DL reasoners (e.g., HermiT [Glimm et al. 2014], Pellet [Sirin et al. 2007], etc.) provide the means to calculate the inconsistency explanations $E$ by asking for explanations of the expression textttowl:Thing $\subseteq$ owl:Nothing, which is a fact entailed by any inconsistent KG. An inconsistency explanation $E$ is a subset of the triples in the original KG, but also it is a minimal conflicting subset; if any of the triples in $E$ is removed, then those that remain are consistent. The complexity of retrieving the explanations depends on the language of the TBox, and may range from PTime for less expressive languages such as those of the $DL - Lite$ family, to 2NExpTime for the more expressive $S R O I Q (D)$ . It is thus apparent that in the face of large and more expressive KGs, the task of retrieving all inconsistency explanations can be deemed intractable. To this end, and drawing inspiration from existing works such as de Groot et al. (2021) and Tran et al. (2020), we rely on a graph splitting approach which enables horizontal scalability by leveraging parallel processing of the subgraphs and thus significantly speeds up the whole inconsistency detection and fixing process. To be able to capture more complex inconsistencies than the existing state-of-the-art parallel approaches—such as those inflicted by chain relationships among the KG entities—the KG is divided into, potentially slightly overlapping, extended modules of individuals.

To further explain the limitations of the plain modules of individuals against our approach for the extended modules, we employ the KG example of Figure 1. In this case, the TBox is composed of two axioms: the first declares class equivalence between childoffamily and an indirectly defined one via the EquivalentClasses type axiom, which restricts the hasParent property to have all its object values being instances of the class parentoffamily. The second declares that a minor cannot be a parentoffamily. The ABox includes the assertions that the individual childA is of class childoffamily and that it hasParent the other individual dad, which is of class minor. We can verify that this KG is inconsistent by considering that dad is a parent of childA, but dad is also a minor, which is disjoint with parentoffamily and which is the class that an individual being a parent should have. In this trivial case, the explanation of the inconsistency includes all the axioms and assertions of the original KG.

Figure 1.

An Example of Inconsistency not Captured by Tran et al. (2020).

According to the language considered in Tran et al. (2020), the existential quantifier ObjectSomeValuesFrom(Object Property Expression, Class Expression) and consequently its “complement,” the universal quantifier ObjectAllValuesFrom, support only the owl:Thing class expression, thus it is not capable of detecting complex inconsistencies of particular types. Following that scheme, we would split the KG into one module for each individual, that is, one including the TBox and all the assertions that refer to the individual childA, and a second including the TBox and every assertion referring to the individual dad. Then, if we checked each module individually for inconsistencies we would get that both are consistent, which does not hold for the entire KG; the inconsistency is left undetected. This happens because, in the case of the childA module, we do not have the information that dad is a minor, and thus there is no conflict with it being a parentoffamily. Additionally, in the case of the dad module, the assertion that childA is a childoffamily is not included, thus there is no restriction for its parent to be a parentoffamily.

To address this issue, we have developed a method that extends the modules of individuals, nevertheless not exhaustively by following all existing links to other entities in the original KG, but instead guided by particular axiom types that are capable of inducing inconsistencies that are left undetected by other approaches. In our method, which is given in Algorithm 1, a module of individual $a$ is combined with another individual’s module (e.g., $b$ ), only if a role assertion $\tilde{R}$ exists that (i) binds together individuals $a, b$ , that is, $\tilde{R} (a, b)$ exists in $K$ , and (ii) $\tilde{R}$ occurs in $T$ (e.g., in class expressions defined in EquivalentClasses axioms) in any of the following axiom types (Algorithm 1, lines 1 and 3):

ObjectAllValuesFrom

ObjectSomelValuesFrom

ObjectOneOf

ObjectHasValue

ObjectHasSelf

Object{Min,Max,Exact}Cardinality

EquivalentObjectProperties

InverseObjectProperties

DisjointObjectProperties

TransitiveObjectProperty

We can construct the set of “suspicious”

\tilde{R}

, termed as

S

, by examining

T

for roles that are defined by any of the above axiom types.

Additionally, assertions binding together $a, b$ in $A$ of the types

SameIndividual

DifferentIndividuals

are also indicators of when to combine modules of individuals into a single extended module (Algorithm 1, line 8). We include any such linked individual

b

to the set

I_{M_{a}}

, where

M_{a}

signifies the extended module of individual

a

Note that, in contrast to the $k$ -hop modules of individuals proposed in Zhu et al. (2022), our process for extending modules is not exhaustive with respect to the linked individuals via role assertions; we extend to another individual only if it is possible for an inconsistency to be induced in the extended module, as this is indicated by the existence of the axiom types in $T$ that we provided above. This process is illustrated in Algorithm 2. Having the set of suspicious roles $S$ and the direct module extensions in $I_{M_{a}}$ , we scan the initial KG for all the additional individuals that are linked in the way we described above (Algorithm 2, lines 2–7). In case the $h o p s$ variable, which is incremented in each iteration (Algorithm 1, line 25) is larger than one, the algorithm is invoked in a recursive manner to cover larger areas of the KG (Algorithm 2, lines 8–13). This accrues to smaller subgraphs than the original KG, nevertheless, overlapping parts are probable among them.

We now discuss the soundness and completeness properties of our approach. Regarding soundness, recall that an inconsistent explanation is a minimal subset of the KG’s axioms and assertions. Assume that Algorithm 1 could detect an explanation $\bar{E}$ that does not exist in $K$ , $\bar{E} \subseteq M_{a}$ . This would mean that $\bar{E} ⊈ K$ , which contradicts the main property of the extended modules of individuals, which is $M_{a} \subseteq K$ . The inconsistency detection capability of Algorithm 1 is sound, as no new axioms or assertions are introduced in the extended modules—ones that did not exist in the original $K$ —and thus, no additional inconsistency explanations are possible to be retrieved.

For completeness, we need to show that every inconsistency explanation $E$ that exists in $K$ can be also detected by examining the extended modules independently of one another. By Tran et al. (2020), we have that all inconsistencies, apart from those being induced by chain relationships and restrictions on complex class expressions, can be detected on the basis of 1-hop extended modules. Since $E$ is the minimal conflicting set, then it only consists of relevant axioms from $T$ and the conflicting assertions from $A$ that refer to a particular individual $a$ . By the construction process of the 1-hop modules of individuals we have that all such axioms and assertions are included in $M_{a}$ , thus $E$ can be detected. Now, a more complex type of inconsistency explanation includes references to two different individuals, for example, $a, b$ , which, nevertheless, must be somehow linked by a restricted role assertion $\tilde{R} (a, b)$ . If such an assertion did not exist in $E$ , then the inconsistency would not be induced by both individuals’ assertions, and thus it would not be minimal; we would have two independent inconsistency explanations, one for each individual. Given that $\tilde{R} (a, b)$ is included in $S$ as a role “candidate” for inducing inconsistency (or if $b \in I_{M_{a}}$ ), then, by definition, the 2-hop extended module of $a$ , $M_{a}$ , would combine the 1-hop module of $b$ , $M_{b}$ (similarly, the 2-hop extended module of $b$ would include $M_{a}$ ) and $E$ can then be detected by reasoning on the 2-hop extended modules of either $a$ or $b$ .

Now, in the more complex case of chain relationships, $E$ would also include assertions referring to more than two individuals, that is, except for the base individual $a$ , we may also have $b, c, \dots, d, e$ being bound together by $\tilde{R} (a, b), \tilde{R} (b, c), \dots, \tilde{R} (d, e)$ . Assume that such a chain relationship has a length of $k_{max}$ . Then, according to the iterative procedure of Algorithm 1 and by setting $m a x_h o p s = k_{max}$ , as long as $K$ remains inconsistent we gradually increase the hops for extending the modules of individuals, reaching eventually $h o p s = k_{max}$ . Throughout this process, $M_{a}$ will be extended to include $M_{b}$ due to $\tilde{R} (a, b) \in M_{a}$ . In the second iteration, $M_{a} \cup M_{b}$ will be extended to include $M_{c}$ , due to $\tilde{R} (b, c) \in (M_{a} \cup M_{b})$ . Similarly, in the $k_{max}$ th iteration, the extended module of $a$ will be of the form $M_{a} \cup M_{b} \cup \dots \cup M_{d} \cup M_{e}$ and the complex inconsistency with chain relationships will be detected in this extended module. To set the value of $m a x_h o p s$ , the user can either (i) rely on a preprocessing step that would search for the length of the longest chain relationship that would have the capability to include an inconsistency, for example, based on the relationships in $S$ , or (ii) set it manually in case corresponding domain knowledge is available.

As we have already mentioned, the extended modules can be overlapping, for example, we can have both $M_{α, b}$ and $M_{b, c}$ resulting from a KG. In this case, the ABox assertions corresponding to $M_{b}$ will be examined by both reasoning processes, one for each extended module. Considering, however, parallel task execution, this would not have any significant impact in terms of computational time; the bottleneck would be the extended module that has the largest size. Lastly, although in the worst case, an extended module can have the same size as $K$ —for example, an individual is linked to all other individuals via an object property in $S$ —practice has shown that this rarely happens in real-world KGs. In the best case, the KG’s individuals are not linked to each other, thus no module is extended, resulting in the approach of Tran et al. (2020) that is significantly faster than reasoning with the initial KG $K$ .

3.2. Repairing Inconsistent KGs

KGs often contain erroneous triples that lead to inconsistencies against their semantic schema. Fixing such inconsistencies involves actions that edit the data of the KG, such as the removal and/or insertion of triples. Update-based repairing approaches focus on updating the erroneous triples, by modifying the object or subject, in order to minimize information loss. Identifying such sets of actions that lead to consistency (termed as fixes) is a challenging task, in particular, considering that the same triple can be involved in more than one inconsistency and that a single inconsistency typically involves several triples. In addition, the task becomes even more challenging when considering that some triples in a KG can be the result of entailment, which makes their removal even more complex (Arioua & Bonifati, 2018; Arnaout et al., 2022; Melo & Paulheim, 2020).

Aligned with the work of Arioua and Bonifati (2018), our method performs update-based fixing of inconsistent KGs. As already stressed, we consider a reliable² TBox, and fixing is achieved by updating conflicting ABox assertions. In particular, a knowledge base schema is supported that consists of specific types of dependencies, namely contradiction-detecting dependencies (CDDs) expressing class disjointness, and tuple-generating dependencies (TGDs) expressing existential rules. This semantic schema might be less expressive than OWL2 DL as it does not support complex ontological constructs, disjoint union, and data-type reasoning. In our implementation of the method, however, we reformulate the task by adopting semantic web standards and enable the method to support respective OWL2 DL schemas, namely of the particular OWL2 DL subset defined later in Section 4. The notion of update-based repair (Wijsen, 2005) for inconsistency fixing refers to cases where a fact can be updated in order to fix an inconsistency, as opposed to deletion-based repair, where facts are removed from the KG. This way, update-based repair preserves as much information as possible. In this direction, Arioua and Bonifati (2018) introduce the notion of position fix, which can update an entity in a specific position (i.e., subject or object in a triple) of a fact and propose a method that given an inconsistent KG and some immutable positions in it, generates all the alternative position fixes that can lead to consistency. Such alternative fixes are then presented to a user, for example, a domain expert, in the form of multiple-choice questions, and are resolved sequentially gradually leading this way to consistency.

Regarding the generation of alternative update actions, the original method of Arioua and Bonifati (2018) starts with all joint positions on the axioms involved in the body of a CDD that is fired. For example, if a CDD is fired by the ABox axioms childA hasParent dad and childA hasParent childB, the joint positions are the subject position of both assertions, where the “joint” entity childA appears. As regards the strategy to be followed for generating and asking questions, the results of that work suggested that starting with the joint positions involved in more inconsistencies (maximally contained, MCD-fix) was the best strategy. In addition, the notion of “Active domains” is used to narrow down the candidate entities available in the KG for replacing the existing entity in a specific position. For example, replacing childA with childB, rather than dad. The option of replacing the existing entity with a named null variable is also considered an additional option.

Such an approach relies on the notion of $P$ -repairability. A KG is considered $P$ -repairable if there is at least one set of update actions that can render it consistent, without altering a set $P$ of positions in the KG. This set $P$ is initially empty (Algorithm 1, line 14) unless we have a prior ground truth subset of the KG which is reliable and should not be updated during the fixing. Before presenting some update actions as alternatives to the user, a $P$ -repairability check is performed for each of them to identify the ones that retain the KG $P$ -repairable and form a “sound” question (Algorithm 1, line 15). When an update action is selected by the user, the corresponding position is added in $P$ , to avoid undoing any previously selected update actions that could lead to infinite updates. In order to check for the $P$ -repairability of a KG, we can temporarily replace all the entities in editable positions of the KG (i.e., not in $P$ ) with named null values. If this trivial version of the KG is still inconsistent, then the KG is not $P$ -repairable. That is, there is no update fix that can make it consistent without updating any position in $P$ . In the absence of an available user for their experiments, the method selected arbitrarily a “sound” update action, which is an action that can lead to consistency, given the update actions that have already been chosen in previous iterations.

4. Implementation Details

Since there are no publicly available implementations of the methods that inspired us (Arioua & Bonifati, 2018; Tran et al., 2020), we developed our codebase from scratch. We employed the library of OWL-API³ that supports OWL2 DL and developed a service using Java that, given a KG (i.e., in terms of DL, a TBox, and an ABox), (a) extracts the modules of the individuals and (b) invokes the HermiT⁴ (Glimm et al., 2014) reasoner to analyze the subgraphs, checks for inconsistency, and returns the explanations where applicable. We chose to utilize HermiT, as it is illustrated to perform better than others in Horrocks et al. (2012), but any OWL-API compatible reasoner can be used without any modification in the whole approach. The reasoning tasks for each module are delegated to different CPUs so as to parallelize the time-consuming process of inconsistency detection in large KGs.

Next, toward fixing the KG, we introduce our method as an open-source implementation of the MCD-fix strategy (Arioua & Bonifati, 2018). In addition, we re-formulate the problem of fixing a KG adopting Semantic Web standards used in several KGs, namely RDF and OWL2 DL. Our method is implemented in Java and utilizes a standard reasoner, namely HermiT, for consistency checking. In particular, given a KG consisting of an ABox, in the form of RDF triples, and a TBox, in the form of an OWL2 DL schema, our method updates the ABox assertions of the KG leading to repairing certain inconsistencies detected by the reasoner. These include inconsistencies due to entity confusion (e.g., childB instead of childA) as well as inconsistencies due to DataPropertyAssertion axioms (e.g., wrong types of values).

Moving into the Semantic Web formalization, we consider a TBox that consists of OWL2 DL Class Axioms. As OWL2 DL is more expressive than TGDs and CDDs alone, our method is restricted to a particular subset of OWL2 DL, defined by:

The use of simple Data Ranges and Classes, but not Data Range and Class Expressions consisting of constructs.

The use of any OWL2 Axiom type except DisjointUnion, since the latter cannot be expressed as CDDs or TGDs, and are also not supported by the original work of Arioua and Bonifati (2018).

The TBox is considered reliable and only the ABox is updated. The ABox can consist of any OWL2 DL ABox type axiom, but:

Assertions of the types DifferentIndividuals and SameIndividual, though considered for consistency checking, are not updated themselves in the fixing step.

Assertions of the type DataPropertyAssertion are updated for fixing, based on available DataProper- tyRange restrictions to get the expected OWLDatatype and cast the existing value in it. If the value is not compatible with casting to the expected OWLDatatype the axiom is not updated.

The main steps of our KG fixing method are presented in Algorithm 1. After an initial $P$ -repairability check, we follow an iterative approach that incrementally identifies the update actions of preference by producing alternative update actions and choosing among them (Algorithm 1, lines 15–22). First, it identifies the maximally contained joint positions (MCD), that is the joint positions of axioms involved in most explanations of inconsistency (Algorithm 1, lines 16 and 17). Then, it generates potential alternative update actions to form a multiple-choice question (Action generation step, Algorithm 1, line 18). This is done by replacing the entity in the MCD position with another entity from the KG or a named null.⁵

All alternative actions to a multiple-choice question should have the potential to lead to a consistent KG. That is, they should retain the KG $P$ -repairable when applied. Such a question is called “sound question” and its actions “sound actions.” Therefore, all the alternative actions are filtered to keep only the sound ones (Soundness filtering step, Algorithm 1, line 19). In the context of OWL2 DL, the identification of the classes of an entity requires reasoning. Hence, restricting the entities to be considered for a specific position is not as straightforward as with “Active domains” of the original work. In particular, update actions with entities of the wrong type are disregarded during the $P$ -repairability checks in the Soundness filtering step.

The Action generation and Soundness filtering steps can be computationally demanding. The Action generation step is affected by the number of alternative entities to consider. And the Soundness filtering step by the number of alternative actions and the time required for the $P$ -repairability check, which involves a consistency check with the reasoner. Therefore, our method supports three alternative strategies for implementing these two steps (Algorithm 1, line 20):

MCD-Fix: This is the default strategy as described so far. The Action generation step and Soundness filtering step are exhaustive (Arioua & Bonifati, 2018). As no implementation for the original work is available, this strategy serves as the baseline later, in our experiments.

Trivial-fix: In this strategy, we always choose to replace the existing entity with a named null value. This strategy leads to trivial changes, where some information may be lost as in the real world it could be some other entity from the KG that should be added in that position. However, it is expected to be much less demanding in terms of computations because it does not calculate alternative update actions considering all the entities in the KG (Action generation step) and checking their soundness (Soundness filtering step). In some cases, where some degree of errors or missing data can be tolerated, this strategy can be a good option for an efficient way to fix the KG and apply reasoning.

Rank-fix: This strategy avoids the exhaustive calculation of all the alternative update actions (Action generation step) and their checking of whether they are sound or not (Soundness filtering step), by merging these two steps into one. In particular, it checks the soundness of each action during its generation and stops when a pre-defined number of alternative sound actions ( $K$ ) is collected or when a pre-defined ratio ( $k$ ) of all alternative actions has been checked for soundness. In addition, instead of choosing randomly among these sound actions, each of them is temporarily applied in the KG and the number of explanations remaining after the application of each update action is considered for ranking these alternatives. That is, the update action that leads to a KG with fewer explanations of inconsistency is preferred.

5. Empirical Analysis

In this section, we apply our method for detecting and fixing inconsistencies in two datasets from the existing literature. First, we showcase the inconsistency detection capabilities of our approach that split the KG into extended modules of individuals for TBoxes of increased language expressivity. Next, we proceed with applying the KG fixing strategies that we put forward and examine the results.

5.1. Experimental Setup

For our experiments, we employ the KG from the work of Bienvenu et al. (2016), which constitutes a variant of the Lehigh University Benchmark (LUBM; Guo et al., 2005) with additional disjointness axioms to induce inconsistencies of different types. Initially, the ABox is consistent. To facilitate reasoning tasks, controlled inconsistency is introduced in the ABox by adding additional assertions:

Concept assertions: Each individual has a probability $p = 0.002$ of encountering a contradicting assertion that places them in a disjoint but close concept (sharing the same closest super-concept excluding $o w l : T h i n g$ ).

Role assertions: The probability of contradicting each role assertion for an individual is $p / 2$ .

Furthermore, the LUBM KGs used in our experiments have been slightly modified to include types of inconsistencies that are left undetected by the approach of Tran et al. (2020). In particular, similarly to the example of Figure 1, we added an EquivalentClass definition with a class expression that includes a universal restriction on the range for the property takesCourse, according to the expressivity of $S R O I Q (D)$ . Moreover, the original range of this property regarded a class that is now defined as disjoint with the newly defined one.

To further illustrate our method’s applicability, we also incorporate an additional dataset, the Food Inspection KG, as introduced in Bienvenu and Bourgaux (2022). This KG contains data as resultof restaurant inspections in the areas of New York and Chicago, and has a smaller TBox than the LUBM (24 axioms, as opposed to the 1085 of the LUBM TBox). Also, the inconsistencies that are present in the original KG mainly contain SameIndividual and DifferentIndividuals assertions, thus the approach of Tran et al. (2020) cannot detect them.

The evaluation is performed on subgraphs of the aforementioned KGs of gradually increasing size to calculate (a) the time required for the detection and (b) the total number of detected inconsistencies for the first batch of experiments, and to examine the timely correction of inconsistencies for each fixing strategy for the second batch. The experiments were executed on a computer with Linux OS, equipped with an Intel Xeon E5-2630 2.60 GHz (24-cores) CPU and 256 GB of RAM.

5.2. Results on Inconsistency Detection

The results from the original LUBM dataset are shown in Table 2. We can see that even for ABox sizes of 10K and 20K assertions, the “monolithic” approach required more than 24 hours to produce any results. On the other hand, the two parallel approaches complete their execution significantly faster, without significant deviations in the execution time. The number of modules of individuals extracted for parallel evaluation was equal to the number of individuals reported for each subgraph in the corresponding tables.

Table 2.
Results of the Inconsistency Detection in the LUBM Dataset (Bienvenu et al., 2016).

No split Split into modules (Tran et al., 2020) Split into extended modules (ours)

Time (s) Module size Time (s) Module size Time (s)

ABox size/# of Indiv./# of Incons. Detected Incons. Avg. STD Detected Incons. Avg. STD Detected Incons.

10,000/8,637/6 Timeout 1,085.6 2.4 101 6 1,092.2 16.9 94 6

20,000/12,514/40 Timeout 1,086.3 3.6 153 40 1,100.9 32.8 159 40

	No split	Split into modules (Tran et al., 2020)	Split into extended modules (ours)
10,000/8,637/6	Timeout	1,085.6	2.4	101	6	1,092.2	16.9	94	6
20,000/12,514/40	Timeout	1,086.3	3.6	153	40	1,100.9	32.8	159	40

Note. Reasoning timeout is set to 24 hours. The TBox size is fixed to 1085. The first column presents the number of ABox assertions, the number of individuals, and the number of inconsistencies that are included in each dataset.

Since in the real world, we expect to encounter more complex inconsistencies, we also performed tests on a slightly altered LUBM TBox that contained the extra axioms we described earlier in the previous subsection. The results are shown in Table 3. The monolithic approach of applying the HermiT reasoner to the whole KG became intractable quite quickly, as it managed to produce results only for ABox sizes of 2.5K and 5K triples. On the contrary, the module splitting techniques were able to run for up to 20K triples producing results quite fast.

Table 3.

Results of the Inconsistency Detection in the LUBM Dataset With a Slightly Modified TBox.

	No split		Split into modules (Tran et al., 2020)				Split into extended modules (ours)
	Time (s)		Module size		Time (s)		Module size		Time (s)
ABox size/# of Indiv./# of Incons.		Detected Incons.	Avg.	STD		Detected Incons.	Avg.	STD		Detected Incons.
2,500/2,974/1	4,087	1	1,085.1	0.9	42	0	1,087.2	3.5	38	1
5,000/5,258/2	32,531	2	1,085.3	1.3	71	0	1,088.6	6.6	62	2
10,000/8,637/8	Timeout		1,085.6	2.4	120	6	1,093.2	17.2	115	8
20,000/12,514/61	Timeout		1,086.3	3.6	183	40	1,102.1	33.1	265	61

Importantly, we can see that the proposed method for splitting the KG into extended modules was capable of detecting all the inconsistencies, in contrast to the approach of Tran et al. (2020) that overlooked particular inconsistency types. For ABox sizes of 2,500 and 5,000 triples, the proposed method detected all the inconsistencies that could be found without splitting the KG, while Tran et al. (2020) were not able to detect any of them. For ABox sizes of 10K and 20K triples, our method detected two and 21 additional inconsistencies, respectively, which were left undetected by Tran et al. (2020). The size of the modules extracted by these two approaches differed, resulting in a slight increase in the average module size and its standard deviation for the extended module splitting method. Note that the module size includes TBox axioms, which in these experiments were 1085. Overall, the execution time did not differ significantly between the two parallel implementations, but we anticipate that the extended modules approach will be slightly slower for larger KG, as a result of the higher number of triples in the modules that are examined.

The evaluation of the Food Inspection dataset led to similar results (Table 4). The monolithic approach became ineffective for 20K or more assertions, and the plain modules of individuals were not sufficient to capture any of the inconsistencies that exist in this dataset. On the contrary, our method was able to detect all inconsistencies and did so in a reasonable amount of time. Again, we can see that the size of the extended modules was larger due to the extensions to additional individuals, as dictated by the existence of SameIndividual and DifferentIndividuals assertions. Figure 2 depicts the size of the largest module extracted for each dataset and their subsets. In more detail, the black solid curve is the total size of the KG, red stars correspond to the maximum module size when incorporating our extended modules, green dots the maximum module size when incorporating the approach of Tran et al. (2020), and the red and green dashed lines show average module sizes, respectively. We can see that, although the module sizes increase, they are an order of magnitude less than the original KG. As it is expected, the modules of Tran et al. (2020) are even smaller, however, not applicable for detecting inconsistencies in KGs when facing more expressive languages. We also observe that the relationship is not always sublinear as one might expect, since it is dictated by the contents of the KG and the maximum path lengths connecting individuals with relationships that can induce inconsistencies. For example, in the LUBM case (Figure 2, left) where, for example, no SameIndividuals or DifferentIndividual relationships exist in the TBox, the increase is not great. On the contrary, for the Food Inspection dataset where the existence of such relationships leads module extensions to be greater, the increase in the maximum module size is significant.

Figure 2.

Module Sizes for Different Knowledge Graph (KG) Subsets.

Table 4.

Results of the Inconsistency Detection in the Food Inspection Dataset (Bienvenu & Bourgaux, 2022).

	No split		Split into modules (Tran et al., 2020)				Split into extended modules (ours)
	Time (s)		Module size		Time (s)		Module size		Time (s)
ABox size/# of Indiv./# of Incons.		Detected Incons.	Avg.	STD		Detected Incons.	Avg.	STD		Detected Incons.
2,500/1,794/1	37	1	25.9	2.5	6	0	30.7	18.7	6	1
5,000/2,845/2	113	2	26.6	5.2	8	0	45.5	84.9	9	2
10,000/5,272/3	909	3	26.7	7.8	14	0	64.7	228.5	17	3
20,000/9,754/17	Timeout		26.9	11.2	23	0	100.0	575.7	89	17

Note. Reasoning timeout is set to 24 hours. The TBox size is fixed to 24. The first column presents the number of ABox assertions, the number of individuals, and the number of inconsistencies that are included in each dataset.

The execution times of the parallel approaches presented in Tables 2to 4 can be further improved, provided that we increase the number of the parallel processors, as in our case the equipped machine for the experimental evaluation had only 24 cores available, while, for example, in the LUBM 20K we have 12,514 modules of individuals that could be processed in parallel. The slightly faster execution time of our method against that of Tran et al. (2020) in cases of smaller datasets is a result of an optimization step that examines if an extended module has already been processed, and if so then its processing is skipped. The extended modules with base individuals $a$ and $b$ may be identical, if $a$ (and $b$ , respectively) is only linked to individual $b$ (and $a$ , respectively).

5.3. Results on KG Fixing

In the next part of our experimental evaluation, we applied our method for fixing the KGs, using the baseline strategy MCD-fix and the two optimization strategies Trivial-fix and Rank-fix. In the absence of a user to decide on the fix selection, one of the generated sound actions was selected randomly, as in the experiments of Arioua and Bonifati (2018). The same holds for the Rank-fix method, in case of ties between alternative actions.

We also performed experiments by utilizing the extended modules of the individual method we propose for the inconsistency detection and the method of Tran et al. (2020), as described earlier in Section 3.1. The experiments presented in the work of Arioua and Bonifati (2018) considered datasets with up to 1,000 assertions. We experiment with subsets of the original LUBM dataset, consisting of 2,500, 5,000, 7,500, 10,000, and 12,500 assertions. For the Food Inspection dataset, we employed subgraphs of size 2,500, 5,000, 10,000, and 20,000 assertions. We introduce a time threshold for fixing the KG of 1 minute and 5 minutes for the LUBM and the Food Inspection datasets, respectively.

The main aim of this first series of experiments was to investigate the applicability of MCD-fix and reveal the benefits of introducing the proposed strategies Trivial-fix and Rank-fix, by invoking the reasoner on the whole KG, without incorporating any of the parallel inconsistency detection methods. The benefit of these strategies is shown in Figure 3. For the LUBM dataset (Figure 3(a)), all the experiments with the MCD-fix baseline were overtime—having a time threshold of 1 minute, even for the smallest subset of the dataset. On the other hand, all experiments with the Rank-fix and the Trivial-fix strategy managed to repair the KG with up to 7,500 assertions within the time limit and only became overtime for larger subsets. For the Food Inspection dataset (Figure 3(b)), the behavior of the Trivial-fix and Rank-fix strategies was similar, but we can see that the MCD-fix strategy managed to timely complete the experiments at 100% for the 2,500 assertions KG, and at 80% of the times for the 5,000 assertions version. The above suggests that the MCD-fix baseline strategy is not the best option, whereas the Trivial-fix and Rank-fix alternatives were shown to be more effective, especially in the case of the LUBM dataset.

Figure 3.

Results of the Monolithic Approach for Fixing, Without Splitting the Knowledge graph (KG) into Modules. (a) LUBM Dataset: The Ratio of Timely Completed Experiments (i.e., not Exceeding the 1-Minute Limit) per Strategy Across Dataset Subsets of Different Sizes. All Completed Experiments Lead to an Updated KG that is Consistent. MCD-fix Values are Zero, as this Method did not Manage to Complete before the Timeout. and (b) Food Inspection Dataset: The Ratio of Timely Completed Experiments (i.e., not Exceeding the 5-Minute Limit) per Strategy Across Dataset Subsets of Different Sizes. All Completed Experiments Lead to an Updated KG that is Consistent.

Moving forward with the evaluation of the parallel methods, the results when splitting the LUBM KG into plain modules of individuals are presented in Figure 4. The MCD-fix baseline clearly benefits from splitting into modules. With the plain modules of individuals (Figure 4(a)) MCD-fix managed to complete the repairing process under the 1-minute limit for all experiments with less than 10,000 assertions and for 40% of experiments with 10,000 assertions. In addition, it managed to repair the KG reaching global consistency (Figure 4(b)) in a ratio of experiments ranging from 60%, in the smallest subgraphs, to 20% in the subgraph of 10,000 assertions. For the Trivial-fix strategy, we can see in Figure 4(a) that the ratios of timely completed experiments were quite similar to those with no module splitting, apart from the fact that it could complete the procedure for also 20% of the times for 10,000 assertions. It is worth noting that for this fixing strategy, all timely attempts led to global consistency (Figure 4(b)). For Rank-fix, on the other hand, the effect of splitting into modules is more complex. First, a negative effect is observed as the overhead of module splitting results in having some overtime experiments for the subset of about 7,500 assertions, where no overtime experiment was observed when there was no splitting into modules. In addition, several timely completed experiments with Rank-fix failed to lead to global consistency. The failure of some parallel experiments to reach global consistency is because we apply the fixing method on each module locally, without global checking of the KG at each fixing step, as described in the original fixing method. In a real-world scenario, where the update actions are selected in a meaningful way, for instance, by users that crosscheck each action that they choose with some database or other external knowledge, introducing fixes that create new conflicts could be considered probably unexpected. In these experiments, however, where the update actions for each module are chosen randomly and independently, there is no guarantee that the combination of the independently fixed modules achieves global consistency. In such cases, the algorithm needs to be re-run.

Figure 4.

Splitting the LUBM Knowledge Graph (KG) Into Plain Modules of Individuals for Fixing the Inconsistencies. (a) The Ratio of Timely Completed Experiments (i.e., not Exceeding the 1-Minute Limit) per Strategy Across Dataset Subsets of Different Sizes. In This Parallel Configuration Timely Completion does not Guarantee Consistency of the KG. and (b) The Ratio of Experiments that were (i) Timely Completed (i.e., not Exceeding the 1-Minute Limit) and (ii) Led to the Consistency of the KG. The Ratios are Presented per Strategy Across Dataset Subsets of Different Sizes.

For larger subsets, on the other hand, a benefit is observed as splitting into modules allowed some experiments with about 10,000 assertions or more to be completed timely (Figure 4(b), which was not possible without module splitting. In particular, the Rank-fix strategy managed to lead to global consistency for 20% of the experiments with about 10,000 assertions. In addition, when splitting into modules in some cases Rank-fix managed to timely complete more experiments even compared to Trivial-fix. This observation is appointed to Rank-fix choosing more targeted fixes that involve fewer update actions. We omit the assessment of the method of Tran et al. (2020) on the Food Inspection dataset, since no inconsistencies could be detected, and thus fixed.

Finally, we present the results of our proposed approach for the parallel KG inconsistency detection and fixing approach based on the extended modules of individuals. Figure 5 presents the results on the LUBM dataset, while Figure 6 for the Food Inspection one. We can see that with the extended modules in the LUBM (Figure 5(a)) MCD-fix benefited less from splitting the KG, as it completed only some of the experiments of up to 7,500 assertions. This happens because (a) more time is required for extending the module, and (b) the reasoning process on the (slightly larger) extended modules is inherently more complex. Also, only 40% of experiments with about 5,000 assertions led to global consistency (Figure 5(b)). Nevertheless, recall that without KG splitting none of the MCD-fix experiments managed to complete under the time limit of 1 minute (Figure 3(a)). Regarding Trivial-fix and Rank-fix, we can see that in the case of the extended modules, the performance gradually drops as the size of the dataset grows larger. In Figure 5(b), we can see that Rank-fix was more effective in terms of achieving consistency than the two other strategies for 5,000 and 7,500 assertions. For 10,000 and 12,500 assertions, however, all strategies reached the timeout threshold.

Figure 5.

Splitting the LUBM Knowledge Graph (KG) Into Extended Modules of Individuals for Fixing the Inconsistencies. (a) Ratio of Timely Completed Experiments (i.e., not Exceeding the 1-Minute Limit) per Strategy Across Dataset Subsets of Different Sizes. In This Parallel Configuration Timely Completion does not Guarantee Consistency of the KG. and (b) The Ratio of Experiments that were (i) Timely Completed (i.e., not Exceeding the 1-Minute Limit), and (ii) Led to the Consistency of the KG. The Ratios are Presented per Strategy Across Dataset Subsets of Different Sizes.

Figure 6.

Splitting the Food Inspection Knowledge Graph (KG) Into Extended Modules of Individuals During Repairing. (a) The Ratio of Timely Completed Experiments (i.e., not Exceeding the 5-Minute Limit) per Strategy Across Dataset Subsets of Different Sizes. In This Parallel Configuration Timely Completion does not Guarantee Consistency of the KG. and (b) The Ratio of Experiments that were (i) Timely Completed (i.e., not Exceeding the 5-Minute Limit) and (ii) Led to the Consistency of the KG. The Ratios are Presented per Strategy Across Dataset Subsets of Different Sizes.

With regard to the Food Inspection dataset, we can see in Figure 6 that Trivial-fix managed to complete all experiments in a timely manner, and achieved KG consistency in the cases of 5,000 and even of 20,000 assertions (Figure 6(b)). Once again the MCD-fix strategy was the least effective, and the Rank-fix had a slight benefit with respect to timely completion on 5,000 assertions, and to final KG consistency in the case of 2,500 assertions.

5.4. Results Overview

In our experimental evaluation, we compared a monolithic approach for detecting inconsistencies against two module-based approaches that can process the KG in a parallel and in a more time-effective manner. The monolithic approach becomes intractable for datasets consisting of more than 5K triples, but parallel methods that split the KG into smaller modules completed their execution significantly faster. Our approach incorporating extended modules of individuals outperforms the state-of-the-art method approach by detecting all inconsistencies, while maintaining reasonable execution times. For fixing inconsistencies, we evaluated the three strategies we propose, that is, MCD-fix, Trivial-fix, and Rank-fix. MCD-fix was found inefficient without parallelization, as it did not manage to fix the KGs within the provided time limits (1 minute for LUBM, 5 minutes for Food Inspection). Trivial-fix and Rank-fix performed significantly better, managing to fix KGs up to 7,500 assertions. The modular approaches improved MCD-fix, achieving the fixing of the KGs in more cases. The Rank-fix strategy may struggle due to the module-splitting overheads but is able to apply more targeted fixes than the other strategies for larger datasets. While our module splitting approach improves scalability, the availability of additional computational resources is necessary for handling even larger KGs.

6. Conclusions and Future Work

We presented an open-source system for detecting and fixing inconsistencies in KG. To perform the detection in parallel and to account for KGs described in more expressive languages, we have put forward the notion of the extended modules of individuals. According to this, plain modules of individuals are merged into combined extended modules, if particular kinds of axioms and assertions are linked to the base individual, such that inconsistencies may be induced, for example, by chain relations. Subsequently, each extended module is processed in parallel for detecting and fixing any inconsistency, this way being able to scale horizontally for KGs of large sizes. Regarding fixing, we reformulated the corresponding task of update-based repairs, by adopting Semantic Web standards and by introducing necessary modifications that make it applicable to a variety of RDF KGs with an OWL2 DL TBox. We evaluated our approach on two well-known cases of inconsistent KGs, and the results show that it is able to restore consistency even in narrow time frames with strict timeout configurations. Our evaluation results suggest that combining the fixing strategies we put forward with splitting large KGs into modules is a promising direction for the repairing of even larger KGs.

In the absence of user input, the random selection among sound fixes can lead to different fixed KGs after each execution. In this direction, we aim to investigate the development and use of ML models for ranking the alternative position fixes based on their plausibility as a means to minimize the need for input from domain experts. This can be formulated as a link-prediction/fact-validation task where each update action is one of many potential links between entities or facts, considering features based on relation paths in the KG (Lao & Cohen, 2010; Paulheim & Gangemi, 2015), or embedding-based methods, that develop and exploit distributed representations of entities and relations (Bordes et al., 2013; Demir et al., 2021). In addition, the incorporation of large language models for generating or selecting among alternative possible fixes, is another promising direction (Frey et al., 2024). Moreover, the loading of web-scale KGs to the computer’s memory might be deemed impossible in most cases, due to their extreme size. To this end, work is in progress toward an implementation that interoperates with a separate triplestore (e.g., irtuoso⁶ ) using SPARQL queries for the retrieval of the extended modules of individuals that are to be processed in-memory and in parallel.

Footnotes

ORCID iDs

Charilaos Akasiadis

Anastasios Nentidis

Angelos Charalambidis

Alexander Artikis

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has been supported by the ENEXA project, funded by the European Union’s Horizon 2020 research and innovation programme, under grant agreement No. 101070305.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

References

Abu-Salih

(2021). Domain-specific knowledge graphs: A survey. Journal of Computer Networks and Communications, 185, 103076. https://doi.org/10.1016/j.jnca.2021.103076

Ahmetaj

David

Polleres

Šimkus

(2022). Repairing SHACL constraint violations using answer set programming. In ISWC 2022, Vol. 13489 LNCS (pp. 375–391). Springer. https://doi.org/10.1007/978-3-031-19433-7_22.

Arioua

Bonifati

(2018). User-guided repairing of inconsistent knowledge bases. EDBT 2018, 133–144. https://doi.org/10.5441/002/edbt.2018.13

Arnaout

Stepanova

Razniewski

Weikum

(2022). Utilizing language model probes for knowledge graph repair. In Wiki Workshop 2022.

Baader

Horrocks

Sattler

(2008). Description logics. Foundations of Artificial Intelligence, 3, 135–179.

Baader

Koopmann

Kriegel

Nuradiansyah

(2022). Optimal ABox repair w.r.t. static EL TBoxes: From quantified ABoxes back to ABoxes. In ESWC 2022 (Vol. 3263, pp. 130–146). Springer International Publishing. https://doi.org/10.1007/978-3-031-06981-9_8

Benedikt

Konstantinidis

Mecca

Motik

Papotti

Santoro

Tsamoura

(2017). Benchmarking the chase. In SIGMOD/PODS’17 (Vol. Part F1277, pp. 37–52). ACM.https://doi.org/10.1145/3034786.3034796.

Bienvenu

(2020). A short survey on inconsistency handling in ontology-mediated query answering. KI-Künstliche Intelligenz, 34(4), 443–451.

Bienvenu

Bourgaux

(2022). Querying inconsistent prioritized data with orbits: Algorithms, implementation, and experiments. https://arxiv.org/abs/2202.07980.

10.

Bienvenu

Bourgaux

Goasdoue

(2016). Explaining inconsistency-tolerant query answering over description logic knowledge bases. In AAAI-‘16’ (pp. 900–906) AAAI Press. https://doi.org/10.1609/aaai.v30i1.10092

11.

Bonatti

P. A.

Hogan

Polleres

Sauro

(2011). Robust and scalable linked data reasoning incorporating provenance and trust annotations. Journal of Web Semantics Science, 9(2), 165–201. https://doi.org/10.1016/j.websem.2011.06.003 .

12.

Bordes

Usunier

Garcia-Duran

Weston

Yakhnenko

(2013). Translating embeddings for modeling multi-relational data. Advances in Neural Information Processing Systems, 26, 1–9.

13.

Chen

Deng

Zhang

Kharlamov

(2021). Neural symbolic reasoning with knowledge graphs: Knowledge extraction, relational reasoning, and inconsistency checking. Journal of Financial Research, 1(5), 565–573.

14.

Chen

Jia

Xiang

(2020). A review: Knowledge reasoning over knowledge graph. Expert Systems with Applications, 141, 112948. https://doi.org/10.1016/j.eswa.2019.112948

15.

de Groot

Raad

Schlobach

(2021). Analysing large inconsistent knowledge graphs using anti-patterns. In ESWC 2021 (pp. 40–56). Springer.

16.

Demir

Moussallem

Heindorf

Ngonga Ngomo

A. C.

(2021). Convolutional hypercomplex embeddings for link prediction. In 13th Asian conference on machine learning (pp. 656–671). PMLR.

17.

Fan

Tian

Zhou

(2019). Deducing certain fixes to graphs. In VLDB (Vol. 7:12, pp. 752–765). VLDB Endowment. https://doi.org/10.14778/3317315.3317318.

18.

Frey

Meyer

L. P.

Brei

Gründer-Fahrer

Martin

(2024). Assessing the evolution of LLM capabilities for knowledge graph engineering in 2023. In ESWC-’24 (pp. 26–30). Springer Nature.

19.

Geerts

Mecca

Papotti

Santoro

(2020). Cleaning data with llunatic. The VLDB Journal, 29(4), 867–892. https://doi.org/10.1007/s00778-019-00586-5 .

20.

Glimm

Horrocks

Motik

Stoilos

Wang

(2014). HermiT: An OWL 2 reasoner. Journal of Automated Reasoning, 53(3), 245–269.

21.

Guo

Pan

Heflin

(2005). LUBM: A benchmark for owl knowledge base systems. Journal of Web Semantics Science, 3(2–3), 158–182.

22.

Heyvaert

De Meester

Dimou

Verborgh

(2019). Rule-driven inconsistency resolution for knowledge graph generation rules. Semantics Web, 10(6), 1071–1086. https://doi.org/10.3233/SW-190358 .

23.

Hofer

Obraczka

Saeedi

Köpcke

Rahm

(2024). Construction of knowledge graphs: Current state and challenges. Information, 15(8), 509. https://doi.org/10.3390/info15080509

24.

Hogan

Blomqvist

Cochez

D’amato

Melo

G. D.

Gutierrez

Kirrane

Gayo

J. E. L.

Navigli

Neumaier

Ngomo

A. C. N.

Polleres

Rashid

S. M.

Rula

Schmelzeisen

Sequeda

Staab

Zimmermann

(2022). Knowledge graphs. ACM Computing Surveys, 54(4), 1–37. https://doi.org/10.1145/3447772 .

25.

Horridge

Parsia

Sattler

(2009). Explaining inconsistencies in OWL ontologies. In Scalable uncertainty management (pp. 124–137). Springer.

26.

Horrocks

Motik

Wang

(2012). The hermit owl reasoner. Journal of Automated Reasoning, 858(3), 136–141. CEUR Workshop Proceedings.

27.

Huang

Van Harmelen

Ten Teije

(2005). Reasoning with inconsistent ontologies. IJCAI 2005, 5, 254–259.

28.

Pan

Cambria

Marttinen

P. S.

(2022). A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 33(2), 494–514.

29.

Kalyanpur

Parsia

Sirin

Hendler

(2005). Debugging unsatisfiable classes in owl ontologies. Journal of Web Semantics, 3(4), 268–293.

30.

Kamide

(2013). A comparison of paraconsistent description logics. International Journal of Intelligence Science, 3(2), 99–109. Scientific Research Publishing.

31.

Lao

Cohen

W. W.

(2010). Relational retrieval using a combination of path-constrained random walks. Machine Learning, 81(1), 53–67.

32.

Lecue

(2020). On the role of knowledge graphs in explainable ai. Semantics Web, 11(1), 41–51.

33.

Lembo

Lenzerini

Rosati

Ruzzi

Savo

D. F.

(2015). Inconsistency-tolerant query answering in ontology-based data access. Journal of Web Semantics, 33, 3–29.

34.

Hitzler

Lin

(2007). Algorithms for paraconsistent reasoning with owl. In ESWC-’07 (pp. 399–413). Springer.

35.

Meilicke

Ruffinelli

Nolle

Paulheim

Stuckenschmidt

(2017). Fast ABox consistency checking using incomplete reasoning and caching. In RuleML+ RR 2017 (pp. 168–183). Springer.

36.

Melo

Paulheim

(2020). Automatic detection of relation assertion errors and induction of relation constraints. Semantics Web, 11(5), 801–830.

37.

Nescolarde-Selva

Usó-Doménech

J. L.

Alonso-Stenberg

(2015). An approach to paraconsistent multivalued logic: Evaluation by complex truth values. In J. Y. Beziau, M. Chakraborty & S. Dutta (Eds.), New directions in paraconsistent logic (pp. 147–163). Springer India.

38.

Nikitina

Rudolph

Glimm

(2012). Interactive ontology revision. Journal of Web Semantics, 1213(721), 118–130. https://doi.org/10.1016/j.websem.2011.12.002 .

39.

Paulheim

(2016). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantics Web, 8(3), 489–508. https://www.medra.org/servlet/aliasResolver?alias=iospress&doi=10.3233/SW-160218 .

40.

Paulheim

Bizer

(2014). Improving the quality of linked data using statistical distributions. International Journal on Semantic Web and Information Systems (IJSWIS), 10(2), 63–86.

41.

Paulheim

Gangemi

(2015). Serving DBpedia with DOLCE–more than just adding a cherry on top. In The Semantic Web-ISWC 2015 (pp. 180–196). Springer.

42.

Paulheim

Stuckenschmidt

(2016). Fast approximate a-box consistency checking using machine learning. In ESWC 2016 (pp.135–150). Springer.

43.

Pellissier Tanon

Bourgaux

Suchanek

(2019). Learning how to correct a knowledge base from the edit history. In: ACM web conference (pp. 1465–1475). ACM. https://doi.org/10.1145/3308558.3313584.

44.

Peñaloza

(2019). Making decisions with knowledge base repairs. In Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (Vol. 11676, pp. 259–271). LNAI.

45.

Pensel

Turhan

A. Y.

(2018). Reasoning in the defeasible description logic

{E L}_{⊥}

—computing standard inferences under rational and relevant semantics. International Journal of Approximate Reasoning, 103, 28–70. https://doi.org/10.1016/j.ijar.2018.08.005 .

46.

Rodler

(2022). One step at a time: An efficient approach to query-based ontology debugging. Knowledge-Based Systems, 251, 108987. https://doi.org/10.1016/j.knosys.2022.108987

47.

Salhi

Sioutis

(2023). A paraconsistency framework for inconsistency handling in qualitative spatial and temporal reasoning. In ECAI 2023 (pp. 2049–2056). IOS Press.

48.

Schekotihin

Rodler

Schmid

(2018). Ontodebug: Interactive ontology debugging plug-in for protégé. In Foundations of information and knowledge systems: 10th international symposium, FoIKS 2018, Budapest, Hungary, May 14–18, 2018, Proceedings 10 (pp. 340–359). Springer.

49.

Senaratne

(2023). SEKA: Seeking knowledge graph anomalies. In Companion proceeding of the ACM web conference 2023 (pp. 568–572) ACM.

50.

Shchekotykhin

Friedrich

Fleiss

Rodler

(2012). Interactive ontology debugging: Two query strategies for efficient fault localization. Journal of Web Semantics, 12–13, 88–103. https://doi.org/10.1016/j.websem.2011.12.006 .

51.

Sirin

Parsia

Grau

B. C.

Kalyanpur

Katz

(2007). Pellet: A practical OWL-DL reasoner. Journal of Web Semantics, 5(2), 51–53.

52.

Suntisrivaraporn

Haase

(2008). A modularization-based approach to finding all justifications for OWL DL entailments. In ASWC 2008, Bangkok, Thailand, December 8–11, 2008. Proceedings 3 (pp. 1–15). Springer.

53.

Tran

T. K.

Gad-Elrab

M. H.

Stepanova

Kharlamov

Strötgen

(2020). Fast computation of explanations for inconsistency in large-scale knowledge graphs. In Proceedings of the web conference 2020 (pp. 2613–2619). ACM.

54.

Wijsen

(2005). Database repairing using updates. ACM Transactions on Database Systems, 30(3), 722–768. https://doi.org/10.1145/1093382.1093385 .

55.

Zhu

Lin

Ding

Yao

Zhu

(2022). Implementing large-scale abox materialization using subgraph reasoning. In G. Memmi, B. Yang, L. Kong, T. Zhang & M. Qiu (Eds.), Knowledge science, engineering and management (pp. 627–643). Springer International Publishing.

Detecting and Fixing Inconsistencies in Large Knowledge Graphs*

Abstract

Keywords

1. Introduction

2. Background and Related Work

2.1. Existing Work on Inconsistency Detection

3. Restoring KG Consistency

3.1. Parallel Inconsistency Detection

4. Implementation Details

5. Empirical Analysis

5.1. Experimental Setup

5.2. Results on Inconsistency Detection

6. Conclusions and Future Work

Footnotes

ORCID iDs

Funding

Declaration of Conflicting Interests

Notes

References