Sage Journals: Discover world-class research

Abstract

In the context of the Semantic Web regarded as a Web of Data, research efforts have been devoted to improving the quality of the ontologies that are used as vocabularies to enable complex services based on automated reasoning. From various surveys it emerges that many domains would require better ontologies that include non-negligible constraints for properly conveying the intended semantics. In this respect, disjointness axioms are representative of this general problem: these axioms are essential for making the negative knowledge about the domain of interest explicit yet they are often overlooked during the modeling process (thus affecting the efficacy of the reasoning services). To tackle this problem, automated methods for discovering these axioms can be used as a tool for supporting knowledge engineers in modeling new ontologies or evolving existing ones. The current solutions, either based on statistical correlations or relying on external corpora, often do not fully exploit the terminology. Stemming from this consideration, we have been investigating on alternative methods to elicit disjointness axioms from existing ontologies based on the induction of terminological cluster trees, which are logic trees in which each node stands for a cluster of individuals which emerges as a sub-concept. The growth of such trees relies on a divide-and-conquer procedure that assigns, for the cluster representing the root node, one of the concept descriptions generated via a refinement operator and selected according to a heuristic based on the minimization of the risk of overlap between the candidate sub-clusters (quantified in terms of the distance between two prototypical individuals). Preliminary works have showed some shortcomings that are tackled in this paper. To tackle the task of disjointness axioms discovery we have extended the terminological cluster tree induction framework with various contributions: 1) the adoption of different distance measures for clustering the individuals of a knowledge base; 2) the adoption of different heuristics for selecting the most promising concept descriptions; 3) a modified version of the refinement operator to prevent the introduction of inconsistency during the elicitation of the new axioms. A wide empirical evaluation showed the feasibility of the proposed extensions and the improvement with respect to alternative approaches.

Keywords

Disjointness learning conceptual clustering trees inconsistency prevention distance measure

1. Introduction

In the perspective of the Semantic Web (SW) as a Web of Data, a plethora of datasets are constantly published and connected to others in the form of Linked Data, along a standard data model and based on schemata formalized as Web ontologies [16]. In this scenario, many important services have been devised and deployed for exploiting and enriching these knowledge bases in a variety of tasks, such as classification, query answering, population and enrichment, reconciliation (instance matching), and consistency checking.

The effectiveness of the inference services that implement these tasks as well as further services that can be built upon them (e.g. statistical inference) is strictly dependent on the quality of the ontologies, namely on how precisely (and exhaustively) their axioms convey the intended semantics of the underlying domains. As ontologies are represented through standard languages ultimately based on Description Logics (DLs) [2], an open-world semantics is generally adopted as suitable for such a Web-scale scenario, opposed to the complete knowledge assumption backing the semantics of other contexts (e.g. relational databases and logic programs). Checking disjointness is one of the key reasoning tasks for DL knowledge bases together with satisfiability, subsumption, equivalence tests (they can be reduced to one another). Reasoning under open-world semantics, with tasks involving individuals, e.g. instance checking and retrieval queries, it may often happen that the truth of assertions cannot be ascertained, owing to the inherent incompleteness of the knowledge bases (for technical details see [2], Ch. 2). This issue affects also the other services mentioned above that are built upon them. For example, in tasks aimed at enriching knowledge bases, detecting the possible introduction of conflicting assertions (e.g. cases of inconsistency) is very important, as this might trigger further repair-actions to reconcile the possible causes. Hence disjointness axioms are essential to detect such cases.

Fig. 1.

A simple concept hierarchy modeling the agents in a corporate domain.

To clarify this point, let us consider the case of a simple corporate knowledge base fragment (whose hierarchy is depicted in Fig. 1) with two sibling subconcepts, namely Person and Robot. Suppose also that the fragment includes an assertion Robot(BotSmith), stating that the individual BotSmith is a robot (working in one factory of the company, etc.), but later the noisy assertion Person(BotSmith) may be added by mistake (e.g. because lexically similar to Person(BobSmith)) or it may be inferred from the assumption of other (inaccurate) facts, e.g. an assertion welded(BotSmith,Piece123) and a flawed fact like domain(welded,Worker) given that subClassOf(Worker,Person). Without an explicit axiom stating that a robot is not a person, this inconsistency with respect to the intended interpretation of the knowledge base cannot be detected. Note that such an axiom would extend its effect to the whole subconcepts hierarchies.

However, in the design of various popular ontologies currently in use, the introduction of disjointness axioms as required by concept modeling methodologies (see [2], Ch. 10) has been neglected. As a result, they provide only a rather approximate representation of the domains, failing to capture all of the underlying constraints or making a complete knowledge assumption that distorts the intended semantics by admitting unintuitive cases. This lack of modeling accuracy was testified by a survey [30] in which only 97 out of a total of 1275 considered ontologies was found to include disjointness axioms. A possible reason for such an issue may be the inexperience with the language constructs, often leading users to overlook disjointness during the definition of domain axiomatizations [29]. As a result, while a growing number of datasets have been published joining the Linked Data cloud over the years, this issue is still ignored: at the time of writing this article, only 7 out of 9960 knowledge bases (0.7%) include this kind of axioms.1

This can be checked at LodStats: http://stats.lod2.eu.

Another cause for this issue may lie in the context-dependent nature of the notion of disjointness which is consequently perceived in different ways [27]. For instance, considering the previous case, it may be assumed that the same individual cannot be both a Worker and a Freelance within the same context (but, of course, it may be possible to ascribe him/her to either class in separate contexts). This means that a clear understanding of the domain to be modeled is crucial for its careful axiomatization in an ontology. However, the manual introduction of disjointness axioms may become a discouraging and, easily, also error-prone activity with large ontologies. Nevertheless, the task can be (partially) accomplished ex post with the support of statistical models emerging from the data as the result of machine learning techniques.

Noticeably, various works have shown how to exploit association rule mining for statistical schema induction [27,28]. The proposed methods often depend on the availability of heterogeneous external resources (corpora) for their elicitation for relying on lexical features. Conversely, the interplay between extensional and intensional knowledge (i.e. assertions regarding the individuals and terminological axioms) was only marginally taken into account. Most of the current approaches move from the assumption that two (or more) concepts may be mutually disjoint when the sets of their known instances, which should be representative for their extensions [2], do not overlap [14,29]. Moving from these considerations, a data-driven approach could be devised with the goal of finding partitions of similar individuals of the knowledge base according to a criterion that maximize the homogeneity of the individuals in each partition, i.e. emerging concepts, while minimizing their mutual overlap. Evidently, this boils down to a clustering problem [1] which is a classic topic in machine learning. It has also been taken into account in the context of ontology learning as a preliminary step for concept induction [19] or for automated solutions to concept drift or novelty detection problems [10]. Such problems have been tackled through extensions of basic clustering methods, such as (fuzzy) k-medoids [9,11], adapted to work on expressive knowledge bases. An emerging approach to disjointness axioms discovery, has proposed the employment of terminological cluster trees (TCTs) [23,24]. Solving conceptual clustering problems [26], namely finding natural partitions of the individuals to induce intensional definitions of the corresponding classes expressed in the standard representation languages, this framework aims at deriving potential disjointness axioms, that may even involve complex concept descriptions, by leveraging the background knowledge of the underlying schema (the axioms in the knowledge base). Unlike other methods, conceptual clustering produces partitions defined intensionally, with concept descriptions to decide the membership, rather than, extensionally, as a simple list of their individuals.

Fig. 2.

A fragment of a TCT.

TCTs (see an example in Fig. 2) are quite similar to terminological decision trees [12,22]: they are grown through divide-and-conquer strategy and can be thought to form hierarchies of concepts that are built exploiting refinement operators for DLs [19,21]. The latter are essentially binary decision trees, induced by supervised methods exploiting information-based heuristics; they are meant for the classification of the individuals through the logical tests in their inner nodes and classes associated with the leaves. The former (with a similar structure except for the leaves) are trees induced through unsupervised learning methods aimed at eliciting types from the partitions of similar individuals. The concepts to be installed at inner nodes are defined by progressively refining the one at the parent node (or its complement for right-children): possible specializations of such a concept are computed by a refinement operator, and the best one is selected based on the quality of the (bi)partitions of individuals that would be routed to either children. This quality is measured in terms of their membership w.r.t. the candidate refinement. Suitable specific metrics for the underlying representation are required to derive a notion of cluster prototype [10]. Noticeably, the method is able to detect dense data regions in the underlying instance space, hence the number of clusters – which has a strong impact on the quality of the clustering structure – need not to be required as a parameter. As in many partitional clustering method [1] the cohesion of the clusters (e.g. measured as the similarity of the members w.r.t. the prototype) determines the stop condition: further partitioning of coherent clusters should be prevented to preserve the quality of the structure. Once the TCT is grown, the concepts in the tree are extracted to form candidate disjointness axioms. They are intended to be validated by a domain expert/ontology engineer and may be involved in a subsequent debugging process for eliciting future cases of inconsistency (i.e. newly available individuals found to belong to disjoint concepts as shown in the previous example). Despite the potential benefits deriving from the induction of TCTs, there are some issues that were not investigated in preliminary works [24], and whose solution can further improve the framework.

Firstly, we tackle the issue of possible inconsistency cases that may be introduced by some axioms (out of a potentially large number of candidates) elicited by the data-driven method. Due to the context-dependent nature of disjointness, it may be hard to determine if a case of inconsistency indicates a truly erroneous axiom or a special case for the domain represented by the ontology. As an example, let us consider the concepts USPresident and Actor, and two individuals RONALD_REAGAN and DONALD_TRUMP. Let us suppose that a candidate axiom proposes the disjointness of these concepts. The actual inconsistency of the two cases depends on the intended meaning of Actor: if it is meant to denote anyone that participated in a movie, then the case of DONALD_TRUMP (who has some appearance as himself credited for some movies) may be considered as conflicting, whereas RONALD_REAGAN may be assumed as an exception to a general rule. To prevent the late evaluation of such cases after the induction of entire TCTs, it is important to improve the process of concept generation for the tree nodes. This has been redesigned, anticipating the verification of overlapping concepts. Secondly, the best candidate partition of a given cluster-node (and the corresponding concept descriptions) was selected adopting the cluster medoid as the prototype to measure the cohesion (separation) of the resulting child-clusters [24]. This setting did not consider the fact that there may be outliers or noisy individuals in one sub-cluster, that may be really close to the sibling cluster (e.g. children nodes with common parents in the TCT of Fig. 2). As a consequence, a perfect homogeneity of each sub-cluster cannot be ensured.

Summarizing, we further extend our framework for disjointness axiom discovery based on TCTs with:

a new version of the refinement operator that is able to prevent cases of inconsistency introduced by concepts installed in the tree nodes;

the adoption of different versions of the distance measures between the individuals;

a different heuristic aiming at maximizing the distances between the closest elements of two clusters instead of their medoid;

A new and more comprehensive empirical evaluation was designed and carried out aiming at assessing the effectiveness of the method based on the TCTs also in comparison with other related methods.

The paper is organized as follows: in Section 2, the disjointness axiom discovery problem is formalized in terms of a clustering problem of individuals of an ontological knowledge base. In Section 3, the details of the enhanced methods for inducing TDTs used for solving the targeted problem are illustrated. The comparative experimental evaluation of the new implementation on a testbed of common ontologies is presented in Section 4. In Section 5 related works are discussed. Section 6 concludes this work with further research directions.

2. Disjointness discovery as a conceptual clustering problem

In this section, we formalize the problem of discovering concept disjointness axioms from ontological knowledge bases in terms of a clustering task. We will borrow notation and terminology from Description Logics, being the theoretical foundation of the standard representation languages for the SW. Hence, we will use the terms concept (description) and role as synonyms of class and property respectively. DL constructors will be used for defining concept descriptions. Logic entailment, subsumption and equivalence for complex axioms will be denoted with the usual symbols ⊧, ⊑, and ≡, respectively. A knowledge base (KB) $K = ⟨ T, A ⟩$ is made up of the TBox $T$ , a set of terminological axioms regarding concepts and roles, and the ABox $A$ , a set of facts, i.e. concept/role assertions, regarding the individuals. $Ind (A)$ denotes the set of individuals (resource names) occurring in $A$ . Before formalizing the problem of discovering concept disjointness axioms, for the sake of completeness, we recall some basics of the clustering methods.

Clustering is an unsupervised learning task aiming at grouping a collection of objects into subsets, named clusters, such that those within each cluster are more closely related/similar to one another than the objects assigned to different clusters [1]. In cluster analysis, the quality of the clusters is assessed using indices that take into account measures of cohesion, i.e. total reciprocal similarity, among the objects within a cluster, and of separation, i.e. total reciprocal dissimilarity, among different clusters. In a general setting, an object is usually described in terms of features from a selected set $F$ ; a measure of similarity between objects is expressed in terms of a metric; for example, in the case of datasets of objects described by tuples of numeric feature values, the Euclidean distance, Cosine similarity or more complex metrics for vector spaces are typically adopted. A more complex clustering goal is pursued moving from flat to hierarchical structures. Another choice among the various clustering models is related to the form of membership of the objects with respect to the clusters. In the simplest (crisp) case, e.g. k-means, cluster membership is exclusive: each object is assigned to one cluster. Extensions, such as fuzzy c-means or EM [1], admit overlapping clusters with objects exhibiting a graded membership (responsibility) w.r.t. the clusters.

A further interesting class of methods is represented by conceptual clustering approaches [26] which generate also an intensional description (defining the membership property) for each cluster (e.g. a conjunction of propositional atoms). Beyond vectorial or, equivalently, propositional representations, more expressive richer logic languages may be adopted, such as the mentioned DLs. Differently from other methods, conceptual clustering algorithms for such representations may exploit available (schema-level) background knowledge for building descriptions for each cluster, i.e. axioms defining new concepts. Expressive representations require the support of suitable metrics for upgrading clustering methods.

A necessary condition for the disjointness of two (or more) concepts to hold is that their extensions do not overlap. Then the task of discovering disjointness axioms may be regarded as an unsupervised conceptual clustering problem aimed at finding separate partitions of individuals in the KB (such that each cluster consists of similar individuals, according to a given criterion) and producing intensional descriptions for them.

The problem is defined as follows:

Definition 1 (disjointness axiom discovery problem).

Given

a knowledge base $K = ⟨ T, A ⟩$

a set of individuals $I \subseteq Ind (A)$

Find

a partition Π of I in a set of pairwise disjoint clusters $Π = {C_{1}, \dots, C_{| Π |}}$

for each $i = 1, \dots, | Π |$ , a concept description $D_{i}$ that describes $C_{i}$ , so that:

$\forall a \in C_{i} : K ⊧ D_{i} (a)$ and

$\forall b \in C_{j}$ , $i \neq j : K ⊧ \neg D_{i} (b)$ .

Hence

\forall D_{i}, D_{j}, i \neq j : K ⊧ D_{j} ⊑ \neg D_{i}

Note that, differently from other settings, the number of clusters (say $k = | Π |$ ) is not a required parameter.

Example 1.
In the context of the corporate domain introduced in Section 1, let us consider the agents in a company KB (humans and machines). A concept description should be assigned to each cluster of a partition produced by a suitable method. As a result, a disjointness axiom may be discovered involving e.g. the clusters corresponding to the concepts Worker and Robot, namely $Worker ⊑ \neg Robot$ , provided that the sets of their instances do not overlap.

Note that the formalization in Def. 1 is language-independent and it is aimed at a simple flat partitioning structure, yet it can be generalized to target hierarchical structures. In the next section, a solution to such a more complex form of clustering is presented.
3. Induction of terminological cluster trees for disjointness axiom discovery

The proposed approach is grounded on a two-step process. In the first step, given a knowledge base, clusters and the related concepts that describe them are discovered and organized in a tree structure. In the second step, the induced structure is exploited for learning a set of candidate disjointness axioms.

The model is formally defined as follows:

Definition 2 (terminological cluster tree).

Given a knowledge base $K$ , a terminological cluster tree (TCT) is a binary logical tree [8] where:

each node, which stands for a cluster C of individuals, contains a concept description D (defined over the signature of $K$ )

each edge departing from an internal node corresponds to a partition of C in (two) sub-clusters.2

²
Noticeable difference with concept hierarchies: for each node in the TCT, its cluster, composed by instances of the concept in the parent node (ideally ⊤ for the root), is bi-partitioned according to the membership w.r.t. the concept in the current node.

A tree-node is represented by a quadruple $⟨ D, C, T_{left}, T_{right} ⟩$ with the left and right subtrees connected by either departing edge.

Fig. 3.

A fragment of TCT whose nodes are also decorated with the size of the respective cluster of individuals $C_{i}$ . Intuitively, the concept descriptions $D_{i}$ in the various nodes could be roughly mapped ( $C N_{i} \equiv D_{i}$ ) to the following names, respectively: $C N_{2} : Employee$ , $C N_{3} : Robot$ , $C N_{4} : Worker$ , $C N_{5} : Freelance$ .

Figure 3 illustrates an example of TCT resulting from the fragment of the corporate KB introduced in previous examples. In this tree, each concept installed into inner nodes is used to split a cluster of individuals $C_{i}$ in two sub-clusters. For instance, the cluster $C_{2}$ is split into 2 sub-clusters according to the membership of individuals in $C_{2}$ w.r.t. the concept description $Person ⊓ \exists worksIn . ⊤$ (roughly corresponding to $Employee ⊔ Worker$ or $\neg Freelance$ ) obtained as described in the sequel. Note that, similarly to the typical clustering methods [1], an optimal TCT is one that induces a hierarchical partition of the individuals that maximizes both the cohesion within each cluster and the mutual separation between pairs of clusters. Obviously, finding such a tree requires searching a very large space of all possible TCTs based on the various cluster structures resulting from the set of individuals (and all possible concept descriptions used to split them). This turns out to be computationally unfeasible, so a heuristic approach has been advised.

The construction of a TCT combines elements of logical decision trees induction [4,12] (recursive partitioning and refinement operators for specializing concept descriptions) and of instance-based learning (a distance measure over the instance space). The details of the algorithms for (a) growing a TCT and (b) deriving intensional definitions of candidate disjoint concept descriptions are reported in the sequel.

3.1. Growing terminological cluster trees

A TCT is induced by a recursive strategy (see Algo. 1), which follows the schema proposed for growing terminological decision trees (TDTs) [12,22] solving the instance classification problem. The ultimate goal is to find a partition of pure clusters in terms of cohesion. The main routine induceTCT is to be invoked passing I, ⊤ and an empty $CS$ . In this recursive function, the base case tests the stopCondition predicate checking whether either the cluster I is too small to be partitioned or its (measure of) cohesion exceeds a given threshold ν (further details about the heuristics and the stop condition are reported in Section 3.1.3). In this case the algorithm updates a set of concepts $CS$ that is used by the refinement operator to prevent producing specializations that overlap with concept descriptions previously installed in other tree nodes.

Algorithm 1

Main routine for growing TCTs and stop condition test

In the inductive step, which occurs when the stop condition does not hold, the current (parent) concept description C has to be specialized using a refinement operator (ρ) that spans over a search space of concepts subsumed by C. A set of candidate specializations $S \subseteq ρ (C)$ is obtained via $SPECIALIZE (C, I, CS)$ such that, for each of them at least a positive and a negative instance can be found. Then selectBestConcept evaluates each candidate specialization in S in terms of a measure of separation based on the distance (see Eqs (1) and (2) discussed in the following) between the pairs of sub-clusters P, made up of positive instances w.r.t. the current candidate concept, and N, made up of negative instances w.r.t. the current candidate concept. The membership tests are based on instance checking [2]). Hence, selectBestConcept returns the best concept description $E^{*} \in S$ , that is the one maximizing the mentioned heuristic grounded on the notion of separation. Then $E^{*}$ is installed in the current node and the individuals in I are partitioned by split to be routed along the left or the right branch departing from the current node, i.e. positive and negative instances w.r.t. $E^{*}$ . This divide-and-conquer strategy is applied recursively by the algorithm until no further branching is advisable, as the sets of instances routed to the leaf-nodes meet the stop condition. As mentioned before, the number of clusters is not required as an input, it depends on the number of branches grown: the algorithm is able to determine it according to the data distribution.

3.1.1. Downward refinement operators

The proposed approach relies on a downward refinement operator ρ [12,21] that must be able to generate satisfiable concepts – in terms of the models of the KB – performing a specialization process. It can be defined by cases in terms of ancillary sub-functions. Given the a concept description C (or its complement) to be specialized, the operator ρ computes specializations of C in one of the following forms:

by adding a concept atom (or its complement) as a conjunct: $C^{'} = C ⊓ (\neg) A$ ;

by adding a general existential restriction (or its complement) as a conjunct: $C^{'} = C ⊓ (\neg) \exists R . ⊤$ ;

by adding a general universal restriction (or its complement) as a conjunct: $C^{'} = C ⊓ (\neg) \forall R . ⊤$ ;

by replacing a sub-description $C_{i}$ in the scope of an existential restriction in C with one of its refinements: $\exists R . C_{i}^{'} \in ρ (\exists R . C_{i}) \land C_{i}^{'} \in ρ (C_{i})$ ;

by replacing a sub-description $C_{i}$ in the scope of a universal restriction with one of its refinements: $\forall R . C_{i}^{'} \in ρ (\forall R . C_{i}) \land C_{i}^{'} \in ρ (C_{i})$ .

Note that the cases of

ρ_{4}

and

ρ_{5}

are recursive.

Example 2.
Given the TCT in Fig. 3, starting from Person, the following refinements can be obtained:
$Person ⊓ \exists worksIn . ⊤$ , installed in the node 2, is generated using $ρ_{2}$ ;

$Person ⊓ \exists worksIn . Factory$ in node 4 is generated using $ρ_{4}$ .

The refinement operator ρ performs a sort of random sampling over a DL concept space. It is to be remarked that it does not satisfy all of the properties required for ideality, i.e. finiteness (for any concept the set of specializations is finite), completeness (for all concepts C and D, such that $D ⊏ C$ , a concept E, such that $E \equiv D$ , can be computed chaining a number of applications of ρ) and properness (for all concepts C and D, if $D \in ρ (C)$ , $D ⊏ C$ ) [21]. Specifically, concerning the finiteness property, the refinement operator does not satisfy it because no bounds are imposed to the number of specializations to be generated through the random process. However this problem can be easily solved controlling algorithmically the number of specializations, by imposing a finite beam dimension n and/or to the depth of the recursive calls. Also the completeness of the refinement operator is not guaranteed because the constructors employed are evidently limited to those available for the $ALC$ DL. Besides, the random process needed for generating each specialization may consider some concept descriptions more times while others (potentially useful for the learning problem) may be overlooked. Lastly, even the properness is not ensured, due to the equivalence axioms defined in the TBox. For instance, given a concept description $(C_{1} ⊓ C_{2})$ to be refined, the operator (via $ρ_{1}$ ) may add a new concept name B such that $C_{1} \equiv B$ . Also this property can be enforced by further checks on refinements to be output.

Instead of aiming at the definition of a theoretical operator endowed with these properties, which would be intended for a use with a generic generate-and-test algorithm, we decided to devise a more complex yet effective data-driven specialization procedure, capable of leveraging on the situation of the tree under construction. Algo. 2 illustrates the resulting procedure specialize, which embeds the refinement operator ρ. Besides the concept description C to be refined, it requires a set of individuals I and a set of concept description $CS$ that are employed to drive the traversal of the search space. Note that its behavior is also controlled by the beam dimension n. The specializations are generated through the following steps:
generate the specialization $D = C ⊓ E$ adding a conjunct E (selected by addConjunct);

apply subroutine simplify to reduce redundancy and syntactic length3
³
The length of a concept description C, $len (C)$ is defined inductively:

* $len (A) = len (⊤) = len (⊥) = 1$

* $len (\neg D) = len (D) + 1$

* $len (D ⊓ E) = len (D ⊔ E) = len (D) + len (E) + 1$

* $len (\exists R . D) = len (\forall R . D) + 1$ .

of this D [20];
These steps are repeated to ensure that

the resulting specialization D is satisfiable w.r.t. $K$ and if (both negative and positive) instances of D are available in I;

it does not overlap with the concepts $D^{'} \in CS$ of the control set (where the concept extensions are approximated using the retrieval $r_{K}$ inference service [2]).

Algorithm 2
The specialization routines employed for inducing TCTs

The specializations D are produced by adding a new (complex) description via the auxiliary function addConjuct. The recursive procedure tests the value of a random variable $X \sim U (0, 1)$ to decide which refinement case ( $ρ_{1}$ – $ρ_{5}$ ) must be applied. In the base case, a random concept name A is picked from those in the signature of $K$ to output a concept in one of the possible forms: $C ⊓ A$ (see $ρ_{1}$ ) or $C ⊓ \exists (\forall) R . (D ⊓ A)$ (see $ρ_{4}$ – $ρ_{5}$ ). The recursive calls produce sub-descriptions for the complement or the existential and universal restriction (w.r.t a randomly picked role) operators. After addConjuct has produced a sub-description E, simplify is applied to possibly generate a shorter concept equivalent to $C ⊓ E$ . This is aimed at improving the overall interpretability of the resulting TCTs. Essentially, this function checks if $E ⊑ C$ is entailed by the KB, returning the concept description E as an output when this condition holds.
Example 3.
Let us suppose that addConjunct refines the concept $\exists worksIn . ⊤$ by adding the conjunct $\exists worksIn . Factory$ . In this case, $\exists worksIn . Factory$ would be returned via simplify rather than the redundant description $\exists worksIn . ⊤ ⊓ \exists worksIn . Factory$ .

As previously mentioned, the specializations returned by ρ are required to be satisfiable. This does not ensure that instances of such concepts are actually represented in the training sets. It is important to avoid the generation of satisfiable concept descriptions for which the training individuals exhibit a neutral membership (neither positive nor negative examples): installing one of these concepts in a node would end up with all the individuals in the current node to be sorted to a single branch, thus undermining the divisive value of the node tests and increasing the complexity of the trees (in terms of number of nodes) with no evident advantage. To avoid these refinements, the algorithm verifies if a non-null number of positive and negative instances can be found for D, i.e. both the intersection between I and the instances of D ( $r_{K} (D)$ ) and the intersection between I and the complement ( $r_{K} (\neg D)$ ) are empty. This constraint may be too strict to be satisfied, due to the sparseness of the assertional knowledge (in the ABox) for some of the involved concepts. These situations, causing delays (or even infinite loops) can be easily prevented by adopting some timeout condition to stop the generation of a new refinement (producing a leaf-node, instead). A further aspect to be considered is the possible overlap between the concept description D and those installed in other clusters (they are supposed to be contained in the control set $CS$ , passed as an argument to the procedure). It has been observed that, owing to this overlap, the clustering procedure may install concept descriptions that would introduce inconsistency once the axioms derived from the TCT were added to the KB [24]. Therefore, the refinement operator has been extended to avoid the generation of such concepts: the procedure will return a specialization only if it has no common instances with those contained in $CS$ (check operated via overlap).
Example 4.
Let us suppose that the function Overlap checks if the control set contains the concept $Person ⊓ \forall worksIn . ⊥$ (used to describe the individuals in $C_{5}$ ). The procedure implementing the refinement operator should avoid the generation of the concept $Person ⊓ Freelance$ .
3.1.2. Heuristics

The algorithms for growing TCTs and TDTs share a common structure but differ on the criterion for selecting the test concepts to be installed in the nodes: while the latter adopts a scoring function based on the classic notion of information gain, the separation measure to be maximized in the procedure for the TCTs relies on a distance defined over the individuals occurring in the KB. Specifically, the heuristic for selecting the best refinement of the parent concept is defined as follows: $\begin{matrix} (1) & E^{*} = \underset{E \in ρ (C)}{argmax} d (p (P_{E}), p (N_{E})) \end{matrix}$ where $P_{E}$ and $N_{E}$ are the sub-clusters output by split, $d (\cdot, \cdot)$ is a distance measure between individuals in a KB and $p (\cdot)$ is a function that maps a cluster of individuals to its prototype, such as the medoid of the cluster. However, it was observed that maximizing the distance between the medoids may not guarantee to avoid the overlap between the sub-clusters $P_{E}$ and $N_{E}$ [24]. Indeed, we observed cases of boundary individuals in one cluster which were on average closer to those in the other cluster (including their medoid) than to the others belonging to the same cluster. To tackle such cases a more sophisticated heuristic can be adopted: $\begin{matrix} (2) & E^{*} = \underset{E \in ρ (C)}{argmax} min_{b \in P_{E}, c \in N_{E}} d (b, c) \end{matrix}$

The score for each candidate E is determined quantifying the risk of overlap between two clusters according to the distance between the closest individuals belonging to $P_{E}$ and $N_{E}$ . The heuristic resorts to a variation of a language-independent dissimilarity measure proposed in previous works [6,10]. Given the knowledge base $K$ , the idea is to compare the behavior of the individuals w.r.t. a set of concepts $C = {F_{1}, F_{2}, \dots, F_{m}}$ that is dubbed context or committee of features. For each $F_{i} \in C$ , a projection function $π_{i} : Ind (A) \to [0, 1]$ is defined as a simple mapping: $\begin{matrix} (3) & \begin{matrix} \forall a \in Ind (A) \\ π_{i} (a) = \{\begin{matrix} 1 & if K ⊧ F_{i} (a) \\ 0 & if K ⊧ \neg F_{i} (a) \\ 0.5 & otherwise \end{matrix} \end{matrix} \end{matrix}$ where the default value (0.5) represents a case of maximal uncertainty about the membership. Alternatively, an estimated likelihood of being an instance of a $F_{i}$ for a generic individual a could be considered. Especially with densely populated ontologies (as those forming the Web of Data) the probability $Pr [K ⊧ F_{i} (a)]$ may be estimated by $| r_{K} (F_{i}) | / | Ind (A) |$ , where $r_{K} ()$ denotes the retrieval of a concept w.r.t. $K$ , i.e. the set of individuals of $Ind (A)$ whose membership to $F_{i}$ is entailed by $K$ [2]. Finally, a family of distance measures ${d_{n}^{C}}_{n \in N}$ can then be defined as follows: $d_{n}^{C} : Ind (A) \times Ind (A) \to [0, 1]$ with $\begin{matrix} (4) & d_{n}^{C} (a, b) = {[\sum_{i = 1}^{m} w_{i} {[1 - π_{i} (a) π_{i} (b)]}^{n}]}^{1 / n} \end{matrix}$

Non uniform values for the vector of weights $\vec{w}$ can be considered to reflect the specific importance of each feature. For example it may be set according to an entropic measure [6,10] based on the average information brought by each concept: $\begin{matrix} (5) & \begin{matrix} \forall i \in {1, \dots, m} \\ w_{i} = - \sum_{k \in {- 1, 0, + 1}} μ_{i} (k) log μ_{i} (k) \end{matrix} \end{matrix}$ where, given a generic $a \in Ind (A)$ , the following estimates can be used: $μ_{i} (+ 1) \approx Pr [K ⊧ F_{i} (a)]$ , $μ_{i} (- 1) \approx Pr [K ⊧ \neg F_{i} (a)]$ and $μ_{i} (0) = 1 - μ_{i} (+ 1) - μ_{i} (- 1)$ .

An alternative distance measure proposed in other works [7] is the following: $\begin{matrix} (6) & d_{n}^{C} (a, b) = {[\sum_{i = 1}^{m} w_{i} {[π_{i} (a) - π_{i} (b)]}^{n}]}^{1 / n} \end{matrix}$ Note that the two distance measures reported above exhibit different behaviors. The former reaches its maximum when, given two individuals a and b and a feature concept $F_{i} \in C$ , $π_{i} (a) = 0$ and $π_{i} (b) = 0$ or $π_{i} (a) = 0.5$ and $π_{i} (b) = 0.5$ (assuming $w_{i} = 1$ ), i.e in the cases of negative and uncertain membership. The latter reaches its maximum value, when the individuals a and b have opposite definite memberships for $F_{i}$ (i.e. $π_{i} (a) = 0$ and $π_{i} (a) = 1$ and vice-versa).

3.1.3. Stop conditions

The growth of a TCT can be stopped if one of the following conditions are satisfied (see Algo. 1):

the set of individuals is too small to be partitioned: this is made testing if $| I | ⩽ δ$

The concept C to be specialized is different from ⊤:in this case the algorithm finds the positive and negative instances of C and exploits a threshold $ν \in [0, 1]$ for the value of $d (\cdot, \cdot)$ . If the value is below the threshold, the branch growth is stopped.

To avoid trivial clusters, the Boolean function stopCondition is forced to return false when $C = ⊤$ . Conversely, the growth of TCT would stop after the first call, i.e. when the ⊤ and I are passed as input and the specialization process would never occur.

3.2. Extraction of disjointness axioms from TCTs

The procedure for discovering/extracting disjointness axioms requires a TCT as its input. Its details are reported in Algo. 3.

Algorithm 3

Derivation of disjointness axioms from TCTs

Function $DERIVE C ANDIDATE A XIOMS$ can be employed to traverse the TCT passed as an argument to collect the concept descriptions that are installed in the parents of the leaf-nodes. In this phase, it generates a set of concept descriptions $CCD$ . Then, the function considers all pairs of elements C and D in $CCD$ and checks if the number of instances of the concepts $D ⊓ C$ does not exceed the threshold θ (a parameter to be set in the configuration).

The set of collected concept descriptions $CCD$ is obtained by traversing the TCT. The collect function is invoked for gathering concept descriptions for which disjointness axioms may hold by exploring (recursively) the various paths along the (sub)tree from the root to the leaves.

Note that the hierarchical nature of the approach may allow for a further generalization of this function, controlling with a further parameter the maximum depth of the inner nodes to be visited during the traversal. This would likely produce fewer and more general axioms with respect to the specification of the function reported in Algo. 3.

Example 5.

Given the TCT in Fig. 3, the following set of concepts can be built using the routine collect: $\begin{matrix} CDD = { & Person, \neg Person ⊓ Robot, \\ Person ⊓ \exists worksIn . ⊤, \\ Person ⊓ \forall worksIn . ⊥, \\ Person ⊓ \exists worksIn . Factory} \end{matrix}$

The following candidates axioms can be derived: $\begin{array}{l} A = { & Person ⊑ \neg (\neg Person ⊓ Robot), \\ \begin{matrix} Person & ⊓ \exists worksIn . ⊤ \\ ⊑ \neg (\neg Person ⊓ Robot), \end{matrix} \\ \begin{matrix} Person & ⊓ \exists worksIn . Factory \\ ⊑ \neg (\neg Person ⊓ Robot), \end{matrix} \\ \begin{matrix} Person & ⊓ \exists worksIn . Factory \\ ⊑ \neg (Person ⊓ \forall worksIn . ⊥)} \end{matrix} \end{array}$

The algorithm to elicit candidate axioms from TCTs provides an approximation of the missing axiom described in Ex 1. Additionally, it was also able to generate axioms involving other concepts, e.g. $Person ⊓ \exists worksIn . Factory ⊑ \neg (Person ⊓ \forall worksIn . ⊥)$ , corresponding to the axiom $Worker ⊑ \neg Freelance$ .

4. Experiments

In this section, the design and the outcomes of a comparative empirical evaluation of the proposed model and related methods are reported. The experiments were aimed at assessing the performance of the revised version4

⁴
Code and testbed of ontologies are publicly available at: https://github.com/Giuseppe-Rizzo/TCTnew.

of the method based on the TCTs, in comparison with other state-of-the-art statistical methods for discovering disjointness axioms (to be further discussed in the next Section 5). We first illustrate the methodology with the experimental design and setup, then we report and discuss the outcomes of the various sessions.

4.1. Methodology

4.1.1. Ontologies

In the experiments, we considered a variety of freely available Web ontologies describing various domains, namely: BioPax, New Testament Names (NTN), Financial, Geoskills, Monetary, and DBpedia3.9, Mutagenesis and Vicodi. The principal characteristics of the selected KBs are summarized in Table 1. BioPax is a translation into BioPax format of the glycolysis pathway in the EcoCyc database. NTN describes the characters and places mentioned in the New Testament. Financial was created for modeling the domain of banking. GeoSkills comes from an effort aimed at encoding competencies, topics, and educational levels of the mathematics curriculum standards throughout Europe. Monetary is an ontology that was intended for modeling information about currencies. DBPedia will denote a fragment extracted from the DBPedia 3.9 ontology, employing a crawling procedure that traversed the RDF graph for retrieving instances and the related schema information. The other ontologies are Mutagenesis, a porting of a well known benchmark for relational learning methods, and Vicodi, a part of a larger ontology that formalizes knowledge concerning historical events.

Table 1
Ontologies employed in the experiments

Ontology DL Language #Concepts #Roles #Individuals #Disj.Axioms

BioPax $ALCIF (D)$ 74 70 323 85

NTN $SHIF (D)$ 47 27 676 40

Financial $ALCIF (D)$ 60 16 1000 113

GeoSkills $ALCHOIN (D)$ 596 23 2567 378

Monetary $ALCHIF (D)$ 323 247 2466 236

DBPedia $ALCHI (D)$ 251 132 16606 11

Mutagenesis $AL (D)$ 86 5 14145 0

Vicodi $ALHI (D)$ 196 10 16942 0

Ontology	DL Language	#Concepts	#Roles	#Individuals	#Disj.Axioms
BioPax	$ALCIF (D)$	74	70	323	85
NTN	$SHIF (D)$	47	27	676	40
Financial	$ALCIF (D)$	60	16	1000	113
GeoSkills	$ALCHOIN (D)$	596	23	2567	378
Monetary	$ALCHIF (D)$	323	247	2466	236
DBPedia	$ALCHI (D)$	251	132	16606	11
Mutagenesis	$AL (D)$	86	5	14145	0
Vicodi	$ALHI (D)$	196	10	16942	0

4.1.2. Tasks and design of the experiments

The experiments had two main goals:

assessing the ability of the proposed approach to (re-)discover target axioms originally defined as the result of a knowledge engineering process;

assessing the number and quality of the new axioms discovered preserving the KB consistency in comparison with related methods.

In order to automate the test of the axioms produced we decided to bypass the intervention of domain experts which is hardly available and may compromise the repeatability of the experiments. To cope with the lack of target disjointness axioms in the considered KBs which would offer a natural gold standard (ground truth) for the tests, we considered modified versions of the ontologies reported in Table 1, which were produced through the artificial introduction of new disjointness axioms involving sibling concepts in the subsumption hierarchy, according to the SDA, yet preserving the consistency of the ontologies. For each ontology, a fraction f of disjointness axioms was randomly removed. To have performance indices unbiased by the specific selection of axioms, this procedure was repeated 10 times per ontology and also increasing f: 20%, 50%, 70%.

The effectiveness of the methods was evaluated in terms of

the average number of original axioms rediscovered (the larger the better) that can be considered as a sort of recall measure

the average number of inconsistencies in the knowledge base (the less the better) and the average number of axioms elicited, which is an indicator of the precision of the tested methods.

4.1.3. Set-up of the implemented algorithms

In the evaluation we tested various configurations of the overall discovery process based on the TCT learning algorithm that can be summarized as follows:

v.1: first release of the TCT learning algorithm [24] combined with the heuristics (1) and (2) and the distance measures (4) and (6) (both entropic – denoted by e – and uniform weights – denoted by u – have been considered);

v.2: new version of TCT learning algorithm exploiting the heuristics (1) and (2) and the distance measures (4) and (6) (with and without entropic weights)

v.3: same as v.2 but with the consistency check described in Algo. 2.

The distance measure

d_{2}^{C}

was selected from the family with a context of features

C

made up of the atomic concepts in the signature of each KB. The beam width for controlling the number of specializations was set to 100. The timeout for generating a refinement, useful in cases when positive and negative instances were hard to find in the cluster, was set to 300 ms. In all releases but the first, the axiom discovery procedure required the value of threshold θ which was set to 10.

As previously mentioned we tested the various configurations of the procedure based on TCTs against other approaches proposed in the related literature (see Section 5), in particular those based on Pearson’s correlation coefficient (PCC) and negative association rules (NAR) [27]. As for the latter, rules were mined using Apriori, with the required parameters values set as follows: minimum support rate 10, minimum confidence rate 50%, and maximum rule length 3 (also in consideration of the sparseness of the instance distributions w.r.t. the concepts in the specific ontologies).

4.2. Experimental results: Presentation and analysis

For the sake of readability Tables 2–7 report some outcomes of the empirical evaluation while the complete results are listed in Appendix A.

4.2.1. Rediscovering disjointness axioms

Table 2
Average rates (and standard deviations) of original axioms re-discovered. Configurations v.1 and v.2 – scoring function 1

Ontology distance / weights f TCT – standard mode

TCT 0.9 TCT 0.8 TCT 0.7

BioPax (4) / u 20% $0.82 \pm 0.08$ $0.82 \pm 0.08$ $0.82 \pm 0.08$

50% $0.82 \pm 0.09$ $0.82 \pm 0.09$ $0.82 \pm 0.09$

70% $0.81 \pm 0.10$ $0.81 \pm 0.10$ $0.81 \pm 0.10$

(4) / e 20% $0.90 \pm 0.12$ $0.76 \pm 0.13$ $0.74 \pm 0.13$

50% $0.85 \pm 0.13$ $0.74 \pm 0.13$ $0.74 \pm 0.13$

70% $0.85 \pm 0.13$ $0.74 \pm 0.12$ $0.74 \pm 0.14$

(6) / u 20% $0.63 \pm 0.23$ $0.67 \pm 0.23$ $0.69 \pm 0.29$

50% $0.69 \pm 0.23$ $0.69 \pm 0.22$ $0.69 \pm 0.22$

70% $0.69 \pm 0.23$ $0.69 \pm 0.22$ $0.69 \pm 0.22$

(6) / e 20% $0.70 \pm 0.12$ $0.73 \pm 0.11$ $0.73 \pm 0.12$

50% $0.70 \pm 0.12$ $0.73 \pm 0.11$ $0.73 \pm 0.12$

70% $0.70 \pm 0.12$ $0.73 \pm 0.11$ $0.73 \pm 0.12$

Monetary (4) / u 20% $0.99 \pm 0.08$ $1.00 \pm 0.00$ $1.00 \pm 0.00$

50% $0.94 \pm 0.13$ $1.00 \pm 0.00$ $1.00 \pm 0.00$

70% $0.94 \pm 0.13$ $0.91 \pm 0.14$ $0.91 \pm 0.13$

(4) / e 20% $0.99 \pm 0.08$ $1.00 \pm 0.00$ $1.00 \pm 0.00$

50% $0.94 \pm 0.13$ $1.00 \pm 0.00$ $1.00 \pm 0.00$

70% $0.94 \pm 0.13$ $0.91 \pm 0.14$ $0.91 \pm 0.13$

(6) / u 20% $0.89 \pm 0.14$ $0.76 \pm 0.14$ $0.76 \pm 0.13$

50% $0.92 \pm 0.16$ $0.90 \pm 0.16$ $0.92 \pm 0.16$

70% $0.94 \pm 0.13$ $0.94 \pm 0.13$ $0.94 \pm 0.12$

(6) / e 20% $0.97 \pm 0.15$ $0.97 \pm 0.15$ $0.97 \pm 0.15$

50% $0.93 \pm 0.11$ $0.93 \pm 0.11$ $1.00 \pm 0.00$

70% $0.94 \pm 0.13$ $0.91 \pm 0.14$ $0.91 \pm 0.13$

Mutagen. (4) / u 20% $1.00 \pm 0.00$ $1.00 \pm 0.00$ $1.00 \pm 0.00$

50% $1.00 \pm 0.00$ $1.00 \pm 0.00$ $1.00 \pm 0.00$

70% $1.00 \pm 0.00$ $1.00 \pm 0.00$ $1.00 \pm 0.00$

(4) / e 20% $1.00 \pm 0.00$ $1.00 \pm 0.00$ $1.00 \pm 0.00$

50% $1.00 \pm 0.00$ $1.00 \pm 0.00$ $1.00 \pm 0.00$

70% $1.00 \pm 0.00$ $1.00 \pm 0.00$ $1.00 \pm 0.00$

(6) / u 20% $0.76 \pm 0.14$ $0.77 \pm 0.14$ $0.77 \pm 0.13$

50% $0.82 \pm 0.11$ $0.82 \pm 0.11$ $0.81 \pm 0.11$

70% $0.84 \pm 0.09$ $0.84 \pm 0.08$ $0.83 \pm 0.10$

(6) / e 20% $0.95 \pm 0.05$ $0.95 \pm 0.05$ $0.95 \pm 0.05$

50% $1.00 \pm 0.00$ $1.00 \pm 0.00$ $1.00 \pm 0.00$

70% $1.00 \pm 0.00$ $1.00 \pm 0.00$ $1.00 \pm 0.00$

Vicodi (4) / u 20% $0.95 \pm 0.02$ $0.90 \pm 0.08$ $0.90 \pm 0.08$

50% $0.95 \pm 0.02$ $0.90 \pm 0.08$ $0.90 \pm 0.08$

70% $0.95 \pm 0.02$ $0.90 \pm 0.08$ $0.90 \pm 0.08$

(4) / e 20% $0.95 \pm 0.02$ $0.90 \pm 0.08$ $0.90 \pm 0.08$

50% $0.95 \pm 0.02$ $0.90 \pm 0.08$ $0.90 \pm 0.08$

70% $0.95 \pm 0.02$ $0.90 \pm 0.08$ $0.90 \pm 0.08$

(6) / u 20% $0.92 \pm 0.05$ $0.89 \pm 0.08$ $0.89 \pm 0.08$

50% $0.95 \pm 0.04$ $0.93 \pm 0.02$ $0.92 \pm 0.04$

70% $0.95 \pm 0.04$ $0.93 \pm 0.02$ $0.92 \pm 0.04$

(6) / e 20% $0.92 \pm 0.05$ $0.89 \pm 0.08$ $0.89 \pm 0.08$

50% $0.92 \pm 0.05$ $0.89 \pm 0.08$ $0.89 \pm 0.08$

70% $0.90 \pm 0.05$ $0.87 \pm 0.03$ $0.87 \pm 0.03$

Ontology	distance / weights	f	TCT – standard mode
BioPax	(4) / u	20%	$0.82 \pm 0.08$	$0.82 \pm 0.08$	$0.82 \pm 0.08$
50%	$0.82 \pm 0.09$	$0.82 \pm 0.09$	$0.82 \pm 0.09$
70%	$0.81 \pm 0.10$	$0.81 \pm 0.10$	$0.81 \pm 0.10$
(4) / e	20%	$0.90 \pm 0.12$	$0.76 \pm 0.13$	$0.74 \pm 0.13$
50%	$0.85 \pm 0.13$	$0.74 \pm 0.13$	$0.74 \pm 0.13$
70%	$0.85 \pm 0.13$	$0.74 \pm 0.12$	$0.74 \pm 0.14$
(6) / u	20%	$0.63 \pm 0.23$	$0.67 \pm 0.23$	$0.69 \pm 0.29$
50%	$0.69 \pm 0.23$	$0.69 \pm 0.22$	$0.69 \pm 0.22$
70%	$0.69 \pm 0.23$	$0.69 \pm 0.22$	$0.69 \pm 0.22$
(6) / e	20%	$0.70 \pm 0.12$	$0.73 \pm 0.11$	$0.73 \pm 0.12$
50%	$0.70 \pm 0.12$	$0.73 \pm 0.11$	$0.73 \pm 0.12$
70%	$0.70 \pm 0.12$	$0.73 \pm 0.11$	$0.73 \pm 0.12$
Monetary	(4) / u	20%	$0.99 \pm 0.08$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
50%	$0.94 \pm 0.13$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
70%	$0.94 \pm 0.13$	$0.91 \pm 0.14$	$0.91 \pm 0.13$
(4) / e	20%	$0.99 \pm 0.08$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
50%	$0.94 \pm 0.13$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
70%	$0.94 \pm 0.13$	$0.91 \pm 0.14$	$0.91 \pm 0.13$
(6) / u	20%	$0.89 \pm 0.14$	$0.76 \pm 0.14$	$0.76 \pm 0.13$
50%	$0.92 \pm 0.16$	$0.90 \pm 0.16$	$0.92 \pm 0.16$
70%	$0.94 \pm 0.13$	$0.94 \pm 0.13$	$0.94 \pm 0.12$
(6) / e	20%	$0.97 \pm 0.15$	$0.97 \pm 0.15$	$0.97 \pm 0.15$
50%	$0.93 \pm 0.11$	$0.93 \pm 0.11$	$1.00 \pm 0.00$
70%	$0.94 \pm 0.13$	$0.91 \pm 0.14$	$0.91 \pm 0.13$
Mutagen.	(4) / u	20%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
50%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
70%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
(4) / e	20%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
50%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
70%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
(6) / u	20%	$0.76 \pm 0.14$	$0.77 \pm 0.14$	$0.77 \pm 0.13$
50%	$0.82 \pm 0.11$	$0.82 \pm 0.11$	$0.81 \pm 0.11$
70%	$0.84 \pm 0.09$	$0.84 \pm 0.08$	$0.83 \pm 0.10$
(6) / e	20%	$0.95 \pm 0.05$	$0.95 \pm 0.05$	$0.95 \pm 0.05$
50%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
70%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
Vicodi	(4) / u	20%	$0.95 \pm 0.02$	$0.90 \pm 0.08$	$0.90 \pm 0.08$
50%	$0.95 \pm 0.02$	$0.90 \pm 0.08$	$0.90 \pm 0.08$
70%	$0.95 \pm 0.02$	$0.90 \pm 0.08$	$0.90 \pm 0.08$
(4) / e	20%	$0.95 \pm 0.02$	$0.90 \pm 0.08$	$0.90 \pm 0.08$
50%	$0.95 \pm 0.02$	$0.90 \pm 0.08$	$0.90 \pm 0.08$
70%	$0.95 \pm 0.02$	$0.90 \pm 0.08$	$0.90 \pm 0.08$
(6) / u	20%	$0.92 \pm 0.05$	$0.89 \pm 0.08$	$0.89 \pm 0.08$
50%	$0.95 \pm 0.04$	$0.93 \pm 0.02$	$0.92 \pm 0.04$
70%	$0.95 \pm 0.04$	$0.93 \pm 0.02$	$0.92 \pm 0.04$
(6) / e	20%	$0.92 \pm 0.05$	$0.89 \pm 0.08$	$0.89 \pm 0.08$
50%	$0.92 \pm 0.05$	$0.89 \pm 0.08$	$0.89 \pm 0.08$
70%	$0.90 \pm 0.05$	$0.87 \pm 0.03$	$0.87 \pm 0.03$

Throughout the experiments, we noted that the algorithm based on the TCTs was able to re-discover most of the disjointness axioms that had been previously removed to test this ability. The limited number of cases where the algorithm did not manage to re-discover the axioms depended on the choice for the threshold ν: the lower its values the less recursive calls are required for completing the induction of a TCT. Anticipating the termination, along with the inherent incompleteness of the refinement operator, may be one of the reasons for not getting the exact concepts involved in the original axioms. Further aspects that may have affected the outcomes are the choice of the distance measure and the heuristic adopted to select the concepts to be installed in the nodes. As regards the former, we noted that, adopting function (6), the rate of rediscovered axioms was lower than the one obtained with function (4). The resulting trees showed a less complex structure (less nodes) using (6) instead of (4). Besides, higher values were generally returned by the first measure. This makes homogeneous clusters of individuals (that determine the stopping condition for the tree growth) harder to find. In this perspective, also the choice of the vector of weights has had some influence: while the easier choice of uniform weights tended to flatten the distance measures, especially in the cases with large contexts of features, entropic weights resulted in a sort of preliminary feature selection that tended to discard many unrelated concepts of the context.

In this sense, the experiments with BioPax reported in Table 2 and 3 are particularly illustrative about the effectiveness of the weighting model for the resulting measure: the average rate of discovered axioms noticeably improved when distance (4) was employed, spanning from 0.63 up to 0.85.

Similar improvements were observed in the experiments with Mutagenesis and Vicodi. In the experiments with the other ontologies, i.e. NTN, Financial, GeoSkills, Monetary and DBpedia (where a rate greater than 0.9 was often observed regardless the specific configuration of TCT learning algorithm), improvements were observed, although to a lesser extent (see Appendix A).

Table 3

Average rates (and standard deviations) of removed axioms re-discovered using TCTs v.3 – scoring functions (1) and (2)

Ontology	distance / weights	f	TCT – standard mode

			TCT 0.9	TCT 0.8	TCT 0.7
BioPax	(4) / u	20%	$0.85 \pm 0.03$	$0.82 \pm 0.07$	$0.82 \pm 0.08$
		50%	$0.86 \pm 0.13$	$0.82 \pm 0.13$	$0.83 \pm 0.10$
		70%	$0.87 \pm 0.12$	$0.87 \pm 0.12$	$0.87 \pm 0.13$
	(4 / e	20%	$0.90 \pm 0.12$	$0.84 \pm 0.13$	$0.81 \pm 0.13$
		50%	$0.92 \pm 0.14$	$0.90 \pm 0.11$	$0.90 \pm 0.10$
		70%	$0.93 \pm 0.16$	$0.91 \pm 0.11$	$0.90 \pm 0.11$
	(6) / u	20%	$0.66 \pm 0.20$	$0.68 \pm 0.22$	$0.67 \pm 0.30$
		50%	$0.68 \pm 0.21$	$0.68 \pm 0.21$	$0.68 \pm 0.21$
		70%	$0.73 \pm 0.22$	$0.70 \pm 0.21$	$0.71 \pm 0.21$
	(6) / e	20%	$0.76 \pm 0.12$	$0.75 \pm 0.08$	$0.74 \pm 0.10$
		50%	$0.78 \pm 0.12$	$0.76 \pm 0.14$	$0.72 \pm 0.12$
		70%	$0.78 \pm 0.09$	$0.73 \pm 0.11$	$0.73 \pm 0.12$
Mutag.	(4) / u	20%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
		50%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
		70%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
	(4) / e	20%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
		50%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
		70%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
	(6) / u	20%	$0.84 \pm 0.07$	$0.82 \pm 0.14$	$0.77 \pm 0.13$
		50%	$0.82 \pm 0.11$	$0.82 \pm 0.11$	$0.81 \pm 0.11$
		70%	$0.84 \pm 0.09$	$0.84 \pm 0.08$	$0.83 \pm 0.10$
	(6) / e	20%	$0.95 \pm 0.05$	$0.95 \pm 0.05$	$0.95 \pm 0.05$
		50%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
		70%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
Vicodi	(4) / u	20%	$0.93 \pm 0.05$	$0.92 \pm 0.06$	$0.92 \pm 0.06$
		50%	$0.94 \pm 0.01$	$0.89 \pm 0.03$	$0.90 \pm 0.03$
		70%	$0.94 \pm 0.01$	$0.89 \pm 0.03$	$0.90 \pm 0.03$
	(4) / e	20%	$0.95 \pm 0.02$	$0.90 \pm 0.08$	$0.90 \pm 0.08$
		50%	$0.98 \pm 0.04$	$0.98 \pm 0.04$	$0.98 \pm 0.03$
		70%	$0.98 \pm 0.03$	$0.97 \pm 0.05$	$0.97 \pm 0.03$
	(6) / u	20%	$0.96 \pm 0.00$	$0.93 \pm 0.10$	$0.92 \pm 0.11$
		50%	$0.96 \pm 0.13$	$0.94 \pm 0.13$	$0.94 \pm 0.12$
		70%	$0.96 \pm 0.12$	$0.94 \pm 0.14$	$0.93 \pm 0.13$
	(6) / e	20%	$0.97 \pm 0.14$	$0.96 \pm 0.14$	$0.96 \pm 0.14$
		50%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
		70%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
DBPediA	(4) / u	20%	$0.90 \pm 0.08$	$0.90 \pm 0.08$	$0.90 \pm 0.08$
		50%	$0.96 \pm 0.08$	$0.96 \pm 0.07$	$0.96 \pm 0.09$
		70%	$0.96 \pm 0.08$	$0.96 \pm 0.07$	$0.96 \pm 0.09$
	(4) / e	20%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
		50%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
		70%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
	(6) / u	20%	$0.96 \pm 0.03$	$0.96 \pm 0.03$	$0.96 \pm 0.03$
		50%	$0.96 \pm 0.04$	$0.96 \pm 0.03$	$0.95 \pm 0.06$
		70%	$0.96 \pm 0.04$	$0.96 \pm 0.03$	$0.95 \pm 0.06$
	(6) / e	20%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
		50%	$1.00 \pm 0.00$	$1.00 \pm 0.00$	$1.00 \pm 0.00$
		70%	$0.99 \pm 0.03$	$0.98 \pm 0.03$	$0.99 \pm 0.03$

In the evaluation, we also tested the effectiveness of the alternative heuristic (2). In this case very small or no changes were observed in the rate of rediscovered axioms although different tree structures were produced. Furthermore, in the experiments with TCT v.3, we observed an evident decrease of the performance (see the results with Biopax in Table 3). In such cases, the constraint on consistency made the algorithm based on TCTs more conservative than the other versions. Indeed, introducing this condition as a constraint for generating specializations led to discard lots of concept descriptions.

A further aspect to consider is the availability of individuals that are instances of the concepts involved in disjointness axioms. In the elicitation of the target axioms, the larger the number of individuals in a given cluster, the more likely it is to find sub-clusters whose distance is maximized. Specifically, in the experiments we noted that it was hard to rediscover axioms involving concepts with less than 10–15 available instances: in such cases, the limited number of individuals in the clusters did not allow to maximize the distance of the candidate sub-clusters. Consequently, the scores computed via both heuristics (1) and (2) were quite low and the concepts were ignored. For instance, in the experiments on Vicodi a trivial disjointness axiom between the concept Actor and Artefact could not be discovered. These few cases may be treated with specific configurations of the thresholds.

Table 4

Experimental comparison of the various approaches: average numbers of cases of inconsistency (#inc.) and total numbers of discovered axioms (#ax’s) for PCC and NAR

Ontology	f	PCC		NAR

		#inc.	#ax’s	#inc.	#ax’s
BioPax	50%	257	280	352	2990
NTN	50%	32	957	376	3766
Financial	50%	124	1112	542	5366
GeoSkills	50%	456	13384	456	13299
Monetary	50%	543	13384	423	13456
Mutagenesis	50%	20	2264	45	14832
Vicodi	50%	475	15518	472	18721
DBPedia	50%	1243	30470	1243	30365

4.2.2. Axiom discovery and consistency

As regards the overall number of discovered axioms (see Tables 5–7) generally it can be observed that it decreased with larger fractions of axioms removed since the resulting trees showed a less complex structure. Moreover, we noted that, in the case of smallest ontologies (in terms of number of individuals), there was a non-negligible impact on the effectiveness coming from a proper tuning of the threshold ν. Conversely, the differences among the results are small in the experiments with largest ontologies such as GeoSkills, Monetary, Mutagenesis, Vicodi, Dbpedia. This suggests that, for these ontologies, the variations of the numbers of axioms discovered were likely due to the random sampling performed by the refinement operators.

Concerning the distance measures used in the experiments, we noted that this was an important factor for the number of the elicited disjointness axioms. Plugging the distance measure (4) in the TCT-based algorithm had as a consequence the induction of taller trees with a larger number of nodes, owing to the selection of concepts that required many splits of the sub-clusters sorted to the (negative) right branches of the trees. Moreover, in some cases, TCTs with clusters containing few individuals were produced. This was due to the distributions of the instances w.r.t. the various concepts: for example concepts with few instances are frequent in Financial while Geoskills is more densely populated (and the number of empty clusters was more limited). Throughout the evaluation with TCT-v2 and TCT-v3, cases of totally empty clusters were also rarely observed: this result depended on the timeout adopted to avoid time-consuming refinement operations due to the hardness of finding candidate specializations featuring both positive and negative instances.

As regards the choice of the heuristic for selecting the most promising candidates, while this aspect did not affect the ability to rediscover the target axioms, it influenced the ability to induce new axioms avoiding the introduction of inconsistency: the scoring function (2) allowed to select concepts that determined sub-clusters whose distance was larger compared to those produced adopting function (1). As expected, this meant that the new heuristic was able to reduce the problematic cases of individuals of a sub-cluster that were close to those belonging to the sibling sub-cluster (which the heuristic (1) is less sensitive to), improving the homogeneity of the resulting sub-groups (and reducing also the number of the induced axioms). In particular, we noted that no inconsistency cases were introduced in the experiments with Eq. 1: for most candidate axioms involving concepts, say C and D, it seldom occurred that the number of individuals that were instances of $C ⊓ D$ exceeded the given threshold (10). This may lead to limit the use of the reasoner for inconsistency checking during the phase of disjointness axiom elicitation. A similar number of axioms was also obtained by checking for inconsistency during the specialization phase. However, anticipating these checks in the generation of the refinements rather than having them discarded through the heuristic (2) made the approach more stable in terms of number of axioms produced with respect to the fraction f of axioms removed.

Table 5
Experimental comparison of the various approaches: average numbers of cases of inconsistency (#inc.) and total numbers of discovered axioms (#ax’s) using TCT v.1 and v.2 – scoring function (1)

Ontology distance / weights f TCT 0.9 TCT 0.8 TCT 0.7

#inc. #ax’s #inc. #ax’s #inc. #ax’s

BioPax (4) / u 20% 542 4235 576 4237 589 4237

50% 345 3773 357 3817 364 3876

70% 345 3773 357 3817 364 3876

(4) / e 20% 235 3859 357 4235 365 4256

50% 125 3576 357 4176 432 4115

70% 125 3432 235 3875 417 4154

(6) / u 20% 432 2567 446 2756 578 2757

50% 236 2578 237 2758 238 2876

70% 128 2587 128 2587 128 2578

(4) / e 20% 235 2346 357 2357 365 2458

50% 125 3576 357 4176 432 4115

70% 125 3432 235 3675 417 3875

NTN (4) / u 20% 432 3347 432 3347 432 3347

50% 415 3256 415 3256 415 3256

70% 415 3256 415 3256 415 3256

(4) / e 20% 312 3128 343 3126 354 3124

50% 234 3023 234 3034 235 3034

70% 156 2987 176 2679 123 2675

(6) / u 20% 432 4579 478 4789 478 4783

50% 356 4321 356 4321 356 4321

70% 356 4321 356 4321 356 4321

(6) / e 20% 431 3083 431 3083 431 3083

50% 345 2987 345 2987 345 2987

70% 323 2996 324 2993 323 2996

Monetary (4) / u 20% 673 13765 673 13765 677 13767

50% 432 13567 432 13567 432 13567

70% 247 13127 231 13127 3127 13127

(4) / e 20% 535 13456 573 13453 623 13460

50% 315 13236 432 13236 532 13236

70% 247 13127 231 13127 312 13127

(6) / u 20% 756 12437 755 12438 847 12589

50% 643 11357 647 11362 647 11362

70% 536 10432 536 10432 536 10432

(6) / e 20% 756 12437 876 12442 876 12321

50% 643 11386 647 11373 647 11384

70% 540 10457 540 10458 540 10458

Vicodi (6) / u 20% 431 18231 485 18432 502 18432

50% 142 18231 345 18432 467 18431

70% 141 18231 345 18432 312 18432

(6) / e 20% 34 14753 43 14847 43 14978

50% 23 14753 31 14753 32 14978

70% 23 14753 32 14753 32 14978

(4) / u 20% 431 17176 485 17176 502 17176

50% 142 17176 142 17176 142 17176

70% 142 17176 345 17176 467 17176

(6) / e 20% 431 17176 485 17176 502 17176

50% 142 17176 142 17176 142 17176

70% 142 17176 345 171761 467 17176

Ontology	distance / weights	f	TCT 0.9	TCT 0.8	TCT 0.7
BioPax	(4) / u	20%	542	4235	576	4237	589	4237
50%	345	3773	357	3817	364	3876
70%	345	3773	357	3817	364	3876
(4) / e	20%	235	3859	357	4235	365	4256
50%	125	3576	357	4176	432	4115
70%	125	3432	235	3875	417	4154
(6) / u	20%	432	2567	446	2756	578	2757
50%	236	2578	237	2758	238	2876
70%	128	2587	128	2587	128	2578
(4) / e	20%	235	2346	357	2357	365	2458
50%	125	3576	357	4176	432	4115
70%	125	3432	235	3675	417	3875
NTN	(4) / u	20%	432	3347	432	3347	432	3347
50%	415	3256	415	3256	415	3256
70%	415	3256	415	3256	415	3256
(4) / e	20%	312	3128	343	3126	354	3124
50%	234	3023	234	3034	235	3034
70%	156	2987	176	2679	123	2675
(6) / u	20%	432	4579	478	4789	478	4783
50%	356	4321	356	4321	356	4321
70%	356	4321	356	4321	356	4321
(6) / e	20%	431	3083	431	3083	431	3083
50%	345	2987	345	2987	345	2987
70%	323	2996	324	2993	323	2996
Monetary	(4) / u	20%	673	13765	673	13765	677	13767
50%	432	13567	432	13567	432	13567
70%	247	13127	231	13127	3127	13127
(4) / e	20%	535	13456	573	13453	623	13460
50%	315	13236	432	13236	532	13236
70%	247	13127	231	13127	312	13127
(6) / u	20%	756	12437	755	12438	847	12589
50%	643	11357	647	11362	647	11362
70%	536	10432	536	10432	536	10432
(6) / e	20%	756	12437	876	12442	876	12321
50%	643	11386	647	11373	647	11384
70%	540	10457	540	10458	540	10458
Vicodi	(6) / u	20%	431	18231	485	18432	502	18432
50%	142	18231	345	18432	467	18431
70%	141	18231	345	18432	312	18432
(6) / e	20%	34	14753	43	14847	43	14978
50%	23	14753	31	14753	32	14978
70%	23	14753	32	14753	32	14978
(4) / u	20%	431	17176	485	17176	502	17176
50%	142	17176	142	17176	142	17176
70%	142	17176	345	17176	467	17176
(6) / e	20%	431	17176	485	17176	502	17176
50%	142	17176	142	17176	142	17176
70%	142	17176	345	171761	467	17176

Table 6

Experimental comparison of the various approaches: average numbers of cases of inconsistency (#inc.) and total numbers of discovered axioms (#ax’s) using TCT v.1 and v.2 – scoring function (2)

Ontology	distance / weights	f	TCT 0.9		TCT 0.8		TCT 0.7

			#inc.	#ax’s	#inc.	#ax’s	#inc.	#ax’s
BioPax	(4) / u	20%	0	2123	0	2124	0	2145
		50%		2123		2124		2145
		70%		2123		2123		2147
	(4) / e	20%	0	2145	0	2145	0	2145
		50%		2346		2346		2346
		70%		2346		2346		2346
	(6) / u	20%	0	2126	0	2126	0	2126
		50%		2098		2098		2098
		70%		1985		1985		1986
	(6) / e	20%	0	2145	0	2145	0	2145
		50%		2346		2346		2346
		70%		2346		2346		2346
NTN	(4) / u	20%	0	4123	0	4123	0	4123
		50%		4113		4123		4123
		70%		4113		4114		4114
	(4) / e	20%	0	3083	0	3083	0	3083
		50%		2987		2987		2987
		70%		2996		2993		2996
	(6) / u	20%	0	4123	0	4123	0	4123
		50%		4113		4123		4123
		70%		4113		4114		4114
	(6) / e	20%	0	3083	0	3083	0	3083
		50%		2987		2987		2987
		70%		2996		2993		2996
Monetary	(4) / u	20%	0	10243	0	10256	0	10256
		50%		10242		10257		10257
		70%		10243		10258		10258
	(4) / e	20%	0	10116	0	10116	0	10116
		50%		10116		10117		10115
		70%		10115		10116		10116
	(6) / u	20%	0	10257	0	10245	0	10244
		50%		10257		10245		10244
		70%		10257		10242		10257
	(6) / e	20%	0	10116	0	10116	0	10116
		50%		10116		10116		10116
		70%		10116		10116		10116
Vicodi	(6) / u	20%	0	16432	0	16432	0	16432
		50%		16239		16239		16239
		70%		16345		16345		16345
	(4) / e	20%	0	16456	0	16576	0	16579
		50%		16453		16453		16453
		70%		16453		16453		16453
	(6) / u	20%	0	16432	0	16432	0	16432
		50%		16239		16239		16239
		70%		16345		16345		16345
	(6) / e	20%	0	16456	0	16576	0	16579
		50%		16453		16453		16453
		70%		16453		16453		16453

Table 7

Experimental comparison of the various approaches: average numbers of cases of inconsistency (#inc.) and total numbers of discovered axioms (#ax’s) using TCT v.3 – scoring function (2)

Ontology	distance / weights	f	TCT 0.9		TCT 0.8		TCT 0.7

			#inc.	#ax’s	#inc.	#ax’s	#inc.	#ax’s
BioPax	(4) / u	50%	0	2123	0	2124	0	2145
	(4) / e			2346		2346		2346
	(6) / u			2095		2100		2095
	(6) / e			2344		2344		2345
NTN	(4) / u	50%	0	4113	0	4123	0	4123
	(4) / e			2987		2987		2987
	(6) / u			4113		4123		4123
	6) / e			2987		2987		2987
Mutagenesis	(4) / u	50%	0	12456	0	12326	0	12326
	(4) / e			12217		12216		12220
	(6) / u			12456		12326		12326
	(6) / e			12217		12217		12217
Vicodi	(6) / u	50%	0	16239	0	16239	0	16239
	(4) / e			16453		16453		16453
	(6) / u			16239		16239		16239
	(6) / e			16453		16453		16453

Comparative experiments It is worthwhile to note that throughout the experiments, the number of axioms induced through TCTs was larger than the number of axioms induced through the methods based on PCC and NAR (see Table 4). This was due to the fact the, via one of the refinement operators, the TCT-based method performs a search in a larger space than the one considered by the other methods as they focus on the mere combination of concept names selected from the KB signature.

The outcomes reported in Tables 4, 5, 6 and 7 show that, in absolute terms, more axioms were generally discovered using the proposed method (for all three choices of threshold ν selected for the experiments) compared with the two other methods and yet the number of inconsistencies introduced (in case of direct addition of the axioms to the KB) was quite limited in proportion to the overall number of axioms produced: for example, with Monetary and Vicodi this rate on average was less than the 3.5% with almost 20,000 discovered axioms. Inspecting sampled TCTs to gain a deeper insight into the outcomes, we could note that, for ontologies with a small number of concepts, such as BioPax and NTN, the refinement operator tended to introduce the same concept in more branches. As a consequence, the large number of axioms discovered was likely due to the replication of some sub-trees. This represented also one of the main causes for most of the inconsistency cases. This result improved by using heuristic (2) which turned out to be more robust than (1) (see Tables 6 and 7) at determining more suitable concepts describing non-overlapping sub-clusters. However, with such a heuristic the number of cases was lessened but the issue could not be completely prevented: using more complex languages as a tradeoff would require equally complex and computationally expensive ref. operators. To get rid of such cases, anticipating the overlap test during the specialization is crucial. Introducing this solution, it was possible to drive the induction of TCTs towards other concept descriptions thus limiting the aforementioned replication problem.

As regards the performance of PCC and NAR, we noted that they had a more stable behavior with respect to the fractions of removed axioms f because, as previously mentioned, they could discover axioms involving exclusively named concepts of the KB signature whose instances are more likely to be available. Moreover, a weak correlation between two concepts is unlikely to depend on the presence of a disjointness axiom involving them. This led them also not to introduce further inconsistencies.

4.2.3. Examples of discovered disjointness axioms

For a more complete evaluation covering also the qualitative viewpoint, we report some examples of the axioms that could be discovered through the various methods. As previously mentioned, one of the advantages deriving from the employment of the TCTs is related to the kind of axioms that can be elicited. Purely statistical methods focus on the KB signature and merely make pairwise comparisons in order to discover the concepts that are weakly correlated. Conversely, the TCT-based algorithm performs a sort of search that allows to elicit axioms that involve alternative versions of the targeted concepts, i.e. concept descriptions that are candidate to be equivalent to those considered in the target.

For instance, in the case of NTN, PCC and NAR could discover simple axioms like $ReligiousOrganization ⊑ \neg Woman$ and $ReligiousOrganization ⊑ \neg Man$ , while the new method allowed to elicit the target axioms $GroupofPeople ⊑ \neg Person$ and $Man ⊑ \neg Woman$ .

An interesting (not previously existing) disjointness axiom elicited using the TCTs involved more complex concepts like $(\exists spouseOf (\exists visitedPlace . (\forall parentOf . ⊤)))$ and $(\exists nativePlaceOf . \neg Serial)$ .

In the experiments with Financial, PCC and NAR produced the following target axioms: $SouthMoravia ⊑ \neg WestBohemia$ , where SouthMoravia and WestBohemia represents two different geographical areas, and the axiom $Man ⊑ \neg Woman$ . In the case of TCTs, a similar axiom to the target $Man ⊑ \neg Woman$ was found, i.e. $\exists hasSexValue . MaleSex ⊑ \neg (\exists hasSexValue . FemaleSex)$ .

In the experiments with GeoSkills, all methods were able to detect the original disjointness between the concepts Vertex and Volume. Finally, in the case of DBPedia, one of the original axioms that were also elicited by PCC and NAR involved the concepts $Mountain$ and $Movie$ . Instead, the new method elicited disjointness axioms between a potentially redundant concept description, i.e. $\neg Film ⊓ \neg Person ⊓ NaturalPlace ⊓ Mountain$ (where $NaturalPlace ⊒ Mountain$ ) and $Movie$ (where $Movie \equiv Film$ ).

4.2.4. Efficiency of the refinement operators

Fig. 4.

Efficiency (ms) of the refinement operator.

The task of specialization generation is one of the major bottlenecks of the overall TCT induction method. Thus we carried out various tests for assessing the efficiency of the algorithm that is ultimately based on the refinement operator ρ.

The tests have been repeated on increasing sizes of the beam of candidates: 100, 300, 400, 500, 600, 1000. Also, the entire sets of individuals in each KB were considered to test the stop condition in the procedures. Additionally, we repeated the experiments considering both the original ontologies and the derived versions obtained applying the SDA.

Figure 4 illustrates the outcomes (execution time) using the ρ under SDA; similar trends were observed in the experiments on the original KBs. We noted that using ρ in most of the cases the time grew linearly w.r.t. the number of specializations, e.g. see the case of the experiments with Monetary. This depended on the syntactic length of the generated concept descriptions and the threshold on the number of individuals used to stop the condition. In particular, the generation of the concepts was biased towards the introduction of new concept names as conjuncts rather than the existential and universal restrictions. This means that there was a limited number of recursive calls of the refinement operator and shorter concepts.

As a consequence the stop condition was satisfied earlier w.r.t. the case of concepts involving existential and universal restrictions. Indeed, instances of concept descriptions obtained as a conjunction of concept names are generally easier to find than for concepts involving universal and existential restrictions, due to the sparseness of assertional knowledge concerning roles observed in the KBs.

A final remark concerns the line of experiments in which the SDA was not made: they showed that the two operators behave similarly to the cases in which the SDA was made. Even if ρ showed a linear increase of the time as larger beams were considered, the lack of disjointness axioms of the original KBs makes hard to find individuals with a definite membership for each specialization, thus delaying the satisfaction of the stop condition. However, it should be clarified that this problem depends on the specific reasoner adopted to make the inferences required by the algorithms.

5. Related work

The problem of discovering the disjointness axioms to enrich and improve the quality of ontological knowledge bases has been receiving a growing attention. In early works the mentioned strong disjointness assumption [5], which states that the children of a common parent in the subsumption hierarchy should be considered as disjoint, has been exploited in a pinpointing algorithm for semantic clarification (i.e. the process of automatically enriching ontologies with appropriate disjointness statements [25].

Focusing on text and successively on RDF datasets, an unsupervised method for mining axioms, including disjointness axioms, has been proposed [15,29]. The main limitation of this approach is the loose use of any form of background knowledge which, on the contrary, can decisively help to increase the number of discovered axioms while preventing unnecessary/wrong axioms.

Moreover, both relational learning methods and techniques based on formal concept analysis have been proposed to the purpose [3,18], but no specific assessment of the induced axioms quality is made. This limit has been pointed out also by Volker et al. [28], where an approach based on association rule mining has been introduced. Additional approaches, relying on association rules have been proposed by Fleischacker et al. [14] and Volker et al. [27].

The focus of these works was studying the correlation between classes comparatively. Specifically, association rules, negative association rules and correlation coefficient have been considered. Also in these cases, background knowledge and reasoning was exploited in a limited extent.

Lehman et al. [18] proposed a tool for repairing various types of ontology modeling errors: it uses supervised methods from the DL-Learner framework [17] to enrich ontologies with axioms elicited from existing instances.

Our solution is based on an unsupervised approach, deriving from previous works on concept learning and inductive classification [12]. Specifically, we propose a hierarchical conceptual clustering method that, in addition, is able to provide intensional cluster descriptions, and that exploits a novel form of a family of semi-distances over the individuals in an ontological knowledge base [23] which can take into account the available background knowledge. The method is grounded on the notion of medoid as cluster prototypes since clustering algorithms adopting medoids have been introduced to overcome known limits such as the lack of algebraical structure of the representation of the instance space [1]. The hierarchical approach proposed in this paper is related to classic clustering algorithms such as Cobweb [13] with some differences: 1) Cobweb produces directly n-ary cluster hierarchies instead of the binary ones in the TCTs. This allows for eliciting more intermediate concepts with the latter model; 2) the intensional definitions assigned to the clusters adopt a less expressive propositional representation language w.r.t. DLs; 3) a probabilistic cluster membership is modeled (like in the fuzzy clustering approaches) rather than a definite one, that is required to derive disjointness axioms. Related approaches to partitive clustering applied to datasets encoded in DL languages have been proposed, such as the hierarchical bisecting k-medoids [11] or the partition around medoids, combined with evolutionary programming [10]. They are able to form clusters of individuals occurring in Web ontologies by exploiting metrics that are similar to those adopted in this work. However, these methods generally do not return any intensional cluster description. The derivation of concepts, i.e. intensional cluster definitions (conceptual clustering [26]) requires the adoption of additional and suitable concept learning algorithms.

Specifically, the method proposed in this paper relies on logic tree models [4] which essentially adopt a divide-and-conquer strategy to derive a hierarchical structure. The learning method can work both in supervised and unsupervised mode, depending on the availability of information about the instance classification to be exploited for separating sub-groups of instances. Terminological decision trees were derived [12,22] in the former case to classify individuals w.r.t. an unknown target concept (assigning a class label at each leaf node), while for the latter case, First-order logic clustering trees [8] were proposed to induce concepts for the clusters expressed in the context of clausal logic theories. The C0.5 system, which is integrated in the Tilde framework [4], is able to induce concepts as conjunctions of literals (clause bodies) installed at inner nodes. Almost all these exiting methods are grounded on the exploitation of an heuristic based on the information gain, employed in the supervised case, Differently, our approach tends to maximize the separation between cluster medoids according to a semi-distance measure.

6. Conclusions

In this work, we have cast the task of discovering disjointness axioms as a clustering problem that was solved exploiting terminological cluster trees, an extension of terminological decision trees [12] (proposed to solve supervised learning problems). Moving from our previous work [24], we extended the framework for inducing terminological cluster trees along various directions to aim at improving both its effectiveness and efficiency. Specifically, we aimed at improving the quality of the resulting axioms using: 1) different distance measures; 2) different heuristic for selecting the concepts to be installed into tree nodes; 3) a modified version of the refinement operator to generate concepts enabling the elicitation of axioms that do not introduce inconsistency in the KB.

In the empirical evaluation, various experiments have been performed with the goal of assessing the effectiveness of the new methodology. Compared to related unsupervised approaches, ours proved to be able to discover disjointness axioms involving complex concept descriptions exploiting the underlying ontology as a source of background knowledge, unlike the other methods based on the statistical correlation between instances. The evaluation showed also that cases of inconsistency introduced in the KBs by the elicited axioms can be drastically lessened resorting to a different heuristic: candidate concepts are selected according to the farthest distance between the elements of a sub-cluster to the medoid of the sibling resulting from the split of the parent cluster. Additionally, the aforementioned cases can be totally avoided by checking the overlap between the concept installed into the current node and those installed into the tree up to that moment. We noted that the time required by the refinement operator increases linearly w.r.t. the number of specializations.

Various extensions may be envisaged for this work. The TCT induction algorithm can be further improved by introducing a post-pruning step for better tackling the problem of empty clusters. New metrics for evaluating the performance of such methods could also be proposed. Finally, it may be interesting to integrate the methodology within ontology engineering frameworks based on machine learning, such as DL-learner [17], as a service for enriching the terminology of lightweight ontologies.

Footnotes

Detailed results of the experiments with TCTs

We report in the following the outcomes of further experiments with the TCTs produced by the various versions of the algorithm.

References

C.C.

Aggarwal and

C.K.

Reddy, Data Clustering: Algorithms and Applications, 1st edn, Chapman & Hall/CRC, 2013. doi:10.1201/b15410.

Baader,

Calvanese,

McGuinness,

Nardi and

Patel-Schneider (eds), The Description Logic Handbook, 2nd edn, Cambridge University Press, 2007. doi:10.1017/CBO9780511711787.

Baader,

Ganter,

Sertkaya and

Sattler, Completing description logic knowledge bases using formal concept analysis, in: IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence,

Veloso, ed., AAAI Press, 2007, pp. 230–235. doi:10.1.1.494.3278.

Blockeel and

De Raedt, Top-down induction of first-order logical decision trees, Artificial Intelligence101(1–2) (1998), 285–297. doi:10.1016/S0004-3702(98)00034-4.

Cornet and

Abu-Hanna, Usability of expressive description logics – a case study in UMLS, in: Proceedings of the Annual Symposium of the American Medical Informatics Association, AMIA 2002,

Kohane, ed., AMIA, 2002, pp. 180–184. doi:10.1.1.530.6628.

d’Amato,

Fanizzi and

Esposito, Query answering and ontology population: An inductive approach, in: The Semantic Web: Research and Applications, 5th European Semantic Web Conference, ESWC 2008, Proceedings,

Bechhoferet al., eds, LNCS, Vol. 5021, Springer, 2008, pp. 288–302. doi:10.1007/978-3-540-68234-9.

d’Amato,

Fanizzi and

Esposito, Analogical reasoning in description logics, in: Uncertainty Reasoning for the Semantic Web I,

P.C.G.

Costaet al., eds, LNAI, Vol. 5327, Springer Berlin Heidelberg, 2008, pp. 330–347. doi:10.1007/978-3-540-89765-1_19.

De Raedt and

Blockeel, Using logical decision trees for clustering, in: Inductive Logic Programming, 7th International Workshop, ILP-97, Proceedings,

Lavrač and

Džeroski, eds, LNAI, Vol. 1297, Springer, 1997, pp. 133–140. doi:10.1007/3540635149_41.

Esposito,

d’Amato and

Fanizzi, Fuzzy clustering for semantic knowledge bases, Fundam. Inform.99(2) (2010), 187–205. doi:10.3233/FI-2010-245.

10.

Fanizzi,

d’Amato and

Esposito, Evolutionary conceptual clustering based on induced pseudo-metrics, Int. J. Semantic Web Inf. Syst.4(3) (2008), 44–67. doi:10.4018/jswis.2008070103.

11.

Fanizzi,

d’Amato and

Esposito, Conceptual clustering and its application to concept drift and novelty detection, in: The Semantic Web: Research and Applications, 5th European Semantic Web Conference, ESWC 2008, Proceedings,

Bechhoferet al., eds, LNCS, Vol. 5021, Springer, 2008, pp. 318–332. doi:10.1007/978-3-540-68234-9_25.

12.

Fanizzi,

d’Amato and

Esposito, Induction of concepts in web ontologies through terminological decision trees, in: Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2010, Proceedings, Part I,

J.L.

Balcázaret al., eds, LNAI, Vol. 6321, Springer, 2010, pp. 442–457. doi:10.1007/978-3-642-15880-3_34.

13.

D.H.

Fisher, Knowledge acquisition via incremental conceptual clustering, Machine Learning2(2) (1987), 139–172. doi:10.1023/A:1022852608280.

14.

Fleischhacker and

Völker, Inductive learning of disjointness axioms, in: On the Move to Meaningful Internet Systems: OTM 2011 – Confederated International Conferences: CoopIS, DOA-SVI, and ODBASE 2011, Proceedings, Part II,

Meersmanet al., eds, LNCS, Vol. 7045, Springer, 2011, pp. 680–697. doi:10.1007/978-3-642-25106-1_20.

15.

Haase and

Völker, Ontology learning and reasoning: Dealing with uncertainty and inconsistency, in: Uncertainty Reasoning for the Semantic Web I,

da Costaet al., eds, LNAI, Vol. 5327, Springer, 2008, pp. 366–384. doi:10.1007/978-3-540-89765-1_21.

16.

Heath and

Bizer, Linked Data: Evolving the Web into a Global Data Space, Synthesis Lectures on the Semantic Web, Morgan & Claypool Publishers, 2011. doi:10.2200/S00334ED1V01Y201102WBE00.

17.

Hellmann,

Lehmann and

Auer, Learning of OWL class descriptions on very large knowledge bases, Int. J. Semant. Web Inf.5(2) (2009), 25–48. doi:10.4018/jswis.2009040102.

18.

Lehmann and

Bühmann, ORE – a tool for repairing and enriching knowledge bases, in: The Semantic Web – ISWC 2010 – 9th Int. Sem. Web Conf., Revised Selected Papers, Part II,

Patel-Schneideret al., eds, LNCS, Vol. 6497, Springer, 2010, pp. 177–193. doi:10.1007/978-3-642-17749-1_12.

19.

Lehmann,

Fanizzi,

Bühmann and

d’Amato, Concept Learning,

Lehmann and

Voelker, eds, AKA/IOS Press, 2014, pp. 71–91. doi:10.3233/978-1-61499-349-0-i.

20.

Lehmann and

Hitzler, Concept learning in description logics using refinement operators, Machine Learning78(1) (2009), 203. doi:10.1007/s10994-009-5146-2.

21.

Lehmann and

Hitzler, A refinement operator based learning algorithm for the ALC description logic, in: Inductive Logic Programming Int. Conf., ILP 2007, Revised Selected Paper,

Blockeelet al., eds, LNAI, Vol. 4894, Springer, 2007, pp. 147–160. doi:10.1007/978-3-540-78469-2_17.

22.

Rizzo,

d’Amato,

Fanizzi and

Esposito, Tree-based models for inductive classification on the web of data, J. Web Sem.45 (2017), 1–22. doi:10.1016/j.websem.2017.05.001.

23.

Rizzo,

d’Amato,

Fanizzi and

Esposito, Induction of terminological cluster trees, in: Proceedings of the 12th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW 2016),

Bobilloet al., eds, CEUR Workshop Proceedings, Vol. 1665, CEUR-WS.org, 2016, pp. 49–60, url: http://ceur-ws.org/Vol-1665/paper5.pdf.

24.

Rizzo,

d’Amato,

Fanizzi and

Esposito, Terminological cluster trees for disjointness axiom discovery, in: The Semantic Web – 14th International Conf., ESWC 2017, Proceedings, Part I,

Blomqvistet al., eds, LNCS, Vol. 10249, 2017, pp. 184–201. doi:10.1007/978-3-319-58068-5_12.

25.

Schlobach, Debugging and semantic clarification by pinpointing, in: The Semantic Web: Research and Applications, 2nd Europ. Sem. Web Conf., ESWC 2005, Proceedings,

Gómez-Pérez and

Euzenat, eds, LNCS, Vol. 3532, Springer, 2005, pp. 226–240. doi:10.1007/11431053_16.

26.

R.E.

Stepp and

R.S.

Michalski, Conceptual clustering of structured objects: A goal-oriented approach, Artif. Intell.28(1) (1986), 43–69. doi:10.1016/0004-3702(86)90030-5.

27.

Völker,

Fleischhacker and

Stuckenschmidt, Automatic acquisition of class disjointness, Journal of Web Semantics35(P2) (2015), 124–139. doi:10.1016/j.websem.2015.07.001.

28.

Völker and

Niepert, Statistical schema induction, in: The Semantic Web: Research and Applications – 8th Extended Semantic Web Conference, ESWC 2011, Proceedings, Part I,

Antoniouet al., eds, LNCS, Vol. 6643, Springer, 2011, pp. 124–138. doi:10.1007/978-3-642-21034-1_9.

29.

Völker,

Vrandečić,

Sure and

Hotho, Learning disjointness, in: The Semantic Web: Research and Applications, 4th European Semantic Web Conference, ESWC 2007, Proceedings,

Franconiet al., eds, LNCS, Vol. 4519, Springer, 2007. doi:10.1007/978-3-540-72667-8_14.

30.

T.D.

Wang,

Parsia and

Hendler, A survey of the web ontology landscape, in: The Semantic Web – ISWC 2006, 5th Int. Semantic Web Conference Proceedings,

Cruzet al., eds, LNCS, Vol. 4273, Springer, 2006. doi:10.1007/11926078_49.

An unsupervised approach to disjointness learning based on terminological cluster trees

Abstract

Keywords

1. Introduction

Definition 1 (disjointness axiom discovery problem).

Definition 2 (terminological cluster tree).

2 Noticeable difference with concept hierarchies: for each node in the TCT, its cluster, composed by instances of the concept in the parent node (ideally ⊤ for the root), is bi-partitioned according to the membership w.r.t. the concept in the current node.

3.1.3. Stop conditions

3.2. Extraction of disjointness axioms from TCTs

4 Code and testbed of ontologies are publicly available at: https://github.com/Giuseppe-Rizzo/TCTnew.

4.1.1. Ontologies

4.1.3. Set-up of the implemented algorithms

4.2. Experimental results: Presentation and analysis

4.2.1. Rediscovering disjointness axioms

4.2.4. Efficiency of the refinement operators

6. Conclusions

Footnotes

Detailed results of the experiments with TCTs

References

²
Noticeable difference with concept hierarchies: for each node in the TCT, its cluster, composed by instances of the concept in the parent node (ideally ⊤ for the root), is bi-partitioned according to the membership w.r.t. the concept in the current node.

⁴
Code and testbed of ontologies are publicly available at: https://github.com/Giuseppe-Rizzo/TCTnew.