Sage Journals: Discover world-class research

Abstract

Ontology embeddings map classes, roles, and individuals in ontologies into $R^{n}$ , and within $R^{n}$ similarity between entities can be computed or new axioms inferred. For ontologies in the Description Logic ${E L}^{+ +}$ , several optimization-based embedding methods have been developed that explicitly generate models of an ontology. However, these methods suffer from some limitations; they do not distinguish between statements that are unprovable and provably false, and therefore they may use entailed statements as negatives. Furthermore, they do not utilize the deductive closure of an ontology to identify statements that are inferred but not asserted. We evaluated a set of embedding methods for ${E L}^{+ +}$ ontologies, incorporating several modifications that aim to make use of the ontology deductive closure. In particular, we designed novel negative losses that account both for the deductive closure and different types of negatives and formulated evaluation methods for knowledge base completion. We demonstrate that our embedding methods improve over the baseline ontology embedding in the task of knowledge base or ontology completion.

Keywords

ontology embedding knowledge base completion description logic

1. Introduction

Several methods have been developed to embed Description Logic theories or ontologies in vector spaces (Chen et al., 2020a, 2021; Jackermeier et al., 2024; Kulmanov et al., 2019; Mondal et al., 2021; Özcep et al., 2023; Peng et al., 2022; Xiong et al., 2022). These embedding methods preserve some aspects of the semantics in the vector space, and may enable the computation of semantic similarity, inferring axioms that are entailed, and predicting axioms that are not entailed but may be added to the theory. For the lightweight Description Logic ${E L}^{+ +}$ , several geometric embedding methods have been developed (Jackermeier et al., 2024; Kulmanov et al., 2019; Mondal et al., 2021; Özcep et al., 2023; Xiong et al., 2022). They can be proven to “faithfully” approximate a model in the sense that, if a certain optimization objective is reached (usually, a loss function reduced to $0$ ), the embedding method has constructed a model of the ${E L}^{+ +}$ theory. Geometric model construction enables the execution of various tasks. These tasks include knowledge base completion and subsumption prediction via either testing the truth of a statement under consideration in a single (approximate) model or aggregating truth values over multiple models.

Advances on different geometric embedding methods have usually focused on the expressiveness of the embedding methods; originally, hyperballs (Kulmanov et al., 2019) where used to represent the interpretation of concept symbols, yet hyperballs are not closed under intersection. Therefore, axis-aligned boxes were introduced (Jackermeier et al., 2024; Peng et al., 2022; Xiong et al., 2022). Furthermore, ${E L}^{+ +}$ allows for axioms pertaining to roles, and several methods have extended the way in which roles are modeled (Jackermeier et al., 2024; Kulmanov et al., 2019; Xiong et al., 2022). However, there are several aspects of geometric embeddings that have not yet been investigated. In particular, for ${E L}^{+ +}$ , there are sound and complete reasoners with efficient implementations that scale to very large knowledge bases (Kazakov et al., 2013); it may therefore be possible to utilize a deductive reasoner together with the embedding process to improve generation of embeddings that represent geometric models.

We evaluate geometric embedding methods and incorporate deductive inference into the training process. We use the ELEmbeddings (Kulmanov et al., 2019), ELBE (Peng et al., 2022), and ${Box}^{2} E L$ (Jackermeier et al., 2024) models for our experiments; however, our results also apply to other geometric embedding methods for ${E L}^{+ +}$ .

Our main contributions are as follows:

–
We propose loss functions that incorporate negative samples in all normal forms and account for deductive closure during training.
–
We introduce a fast approximate algorithm for computing the deductive closure of an ${E L}^{+ +}$ theory and use it to improve negative sampling during model training.
–
We formulate evaluation methods for knowledge base completion that account for the deductive closure during evaluation.

This is an extended version of our previous work (Mashkova et al., 2024). Here, we include a more comprehensive treatment of computing the deductive closure and using the deductive closure with ${E L}^{+ +}$ embedding methods (e.g., for model evaluation). We include additional experiments on three benchmark datasets: besides protein function prediction and protein–protein interaction (PPI), we study the subsumption prediction task on Food Ontology (Dooley et al., 2018) and GALEN ontology (Rector et al., 1996). Furthermore, we consider two additional geometric ontology embedding models, ELBE (Peng et al., 2022) and ${Box}^{2} E L$ (Jackermeier et al., 2024), apart from ELEmbeddings (Kulmanov et al., 2019) for which we also extend our methods. We make our code and data available at https://github.com/bio-ontology-research-group/DELE.
2. Preliminaries

2.1. Description Logic ${E L}^{+ +}$

Let $Σ = (C, R, I)$ be a signature with set $C$ of concept names, $R$ of role names, and $I$ of individual names. Given $r \in R$ , and $a, b \in I$ , ${E L}^{+ +}$ concept descriptions are constructed with the grammar $⊥ ∣ ⊤ ∣ C ⊓ D ∣ \exists r . C ∣ {a}$ where $C, D$ are ${E L}^{+ +}$ concept descriptions and $r$ is a role name. ABox axioms are of the form $C (a)$ and $r (a, b)$ , TBox axioms are of the form $C ⊑ D$ , and RBox axioms are of the form $r_{1} \circ r_{2} \circ \dots \circ r_{n} ⊑ r$ . ${E L}^{+ +}$ Generalized concept inclusions (GCIs) and role inclusions (RIs) can be normalized to follow one of the forms listed in Table 1 (Baader et al., 2005). If an ontology contains a non-empty ABox, then, first, concept assertion axioms of the form $C (a)$ are processed: for each complex concept description $C$ , a new concept name $A$ is introduced and axiom $C \equiv A$ is added to the TBox (Peñaloza & Turhan, 2008). Normalization rules for TBox axioms (Baader et al., 2005) can be found in Table 2. They preserve the same models for ontologies (Mendez, 2011) and the normalized ontology is a conservative extension of the original ontology (Suntisrivaraporn, 2009). Following previous works (Jackermeier et al., 2024; Kulmanov et al., 2019; Peng et al., 2022) we develop our methodology specifically for normalized ${E L}^{+ +}$ ontologies. The advantage of normalizing ${E L}^{+ +}$ ontologies is that their deductive closure is finite whereas the deductive closure of non-normalized ${E L}^{+ +}$ ontologies is not. Therefore, in our work, when we refer to the language ${E L}^{+ +}$ , we always intend the normalized ${E L}^{+ +}$ , that is, where all valid axiom types are those listed in Table 1.

Table 1.
Normalized Forms of ${E L}^{+ +}$ Generalized Concept Inclusions (GCIs) and Role Inclusions (RIs).

Acronym Axiom Type

GCI0 $A ⊑ B$

GCI1 $A ⊓ B ⊑ E$

GCI2 $A ⊑ \exists r . B$

GCI3 $\exists r . A ⊑ B$

GCI0-BOT $A ⊑ ⊥$

GCI1-BOT $A ⊓ B ⊑ ⊥$

GCI3-BOT $\exists r . A ⊑ ⊥$

RI0 $r ⊑ s$

RI1 $r_{1} \circ r_{2} ⊑ s$

Acronym	Axiom Type
GCI0	$A ⊑ B$
GCI1	$A ⊓ B ⊑ E$
GCI2	$A ⊑ \exists r . B$
GCI3	$\exists r . A ⊑ B$
GCI0-BOT	$A ⊑ ⊥$
GCI1-BOT	$A ⊓ B ⊑ ⊥$
GCI3-BOT	$\exists r . A ⊑ ⊥$
RI0	$r ⊑ s$
RI1	$r_{1} \circ r_{2} ⊑ s$

Here, $r, r_{1}, r_{2}, s$ are role names, $A, B, E$ are concept names.

Table 2.

Normalization Rules for TBox for ${E L}^{+ +}$ Based on Baader et al. (2005).

Input	Output
$r_{1} \circ \dots \circ r_{k} ⊑ s$	$r_{1} \circ \dots \circ r_{k - 1} ⊑ u, u \circ r_{k} ⊑ s$
$C ⊓ \hat{D} ⊑ E$	$\hat{D} ⊑ A, C ⊓ A ⊑ E$
$\exists r . \hat{C} ⊑ D$	$\hat{C} ⊑ A, \exists r . A ⊑ D$
$⊥ ⊑ D$	$\emptyset$
$\hat{C} ⊑ \hat{D}$	$\hat{C} ⊑ A, A ⊑ \hat{D}$
$B ⊑ \exists r . \hat{C}$	$B ⊑ \exists r . A, A ⊑ \hat{C}$
$B ⊑ C ⊓ D$	$B ⊑ C, B ⊑ D$

Here, $r, r_{1}, \dots, r_{k}, s$ are role names, $C, D, B$ are arbitrary concept descriptions, $\hat{C}, \hat{D}$ are complex concept descriptions (i.e., not concept names), $A$ is a fresh concept name and $u$ is a fresh role name.

To define the semantics of an ${E L}^{+ +}$ theory, we use (Baader et al., 2005) an interpretation domain $Δ^{I}$ and an interpretation function $\cdot^{I}$ . For every concept $A \in C$ $A^{I} \subseteq Δ^{I}$ ; individual $a \in I$ , $a^{I} \in Δ^{I}$ ; role $r \in R$ , $r^{I} \subseteq Δ^{I} \times Δ^{I}$ . Furthermore, the semantics for other ${E L}^{+ +}$ constructs are the following (omitting concrete domains and role inclusions):

\begin{aligned} ⊥^{I} & = \emptyset \\ ⊤^{I} & = Δ^{I}, \\ (C ⊓ D)^{I} & = C^{I} \cap D^{I}, \\ (\exists r . C)^{I} & = {x \in Δ^{I} ∣ \exists y \in Δ^{I} : ((x, y) \in r^{I} \land y \in C^{I})}, \\ {a}^{I} & = {a^{I}} \end{aligned}

An interpretation $I$ is a model for an axiom $C ⊑ D$ if and only if $C^{I} \subseteq D^{I}$ , for an axiom $C (a)$ if and only if $a^{I} \in C^{I}$ ; and for an axiom $r (a, b)$ if and only if $(a^{I}, b^{I}) \in r^{I}$ (Baader, 2003). The relation of semantic entailment, $⊨$ , is defined as a relation between a theory $T$ and axiom $ϕ$ : $T ⊨ ϕ$ iff every model of $T$ is also a model of $ϕ$ ( $M o d (T) \subseteq M o d ({ϕ})$ ) (Tarski, 1936).

2.2. Knowledge Base Completion

The task of knowledge base completion is the addition (or prediction) of axioms which hold yet are not represented in the knowledge base. We call the task “ontology completion” when exclusively TBox axioms are predicted. The task of knowledge base completion may encompass both deductive (Jiang et al., 2012; Sato et al., 2018) and inductive (Bouraoui et al., 2017; d’Amato et al., 2012) inference processes and give rise to two subtly different tasks: adding only “novel” axioms to a knowledge base that are not in the deductive closure of the knowledge base, and adding axioms that are in the deductive closure as well as some “novel” axioms that are not deductively inferred; both tasks are related but differ in how they are evaluated.

Inductive inference, analogously to knowledge graph completion (Chen et al., 2020b), predicts axioms based on patterns and regularities within the knowledge base. Knowledge base completion, or ontology completion, can be further distinguished based on the information that is used to predict “novel” axioms. We distinguish between two approaches to knowledge base completion: (1) knowledge base completion which relies solely on (formalized) information within the knowledge base to predict new axioms, and (2) knowledge base completion which incorporates side information, such as text, to enhance the prediction of new axioms. Here, we mainly consider the first case.

2.3. Deductive Closure

The deductive closure of a theory $T$ refers to the smallest set containing all statements which can be inferred by deductive reasoning over $T$ ; for a given deductive relation $⊢$ , we call $T^{⊢} = {ϕ | T ⊢ ϕ}$ the deductive closure of $T$ . In knowledge bases, the deductive closure is usually not identical to the asserted axioms in the knowledge base; it is also usually infinite. Representing the deductive closure is challenging since it is infinite, but, in ${E L}^{+ +}$ , any knowledge base can be normalized to one of the seven normal forms; therefore, we can compute the deductive closure with respect to these normal forms, and this set will be finite (as long as the sets of concept and role names are finite). For example, all entailed axioms of type $A ⊑ B$ will be a subset of the set of all possible axioms of GCI0 type having cardinality $| C |^{2}$ where $| C |$ is the cardinality of the set of all concept names. Similarly, the cardinality of GCI0-BOT deductive closure will be limited by $| C |$ , GCI1 deductive closure cardinality will be limited by $| C |^{3}$ , and GCI1-BOT deductive closure cardinality by $| C |^{2}$ since one of the concepts is fixed. GCI2 and GCI3 deductive closures cardinality will depend on the total number of roles $| R |$ and will be limited by $| C |^{2} \cdot | R |$ , and, finally, the number of entailed axioms of GCI3-BOT type will not exceed $| C | \cdot | R |$ .

3. Related Work

3.1. Graph-Based Ontology Embeddings

Graph-based ontology embeddings rely on a construction (projection) of graphs from ontology axioms mapping ontology classes, individuals and roles to nodes and labeled edges (Zhapa-Camacho & Hoehndorf, 2023). Embeddings for nodes and edge labels are optimized following two strategies: by generating random walks and using a sequence learning method such as Word2Vec (Mikolov et al., 2013) or by using Knowledge Graph Embedding (KGE) methods (Wang et al., 2017). These type of methods have been shown effective on knowledge base and ontology completion (Chen et al., 2021) and have been applied to domain-specific tasks such as PPI prediction (Chen et al., 2021) or gene–disease association prediction (Althagafi et al., 2024; Chen et al., 2020a). Graph-based methods rely on adjacency information of the ontology structure but cannot easily handle logical operators and do not approximate ontology models. Therefore, graph-based methods are not “faithful,” that is, do not approximate models, do not allow determining whether statements are “true” in these models, and therefore cannot be used to perform semantic entailment.

3.2. Geometric-Based Ontology Embeddings

Multiple methods have been developed for the geometric construction of models for the ${E L}^{+ +}$ language. ELEmbeddings (Kulmanov et al., 2019) constructs an interpretation of concept names as sets of points lying within an open $n$ -dimensional ball and generates an interpretation of role names as the set of pairs of points that are separated by a vector in $R^{n}$ , that is, by the embedding of the role name. EmEL++ (Mondal et al., 2021) extends ELEmbeddings with more expressive constructs such as role chains and role inclusions. Another extension of ELEmbeddings, EmELvar (Mohapatra et al., 2021), introduces role embeddings which enables handling many-to-many roles and provides a perspective to extending the method to more expressive Description Logics. ELBE (Peng et al., 2022) and BoxEL (Xiong et al., 2022) use $n$ -dimensional axis-aligned boxes to represent concepts, which has an advantage over balls because the intersection of two axis-aligned boxes is a box whereas the intersection of two $n$ -balls is not an $n$ -ball. BoxEL additionally preserves ABox facilitating a more accurate representation of knowledge base’s logical structure by ensuring, for example, that an entity has the minimal volume. Box $^{2}$ EL (Jackermeier et al., 2024) represents ontology roles more expressively with two boxes encoding the semantics of the domain and codomain of roles. Box $^{2}$ EL enables the expression of one-to-many roles as opposed to other methods. The box-based method TransBox (Yang et al., 2025) aims to effectively capture all ${E L}^{+ +}$ logical operations allowing for precise representation of any arbitrarily complex concept description. The fact that any normalized ${E L}^{+ +}$ theory has a finite deductive closure allows the definition of a canonical model where all and only entailed axioms are true. Lacerda et al. (2024a, 2024b) developed FaithEL, a strongly TBox-faithful method based on convex sets in $n$ -dimensional real-valued space which interprets concepts and roles as subsets of unitary hypercubes. FaithEL constructs a “canonical” model of an ontology and is able to predict new assertions consistent with a TBox. Axis-aligned cone-shaped geometric model introduced in Özcep et al. (2023) deals with $A L C$ ontologies and allows for full negation of concepts and existential quantification by construction of convex sets in $R^{n}$ . This work has not yet been implemented or evaluated in an application.

3.3. Knowledge Base Completion Task

Several recent advancements in the knowledge base completion rely on side information as included in large language models (LLMs). Ji et al. (2023) explores how pretrained language models can be utilized for incorporating one ontology into another, with the main focus on inconsistency handling and ontology coherence. HalTon (Cao et al., 2023) addresses the task of event ontology completion via simultaneous event clustering, hierarchy expansion and type naming utilizing BERT (Devlin et al., 2019) for instance encoding. Li et al. (2024) formulates knowledge base completion task as a Natural Language Inference (NLI) problem and examines how this approach may be combined with concept embeddings for identifying missing knowledge in ontologies. As for other approaches, Mežnar et al. (2022) proposes a method that converts an ontology into a graph to recommend missing edges using structure-only link analysis methods, Shiraishi and Kaneiwa (2024) constructs matrix-based ontology embeddings which capture the global and local information for subsumption prediction. All these methods use side information from LLMs and would not be applicable, for example, in the case where a knowledge base is private or consists of only identifiers; we do not consider methods based on pre-trained LLMs here as baselines.

4. Negative Sampling and Objective Functions

Currently available geometric ontology embedding models which construct a model of an ontology by optimizing some objective function usually sample negative examples during training phase (Jackermeier et al., 2024; Kulmanov et al., 2019; Mohapatra et al., 2021; Mondal et al., 2021; Peng et al., 2022; Yang et al., 2025). This operation prevents overgeneralization of learned embeddings and trivial satisfiability in case a model collapses (Kulmanov et al., 2019; Yang et al., 2025) by incorporating additional constraints within a model. Ontology embedding methods select negatives by replacing one of the concepts with a randomly chosen one (either from the set of all concept names, or a subset thereof). ELEmbeddings, ELBE and ${Box}^{2} E L$ use a single loss for “negatives,” that is, axioms that are not included in the knowledge base; the loss is used only for axioms of the form $A ⊑ \exists r . B$ (GCI2) which are randomly sampled; negatives are not sampled for other normal forms. Correspondingly, the embedding methods were primarily evaluated on predicting GCI2 axioms ( ${Box}^{2} E L$ was also evaluated on subsumption prediction); this evaluation procedure might have introduced biases towards axioms of type GCI2, and influenced the ability of geometric models to predict axioms of other types. Specifically, the lack of negative examples of other axiom types may lead to geometric models in which many axioms are true even if they are not entailed, leading to a decreased ability to find axioms that can be added to a theory in the task of knowledge base completion. Consequently, we also sample negatives for other normal forms and add “negative” losses (i.e., losses for the sampled “negatives”) for all other normal forms.

4.1. ELEmbeddings Negative Losses

For negative loss construction in ELEmbeddings, we employ notations from the ELEmbeddings method where $r_{η} (a), r_{η} (b), r_{η} (e)$ and $f_{η} (a), f_{η} (b), f_{η} (e)$ denote the radius and the ball center associated with classes $a, b, e$ , respectively; and $f_{η} (r_{0})$ denotes the embedding vector associated with role $r$ . $γ$ stands for a margin parameter, and $ε$ is a small positive number. There is a geometric part as well as a regularization part for each new negative loss forcing class centers to lie on a unit $ℓ_{2} -$ sphere.

For ELEmbeddings, as reflected in Eq. (1), we use the original GCI1-BOT loss for disjoint classes; although non-containment of ball corresponding to $A$ within the ball corresponding to B is not equivalent to their disjointness, the loss aims to minimize the classes’ overlap for better optimization:

\begin{aligned} {loss}_{A ⋢ B} (a, b) = & max (0, r_{η} (a) + r_{η} (b) - ‖ f_{η} (a) - f_{η} (b)) ‖ + γ) + \\ + | ‖ f_{η} (a) ‖ - 1 | + | ‖ f_{η} (b) ‖ - 1 | \end{aligned}

(1)

The same logic applies for the negative loss in Eq. (2) where we minimize overlap between the translated ball corresponding to class $A$ and the ball representing $B$ :

\begin{aligned} {loss}_{\exists r . A ⋢ B} (r_{0}, a, b) = & max (0, r_{η} (a) + r_{η} (b) - ‖ f_{η} (a) - f_{η} (r_{0}) - f_{η} (b)) ‖ + γ) + \\ + | ‖ f_{η} (a) ‖ - 1 | + | ‖ f_{η} (b) ‖ - 1 | \end{aligned}

(2)

Negative loss (3) is constructed similarly to the $A ⊓ B ⊑ E$ loss: the first part penalizes non-overlap of the classes $A$ and $B$ (we do not consider the disjointness case since, for every class $X$ , we have $⊥ ⊑ X$ ); furthermore, for negative sampling of axioms of this type, we vary only the $E$ part of GCI1 axioms from the ontology, so the intersection of $A$ and $B$ is non-empty by assumption:

\begin{aligned} {loss}_{A ⊓ B ⋢ E} (a, b, e) = & max (0, - r_{η} (a) - r_{η} (b) + ‖ f_{η} (a) - f_{η} (b)) ‖ - γ) + \\ + max (0, r_{η} (a) - ‖ f_{η} (a) - f_{η} (e)) ‖ + γ) + \\ + max (0, r_{η} (b) - ‖ f_{η} (b) - f_{η} (e)) ‖ + γ) + \\ + | ‖ f_{η} (a) ‖ - 1 | + | ‖ f_{η} (b) ‖ - 1 | + | ‖ f_{η} (e) ‖ - 1 | \end{aligned}

(3)

The second and the third part force the center corresponding to $E$ not to lie in the intersection of balls associated with $A$ and $B$ . Here we do not consider constraints on the radius of the ball for the $E$ class and focus only on the relative positions of the $A, B$ and $E$ class centers and the overlapping of $n$ -balls representing $A$ and $B$ . Since the first part of the loss encourages classes to have a non-empty intersection, we use it as a negative loss for GCI1-BOT axioms:

\begin{aligned} {loss}_{A ⊓ B ⋢ ⊥} (a, b) = & max (0, - r_{η} (a) - r_{η} (b) + ‖ f_{η} (a) - f_{η} (b)) ‖ - γ) + \\ + | ‖ f_{η} (a) ‖ - 1 | + | ‖ f_{η} (b) ‖ - 1 | \end{aligned}

(4)

In the original method losses for axioms of type GCI0-BOT and GCI3-BOT force radii of unsatisfiable classes to become $0$ . For the correspondent negative losses (see Eqs. (5) and (6)) we use the interpretation for satisfiable classes as balls with nonzero radius, that is, with a radius which equals to or greater than some small positive number $ε$ :

\begin{aligned} {loss}_{A ⋢ ⊥} (a) & = max (0, ε - r_{η} (a)) \end{aligned}

(5)

\begin{aligned} {loss}_{\exists r . A ⋢ ⊥} (r_{0}, a) & = max (0, ε - r_{η} (a)) \end{aligned}

(6)

4.2. ELBE Negative Losses

ELBE is a model that relies on boxes instead of balls. Here, similarly, $ε$ is a small positive number, $e_{c} (a), e_{c} (b)$ and $e_{o} (a), e_{o} (b)$ denote the box center and the box offset associated with classes $a, b$ , respectively, $e_{c} (r_{0})$ denotes the embedding vector associated with role $r$ , and $e_{c} (new), e_{o} (new)$ correspond to the center and the offset of the box which is the result of intersection of boxes associated with concepts $a$ and $b$ , $margin$ stands for a margin parameter.

Following the same method of negative loss construction for ELEmbeddings, we use GCI1-BOT loss as a negative loss for $A ⊑ B$ axioms (see Eq. (7)):

\begin{aligned} {loss}_{A ⋢ B} (a, b) = ‖ max (zeros, - | e_{c} (a) - e_{c} (b) | + e_{o} (a) + e_{o} (b) + margin) ‖ \end{aligned}

(7)

Since axis-aligned hyperrectangles are closed under intersection, we also use GCI1-BOT for the intersection of boxes representing $A$ and $B$ concepts and the $E$ box (see Eq. (8)):

\begin{aligned} {loss}_{A ⊓ B ⋢ E} (a, b, e) = ‖ max (zeros, - | e_{c} (new) - e_{c} (e) | + e_{o} (new) + e_{o} (e) + + margin) ‖ \end{aligned}

(8)

This property also allows us to interpret each negative sample for $A ⊓ B ⊑ ⊥$ axioms as a box intersection with nonzero offset (see Eq. (9)):

\begin{aligned} {loss}_{A ⊓ B ⋢ ⊥} (a, b) = max (0, ε - ‖ e_{o} (new) ‖) \end{aligned}

(9)

Other negative losses have the form similar to the ones constructed for ELEmbeddings:

\begin{aligned} {loss}_{\exists r . A ⋢ B} (r_{0}, a, b) & = ‖ max (zeros, - | e_{c} (a) - e_{c} (r_{0}) - e_{c} (b)) | + e_{o} (a) + e_{o} (b) + margin) ‖ \end{aligned}

(10)

\begin{aligned} {loss}_{A ⋢ ⊥} (a) & = max (0, ε - ‖ e_{o} (a) ‖) \end{aligned}

(11)

\begin{aligned} {loss}_{\exists r . A ⋢ ⊥} (r_{0}, a) & = max (0, ε - ‖ e_{o} (a) ‖) \end{aligned}

(12)

4.3.

{Box}^{2} E L

Negative Losses

${Box}^{2} E L$ is also based on boxes but uses a different role modeling compared to ELBE. Additionally making use of the notations from ${Box}^{2} E L$ (Jackermeier et al., 2024), $ε$ is a small positive number, $Box (A), Box (B)$ , $Box (E)$ are boxes associated with classes $a, b, e$ , respectively, $γ$ denotes a margin parameter, $δ$ is a parameter from the GCI2 negative loss, $H e a d (r)$ represents the head box of role $r$ interpretation, and $B u m p (A)$ corresponds to a bump vector associated with concept $A$ .

Equations (13) and (14) are constructed in a similar fashion as for ELBE based on the GCI1-BOT loss which penalizes the element-wise distance $d$ between axis-aligned boxes:

\begin{aligned} {loss}_{A ⋢ B} (a, b) & = ‖ max (0, - (d (Box (A), Box (B)) + γ)) ‖ \end{aligned}

(13)

\begin{aligned} {loss}_{A ⊓ B ⋢ E} (a, b, e) & = ‖ max (0, - (d (Box (A) \cap Box (B), Box (E)) + γ)) ‖ \end{aligned}

(14)

Negative losses 15–17 encourage boxes to be non-empty:

\begin{aligned} {loss}_{A ⋢ ⊥} (a) & = max (0, ε - ‖ o (A) ‖) \end{aligned}

(15)

\begin{aligned} {loss}_{A ⊓ B ⋢ ⊥} (a, b) & = max (0, ε - ‖ o (Box (A) \cap Box (B)) ‖) \end{aligned}

(16)

\begin{aligned} {loss}_{\exists r . A ⋢ ⊥} (r_{0}, a) & = max (0, ε - ‖ o (A) ‖) \end{aligned}

(17)

The GCI3 negative loss reflects the structure of the original GCI3 loss:

\begin{aligned} {loss}_{\exists r . A ⋢ B} (r_{0}, a, b) = (δ - μ (Head (r) - Bump (A), Box (B)))^{2} \end{aligned}

(18)

5. Negative Sampling and Entailments

In the case of knowledge base completion where the deductive closure contains potentially many non-trivial entailed axioms, the random sampling approach for negatives may lead to suboptimal learning since some of the axioms treated as negatives may be entailed (and should therefore be true in any model, in particular the one constructed by the geometric embedding method). As an example, let us consider the simple ontology consisting of two axioms: $A ⊓ B ⊑ E$ and $F ⊑ B$ . For the $A ⊓ B ⊑ E$ axiom, random negative sampling will sample $A ⊓ B ⊑ E^{'}$ where $E^{'}$ is one of $A, B, E, F$ . Since the knowledge base makes the axioms $A ⊓ B ⊑ A$ , $A ⊓ B ⊑ B$ , and $A ⊓ B ⊑ E$ true, in 75% of cases we will sample a negative for this axiom that is actually true in each model.

We suggest to filter selected negatives during training based on the deductive closure of the knowledge base: for each randomly generated axiom to be used as negative, we check whether it is present in the deductive closure and, if it is, we delete it. ${E L}^{+ +}$ reasoners such as ELK (Kazakov et al., 2013) compute subsumption hierarchies, that is, all axioms of the form $A ⊑ B$ in the deductive closure, but not entailed axioms for the other normal forms. We use the inferences computed by ELK (of the form $A ⊑ B$ where $A$ and $B$ are concept names) to design an algorithm that computes (a part of) the deductive closure with respect to the ${E L}^{+ +}$ normal forms. The algorithm implements a sound but incomplete set of inference rules which can quickly generate a partial deductive closure with respect to all normal forms. Algorithm 1 contains inference rules for deriving entailed axioms of type GCI1, GCI2, GCI3, GCI1-BOT and GCI3-BOT from axioms explicitly represented within a knowledge base; GCI0 and GCI0-BOT axioms are precomputed by ELK. Algorithm 2 provides a set of additional rules depending on arbitrary classes and roles represented within a knowledge base after inferred axioms from Algorithm 1 are computed. The purpose of Algorithm 2 is to enrich the approximate deductive closure with axioms involving arbitrary roles and concepts or with axioms of new GCI type which may be missed by applying rules from Algorithm 1 since Algorithm 1 computes entailed axioms based on ontology axioms, concept hierarchy and role inclusions. For example, consider an ontology consisting of two axioms: $A ⊑ E$ and $B ⊑ E$ . Since no GCI1 axioms are present in the ontology, no GCI1 axioms will be entailed by Algorithm 1. Algorithm 2 enables inference of a GCI1 axiom $A ⊓ B ⊑ E$ . Another example is an ontology comprised of axioms $A ⊑ B$ and $B ⊓ E ⊑ F$ . The axiom $A ⊓ ⊥ ⊑ B$ , which is entailed, cannot be inferred by applying rules from Algorithm 1: the concept $A$ is not a subclass of neither $B$ or $E$ . Although we can use ELK or similar reasoners to query for arbitrary entailed axioms, the algorithms we propose have an advantage over this method since it does not require the addition of a new concept to an ontology and recomputing the concept hierarchy. We show a detailed example of the algorithm in Appendix E based on the simple ontology example introduced in Section 6.4.

In the task of knowledge base completion with many non-trivial entailed axioms, the deductive closure can also be used to modify the evaluation metrics, or define novel evaluation metrics that distinguish between entailed and non-entailed axioms. So far, ontology embedding methods that have been applied to the task of knowledge base completion have used evaluation measures that are taken from the task of knowledge graph completion; in particular, they only evaluate knowledge base completion using axioms that are “novel” and not entailed. However, any entailed axiom will be true in all models of the knowledge base, and therefore also in the geometric model that is constructed by the embedding method.

We suggest to filter entailed axioms from training or test sets when the aim is to predict “novel” (i.e., non-entailed) knowledge. The geometric embedding methods generate models making all entailed axioms true in all models. It is expected that methods explicitly constructing models preferentially make entailed axioms true and rank them higher than non-entailed axioms. If the evaluation is based solely on non-entailed axioms, it will consider all similar inferred axioms false, and to avoid this, we may filter such axioms from the ranking list. The more axioms are filtered, the more entailed axioms are predicted by a model.

6. Experiments

6.1. Datasets

6.1.1. Gene Ontology & STRING Data

Following previous works (Jackermeier et al., 2024; Kulmanov et al., 2019; Peng et al., 2022) we use common benchmarks for knowledge-base completion, in particular a task that predicts PPIs based on the functions of proteins. We also use the same data for the task of protein function prediction. For these tasks we use two datasets, each of them consists of the Gene Ontology (GO) (Consortium, 2015) with all its axioms, PPIs and protein function axioms extracted from the STRING database (Mering, 2003); each dataset focuses on only yeast proteins. GO is formalized using OWL 2 EL (Golbreich & Horrocks, 2007).

For the PPI yeast network we use the built-in dataset PPIYeastDataset available in the mOWL (Zhapa-Camacho et al., 2022) Python library (release 0.2.1) where axioms of interest are split randomly into train, validation and test datasets in ratio 90:5:5 keeping pairs of symmetric PPI axioms within the same dataset, and other axioms are placed into the training part; validation and test sets are made up of TBox axioms of type ${P_{1}} ⊑ \exists i n t e r a c t s_w i t h . {P_{2}}$ where $P_{1}, P_{2}$ are protein names. The GO version released on 2021-10-20 and the STRING database version 11.5 were used. Alongside with the yeast $i n t e r a c t s_w i t h$ dataset we collected the yeast $h a s_f u n c t i o n$ dataset organized in the same manner with validation and test parts containing TBox axioms of type ${P} ⊑ \exists h a s_f u n c t i o n . {G O}$ . Based on the information in the STRING database, in PPI yeast, the interacts_with role is symmetric and the dataset is closed against symmetric interactions. We normalize each train ontology using the updated implementation of the jcel (Mendez, 2012) reasoner¹ where we take into consideration newly generated concept and role names. Although role inclusion axioms may be utilized within the ${Box}^{2} E L$ framework we ignore them since neither ELEmbeddings nor ELBE incorporate these types of axioms. Table in the appendix A shows the number of GCIs of each type in the datasets and the number of concepts and roles after normalization. For more precise evaluation of novel knowledge prediction we remove entailed axioms from the test set for function prediction task based on the precomputed deductive closure of the train ontology (see Section 5).

6.1.2. Food Ontology & GALEN Ontology

Food Ontology (Dooley et al., 2018) contains structured information about foods formalized in $S R I Q$ DL expressivity (Chen et al., 2021) involving terms from UBERON (Mungall et al., 2012), NCBITaxon (Federhen, 2012), Plant Ontology (Jaiswal et al., 2005), and others. The GALEN ontology (Rector et al., 1996) represents biomedical concepts related to anatomy, diseases, and others (Rector & Rogers, 2004). For the Food Ontology, the data for subsumption prediction was extracted from the case studies used to evaluate OWL2Vec* (Chen et al., 2021)² ; the train part of the ontology was restricted to the $E L$ fragment and normalized using the jcel (Mendez, 2012) reasoner. In case of GALEN ontology, subsumption axioms were randomly split in ratio 90:5:5 among train, validation and test sets. Since the normalization procedure splits each complex axiom into a set of shorter axioms including subsumptions between atomic concepts from the signature, it may result in adding axioms represented in the validation or test part of the ontology to the train part. To avoid this, we filtered out such axioms from the original validation and test datasets after the train ontology for subsumption prediction was normalized. Additionally, as described in Section 6.1.1, we remove entailed axioms from the test dataset. Statistics about the number of axioms of each GCI type, roles and classes can be found in Appendix B for the Food Ontology and in Appendix C for the GALEN ontology.

6.2. Evaluation Scores and Metrics

For GO & STRING data, we predict GCI2 axioms of type ${P_{1}} ⊑ \exists i n t e r a c t s_w i t h . {P_{2}}$ or ${P} ⊑ \exists h a s_f u n c t i o n . {G O}$ depending on the dataset. On Food Ontology and GALEN ontology, we predict GCI0 axioms of type $A ⊑ B$ , $A$ and $B$ are arbitrary classes from the signature. For each axiom type, we use the corresponding loss expressions to score axioms. This is justified by the fact that objective functions are measures of truth for each axiom within constructed models.

The predictive performance is measured by the Hits@n metrics for $n = 10, 100$ , macro and micro mean rank, and the area under the ROC curve (AUC ROC). For rank-based metrics, we calculate the score of $A ⊑ \exists r . B$ or $A ⊑ B$ for every class $A$ from the test set and for every $B$ from the set $C$ of all classes (or subclasses of a certain type, such as proteins or functions for domain-specific cases) and determine the rank of a test axiom $A ⊑ \exists r . B$ . For macro mean rank and AUC ROC, we consider all axioms from the test set; for micro metrics, we compute corresponding class-specific metrics averaging them over all classes in the signature:

\begin{aligned} m i c r o_M R_{A ⊑ \exists r . B} & = Mean (M R_{A} ({A ⊑ \exists r . B, B \in C})) \end{aligned}

(19)

\begin{aligned} m i c r o_M R_{A ⊑ B} & = Mean (M R_{A} ({A ⊑ B, B \in C})) \end{aligned}

(20)

\begin{aligned} m i c r o_A U C_R O C_{A ⊑ \exists r . B} & = Mean (A U C_R O C_{A} ({A ⊑ \exists r . B, B \in C})) \end{aligned}

(21)

\begin{aligned} m i c r o_A U C_R O C_{A ⊑ B} & = Mean (A U C_R O C_{A} ({A ⊑ B, B \in C})) \end{aligned}

(22)

Additionally, we remove axioms represented in the train set or deductive closures (see Section 5) to obtain corresponding filtered metrics (FHits@n, FMR, FAUC). In related work focusing on knowledge graph completion or knowledge base completion tasks (Bordes et al., 2013; Kulmanov et al., 2019; Peng et al., 2022; Wang et al., 2014), filtered metrics are computed by removing axioms presented within the train set from the list of all ranked axioms. This filtration is applied to eliminate statements learnt by a model during training phase which are therefore likely to have lower rank and to evaluate the predictive performance of a model in a more fair setting.

6.3. Training Procedure

All models are optimized with respect to the sum of individual GCI losses (here we define the loss in most general case using all positive and all negative losses):

\begin{aligned} L = & l_{A ⊑ B} + l_{A ⊓ B ⊑ E} + l_{A ⊑ \exists r . B} + l_{\exists r . A ⊑ B} + l_{A ⊑ ⊥} + l_{A ⊓ B ⊑ ⊥} + l_{\exists r . A ⊑ ⊥} + \\ + l_{A ⋢ B} + l_{A ⊓ B ⋢ E} + l_{A ⋢ \exists r . B} + l_{\exists r . A ⋢ B} + l_{A ⋢ ⊥} + l_{A ⊓ B ⋢ ⊥} + l_{\exists r . A ⋢ ⊥} \end{aligned}

(23)

All model architectures are built using the mOWL (Zhapa-Camacho et al., 2022) library on top of mOWL’s base models. All models were trained using the same fixed random seed.

All models are trained for 2,000 epochs for STRING & GO datasets and 800 epochs for the Food Ontology and GALEN datasets with batch size of 32,768. Training and optimization is performed using Pytorch with Adam optimizer (Kingma & Ba, 2015) and ReduceLROnPlateau scheduler with patience parameter $10$ . We apply early stopping if validation loss does not improve for $20$ epochs. For ELEmbeddings, hyperparameters are tuned using grid search over the following set: margin $γ \in {- 0.1, - 0.01, 0, 0.01, 0.1}$ , embedding dimension ${50, 100, 200, 400}$ , learning rate ${0.01, 0.001, 0.0001}$ ; since none of our datasets contains unsatisfiable classes, we do not tune the parameter $ε$ appearing in GCI0-BOT and GCI3-BOT negative losses. For ELBE, grid search is performed over 60 randomly chosen subsets of the following hyperparameters: embedding dimension ${25, 50, 100, 200}$ , margin ${- 0.1, - 0.01, 0, 0.01, 0.1}$ , $ε \in {0.1, 0.01, 0.001}$ (for experiments with all negative losses involved), learning rate ${0.01, 0.001, 0.0001}$ . The same strategy is applied to ${Box}^{2} E L$ models for embedding dimension ${25, 50, 100, 200}$ , margin $γ \in {- 0.1, - 0.01, 0, 0.01, 0.1}$ , $δ \in {1, 2, 4}$ , $ε \in {0.1, 0.01, 0.001}$ (similarly, for experiments with all negative losses involved), regularization factor $λ \in {0, 0.05, 0.1, 0.2}$ , and learning rate ${0.01, 0.001, 0.0001}$ . For experiments with negatives filtration during training we use the same set of hyperparameters for random and filtered mode of negative sampling. See Appendix D for details on optimal hyperparameters used.

6.4. Results

We evaluate whether adding negative losses for all normal forms will allow for the construction of a better model and improve the performance in the task of knowledge base completion. We test the effect of the expanded negative sampling and negative losses first on a small ontology that can be embedded and visualized in 2D space, and then on a larger application. We formulate and add negative losses for all normal forms given by equations (1)–(17).

First, we investigate a simple example corresponding to the task of protein function prediction using the ELEmbeddings model. Let us consider an ontology consisting of two axioms stating that there are two disjoint functions ${G O_{1}}$ and ${G O_{2}}$ , and proteins having these functions are also disjoint: ${G O_{1}} ⊓ {G O_{2}} ⊑ ⊥$ , $\exists h a s_f u n c t i o n . {G O_{1}} ⊓ \exists h a s_f u n c t i o n . {G O_{2}} ⊑ ⊥$ . After normalization, the last axiom is substituted by the following three axioms: $A ⊓ B ⊑ ⊥$ , $\exists h a s_f u n c t i o n . {G O_{1}} ⊑ B$ , $\exists h a s_f u n c t i o n . {G O_{2}} ⊑ A$ where $A, B$ are new concept names. To visualize the results, we embed these axioms in 2D space. Figure 1(a) shows the embedding generated with the original ELEmbeddings model. Since there are no axioms of type GCI2 represented within the knowledge base, the model learns without any negative examples and demonstrates poor performance compared to the model with incorporated negative losses for all normal forms as demonstrated in Figure 1(b).

Figure 1.

ELEmbeddings example. Dashed circles represent translated classes by role vector corresponding to $h a s_f u n c t i o n$ role. The normalized theory ${{G O_{1}} ⊓ {G O_{2}} ⊑ ⊥,$ $A ⊓ B ⊑ ⊥, \exists h a s_f u n c t i o n . {G O_{1}} ⊑ B, \exists h a s_f u n c t i o n . {G O_{2}} ⊑ A}$ is better preserved when negative losses are incorporated to all normal forms (Figure b) rather than only to GCI2 normal form (Figure a). (a) GCI2 negative loss; (b) All negative losses.

Since we are interested in predicting not only axioms of type $A ⊑ \exists r . B$ for which negative sampling is used in the original ELEmbeddings, ELBE and ${Box}^{2} E L$ , we also examine the effect of all negative losses utilization during training on Food Ontology and GALEN ontology for subsumption prediction (see Tables 3 and 4, respectively). We find that the ELEmbeddings model does not improve on the Food Ontology subsumption prediction task, but ELBE with additional losses improves over the original model; ${Box}^{2} E L$ with additional losses surpasses its version with just GCI2 negative loss in Hits@n metrics. As for the results on GALEN ontology, we find that in case of all three models Hits@n metrics are improved when all negative losses are applied (except Hits@100 metric for ELEmbeddings model) indicating that in this particular case negative losses encourage models to predict more new axioms. Mean rank results are similarly better for ELEmbeddings and ${Box}^{2} E L$ models.

Table 3.

Subsumption Prediction Experiments on Food Ontology.

Model	H@10		H@100		macro_MR			micro_MR			macro_AUC		micro_AUC
	NF	F	NF	F	NF	F	NF-F	NF	F	NF-F	NF	F	NF	F
ELEm	0.12	0.12	0.21	0.21	4659	4656	3	4662	4659	3	0.84	0.84	0.84	0.84
ELEm+l	0.10	0.11	0.19	0.19	5015	5013	2	5020	5017	3	0.83	0.83	0.83	0.83
ELEm+l+n	0.10	0.11	0.19	0.19	5022	5019	3	5027	5024	3	0.83	0.83	0.83	0.83
ELBE	0.01	0.01	0.09	0.09	6695	6692	3	6688	6686	2	0.77	0.77	0.77	0.77
ELBE+l	0.04	0.04	0.14	0.14	5428	5426	2	5412	5409	3	0.81	0.81	0.82	0.82
ELBE+l+n	0.04	0.04	0.14	0.14	5427	5424	3	5410	5408	2	0.81	0.81	0.82	0.82
${Box}^{2} E L$	0.01	0.01	0.10	0.10	3900	3898	2	3877	3874	3	0.87	0.87	0.87	0.87
${Box}^{2} E L$ +l	0.04	0.04	0.13	0.13	7550	7547	3	7555	7553	2	0.74	0.74	0.74	0.74
${Box}^{2} E L$ +l+n	0.05	0.05	0.14	0.14	6865	6862	3	6869	6866	3	0.76	0.76	0.77	0.77

‘‘l’’ corresponds to all negative losses, ‘‘l+n’’ means a model was trained using all negative losses and negatives filtering. For each model we report non-filtered metrics (NF) and filtered metrics with respect to the deductive closure of the train and the test set combined together (F). For macro_MR and micro_MR we additionally report the difference between filtered and non-filtered metrics (NF-F) to check how much of entailed knowledge is predicted on average. Values in bold indicate best metrics.

Table 4.

Subsumption Prediction Experiments on GALEN Ontology.

	H@10		H@100		macro_MR			micro_MR			macro_AUC		micro_AUC
Model	NF	F	NF	F	NF	F	NF-F	NF	F	NF-F	NF	F	NF	F
ELEm	0.15	0.16	0.35	0.35	9106	9105	1	9106	9105	1	0.82	0.82	0.82	0.82
ELEm+l	0.21	0.22	0.34	0.34	8977	8976	1	8977	8976	1	0.82	0.82	0.82	0.82
ELEm+l+n	0.21	0.22	0.33	0.34	9005	9003	2	9005	9003	2	0.82	0.82	0.82	0.82
ELBE	0.08	0.08	0.22	0.22	11236	11234	2	11236	11234	2	0.77	0.78	0.77	0.77
ELBE+l	0.13	0.13	0.34	0.34	11884	11882	2	11884	11882	2	0.76	0.76	0.76	0.76
ELBE+l+n	0.12	0.12	0.34	0.34	11720	11717	3	11720	11717	3	0.77	0.77	0.76	0.76
${Box}^{2} E L$	0.14	0.14	0.31	0.31	11724	11721	3	11724	11721	3	0.77	0.77	0.76	0.76
${Box}^{2} E L$ +l	0.16	0.16	0.37	0.37	11371	11369	2	11371	11369	2	0.77	0.77	0.77	0.77
${Box}^{2} E L$ +l+n	0.16	0.16	0.37	0.37	11378	11376	2	11378	11376	2	0.77	0.77	0.77	0.77

Additionally, we evaluate the performance on a standard benchmark set for PPI prediction (see Table 5). For this task, the test axioms are of the type GCI2. We observe that ELEmbeddings and ELBE with negative losses for all normal forms integrated demonstrate superior performance compared to their initial configurations in terms of Hits@n metrics; it also allows ${Box}^{2} E L$ to lower ranks of test axioms. Generally, for the task of PPI prediction, additional negative sampling improves performance.

Table 5.

Protein–Protein Interaction Prediction Experiments on Yeast Proteins.

Model	H@10	H@100	macro_MR	micro_MR	macro_AUC	micro_AUC
ELEm	0.05	0.31	599.21	701.57	0.90	0.90
ELEm+l	0.06	0.35	532.93	681.02	0.91	0.90
ELEm+l+n	0.06	0.37	519.62	671.19	0.91	0.91
ELBE	0.07	0.37	829.86	1123.47	0.91	0.89
ELBE+l	0.08	0.40	984.92	1259.54	0.84	0.82
ELBE+l+n	0.08	0.40	984.18	1281.20	0.84	0.82
${Box}^{2} E L$	0.05	0.57	215.07	287.16	0.96	0.96
${Box}^{2} E L$ +l	0.05	0.57	200.85	250.17	0.97	0.96
${Box}^{2} E L$ +l+n	0.05	0.58	197.73	250.47	0.97	0.96

‘‘l’’ corresponds to all negative losses, ‘‘l+n’’ means a model was trained using all negative losses and negatives filtering. Non-filtered metrics are reported. Values in bold indicate best metrics.

To summarize the above mentioned observations, we note that in some cases additional negative losses may decrease the ability of models to predict new axioms and encourage models to predict entailed knowledge first (as, e.g., in protein function prediction case) thus leading to construction of a more accurate model of a theory. Since there is a tradeoff between prediction of novel and entailed knowledge, additional negative losses may demonstrate worse performance on novel knowledge prediction.

Using the example introduced above and the ELEmbeddings embedding model, we demonstrate that negatives filtration may be beneficial for constructing a model of a theory. Apart from axioms mentioned earlier, that is, ${G O_{1}} ⊓ {G O_{2}} ⊑ ⊥$ , $A ⊓ B ⊑ ⊥$ , $\exists h a s_f u n c t i o n . {G O_{1}} ⊑ B$ and $\exists h a s_f u n c t i o n . {G O_{2}} ⊑ A$ , we add 10 more axioms about 5 proteins ${P_{1}}, \dots, {P_{5}}$ having function ${G O_{1}}$ (i.e., ${P_{i}} ⊑ \exists h a s_f u n c t i o n . {G O_{1}}, i = 1, \dots, 5$ ), and 5 proteins ${Q_{1}}, \dots, {Q_{5}}$ having function ${G O_{2}}$ (i.e., ${Q_{i}} ⊑ \exists h a s_f u n c t i o n . {G O_{2}}, i = 1, \dots, 5$ ). Figure 2 shows the constructed models with and without negatives filtering. We observe that the model with filtered negatives provides faithful representation of GCI3 axiom $\exists h a s_f u n c t i o n . {G O_{2}} ⊑ A$ and axioms introducing proteins having function ${G O_{2}}$ as opposed to its counterpart with random negatives: according to geometric interpretation, for GCI3 axioms $\exists r . A ⊑ B$ to be faithfully represented, the $n$ -ball interpreting the concept $A$ translated by $- r_{0}$ role $r$ vector should lie inside the $n$ -ball interpreting the concept $B$ . We see that this hold true on Figure 2(b) yet not on Figure 2(a).

Figure 2.

ELEmbeddings example. Dashed circles represent translated classes by role vector corresponding to $h a s_f u n c t i o n$ role. ‘Red’ classes represent proteins ${Q_{1}}, \dots, {Q_{5}}$ , ‘green’ classes represent proteins ${P_{1}}, \dots, {P_{5}}$ . Axioms ${Q_{i}} ⊑ \exists h a s_f u n c t i o n . {G O_{2}},$ $i = 1, \dots, 5$ are better preserved when negatives are filtered based on precomputed deductive closure (Figure b) rather than when random negatives are sampled (Figure a). The same applies for the axiom $\exists h a s_f u n c t i o n . {G O_{2}} ⊑ A$ . (a) With random negatives; (b) With filtered negatives.

Tables 3 –5 show results in the tasks of PPI and subsumption prediction. We find that excluding axioms in the deductive closure for negative selection slightly improves or yields similar results. One possible reason is that a randomly chosen axiom is very unlikely to be entailed since very few axioms are entailed compared to all possible axioms to choose from.

Because the chance of selecting an entailed axiom as a negative depends on the knowledge base on which the embedding method is applied, we perform additional experiments on Food Ontology with ELEmbeddings model where we bias the selection of negatives; we chose between 100% negatives to 0% negatives from the entailed axioms. We find that reducing the number of entailed axioms from the negatives has an effect to improve performance and the effect increases the more axioms would be chosen from the entailed ones (see Figure 3).

Figure 3.

Metrics reported for biased fraction of random negatives combined with entailed axioms from the precomputed deductive closure. (a) H@1, H@10, H@100 and ROC AUC; (b) macro_MR.

We compute filtered metrics for the protein function and subsumption prediction tasks. Both of them account for entailed axioms prediction since if, for example, $A ⊑ B$ is being predicted then first models may predict axioms of type $A ⊑ B^{'}$ where $B^{'}$ is any superclass of $B$ ; the same is true for function prediction axioms ${P} ⊑ \exists h a s_f u n c t i o n . {G O}$ and all superclasses ${G O^{'}}$ of ${G O}$ class. Note that the PPI prediction task is not tailored for evaluation using deductive closures of the train or test set: for each protein ${P}$ its subclasses include only $⊥$ and superclasses include only $⊤$ . As a result, the only inferred axioms will be of type $⊥ ⊑ \exists i n t e r a c t s_w i t h . {P}$ , ${P_{1}} ⊑ \exists i n t e r a c t s_w i t h . {P_{2}}$ or ${P} ⊑ \exists i n t e r a c t s_w i t h . ⊤$ , and filtered metrics may be computed only with respect to the train part of the ontology. For this reason we do not report filtered metrics for PPI prediction task (Table 5).

For function prediction and subsumption prediction, we employ filtration of metrics based on the deductive closure of the train set and of the test set. Tables 3, 4 and 6 contain results for subsumption prediction on Food Ontology, subsumption prediction on GALEN ontology and function prediction on GO, respectively.

Table 6.

Protein Function Prediction Experiments on Yeast proteins.

	H@10		H@100		macro_MR			micro_MR			macro_AUC		micro_AUC
Model	NF	F	NF	F	NF	F	NF-F	NF	F	NF-F	NF	F	NF	F
ELEm	0.01	0.01	0.03	0.03	21198	21150	48	21165	21118	47	0.62	0.62	0.63	0.63
ELEm+l	0.00	0.00	0.03	0.03	9603	9575	28	9449	9423	26	0.83	0.83	0.84	0.84
ELEm+l+n	0.00	0.00	0.03	0.03	9488	9460	28	9334	9307	27	0.83	0.83	0.84	0.84
ELBE	0.03	0.03	0.24	0.24	4229	4209	20	4156	4137	19	0.92	0.92	0.93	0.93
ELBE+l	0.00	0.00	0.01	0.01	12920	12865	55	12797	12745	52	0.77	0.77	0.78	0.78
ELBE+l+n	0.00	0.00	0.01	0.01	12900	12845	55	12772	12719	53	0.77	0.77	0.78	0.78
${Box}^{2} E L$	0.28	0.31	0.55	0.55	1988	1979	9	1988	1980	8	0.96	0.96	0.97	0.97
${Box}^{2} E L$ +l	0.24	0.27	0.54	0.55	2129	2120	9	2099	2091	8	0.96	0.96	0.97	0.97
${Box}^{2} E L$ +l+n	0.24	0.27	0.54	0.55	2161	2152	9	2147	2139	8	0.96	0.96	0.96	0.96

Our findings suggest that the baseline ELEmbeddings predicts primarily entailed axioms of GCI2 type, yet for GCI0 on Food Ontology the model predicts “novel” knowledge first whereas the model modifications with additional negative losses and negatives filtration derive entailed knowledge in the first place. For the GALEN ontology, however, the situation is similar to the protein function prediction case, that is, novel knowledge is predicted in the first place for modifications with additional negative losses and negatives filtration. This may indicate model construction where many classes overlap or “collapse” for all negative losses and negatives filtering case since the GALEN ontology does not contain disjointness axioms and consequently no classes will be separated by the model. The same holds for ELBE and ${Box}^{2} E L$ models. Losses for all normal forms and negatives filtering during training aid ELBE and ${Box}^{2} E L$ to construct model-generated embeddings which first predict logically inferred knowledge and then non-entailed axioms of type GCI2 or GCI0 (on Food Ontology), respectively. The results indicate that models with all types of valid negatives in most cases explicitly construct models.

7. Discussion

We evaluated properties of ELEmbeddings, ELBE and ${Box}^{2} E L$ , ontology embedding methods that aims to generate a model of an ${E L}^{+ +}$ theory; the properties we evaluate hold similarly for other ontology embedding methods that construct models of ${E L}^{+ +}$ theories. While we demonstrate several improvements over the original model, we can also draw some general conclusions about ontology embedding methods and their evaluation. Knowledge base completion is the task of predicting axioms that should be added to a knowledge base; this task is adapted from knowledge graph completion where triples are added to a knowledge graph. The way both tasks are evaluated is by removing some statements (axioms or triples) from the knowledge base, and evaluating whether these axioms or triples can be recovered by the embedding method. This evaluation approach is adequate for knowledge graphs which do not give rise to many entailments. However, knowledge bases give rise to potentially many non-trivial entailments that need to be considered in the evaluation. In particular, embedding methods that aim to generate a model of a knowledge base will first generate entailed axioms (because entailed axioms are true in all models); these methods perform knowledge base completion as a generalization of generating the model where either other statements may be true, or they may be approximately true in the generated structure. This has two consequences: the evaluation procedure needs to account for this; and the model needs to be sufficiently rich to allow useful predictions.

We have introduced a method to compute the deductive closure of ${E L}^{+ +}$ knowledge bases; this method relies on an automated reasoner and is sound. We use all the axioms in the deductive closure as positive axioms to be predicted when evaluating knowledge base completion, to account for methods that treat knowledge base completion as a generalization of constructing a model and testing for truth in this model. We find that some models (e.g., modified box-based models using valid negatives of all types) can predict entailed axioms well, some (e.g., the original ${Box}^{2} E L$ model) preferentially predict “novel”, non-entailed axioms; these methods solve subtly different problems (either generalizing construction of a model, or specifically predicting novel non-entailed axioms). We also modify the evaluation procedure to account for the inclusion of entailed axioms as positives; however, the evaluation measures are still based on ranking individual axioms and do not account for semantic similarity. For example, if during testing, the correct axiom to predict is $A ⊑ \exists r . B$ but the predicted axiom is $A ⊑ \exists r . E$ , the prediction may be considered to be “more correct” if $B ⊑ E$ was in the knowledge base than if $B ⊓ E ⊑ ⊥$ was in the knowledge base. Novel evaluation metrics need to be designed to account for this phenomenon, similarly to ontology-based evaluation measures used in life sciences (Radivojac & Clark, 2013). It is also important to expand the set of benchmark sets for knowledge base completion.

Use of the deductive closure is useful not only in evaluation but also when selecting negatives. In formal knowledge bases, there are at least two ways in which negatives for axioms can be chosen: they are either non-entailed axioms, or they are axioms whose negation is entailed. However, in no case should entailed axioms be considered as negatives; we demonstrate that filtering entailed axioms from selected negatives during training improves the performance of the embedding method consistently in knowledge base completion (and, obviously, more so when entailed axioms are considered as positives during evaluation).

While we only report our experiments with ELEmbeddings, ELBE, and ${Box}^{2} E L$ , our findings, in particular about the evaluation and use of deductive closure, are applicable to other geometric ontology embedding methods. As ontology embedding methods are increasingly applied in knowledge-enhanced learning and other tasks that utilize some form of approximate computation of entailments, our results can also serve to improve the applications of ontology embeddings.

Footnotes

Acknowledgments

This work has been supported by funding from King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/4355-01-01, URF/1/4675-01-01, URF/1/4697-01-01, URF/1/5041-01-01, and REI/1/5334-01-01. This work was supported by the SDAIA–KAUST Center of Excellence in Data Science and Artificial Intelligence (SDAIA–KAUST AI), by funding from King Abdullah University of Science and Technology (KAUST) – KAUST Center of Excellence for Smart Health (KCSH) under award number 5932, and by funding from King Abdullah University of Science and Technology (KAUST) – KAUST Center of Excellence for Generative AI under award number 5940. We acknowledge support from the KAUST Supercomputing Laboratory.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

This work has been supported by funding from King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/4675-01-01, URF/1/4697-01-01, URF/1/5041-01-01, REI/1/5235-01-01, and REI/1/5334-01-01. This work was supported by funding from King Abdullah University of Science and Technology (KAUST) -- KAUST Center of Excellence for Smart Health (KCSH), under award number 5932, and by funding from King Abdullah University of Science and Technology (KAUST) -- Center of Excellence for Generative AI, under award number 5940.

Notes

ORCID iDs

Olga Mashkova

Fernando Zhapa-Camacho

Appendix A GO & STRING Data Statistics,Train Part

Dataset	GCI0	GCI1	GCI2	GCI3	GCI0_BOT	GCI1_BOT	GCI3_BOT	Classes	Roles	Test axioms
Yeast iw	81,068	11,825	269,567	11,823	0	31	0	61,846	16	12,040
Yeast hf	81,068	11,825	290,433	11,823	0	31	0	61,850	16	1,530

Appendix B Food Ontology Statistics,Train Part

GCI0	GCI1	GCI2	GCI3	GCI0_BOT	GCI1_BOT	GCI3_BOT	Classes	Roles	Test axioms
21,795	1,267	10,719	897	0	495	0	24,969	43	5,752

Appendix C GALEN Ontology Statistics,Train Part

GCI0	GCI1	GCI2	GCI3	GCI0_BOT	GCI1_BOT	GCI3_BOT	Classes	Roles	Test axioms
27,339	15,613	29,618	15,615	0	0	0	49,223	888	667

Appendix D Hyperparameters

Dataset	Model	dim	lr	$γ$	$ϵ$	$δ$	$λ$
Yeast iw	ELEm	100	0.0001	-0.10
	ELEm+l	50	0.0001	0.00
	ELBE	200	0.0001	0.00
	ELBE+l	200	0.0100	0.00	0.001
	${Box}^{2} E L$	200	0.0010	0.01		1	0.05
	${Box}^{2} E L$ +l	200	0.0010	0.01	0.010	2	0.05
Yeast hf	ELEm	200	0.0001	0.01
	ELEm+l	50	0.0001	-0.10
	ELBE	200	0.0001	0.10
	ELBE+l	200	0.0001	0.10	0.010
	${Box}^{2} E L$	200	0.0100	0.10		4	0.20
	${Box}^{2} E L$ +l	200	0.0100	0.10	0.010	4	0.05
FoodOn	ELEm	400	0.0010	-0.10
	ELEm+l	400	0.0010	-0.10
	ELBE	200	0.0100	0.10
	ELBE+l	200	0.0100	-0.01	0.001
	${Box}^{2} E L$	100	0.0100	0.10		1	0.20
	${Box}^{2} E L$ +l	200	0.0010	0.10	0.010	4	0.10
GALEN	ELEm	400	0.0010	-0.10
	ELEm+l	400	0.0010	-0.01
	ELBE	100	0.0010	0.10
	ELBE+l	200	0.0010	0.01	0.010
	${Box}^{2} E L$	200	0.0010	0.00		4	0.05
	${Box}^{2} E L$ +l	200	0.0100	0.00	0.100	1	0.05

Appendix E Deductive Closure Computation Example

Let us add two more axioms to the simple ontology example from Section 6.4 about proteins ${P}$ and ${Q}$ having functions ${G O_{1}}$ and ${G O_{2}}$ , respectively. ELK will infer the following class hierarchy:

For GCI2 axioms ${P} ⊑ \exists h a s_f u n c t i o n . {G O_{1}}$ and ${Q} ⊑ \exists h a s_f u n c t i o n . {G O_{2}}$ the algorithm will output.

For GCI3 axioms $\exists h a s_f u n c t i o n . {G O_{1}} ⊑ B$ and $\exists h a s_f u n c t i o n . {G O_{2}} ⊑ A$ the algorithm will infer

In this small protein function prediction example there are two disjointness axioms: $A ⊓ B ⊑ ⊥$ and ${G O_{1}} ⊓ {G O_{2}} ⊑ ⊥$ . Taking into consideration the concept hierarchy and inference rules from part 2 the algorithm will infer the following GCI1 and GCI1_BOT axioms:

Appendix F Deductive Closure Computation Soundness

Let us show that each inference rule provides truth statements:

$\frac{A ⊓ B ⊑ E A^{'} ⊑ A B^{'} ⊑ B E ⊑ E^{'}}{A^{'} ⊓ B^{'} ⊑ E^{'}}$

Let $I$ be an arbitrary interpretation. Since $I ⊨ A ⊓ B ⊑ E$ then $A^{I} \cap B^{I} \subseteq E^{I}$ . Also, $A^{' I} \subseteq A^{I}$ , $B^{' I} \subseteq B^{I}$ and $E^{I} \subseteq E^{' I}$ . From this we derive $A^{' I} \cap B^{' I} \subseteq E^{' I}$ , that is, $I ⊨ A^{'} ⊓ B^{'} ⊑ E^{'}$ .

$\frac{A ⊑ \exists r . B A^{'} ⊑ A B ⊑ B^{'} r ⊑ r^{'}}{A^{'} ⊑ \exists r^{'} . B^{'}}$

Let $I$ be an arbitrary interpretation. Since $I ⊨ A ⊑ \exists r . B$ then $A^{I} \subseteq {a \in Δ^{I} | \exists b \in Δ^{I} : (a, b) \in r^{I} \land b \in B^{I}}$ . Additionally, $A^{' I} \subseteq A^{I}$ , $B^{I} \subseteq B^{' I}$ and for arbitrary $a, b \in Δ^{I}$ $(a, b) \in r^{I} \Rightarrow (a, b) \in r^{' I}$ . Then $A^{' I} \subseteq {a \in Δ^{I} | \exists b \in Δ^{I} : (a, b) \in r^{' I} \land b \in B^{' I}}$ , that is, $I ⊨ A^{'} ⊑ \exists r^{'} . B^{'}$ .

$\frac{A ⊑ \exists r . B B ⊑ \exists r^{'} . E r \circ r^{'} ⊑ s}{A ⊑ \exists s . E}$

Let $I$ be an arbitrary interpretation. Since $I ⊨ A ⊑ \exists r . B$ and $I ⊨ B ⊑ \exists r^{'} . E$ then $A^{I} \subseteq {a \in Δ^{I} | \exists b \in Δ^{I} : (a, b) \in r^{I} \land b \in B^{I}}$ and $B^{I} \subseteq {b \in Δ^{I} | \exists c \in Δ^{I} : (b, c) \in r^{' I} \land c \in E^{I}}$ . For arbitrary $a, b, c \in Δ^{I}$ $(a, b) \in r^{I} \land (b, c) \in r^{' I} \Rightarrow (a, c) \in s^{I}$ . Then $A^{I} \subseteq {a \in Δ^{I} | \exists c \in Δ^{I} : (a, c) \in s^{I} \land c \in E^{I}}$ , that is, $I ⊨ A ⊑ \exists s . E$ .

$\frac{\exists r . A ⊑ B A^{'} ⊑ A B ⊑ B^{'} r^{'} ⊑ r}{\exists r^{'} . A^{'} ⊑ B^{'}}$

Let $I$ be an arbitrary interpretation. Since $I ⊨ \exists r . A ⊑ B$ then ${a \in Δ^{I} | \exists b \in Δ^{I} : (a, b) \in r^{I} \land b \in A^{I}} \subseteq D^{I}$ . Also, $A^{' I} \subseteq A^{I}$ , $B^{I} \subseteq B^{' I}$ and for arbitrary $a, b \in Δ^{I}$ $(a, b) \in r^{' I} \Rightarrow (a, b) \in r^{I}$ . From this follows ${a \in Δ^{I} | \exists b \in Δ^{I} : (a, b) \in r^{' I} \land b \in A^{' I}} \subseteq B^{' I}$ , that is, $I ⊨ \exists r^{'} . A^{'} ⊑ B^{'}$ .

$\frac{A ⊓ B ⊑ ⊥ A^{'} ⊑ A B^{'} ⊑ B}{A^{'} ⊓ B^{'} ⊑ ⊥}$

Let $I$ be an arbitrary interpretation. Since $I ⊨ A ⊓ B ⊑ ⊥$ then $A^{I} \cap B^{I} = \emptyset$ . Additionally, Also, $A^{' I} \subseteq A^{I}$ , $B^{' I} \subseteq B^{I}$ , from where we can derive $A^{' I} \cap B^{' I} = \emptyset$ , that is, $I ⊨ A^{'} ⊓ B^{'} ⊑ ⊥$ .

$\frac{A ⊓ B ⊑ ⊥}{A ⊓ B ⊑ E}$

Let $I$ be an arbitrary interpretation. Since $I ⊨ A ⊓ B ⊑ ⊥$ then $A^{I} \cap B^{I} = \emptyset$ . Since $\emptyset \subseteq E^{I}$ for any concept $E$ then $A^{I} \cap B^{I} \subseteq E^{I}$ , that is, $I ⊨ A ⊓ B ⊑ E$ .

$\frac{\exists r . A ⊑ ⊥ A^{'} ⊑ A r^{'} ⊑ r}{\exists r^{'} . A^{'} ⊑ ⊥}$

Let $I$ be an arbitrary interpretation. Since $I ⊨ \exists r . A ⊑ ⊥$ then ${a \in Δ^{I} | \exists b \in Δ^{I} : (a, b) \in r^{I} \land b \in A^{I}} = \emptyset$ . Additionally, $A^{' I} \subseteq A^{I}$ and for arbitrary $a, b \in Δ^{I}$ $(a, b) \in r^{' I} \Rightarrow (a, b) \in r^{I}$ . From this we get that ${a \in Δ^{I} | \exists b \in Δ^{I} : (a, b) \in r^{' I} \land b \in A^{' I}} = \emptyset$ , that is, $I ⊨ \exists r^{'} . A^{'} ⊑ ⊥$ .

$\bar{A ⊓ ⊥ ⊑ E}$

Let $I$ be an arbitrary interpretation. For an arbitrary concept $A$ $(A ⊓ ⊥)^{I} = A^{I} \cap \emptyset = \emptyset$ and for every concept $E$ we have $\emptyset \subseteq E^{I}$ . Hence $I ⊨ A ⊓ ⊥ ⊑ E$ .

$\frac{B ⊑ ⊥}{A ⊓ B ⊑ E}$

Let $I$ be an arbitrary interpretation. Since $I ⊨ B ⊑ ⊥$ then $B^{I} = \emptyset$ . On analogy with the previous case we get $I ⊨ A ⊓ B ⊑ E$ for arbitrary concepts $A$ and $E$ .

$\frac{E ⊑ E^{'}}{A ⊓ E ⊑ E^{'}}$

Let $I$ be an arbitrary interpretation. Since $I ⊨ E ⊑ E^{'}$ then $E^{I} \subseteq E^{' I}$ . For every concept $A$ $(A ⊓ E)^{I} = A^{I} \cap E^{I} \subseteq E^{I} \subseteq E^{' I}$ , hence $I ⊨ A ⊓ E ⊑ E^{'}$ .

$\frac{A ⊓ B ⊑ ⊥}{A ⊓ B ⊑ E}$

Let $I$ be an arbitrary interpretation. Since $I ⊨ A ⊓ B ⊑ ⊥$ then $A^{I} \cap B^{I} = \emptyset$ . Since $\emptyset \subseteq E^{I}$ for every concept $E$ then $I ⊨ A ⊓ B ⊑ E$ .

$\frac{A ⊑ E B ⊑ E A^{'} ⊑ A B^{'} ⊑ B E ⊑ E^{'}}{A^{'} ⊓ B^{'} ⊑ E^{'}}$

Let $I$ be an arbitrary interpretation. Since $I ⊨ A ⊑ E$ and $I ⊨ B ⊑ E$ then $A^{I} \subseteq E^{I}$ and $B^{I} \subseteq E^{I}$ . Note that $A^{I} \cap B^{I} \subseteq A^{I} \subseteq E^{I}$ , from this we get $I ⊨ A ⊓ B ⊑ E$ . On analogy with case 1 we derive that $I ⊨ A^{'} ⊓ B^{'} ⊑ E^{'}$ .

$\frac{A ⊑ A^{'}}{A ⊓ ⊤ ⊑ A^{'}}$

Let $I$ be an arbitrary interpretation. Since $I ⊨ A ⊑ A^{'}$ then $A^{I} \subseteq A^{' I}$ . By definition of $⊤$ , $(A ⊓ ⊤)^{I} = A^{I} \cap T^{I} = A^{I}$ , hence $I ⊨ A ⊓ ⊤ ⊑ A^{'}$ .

$\bar{⊥ ⊑ \exists r . B}$

Follows immediately from the fact that $\emptyset$ is a subset of any concept interpretation.

$\frac{A ⊑ ⊥}{A ⊑ \exists r . B}$

Let $I$ be an arbitrary interpretation. Since $I ⊨ A ⊑ ⊥$ then $A^{I} = \emptyset$ . On analogy with case 14 we get $I ⊨ A ⊑ \exists r . B$ .

$\bar{\exists r . A ⊑ ⊤}$

Follows immediately from the fact that $⊤^{I} = Δ^{I}$ .

References

Althagafi

Zhapa-Camacho

Hoehndorf

(2024). Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning. Bioinformatics (Oxford, England), 40(5), btae301. https://doi.org/10.1093/bioinformatics/btae301

Baader

(2003). The description logic handbook: Theory, implementation and applications. Cambridge University Press.

Baader

Brandt

Lutz

(2005). Pushing the

E L

envelope. In Proceedings of the nineteenth international joint conference on artificial intelligence IJCAI-05. Morgan-Kaufmann Publishers.

Bordes

Usunier

Garcia-Duran

Weston

Yakhnenko

(2013). Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems (Vol. 26). Curran Associates, Inc.

Bouraoui

Jameel

Schockaert

(2017). Inductive reasoning about ontologies using conceptual spaces. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1). https://doi.org/10.1609/aaai.v31i1.11162

Cao

Hao

Chen

Liu

Jiang

Zhao

(2023). Event ontology completion with hierarchical structure evolution networks. In Proceedings of the 2023 conference on empirical methods in natural language processing (pp. 306–320).

Chen

Althagafi

Hoehndorf

(2020a). Predicting candidate genes from phenotypes, functions and anatomical site of expression. Bioinformatics (Oxford, England), 37(6), 853–860. https://doi.org/10.1093/bioinformatics/btaa879

Chen

Jimenez-Ruiz

Holter

O. M.

Antonyrajah

Horrocks

(2021). OWL2Vec*: Embedding of OWL ontologies. Machine Learning, 110(7), 1813–1845. https://doi.org/10.1007/s10994-021-05997-6

Chen

Wang

Zhao

Cheng

Zhao

Duan

(2020b). Knowledge graph completion: A review. IEEE Access, 8, 192435.

10.

Consortium

G. O.

(2015). Gene ontology consortium: Going forward. Nucleic Acids Research, 43(D1), D1049–D1056.

11.

d’Amato

Fanizzi

Fazzinga

Gottlob

Lukasiewicz

(2012). Ontology-based semantic search on the web and its combination with the power of inductive reasoning. Annals of Mathematics and Artificial Intelligence, 65(2–3), 83–121. https://doi.org/10.1007/s10472-012-9309-7

12.

Devlin

Chang

Lee

Toutanova

(2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran & T. Solorio (Eds.), Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019 (Long and Short Papers) (Vol. 1, pp. 4171–4186). Association for Computational Linguistics. https://doi.org/10.18653/v1/n19-1423

13.

Dooley

D. M.

Griffiths

E. J.

Gosal

G. S.

Buttigieg

P. L.

Hoehndorf

Lange

M. C.

Schriml

L. M.

Brinkman

F. S.

Hsiao

W. W.

(2018). Foodon: A harmonized food ontology to increase global food traceability, quality control and data integration. npj Science of Food, 2(1), 23.

14.

Federhen

(2012). The NCBI taxonomy database. Nucleic Acids Research, 40(D1), D136–D143.

15.

Golbreich

Horrocks

(2007). The OBO to OWL mapping, GO to OWL 1.1! In C. Golbreich, A. Kalyanpur & B. Parsia (Eds.), Proceedings of the OWLED 2007 workshop on OWL: Experiences and Directions, Innsbruck, Austria, June 6-7, 2007, CEUR Workshop Proceedings (Vol. 258). CEUR-WS.org. https://ceur-ws.org/Vol-258/paper35.pdf

16.

Jackermeier

Chen

Horrocks

(2024). Dual box embeddings for the description logic EL++. In Proceedings of the ACM web conference 2024, WWW ’24. https://doi.org/10.1145/3589334.3645648

17.

Jaiswal

Avraham

Ilic

Kellogg

E. A.

McCouch

Pujar

Reiser

Rhee

S. Y.

Sachs

M. M.

Schaeffer

Stein

(2005). Plant Ontology (PO): A controlled vocabulary of plant structures and growth stages. Comparative and Functional Genomics, 6(7-8), 388–397.

18.

Ren

(2023). Ontology revision based on pre-trained language models.

19.

Jiang

Huang

Nickel

Tresp

(2012). Combining information extraction, deductive reasoning and machine learning for relation prediction (pp. 164–178). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-30284-8_18

20.

Kazakov

Krötzsch

Simančík

(2013). The incredible ELK. Journal of Automated Reasoning, 53(1), 1–61. https://doi.org/10.1007/s10817-013-9296-3

21.

Kingma

D. P.

(2015). Adam: A method for stochastic optimization. In Y. Bengio & Y. LeCun (Eds.), 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, conference track proceedings.

22.

Kulmanov

Liu-Wei

Yan

Hoehndorf

(2019). El embeddings: Geometric construction of models for the description logic el ++. In International joint conference on artificial intelligence.

23.

Lacerda

Ozaki

Guimarães

(2024a). Faithel: Strongly tbox faithful knowledge base embeddings for EL. In International joint conference on rules and reasoning (pp. 191–199). Springer.

24.

Lacerda

Ozaki

Guimarães

(2024b). Strong faithfulness for ELH ontology embeddings. Transactions on Graph Data and Knowledge, 2(3), 2-1.

25.

Bailleux

Bouraoui

Schockaert

(2024). Ontology completion with natural language inference and concept embeddings: An analysis.

26.

Mashkova

Zhapa-Camacho

Hoehndorf

(2024). Enhancing geometric ontology embeddings for el++ with negative sampling and deductive closure filtering. In International conference on neural-symbolic learning and reasoning (pp. 331–354). Springer.

27.

Mendez

(2011). A classification algorithm for elhifr+. Dresden University of Technology.

28.

Mendez

(2012). JCEL: A modular rule-based reasoner. In I. Horrocks, M. Yatskevich & E. Jiménez-Ruiz (Eds.), Proceedings of the 1st international workshop on owl reasoner evaluation (ORE-2012), Manchester, UK, July 1st, 2012, CEUR Workshop Proceedings (Vol. 858). CEUR-WS.org. https://ceur-ws.org/Vol-858/ore2012_paper12.pdf

29.

Mering

(2003). String: A database of predicted functional associations between proteins. Nucleic Acids Research, 31(1), 258–261. https://doi.org/10.1093/nar/gkg034

30.

Mežnar

Bevec

Lavrač

Škrlj

(2022). Ontology completion with graph-based machine learning: A comprehensive evaluation. Machine Learning and Knowledge Extraction, 4(4), 1107–1123. https://doi.org/10.3390/make4040056

31.

Mikolov

Chen

Corrado

Dean

(2013). Efficient estimation of word representations in vector space. In Y. Bengio & Y. LeCun (Eds.), 1st International conference on learning representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings.

32.

Mohapatra

Bhatia

Mutharaju

Srinivasaraghavan

(2021). Emelvar: A neurosymbolic reasoner for the el++ description logic. In SemREC@ ISWC (pp. 44–51).

33.

Mondal

Bhatia

Mutharaju

(2021). Emel++: Embeddings for EL++ description logic. In A. Martin, K. Hinkelmann, H. Fill, A. Gerber, D. Lenat, R. Stolle, & F. van Harmelen (Eds.), Proceedings of the AAAI 2021 spring symposium on combining machine learning and knowledge engineering (AAAI-MAKE 2021), Stanford University, Palo Alto, CA, March 22–24, 2021, CEUR Workshop Proceedings (Vol. 2846). CEUR-WS.org.

34.

Mungall

C. J.

Torniai

Gkoutos

G. V.

Lewis

S. E.

Haendel

M. A.

(2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13, 1–20.

35.

Özcep

O. L.

Leemhuis

Wolter

(2023). Embedding ontologies in the description logic alc by axis-aligned cones. Journal of Artificial Intelligence Research, 78, 217–267. https://doi.org/10.1613/jair.1.13939

36.

Peñaloza

Turhan

A. Y.

(2008). Instance-based non-standard inferences in with subjective probabilities. In International workshop on uncertainty reasoning for the semantic web (pp. 80–98). Springer.

37.

Peng

Tang

Kulmanov

Niu

Hoehndorf

(2022). Description logic el++ embeddings with intersectional closure.

38.

Radivojac

Clark

W. T.

(2013). Information-theoretic evaluation of predicted ontological annotations. Bioinformatics (Oxford, England), 29(13), i53–i61. https://doi.org/10.1093/bioinformatics/btt228

39.

Rector

Rogers

(2004). Patterns, properties and minimizing commitment: Reconstruction of the galen upper ontology in owl. In Proceedings of the EKAW (Vol. 4).

40.

Rector

A. L.

Rogers

J. E.

Pole

(1996). The galen high level ontology. In Medical Informatics Europe’96 (pp. 174–178). IOS Press.

41.

Sato

Stapleton

Jamnik

Shams

(2018). Deductive reasoning about expressive statements using external graphical representations. In Proceedings of the 40th annual conference of the cognitive science society. Cognitive Science Society. CogSci 2018; Conference date: 25-07-2018 Through 28-07-2018.

42.

Shiraishi

Kaneiwa

(2024). A self-matching training method with annotation embedding models for ontology subsumption prediction.

43.

Suntisrivaraporn

D. I. B.

(2009). Polynomial-time reasoning support for design and maintenance of large-scale biomedical ontologies.

44.

Tarski

(1936). On the concept of logical consequence. Logic, Semantics, Metamathematics, 52, 409–420.

45.

Wang

Mao

Wang

Guo

(2017). Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering, 29(12), 2724–2743.

46.

Wang

Zhang

Feng

Chen

(2014). Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI conference on artificial intelligence (Vol. 28).

47.

Xiong

Potyka

Tran

T. K.

Nayyeri

Staab

(2022). Faithful embeddings for EL++ knowledge bases. In Proceedings of the 21st international semantic web conference (ISWC2022) (pp. 1–18).

48.

Yang

Chen

Sattler

(2025). Transbox: El++-closed ontology embedding. In Proceedings of the ACM on web conference 2025 (pp. 22–34).

49.

Zhapa-Camacho

Hoehndorf

(2023). From axioms over graphs to vectors, and back again: Evaluating the properties of graph-based ontology embeddings. In A. S. d’Avila Garcez, T. R. Besold, M. Gori & E. Jiménez-Ruiz (Eds.) Proceedings of the 17th international workshop on neural-symbolic learning and reasoning, La Certosa di Pontignano, Siena, Italy, July 3-5, 2023, CEUR Workshop Proceedings (Vol. 3432, pp. 85–102). CEUR-WS.org. https://ceur-ws.org/Vol-3432/paper7.pdf

50.

Zhapa-Camacho

Kulmanov

Hoehndorf

(2022). mOWL: Python library for machine learning with biomedical ontologies. Bioinformatics (Oxford, England), 39(1), btac811. https://doi.org/10.1093/bioinformatics/btac811

DELE: Deductive E L + + Embeddings for Knowledge Base Completion

Abstract

Keywords

1. Introduction

2.1. Description Logic E L + +

Table 1. Normalized Forms of E L + + Generalized Concept Inclusions (GCIs) and Role Inclusions (RIs). Acronym Axiom Type GCI0 A ⊑ B GCI1 A ⊓ B ⊑ E GCI2 A ⊑ ∃ r . B GCI3 ∃ r . A ⊑ B GCI0-BOT A ⊑ ⊥ GCI1-BOT A ⊓ B ⊑ ⊥ GCI3-BOT ∃ r . A ⊑ ⊥ RI0 r ⊑ s RI1 r 1 ∘ r 2 ⊑ s

2.3. Deductive Closure

3. Related Work

3.1. Graph-Based Ontology Embeddings

3.2. Geometric-Based Ontology Embeddings

3.3. Knowledge Base Completion Task

4. Negative Sampling and Objective Functions

4.1. ELEmbeddings Negative Losses

6. Experiments

6.1. Datasets

6.1.1. Gene Ontology & STRING Data

6.1.2. Food Ontology & GALEN Ontology

6.2. Evaluation Scores and Metrics

Footnotes

Acknowledgments

Declaration of conflicting interests

Funding

Notes

ORCID iDs

Appendix A GO & STRING Data Statistics,Train Part

Appendix B Food Ontology Statistics,Train Part

Appendix C GALEN Ontology Statistics,Train Part

Appendix D Hyperparameters

Appendix E Deductive Closure Computation Example

Appendix F Deductive Closure Computation Soundness

References

2.1. Description Logic ${E L}^{+ +}$