Sage Journals: Discover world-class research

Abstract

Since the 2001 envisioning of the Semantic Web (SW) (Scientific American284(5) (2001) 34–43), the main research focus in SW reasoning has been on the soundness and completeness of reasoners. While these reasoners assume the veracity of input data, the reality is that the Web of data is inherently noisy. Although there has been recent work on noise-tolerant reasoning, it has focused on type inference rather than full RDFS reasoning. Even though RDFS closure generation can be seen as a Knowledge Graph (KG) completion problem, the problem setting is different – making KG embedding techniques that were designed for link prediction not suitable for RDFS reasoning. This paper documents a novel approach that extends noise-tolerance in the SW to full RDFS reasoning. Our embedding technique – that is tailored for RDFS reasoning – consists of layering RDF graphs and encoding them in the form of 3D adjacency matrices where each layer layout forms a graph word. Each input graph and its entailments are then represented as sequences of graph words, and RDFS inference can be formulated as translation of these graph words sequences, achieved through neural machine translation. Our evaluation on LUBM1 synthetic dataset shows $97 %$ validation accuracy and $87.76 %$ on a subset of DBpedia while demonstrating a noise-tolerance unavailable with rule-based reasoners.

Keywords

Deep learning Semantic Web RDFS reasoning noise-tolerance neural machine translation graph words

1. Introduction

The Web is inherently noisy and as such its extension is noisy as well. This noise is as a result of inevitable human error when creating the content, designing the tools that facilitate the data exchange, conceptualizing the ontologies that allow machines to understand the data content, mapping concepts from different ontologies, etc. For instance, noise can be a consequence of building Linked Open Data (LOD) from semi-structured or non-structured data. When LOD is built from non-structured data such as text using Named Entity Linking (NEL) or relation extraction tools – whose accuracy is not perfect – they generate erroneous triples. Thus, the integrity of the inference becomes questionable.

It is foolish to expect that the Web or the SW will ever be free of noise. Many research efforts concentrate on noise detection and data cleansing in the Web of data. Knowing that there will always be other instances or types of noise that will be overlooked, other research efforts focus on noise-tolerance instead. Most of the current work in the latter category targets adding some noise-tolerant reasoning capabilities without aiming for full semantic reasoning.

Humans are able to learn from very few examples while providing explanations for their decision making process. In contrast, deep learning techniques – even though robust to noise and very effective in generalizing across a number of fields including machine vision, natural language understanding, speech recognition etc. – require large amounts of data and are unable to provide explanations for their decisions. Attaining human-level robust reasoning requires combining sound symbolic reasoning with robust connectionist learning as outlined in [96]. “We argue that to face this challenge one first needs a framework in which inductive learning and logical reasoning can be both expressed and their different natures reconciled.” [5,96] However, connectionist learning uses low-level representations – such as embeddings – rather than “symbolic representations used in knowledge representation” [23,55]. This challenge constitutes what is referred to as the Neural-Symbolic gap. The aim of this research is to provide a stepping stone towards bridging the Neural-Symbolic gap specifically in the SW field and RDF Schema (RDFS) reasoning in particular.

This paper documents a novel approach that takes previous research efforts on noise-tolerance to the next level of full RDFS reasoning. The proposed approach utilizes the recent advances in deep learning – that showed robustness to noise in other machine learning applications such as computer vision and natural language understanding – for semantic reasoning. The first step towards bridging the Neural-Symbolic gap for RDFS reasoning is to represent Resource Description Framework (RDF) graphs in a format that can be fed to neural networks. The most intuitive representation to use is graph representation. However, RDF graphs differ from simple graphs as defined in the graph theory in a number of ways. We examine in the literature different graph models for RDF from which we conclude that the proposed models were neither designed for RDFS reasoning requirements nor are they suitable for neural network input. The proposed graph model for RDF consists of layering RDF graphs and encoding them in the form of 3D adjacency matrices. Each layer layout in the 3D adjacency matrices forms what we termed as a graph word. Every input graph and its corresponding inference are then represented as sequences of graph words. The RDFS inference becomes equivalent to the translation of graph words that is achieved through neural network translation.

The evaluation confirms that deep learning can in fact be used to learn RDFS rules from both synthetic as well as real-world SW data while showing noise-tolerance capabilities as opposed to rule-based reasoners.

1.1. Contributions and outline

The main contributions in this paper are:

Noise Intolerance Conditions. In order to illustrate the intolerance of rule-based reasoners to noise in SW data, a taxonomy for noise types in SW data according to the impact of the noise on the inference is drawn. Additionally, the necessary conditions for a noise type to be propagable (i.e. affect the inference) by any RDFS rule is discerned.

Layered Graph Model for RDF. Even though the literature encompasses quite a few propositions for graph models for RDF, none of them were designed for RDFS reasoning specifically. We propose a layered graph model for RDF that fulfills this requirement.

Graph Words. Using the layered graph model, we propose a novel way of representing RDF graphs as a sequence of graph words. The main observation that led to this design is that layers of RDF graphs in a restricted domain are slightly variable.

Graph-to-Graph Learning. By representing RDF graphs as a sequence of graph words, we were able to use neural network translation techniques for translation of graph words. This constitutes a novel approach for graph-to-graph learning.

Full RDFS reasoning with noise tolerance. Our evaluation shows not only comparable results with rule-based reasoners on intact data but also exceptional noise-tolerance compared to them: 99% for the deep reasoner vs 0% (by design) for Jena in the ${UGS}_{100}$ dataset.

In Section 2, we use three aspects to position our research with respect to the related work. Section 3 draws a taxonomy for noise types in SW data and illustrates the process of ground truthing and noise induction for LUBM and a subset of DBpedia – that are used as examples to describe the design of the overall approach. We examine different graph models for RDF and motivate the design of the layered graph model for RDF in Section 4. Then the creation of the RDF tensors and the RDF graph words as well as the description of the graph words translation are presented respectively in Section 5 and Section 6. The results of the experiments are described in the Section 7. In Section 8, we review the related literature in terms of noise-tolerance in the SW, deep learning and the SW and graph embedding techniques – specifically KG embedding. Finally the learned lessons, main contributions and future work are illustrated in Section 9.

2. Background and problem statement

In this section we use three aspects to position our research with respect to related work:

Noise handling strategies: active vs adaptive

Knowledge graph completion categories: schema-guided vs data-driven

Graph embedding output: Node/Edge embedding vs whole-graph embedding.

2.1. Noise handling strategies

We classify the strategies of handling noise in SW data into two categories:

Active noise handling consists of detecting noise and cleansing the data before performing any tasks that might be affected by the presence of noise

Adaptive noise handling the previous category provides solutions that are tailored to certain types of noise as described in Section 8.1.1. Given the unrealistic expectation of cleansing every type of noise in SW data, adaptive noise handling approaches focus rather on building techniques that are noise-tolerant. The research described in this paper falls into this category as we are building a noise-tolerant RDFS reasoner.

2.2. Knowledge graph completion categories

RDFS closure can be seen as a Knowledge Graph Completion (KGC) problem – multi-relational link prediction problem in particular – where each RDFS rule (see Appendix A) generates different types of links:

links between TBox concepts (RDFS10, RDFS11)

links between TBox properties (RDFS5, RDFS6)

links between ABox entities and TBox concepts (RDFS2, RDFS3, RDFS9)

links between ABox entities (RDFS7)

We refer to the RDFS closure computation as schema-guided KGC because the links are generated according to the ontology (TBox), unlike data-driven KGC where the links are predicted based on the analysis of the existing links in the KG. Data-driven KGC models “heavily rely on the connectivity of the existing KG and are best able to predict relationships between existing, well-connected entities” [5,88]. The predicted links from data-driven KGC might be seen as links between ABox entities and thus similar to the (d) case in the schema-guided KGC. However, there is a crucial difference: the generated relation (or link label) by the RDFS7 rule is a super-property that might not be seen in the initial KG as it is defined only in the TBox as a node not even as a link type, whereas all the relations generated by data-driven KGC are necessarily seen before in the initial KG.

2.3. Graph embedding output

Graph embedding approaches can be classified using several criteria. One particular criterion of interest in our survey of the state of the art (Section 8) is the “problem setting” [12]. The problem setting uses the type of graph input as well as the embedding output to classify the embedding approach. For the input part of the problem setting, the graph can be either:

Homogeneous. Where all the nodes are of the same type and all the edges are of the same type as well.

Heterogeneous where there are multiple types of nodes and/or multiple types of edges. This is the case for RDF graphs.

The majority of graph embedding approaches (detailed in Section 8) yield node representation in a low dimensional space. This is why graph embedding and node embedding are often used interchangeably. However, there are other types of graph embedding outputs such as:

Edge embedding. The output in this case is a low dimensional representation of the edges. This is particularly useful in the case of knowledge graphs [10] where the type of edges between nodes is crucial to determine their similarities.

Whole graph embedding the output is a vector representation of the whole graph – not only node or edge vectors. The embedding vectors of similar graphs should be neighbors in the embedding space. The embedding of RDF graphs – in order to learn their inference – falls under this category.

2.4. Problem statement

For learning RDFS reasoning, the whole-graph embedding is required because the input of the learning algorithm is the input graph and the output is the inference graph. However, existing embedding techniques for KGs were not designed for RDFS reasoning and they raise two main challenges if they were to be used for this task.

The first challenge is the need to check the validity of every possible triple using the scoring function $f_{r} (h, t)$ (described in Section 8) in order to generate the full materialization.

The second challenge is the embedding of the relations that are seen only in the inference as the embeddings should be learned only from the input graph and be used to generate the inference graphs. For instance, when the property mastersDegreeFrom in LUBM appears in the input graph, its super-property degreeFrom appears in the inference graph by applying the RDFS7 rule [37]. If degreeFrom was not seen in the input graph then its embedding was not learned.

The baseline experiments detailed in Section 7.3 illustrate these challenges empirically.

3. Ground truthing and noise induction

For this research, the input is from one of two types of datasets: a synthetic dataset from LUBM and a real-world dataset from DBpedia [3]. The inferential target for these datasets is set using a rule-based SW reasoner (Jena [15]). Essentially, the goal for the deep reasoner is to learn the mapping between input RDF graphs and their entailed graphs in the presence of noise. Thus, noise was induced in the synthetic dataset to test the noise-tolerance of the deep reasoner.

3.1. Taxonomy of Semantic Web noise types

Fig. 1.

Semantic Web noise taxonomy.

The literature contains a few taxonomies [41,103] for the types of noise that can impact RDF graphs; however they are not drawn with respect to the impact of the noise on the inference. The taxonomy illustrated in Figure 1 serves this purpose. It should be noted that the propagation of noise is dependent on the inference rule.

TBox Noise is the type of noise that resides within the ontology, such as in the class hierarchy or domain and range properties. This type of noise impacts inference over the whole dataset. For example, in the DBpedia ontology, the property dbo:field has domain dbo:Artist which implies that every scientist in the DBpedia dataset who has a dbo:field property (such as dbr:Artificial_intelligence or dbr:Semantic_Web) will be labelled a dbo:Artist after inference. Reasoning with tolerance to TBox noise is outside the scope of this research for the following reason: the use of rule-based reasoners for ground truthing with noise in the TBox biases the whole ground truth, which makes noisy inferences omnipresent and not just anomalies that can be detected and fixed.

The following assumption is made in order to scope this research within a manageable framework:

Assumption 1 (Noise locality).

The noise is latent only in the ABox, but the TBox is devoid of noise.

Definition 1 (Triple corruption).

The process of morphing an existing triple in an RDF graph by changing one of the triples’ resources. This can result in either propagable or non-propagable noise.

Definition 2 (Non-propagable noise).

Any corrupted triple in the input graph that does not have any impact on the inference.

This can occur at least in these cases:

The original triple does not generate any inference nor does the corrupted triple.

The original triple does not generate any triple but the corrupted triple generates an inference that is generated also by another triple in the input graph. (For example if the corrupted triple is equal to another triple in the graph)

The original triple and the corrupted triple generate the exact same inference.

The corrupted triple generates a set of triples that is a proper subset of the set of triples generated by the original triple. However the difference between the two sets is also generated by other input triples.

Definition 3 (Propagable noise).

Any corrupted triple in the input graph that changes the inference.

In order to discern the necessary conditions for RDFS rules to propagate noise, first the input patterns of the premises of the RDFS rules [37] are classified as TBox pattern or ABox patterns (Appendix B). The rules that have only TBox type patterns, such as RDFS5 (which defines the properties hierarchy) and RDFS11 (which defines the class hierarchy), are excluded because any corruption of triples matching these patterns will induce TBox noise. For the remaining rules that have both TBox and ABox patterns (i.e. RDFS2, RDFS3, RDFS7 and RDFS9), only the ABox triple can be corrupted. In Table 2, the necessary conditions for RDFS9 rule (Table 1) to generate a noisy inference from a corrupted triple are identified. In plain English, the RDFS9 rule will generate a noisy inference if and only if the corrupted type $x^{'}$ has a super-class $y^{'}$ defined in the ontology and the original type x either does not have a super-class or $y^{'}$ is not a super-class of x. The necessary conditions for the remaining rules are listed in Appendix C.

Table 1
RDFS9 rule from [37]

Table 2

Propagable noise by rule-based RDFS reasoners for RDFS9 rule

3.1.1. Mapping to the DBpedia noise taxonomy drawn in “User-driven quality evaluation of DBpedia” [103]

In [103], the authors examine different types of noise in the DBpedia instances. Thus all the categories of noise presented are of type ABox noise. Every category of noise can either be propagable or non-propagable depending on the property in the noisy triple. For example, in Object value is incorrectly extracted [103], the noise can be propagable if the corresponding property has any super-properties defined in the ontology and non-propagable otherwise – such as in the following example provided in the paper:

where the property dbpprop:map does not have any super-properties in the DBpedia ontology.

3.2. Ground-truthing in LUBM1

Lehigh University Benchmark (LUBM) [33] is a benchmark for SW repositories. The LUBM ontology conceptualizes 42 classes from the academic domain and 28 properties describing these classes’ relationships. LUBM1, an RDF graph of one hundred thousand triples, was generated according to this ontology, and contains $17, 189$ subject-resources within 15 classes (for instance $5, 999$ resources of type ub:Publication and 15 resources of type ub:Department). Let R be the set of these subject-resources. For each resource r in R, a graph g is built by running the SPARQL DESCRIBE query. Appendix D contains the graph description of the resource GraduateStudent9. Let G be the set of graphs g obtained after this step. For each graph g in G, the RDFS inferential closure is generated according to the LUBM ontology using Jena. Let I be the set of inference graphs. Appendix E contains the inference graph of the input graph in Listing 1. Finally, G and I are split into $60 %$ training (G_train,I_train), $20 %$ validation (G_val, I_val) and $20 %$ (G_test, I_test) testing sets using a stratified splitting technique where the resource class is used as the label for the stratification. The goal of the stratification is to have the required percentage of each resource type in the training and test sets. Otherwise there is a risk of having all the small classes in the training set, which will mistakenly inflate the accuracy. For instance, the stratified split leads to 9 graphs describing resources of type ub:Department in the training set and 3 graphs describing resources of the same type in validation and test sets respectively. The input of the supervised learning algorithm is the set of graphs G_train, the target is their corresponding inference graphs I_train and the goal is to learn the inference generation.

3.2.1. Noise induction in LUBM1

In [104], a methodology for noise induction in LUBM was proposed in which three datasets were constructed by corrupting type assertions according to a given noise level.

RATA

Instances of type TeachingAssistant were corrupted to be of type ResearchAssistant. This type of noise is non-propagable because both concepts, TeachingAssistant and ResearchAssistant, are sub-classes of the concept Person.

UGS

Instances of type GraduateStudent were corrupted to be of type University. This type of noise is propagable by the RDFS rule RDFS9 because these concepts are not siblings. A rule reasoner will generate a noisy inference by deducing that the student instance is of type Organization which is the super-class of University.

GCC

Instances of type Course were corrupted to be of type GraduateCourse. This type of noise is also non-propagable.

As [104] focus only on noisy type assertions, two additional datasets were created with noisy property assertions for the purpose of this research. TEPA

The property publicationAuthor is corrupted to be teachingAssistantOf. This noise is propagable by the RDFS rules RDFS2 and RDFS3 as the two properties have different domains and ranges.

WOAD

The property advisor is corrupted to be worksFor. This noise is non-propagable as the property worksFor does not have any domain or range specification in the LUBM ontology, but by removing the property advisor the type inference that was made about the student and advisor is lost.

3.3. Ground truthing the scientist dataset from DBpedia

From DBpedia [7], a dataset of scientists’ descriptions was built; $25, 760$ URIs for scientists’ descriptions were retrieved. In order to diversify the types of classes in the scientists dataset, a few other classes that are related to the Scientist concept in DBpedia were also collected, namely: EducationalInstitution, Place and Award. Table 3 lists the number of resources per class in the scientist dataset. The total number of triples obtained in the scientists dataset is $≃ 5.5$ million. No artificial noise was induced in this dataset as it already has pre-existing noise. An example of noisy type assertion is the resource dbr:United_States being of type dbo:Person. There are $1, 761$ resources in DBpedia that are of types dbo:Person and dbo:Place simultaneously, which obviously indicates that one of them is a noisy triple.

Table 3
Number of resources per class in the scientists dataset

Class Number of resources

dbo:Scientist 25,760

dbo:Place 22,035

dbo:EducationalInstitution 6,048

dbo:Award 1,166

Class	Number of resources
dbo:Scientist	25,760
dbo:Place	22,035
dbo:EducationalInstitution	6,048
dbo:Award	1,166

4. Layered graph model for RDF

Despite its effectiveness as a standardized “framework for representing information in the Web” [22] and as an essential building block for the SW, the graph representation for the RDF model remains an open question in the SW research community. Even though the RDF conceptual model is designed as a graph, it differs from the graph theory definition of graphs in a number of ways. RDF graphs are heterogeneous multigraphs. Moreover, an edge in the ABox can be a node in the TBox (describing the properties hierarchy for example). Current research efforts to represent the RDF model as graphs – based on a: bipartite graph model [36], hypergraph model [55,63,100] or metagraph models [16] – target different goals ranging from storing and querying RDF graphs to reducing space and time complexity to solving the reification and provenance problem. Unfortunately, these goals do not coincide with RDFS reasoning. Moreover they use complex graph models which are not suitable for neural network input.

This paper describes a layered graph model that uses simple directed graphs to achieve the goal of representing RDF graphs and their inference graphs according to the RDFS rules. It is important to note that the mapping between RDF to the proposed layered model is irreversible – meaning that the reconstruction of the original RDF graph is not guaranteed. Thus, the layered graph model is not suitable for storing and querying RDF data.

4.1. Notations and definitions

In Appendix B, the premises of RDFS rules were classified into ABox patterns and TBox patterns.

Definition 4.
TBox rule is a rule where its premises are all of type TBox pattern.

The Tbox rules in RDFS are:
RDFS5: the subPropertyOf transitivity rule

RDFS6: the subPropertyOf reflexivity rule

RDFS11: the subClassOf transitivity rule

RDFS10: the subClassOf reflexivity rule
As these rules’ patterns are present in the ontological level and there is only one ontology per training set, there are not enough samples to learn these rules. Thus, it is assumed that there is a materialized version of the ontology where the TBox rules are already applied. This materialized version is inferred only once and is part of the training input. Let:

O: be the materialized ontology.

P: be the set of properties in O.

$P^{+} = P$ ∪ {rdf:type}

$n p$ : be the size of the set $P^{+}$

$(p_{1}, p_{2}, \dots, p_{n p})$ : be a tuple of the elements of $P^{+}$ (It is crucial to maintain the same order of elements in this tuple throughout the training process)

SubjObj(T): be the set of subject and object resources of the RDF graph T (formally defined in Appendix F).
Definition 5.
A Layered directed graph is a graph that has multiple sets of directed edges where each layer has its own set of edges.

An n-layered directed graph is a layered directed graph of n layers. More formally, an n-layered directed graph is defined as:

$G (V, (E_{1}, E_{2}, \dots, E_{n}))$ where the edges part is a tuple containing n sets of directed edges.
Definition 6.
Layered directed graph for RDF:

An RDF graph T is represented by a layered directed graph:

$G (SubjObj (T), (E_{1}, E_{2}, \dots, E_{n p}))$ where: $\begin{matrix} (e_{i}, e_{j}) \in E_{l} ⟺ \{\begin{matrix} (e_{i}, p_{l}, e_{j}) \in T \\ e_{i} \in SubjObj (T) \\ e_{j} \in SubjObj (T) \end{matrix} \end{matrix}$

It is important to note that the transformation of an RDF graph into its layered directed graph representation is not bijective as two non-isomorphic RDF graphs can have the same layered directed graph representation.
Proof by construction.
Let T be an RDF graph and $L_{T}$ be its layered directed graph representation according to the ontology O and its tuple of properties $(p_{1}, p_{2}, \dots, p_{n p})$ . If $(s, p, o) \notin T$ and $p \notin (p_{1}, p_{2}, \dots, p_{n p})$ then the RDF graph $T' = T \cup (s, p, o)$ is not isomorphic to T but has the same representation $L_{T}$ . □

However this transformation guarantees that if two RDF graphs have the same layered directed graph representation then their RDFS inference graphs according to the ontology O are isomorphic.

Appendix G lists two examples of layered RDF graphs: one for the graph description of the resource GraduateStudent9 and one for its corresponding inference graph.
5. RDF tensors, and the graph words embedding

The way the generic methodology of supervised machine learning is applied in this work is depicted in Fig. 2, where the pair (input, target) is the input graph and its corresponding inference. In a nutshell, tensors representing the input graph g and its corresponding inference i are created. The tensors of these graphs are then used in the training phase. The algorithm outputs the tensor of the graph and its encoding dictionary that will be used in the decoding phase to regenerate the original graph. In addition to preparing the RDF graph for input into a neural network, the main goal of this phase is to capture the pattern similarities between graphs in such a way that “similar” graphs will have similar tensors. An example of “similar” graphs is: two graphs containing RDF descriptions of two resources of the same type (such as two Publications’ descriptions in the LUBM1 dataset).

Fig. 2.

Encoding/decoding in training and inference phases.

5.1. Tensor creation

The goal of this phase is to use the layered graph model for RDF in creating RDF tensors. Each RDF graph will be represented as a 3D adjacency matrix, where each layer is the adjacency matrix relative to one property (Figure 3). An ID must be assigned to each resource in the RDF graph to allow it to be represented as a 3D adjacency matrix. The process of assigning these IDs for the input graphs and their corresponding inference graphs must satisfy the following requirement:

Fig. 3.

3D adjacency matrix.

It is mandatory that the encoding dictionary for a given graph g contains all the possible resources that might be used in its corresponding inference graph i.

The proof for this requirement is detailed in Appendix H.

5.1.1. Simplified version

In the simplified version of the algorithm, two dictionaries were created: one for the subject and object IDs – which is split into a global and a local dictionary – and one for the property IDs. The global resources dictionary contains the subject and object resources that are used throughout the G set (which are basically the RDFS classes in the ontology). The local_resources_dictionary is created incrementally during the encoding routine for each graph g in G. It holds the IDs of the resources that are not present in the global_resources_dictionary. The local_resources_dictionary is populated with an offset equal to the length of global_resources_dictionary – that is, 57 in the case of LUBM1. The largest ID in the local_resources_dictionary for every graph in G is less than 80. This value is used to initialize the size of the 3D adjacency matrix. These dictionaries are then used in the encoding routine to transform a layered graph representation into an RDF tensor and vice-versa in the decoding routine. The details of the encoding/decoding algorithms are in Appendix I.

The previously stated goal – capturing the pattern similarities between graphs describing resources of the same type – can be achieved by this simplified encoding technique when the cardinality of each property is variable within a small range. For instance, in LUBM1, students take more or less the same number of courses, and a publication has between one to seven authors. To get the full list of these statistics, the following SPARQL query is run:

The inner query counts the number of objects per property per class and the outer query concatenates the possible values. Appendix M contains a sample of these statistics in LUBM1.

Alas, this is not the case in real-world knowledge graphs such as DBpedia, where even graphs describing resources of the same type differ widely. For example the DBpedia graph describing Professor James Hendler [24] has 40 objects for the property RDFt including owl:Thing, foaf:Person, dbo:Person, dul:Agent, dbo:Agent, dbo:Scientist, schema:Person, yago:Scholar110557854, etc. Out of these 40 objects, 12 are in the global_resources_dictionary because they are concepts in the DBpedia ontology and the other 28 objects will populate the local_resources_dictionary. In contrast, the DBpedia graph describing Professor Yoshua Bengio [25] has only 12 links for the property RDFt and all of the objects are in global_resources_dictionary. This implies that the RDFt layers in the 3D adjacency matrices for Professor Hendler and Yoshua Bengio graphs will be very different. In fact all the subsequent layers will be very different. For instance, when encoding the layer of the property dbo:almaMater for Professor Hendler’s graph, the resources dbr:Brown_University, dbr:Southern_Methodist_University and dbr:Yale_University will have IDs 29, 30 and 31 respectively as there is already 28 resources in the local_resources_dictionary. When encoding the same layer for Professor Bengio’s graph, the resource dbr:McGill_University will have ID 1 as the corresponding local_resources_dictionary is still empty. Consequently, this has a domino effect on the rest of the layers. To overcome this limitation, a more advanced tensor creation method was necessary to capture the patterns of real-world knowledge graphs.

5.1.2. Advanced version

The main idea of the advanced encoding/decoding technique is to create a local_resources_dictionary per layer instead of a local_resources_dictionary for the whole graph being encoded. While this may seem sufficient to overcome the limitation of the simple encoding technique, a few challenges in the encoding of the inference graphs as well as in the decoding phase for both the input and the inference graphs are encountered. The details of these challenges and the proposed solutions for the advanced tensor creation technique are detailed in Appendix J. When using the advanced encoding technique, the number of properties is actually the number of “active” properties – where active properties are the set of properties in the T-Box that are used in the A-Box. This reduces the size of the 3D adjacency matrices dramatically especially in the case of the Scientists dataset where only a small subset of the DBpedia properties are used.

5.2. Graph words

At this stage, every RDF graph is represented as a 3D adjacency matrix of size: (number_of_properties, max_number_of_resources, max_number_of_resources) where each layer represents an adjacency matrix according to one property.

In theory the maximum number of possible layer layouts in a dataset of size dataset_size is: $\begin{array}{l} min (2^{\max_nb_{resources}^{2}}, \\ dataset_size * nb_properties) \end{array}$ When encoding an RDF graph from the LUBM1 dataset – which contains 17,189 RDF graphs – a 3D adjacency matrix of size $(18 \times 800 \times 800)$ is obtained. Where 18 is the size of active properties set in LUBM1 and 800 is the maximum number of resources per graph. The maximum number of layer layouts is equal to: $minimum (2^{800^{2}} ≃ 10^{192, 659}, 18 * 17, 189) = 309, 402$ possible layouts. However, the actual number of layouts when encoding LUBM1 is much smaller than this theoretical bound (131 and 490 for the sets G and I respectively). This observation is a good indication that the encoding algorithm has achieved one of its major goals of having similar encodings for “similar” graphs. Let ${Catalog}_{G}$ and ${Catalog}_{I}$ be the layers’ catalogs for the sets G and I respectively where each layout is assigned an ID. The 3D adjacency matrix can now be represented as a sequence of layouts’ IDs as shown in Figure 4. The layouts in the catalogs are termed “graph words”, as the sequence (or phrase) of graph words represents a 3D adjacency matrix and thus an RDF graph. Representing an RDF graph as a sequence of graph words has two main advantages:

Fig. 4.

From a 3D adjacency matrix to a sentence of graph words.

Reducing the size of the encoded dataset: only the ID of the layer’s layout along with a catalog of layouts is saved.

Exploitation of the research results in neural machine translation.

6. Graph words translation for RDFS reasoning

At this stage, there is a parallel corpus of graph words for the input and inference graphs. This representation has the following drawbacks: difficulty handling “unknown” graph words and insensitivity to graph word similarities. Unknown graph words can be encountered when a graph word is seen only in the test set but not during the training phase; when inducing noise, most of the graph words will be unknown. A common technique in Natural Language Processing (NLP) is to assign the same ID for unknown words, which is not a significant deterrent to success in most learning tasks involving natural language. However, in our case if the same ID is assigned to every unknown graph word, then the learning process will be compromised and will not generate the exact inference. (Briefly, the proof by construction – that the use of the same ID for every unknown graph word is deterrent to learning the graph words translation – consists of building two graphs having the same input representations but having different targets.)

Fig. 5.

Graph words translation model for LUBM1.

By encoding an RDF graph as a sequence of layers where each layer contains an adjacency matrix of a directed graph according to one property, homogeneous graph embedding algorithms can be used on each layer. The High-Order Proximity preserved Embedding (HOPE) algorithm [70] was used as it had the best reconstruction accuracy when tested on the catalog of graph words. The graph words embedding also solves the problem of capturing the similarities between graph words.

By representing the RDF graph input as well as its corresponding inference graph by two sequences of graph words, the RDFS inference becomes equivalent to the translation of graph words. Thus, Neural Machine Translation (NMT) models can be applied to learn the RDFS inference generation. NMT models typical architecture consisted of [17] Recurrent Neural Network (RNN) Encoder-Decoder where the encoder RNN transforms the sequence of words from the input sentence into a fixed-length hidden representation and the decoder RNN generates the target sentence from the hidden representation. More recent architectures that used convolutional networks for NMT such as [29] outperformed RNN based architectures in terms of accuracy and training speed.

For designing the graph words translation model, we used keras [18] with TensorFlow [1] backend. It is basically a sequence-to-sequence model [93] with a Bidirectional Recurrent Neural Network (BRNN) [85] encoder. The overall architecture of the model is depicted in Figure 5. The input layer consists of a tensor of shape $(18 \times 3200)$ where 18 is the size of the graph words’ sequence – which is the size of the active properties set in LUBM1. Each graph word represents a layout for an adjacency matrix of size $(800 \times 800)$ . When embedding the adjacency matrices using HOPE embedding technique, we chose an embedding dimension of 4. Hence the second dimension in the tensor $3, 200 = 800 * 4$ . The second layer graph_input_dense is a densely-connected layer which transforms each graph word embedding into a vector of size 256. The gru_sequence_encoder is a Gated Recurrent Unit (GRU) [19] that transforms the sequence into the hidden representation of size 128. The bidirectional layer feeds the sequence in positive and negative time direction to the GRU, hence the size 256. The sequence_decoder layer decodes the hidden representation into a sequence of size 18. The softmax layer is a densely-connected layer with softmax activation. The output of this layer is of size 490 which is the size of the inference graph words layers’ catalogue. The TimeDistributed layer applies the softmax_layer on the 18 sequence elements of the previous layer to output a sequence of 18 graph words. The dropout layers have a dropout factor of 0.2 and are introduced in the model architecture to prevent overfitting and improve generalization.

For the training phase, we used Adam [48] optimizer – with a learning rate of 0.001, a first moment decay rate of 0.9 and a second moment decay rate of 0.999 – and a categorical cross-entropy for the loss.

7. Evaluation

7.1. Hardware setup

The training was done on a server, which has four Tesla K40m NVIDIA Graphics Processing Unit (GPU)s. Each GPU has 2880 Compute Unified Device Architecture (CUDA) cores and 12 GB of memory. The models were trained using all the GPUs in parallel.

7.2. Data analysis

In this section, we perform a statistical analysis on the training data for both LUBM1 and the scientists’ dataset. This analysis is based on the relations’ distribution across the training input and inference graphs. The motive behind this analysis is to get an insight to the performance of type prediction approaches on these datasets in comparison with our proposed approach.

7.2.1. LUBM1 data analysis

When we consider all the triples in the inference of LUBM1 across all the graphs, there are $130, 377$ triples with the property rdf:type which constitute $94.17 %$ of the total number of generated triples. The remaining two properties in the inference ub:degreeFrom and ub:memberOf materialize in $6, 988$ triples ( $5.05 %$ ) and $1, 080$ triples ( $0.78 %$ ) respectively. However, when we consider the distribution of these two properties among the $17, 189$ graphs in the set G, it becomes apparent that they are spread across a larger portion of the graphs ( $19.84 %$ of the inference graphs contain the property ub:degreeFrom and/or ub:memberOf while the remaining $80.1 %$ graphs contain only triples with the property rdf:type). The Venn diagram in Figure 6 illustrates the distribution of properties in LUBM1 inference. Such distribution is due to the frequency of the properties in the inferred graphs. As shown in Figure 7, every graph in LUBM1 inference contain at least 3 triples with the property rdf:type and more than $50 %$ of the graphs contain between 5 and 97 triples with this property. On the other hand, $50 %$ of the graphs containing the property ub:degreeFrom have only one triple with this property and $75 %$ of the graphs containing the property ub:memberOf have only one triple with this property. Figure 8 illustrates the frequency of each property in the input and inference graphs of LUBM1 as well as the entailment rules that generated the triples containing these properties.

Fig. 6.

The distribution of properties in LUBM1 inference.

Fig. 7.

Frequency of the properties in LUBM1 inference.

Given these statistics about the frequency of properties in LUBM1 inference, it can be concluded that the upper bound of accuracy for type prediction systems is around $80 %$ . Meaning that the perfect type predictor system can get a per-graph accuracy of $80 %$ at most. This is due to the fact that approximately $20 %$ of the inference graphs contain at least one property that is not rdf:type – which is outside the scope of type prediction systems. This percentage is even dramatically higher in the Scientists dataset as presented in the following section.

Fig. 8.

Distribution of properties in input and inference graphs in LUBM.

7.2.2. Scientists dataset analysis

The Scientist’s dataset contains more than fifty thousands graphs. When considering all triples in the inference of these graphs, there are $3, 238, 260$ triples with the property rdf:type which constitutes $83.26 %$ . The remaining $651, 028$ triples ( $16.74 %$ ) contain one of the 33 properties of the inference (Figure 9). Similarly to LUBM1, the frequency of the triples with rdf:type across the inference graphs is much higher than the other properties. For instance, more than $75 %$ of the inference graphs contain between 43 and 491 triples with the property rdf:type while $75 %$ of the inference graphs contain in total less than 16 triples with the remaining properties. However, unlike LUBM1, the dispersion of the remaining properties across the inference graphs is much higher – $96.74 %$ of the inference graph contain at least one triple with a property that is not rdf:type. This analysis sets the upper bound limit for type prediction systems at $3.26 %$ per-graph accuracy for the Scientists dataset.

7.3. Baseline experiments

As discussed previously in 2.4, existing KG embedding techniques that were designed for data-driven Knowledge Graph Completion (KGC) are not suitable for RDFS reasoning for the following reasons:

For learning RDFS reasoning, the whole-graph embedding is required rather than node/edge embeddings.

The closure computation requires checking the validity of every possible triple using the scoring function.

More importantly, any relation that is seen only in the inference (for instance generated by the RDFS7 rule [37]) cannot be learned from the input graph.

On the other hand, these techniques are more suitable to learn from the whole KG at once and there is no need to partition the KG into subgraphs containing resource descriptions as in our approach.

In order to set the baseline and provide empirical evidence of the previous claims about using KG embedding techniques for RDFS reasoning, we run the following experiments: the embedding of LUBM1( $100, 867$ triples containing $26, 454$ resources and 18 properties) is computed using 3 embedding techniques – namely TransH [98], ComplEx [94] and HolE [65] by utilizing the OpenKE toolkit [101]. OpenKE also provides a binary classifier to check the validity of triples. In order to generate the full materialization of LUBM1 using these techniques, the set of all possible triples – containing $26, 454^{2} * 18 \approx 12$ billion triples – need to be generated. Then the classifier can be used to check the validity of each possible triple. To make the experiment more manageable, we instead generate $100, 000$ random negative triples that are neither part of the input graph nor the inference graph and the classifier is used to check the validity of these random triples as well as the LUBM1 inference ( $31, 612$ triples). The results of these experiments are presented in Figure 10 (the higher the better). Given that our approach is generative – that generates the inference graph from the input graph – rather than via binary classification of valid triples, it assumes validity of the input graph and thus omitted from Figure 10(a). The three embedding techniques were able to validate the triples in the input training graph (Figure 10(a)) and to invalidate the negative triples (Figure 10(c)), but were not able to validate most of the triples in the inference graph (Figure 10(b)). HolE performs better than existing KG embedding techniques at $48.51 %$ but did not improve with more training epochs (vs $97.7 %$ for our approach on the test set).

7.4. Evaluation on intact LUBM1

Figure 11 shows the training process on the LUBM1 dataset. After approximately 12 minutes of training, $98.8 %$ training accuracy was achieved. When testing the trained model on the intact LUBM1 test set, an overall per-graph accuracy of $97.7 %$ was obtained.

Fig. 9.

Frequency of the properties in the scientists dataset inference.

Fig. 10.

RDFS inference with KG embedding techniques?

Table 4 presents the overall per-triple precision and recall as well as the breakdown of these metrics for each property in the inference. The precision and recall are much higher for the properties rdf:type and ub:degreeFrom compared with ub:memberOf. This can be explained by the fact that the property ub:memberOf is much less frequent in the LUBM1 training set of – which contains only one university.

7.5. Evaluation on noisy LUBM1 data

In this experiment, the trained model was tested on the noisy datasets created as described in Section 3. Two metrics were designed:

Macroscopic metric: Per-graph accuracy: Inferences in this metric (Depicted in Figure 13) are scored correct when $d r$ and i are isomorphic – in other words, when the deep reasoner inference from the corrupted graph is isomorphic to the Jena inference from the intact graph.

Microscopic metric: Per-triple precision/recall. The previous metric overlooks the fact that some triples, generated by the deep reasoner and not by Jena, were in fact valid. In this metric, three materialization graphs are generated:

Jena materialization from the intact graphs (J)

The deep reasoner materialization from the corrupted graphs ( $D R$ )

An OWL-RL [64] materialization of LUBM1 to check the validity of the false positives from the deep reasoner.

Let V be the set of valid false positive triples. The quasi-confusion metric is computed as shown in Figure 14. The macro and micro evaluation on the 5 noisy datasets (Fig. 12) shows exceptional noise-tolerance compared to rule-based reasoners: 99% for the deep reasoner vs 0% (by design) for Jena in the

{UGS}_{100}

dataset.

It is counter-intuitive that our approach performs better on propagable noise such as UGS compared with non-propagable noise such as GCC. This can be explained by the fact that the propagable noise case is very unlikely and there are no “similar” graphs in the training set. For instance, in UGS a ub:GraduateStudent is corrupted to be of type University, which makes the university take a course in the input graph. Being more “similar” to students’ graphs, it is inferred to be of type ub:Person rather than ub:Organizartion. This is not the case for non-propagable noise such as GCC where a ub:Course is corrupted to be of type ub:GraduateCourse, which is likely – i.e. “similar” graphs can exist in the training set.

7.6. Evaluation on the scientists dataset

The model used for the scientists dataset is like the LUBM1 model, except for the hyper-parameters. Training to a validation accuracy of $87.76 %$ takes over 16 hours (Figure 15). The person-place examples were used for noise-tolerance evaluation; out of the $1, 761$ noisy examples of person-place in DBpedia, the ‘scientists’ dataset contains 94. Unlike the LUBM1 case – where training was done on intact data and testing on controlled noisy data – ‘scientists’ training data was noisy. When an input graph has a resource of type dbo:Person, Jena infers that it’s also of type dbo:Agent since dbo:Person is a subclass of dbo:Agent. For the person-place graphs, this constitutes noise propagation because dbo:Agent and dbo:Place are disjoint classes. To evaluate noise-tolerance in the deep reasoner, a test is run to check whether it inferred that a person-place is of type dbo:Agent. Of the 94 examples, 6 inferences only contain this noisy inference. However, some of the remaining 88 inferences either had false positives or missed valid triples inferred by Jena. 38 inference graphs were perfect, containing exactly the inference from Jena minus the noisy triple; examples include: dbr:Socialist_Republic_of_Croatia, dbr:Teylers_Museum and dbr:Meta_River. These make up $40 %$ of the noisy examples. For the remaining person-place inferences, a few contain “false positive” triples not generated by Jena. For example, the deep reasoner inference from the dbr:Big_Ben graph, missed these two triples compared to Jena:

(the first should be missed), and generated the following extra triple:

Fig. 11.

Training results on intact LUBM1 data.

Table 4

Per-property precision and recall on LUBM1 test set

Property	True Positives	False Positives	False Negatives	Precision	Recall
rdf:type	13,642	602	725	95.77%	94.95%
ub:degreeFrom	1,392	0	3	100%	99.78%
ub:memberOf	110	32	104	77.46%	51.4%
Overall	15,144	634	832	95.98%	94.79%

It should be noted that this information is not explicitly (i.e. embedded in the DBpedia graph of the resource dbr:Big_Ben) nor implicitly (i.e. can be inferred). It is therefore counted as false positive even though it “makes sense”. The deep reasoner inferred this information by capturing the generalization that resources with similar links to dbr:Big_Ben are usually of type dbo:HistoricPlace.

Fig. 12.

Macro and micro evaluation on noisy LUBM1 datasets.

Fig. 13.

Macroscopic metric: per-graph accuracy.

Fig. 14.

Refined confusion matrix.

Fig. 15.

Training results on DBPpedia scientists dataset.

8. Related work

In this section we review the state of the art in terms of:

Handling noise in SW data

Graph embedding (KG embedding in particular)

Approximate semantic reasoning

Deep learning for semantic reasoning

8.1. Handling noise in Semantic Web data

The strategies for handling noise in SW data can either be active or adaptive:

8.1.1. Active noise handling

Most of the work in this category focuses on detecting and fixing noisy data in the LOD. LOD can be created using structured, semi-structured or non structured data. DBpedia [3], for example, is created from semi-structured Wikipedia articles. Non structured texts can also feed NEL tools to create LOD. These two methodologies are more likely to generate noisy triples due to the non perfect accuracy of NEL tools.

In [75], the authors describe two algorithms that they designed to improve the quality of LOD. The SDType algorithm falls into the category of adaptive noise handling and will be described in the corresponding section. SDValidate identifies wrong triples when there is a large deviation between the resource types. The main idea of this algorithm is to assign a relative predicate frequency – describing the frequency of predicate/object combinations – for every statement. Probability distributions are then used to decide if a statement with low relative predicate frequency should be considered erroneous. Both algorithms are validated on DBpedia and Never-Ending Language Learning (NELL) [14] knowledge bases.

In [104], the authors focus on detecting noisy type assertions. They built a few synthetic noisy datasets based on LUBM. Then a multi-class classifier is trained to learn disjoint classes.

In [27 ,99], the focus is on incorrect numerical data in LOD datasets. [27] uses a two phase detection approach. In the first phase, outliers of numerical values are detected for every property and in the second phase, the owl:sameAs property is used to confirm or reject the outliers. [99] uses a few unsupervised learning techniques including Kernel Density Estimation (KDE) [73] combined with semantic grouping to identify the outliers.

8.1.2. Adaptive noise handling

Given the unrealistic expectation of cleansing every type of noise in SW data, adaptive noise handling approaches focus rather on building techniques that are noise-tolerant. In the SDType algorithm [74,75], the rdf:type inference uses information from the ABox rather than ontological descriptions from the TBox. For instance, instead of using the rdfs:domain and rdfs:range of the properties to infer the resources’ types, which will propagate noise, a weighted voting heuristic is used instead to determine the types of the resources. The weights are generated from the statistical distribution between predicates and types. For example, given that the property dbo:location is mostly connected to objects of type dbo:Place, then this property will have high weight to infer the type dbo:Place.

To the best of our knowledge, most of the previous work in the literature about reasoning with noisy SW data focuses on type inference. This research is the first to aim at full RDFS reasoning with noise-tolerance capability.

8.2. Graph embedding

This review is partially based on three recent surveys of graph embedding techniques and their applications [12,31,97]. We update the latter survey by including the work on RDF graph embedding. The authors of [31] also provide an open source Python library (Graph Embedding Methods) for graph embedding comparison that we used to compare the discussed embedding techniques on RDF graphs.

It is needless to stress the omnipresence of graph based representations for research problems and real world applications ranging from social network analysis to recommendation systems to protein interaction networks to knowledge graphs and SW graphs in particular. This can be considered as the main motive for graph analytics research. Graph analytics tasks include centrality analysis, nodes classification [6], link prediction [52] etc. The latter is the closest to our research because the inference RDF graph can be seen as the link prediction applied to the input graph.

8.2.1. Why embedding graphs?

In performing the previous tasks of graph analytics, two of the main challenges – especially when processing large scale graphs – are size and time complexity. One technique that tackles these challenges is graph embedding. In a nutshell, the embedding consists of finding a mapping from the original space to a continuous vector space of lower dimension while preserving certain required properties. In graph embedding, the desired properties to be preserved can be node proximity, node similarities or dissimilarities, structural proximity etc.

8.2.2. How to embed graphs?

In order to briefly describe the embedding process, a few preliminary notions from [31] should be introduced. Let:

S be the adjacency matrix of the graph $G (V, E)$ where: $\begin{matrix} s_{i, j} = \{\begin{matrix} 0 & if the nodes v_{i} and v_{j} \\ are not connected \\ w_{i, j} & the weight of the edge e_{i, j} \end{matrix} \end{matrix}$

The first-order proximity between two nodes is defined as the weight of their edge.

The second-order proximity between two nodes is defined by the similarity between their respective immediate neighbors. More formally, let $s_{i}$ and $s_{j}$ be the ith and jth row vectors of the adjacency matrix respectively. These row vectors represent the first-order proximity between a given node and all the other nodes of the graph. The distance between $s_{i}$ and $s_{j}$ represents the second-order proximity between the nodes $v_{i}$ and $v_{j}$ .

Similarly, higher order proximity can be defined using the second-order proximity. Using these preliminary notions, [31] defines graph embedding as:

Given a graph $G = (V, E)$ , a graph embedding is a mapping $f : v_{i} \to y_{i} \in R^{d}$ $\forall i \in [n]$ such that $d ≪ | V |$ and the function f preserves some proximity measure defined on graph G. [31,96]

[n]

denotes the set of indices

{1, 2, \dots, n}

8.2.3. Graph embedding methods

Based on the techniques used to compute such embeddings, a taxonomy for graph embedding approaches can be drawn:

Matrix factorization methods Matrix factorization consists of decomposing a matrix into two or more matrices where their product regenerates the original matrix. Graph embedding techniques using matrix factorization start by generating a matrix representation of the graph and then compute the factorization to obtain the embedding. In its simplest form, the matrix representation of the graph can just be the nodes’ adjacency matrix S. Other matrix representations of the graph include the Laplacian matrix [4] and the Katz similarity matrix [46], which measure the nodes’ centrality. A few examples of graph embedding approaches using matrix factorization are: Locally Linear Embedding [81], Graph factorization [2] and HOPE [71]. The authors of the HOPE algorithm aimed to preserve the asymmetric transitivity property, which is an important property in directed graphs. The feature of preserving the asymmetric transitivity is desirable in RDF graphs embedding as the rdfs:subPropertyOf and rdfs:subClassOf are asymmetric transitive properties. In order to speedup the matrix factorization of sparse matrices, the authors of HOPE use singular-value decomposition.

Random walks methods [58] defines random walks on graphs by:

Given a graph and a starting point, we select a neighbor of it at random, and move to this neighbor; then we select a neighbor of this point at random, and move to it etc. The (random) sequence of points selected this way is a random walk on the graph. [5,58]

When the size of the graph is too large to traverse in a reasonable time and space complexity, random walks can be used to approximate the computation of certain properties of the graph. In node2vec [32], the authors compute biased-random walks to obtain a balanced traversal between depth first and breadth first traversal. Then they apply a similar technique to word2vec [61] by considering the graph walks as sentences to compute the embedding.

Graph neural network models One of the earliest work that proposes a framework for consuming graph data by neural networks is GNN [83]. Deep autoencoders can be used for dimensionality reduction. Deep graph embedding techniques use this ability to reduce the dimension of the matrix representing the graph. The authors of [49] propose a Graph Convolutional Network (GCN) model which is a variant of convolutional neural networks that operates on graphs. [50] applies variational autoencoders – where the encoder part is a graph convolutional network – in order to improve the embedding quality of unsupervised techniques.

8.2.4. Embedding of knowledge graphs

[97] classifies the embedding approaches of KG facts into:

Translational distance models. In the translational model TransE [9], both the head of the fact h (subject in RDF terminology) and the tail of the fact t (object in RDF terminology) are embedded in the same vector space.

Let: $R^{d}$ :

be the embedding space where d is the embedding dimension.

be the vector representation in $R^{d}$ of the head entity h.

be the vector representation in $R^{d}$ of the tail entity t.

In these translational models, the relation r (predicate in RDF terminology) is represented as a translation vector r such that

h + r \approx t

. A scoring function

f_{r} (h, t)

is defined to assign a plausibility score to the facts of the KG. The TransE model [9] does not support facts with 1-N relations – such as a student taking many courses – as all the courses in this case will have very close embedding vectors. The literature contains variations of the TransE model that support 1-N relations such as TransH [98] which uses relation-specific hyperplanes, TransR [53] and TransD [44]. Gaussian embeddings in this class such as KG2E [38] aim to model uncertainty in the entities and relations.

Semantic matching models. In the semantic matching models, the entities are represented by their latent semantic attributes and their relations “are encoded as bilinear operators on the entities [23,43]”. In other terms, each relation is denoted as a matrix $M_{r}$ that represents the pairwise relations between the entities. The score of the fact plausibility in these models is computed by this bilinear map: $f_{r} (h, t) = h^{T} M_{r} t$ . This category includes RESCAL [66], DistMul [102] where $M_{r}$ is simplified to a diagonal matrix, ComplEx [95] which extends DistMul by using complex valued embedding in order to support asymmetric relations. HolE (Holographic Embeddings) [65] also supports asymmetric relations through circular correlations between the entities’ embeddings.

Neural network architectures for KG embedding The network models proposed in the literature for learning KG embeddings include:

Semantic matching energy [8] which computes the energy by matching the embedding of a left hand side containing the head and the relation of the triple and the embedding of the right hand side containing the tail and the relation of the triple.

Neural tensor network (NTN) [90] proposed an end-to-end deep neural network model that is parameterized by a 3-way tensor representing the relation in order to learn the plausibility of triples in a KG.

Relational Graph Convolutional Networks (R-GCNs) [84] adapts GCN [49] to KGs by introducing transformations that are dependent on the type and direction of the edges.

Embedding of RDF graphs RDF embedding techniques can be classified into:

Graph kernels for RDF. One of the earliest work in this class was [56] where the authors apply general graph kernel methods on RDF graphs and propose two kernels that are specific to RDF, namely intersection graph kernels and intersection tree kernels. In [26], the authors consider state of the art graph kernels which are Weisfeiler–Lehman graph kernels [87] and adapt them to RDF graphs. [69] proposes an h-hop neighborhood-based graph kernel for LOD and they apply it in a linked data recommender system.

2vec RDF embedding. These approaches use the following generic method: generate sequences of entities from the RDF graph using graph walks or other graph kernels and then apply a technique similar to word2vec [62] – where each entity in the sequence is treated as a word in a sentence. In RDF2Vec [79], the sequences are generated using graph walks and using the Weisfeiler–Lehman adaptation to RDF graphs [26] mentioned previously. [20] improves RDF2Vec by using biased graph walks to generate the entities’ sequences. In order to explore the global patterns of the RDF graph instead of the local patterns as in RDF2Vec, [21] substitute word2vec with a technique similar to GloVe (Global Vectors) [78]. The authors report similar embedding quality as RDF2Vec but with the ability to incorporate larger portions of the graph.

Knowledge graph completion One of the main application of KG embedding is Knowledge Graph Completion (KGC). Data-driven KGC literature includes [45,54,80,86,91,94]. Logic Tensor Networks [86] allow the definition of logical constraints to improve the KGC. [80] aims not only at predicting missing relations in a KG but also at inducing the logical rules from it.

8.3. Approximate semantic reasoning

In 2010, Hitzler and van Harmelen called in [39] for questioning the model-theoretic semantic reasoning and investigation of machine learning (ML) for semantic reasoning since ML techniques are more tolerant to noisy data.

8.3.1. Type inference

Type inference consists of inferring the corresponding classes from the TBox for resources in the ABox. It can be considered as a main step towards full RDFS reasoning as almost half of the rules in RDFS – namely RDFS1, RDFS2, RDFS3, RDFS4a, RDFS4b and RDFS9 [37] – generate type inference.

SDType algorithm [74,75] (mentioned previously) used a statistical distribution of types to predict the type of object and subject in a triple given that they are connected with a certain property. Their statistical approach makes this type inference mechanism robust to noise in RDF data. [68] targeted inferring the missing types in DBpedia resources through an inductive and an abductive approach. In the inductive approach, the k-Nearest Neighbors algorithm is used to determine the closest concepts from the DBpedia ontology to which the resource should be linked. In the abductive approach, the Encyclopedia Knowledge Paths [67] are used in a similarity metric.

8.3.2. Consistency checking

In [76], the authors aimed to detect systematic errors in DBpedia by aligning the DBpedia ontology and the upper level ontology DOLCE-Zero [28]. By clustering the reasoning results, they found that 40 clusters cover 96% of the inconsistencies. This observation was among the motivations that approximate semantic reasoning can cover most of the use cases where ontological reasoning is required. In order to speedup the process of ABox consistency checking, [77] used an approximate semantic reasoning approach based on machine learning. The authors formalized the problem as a binary classification problem where each classifier C is trained for a specific TBox to decide if any ABox is consistent or not with respect to the TBox. In order to transform the RDF graphs into feature vectors, graph walks [57] were used. The decision tree model achieves 95% accuracy within 2% of the time required by a semantic reasoner.

In [60], the authors extend the clash queries [51] for DL-Lite [13] and caching to reduce the required calls to a semantic reasoner in order to check ABox inconsistency. This approach had better running time and empirical accuracy than [77].

8.4. Deep learning for semantic reasoning

One of the closest research efforts to the scope of this research is [42]. Besides the used neural network model, the main difference between their approach and ours is that they consider only learning from intact data and do not focus on noise-tolerance capabilities. In this work, Relational Tensor Networks (RTN) are proposed as an adaptation of Recursive Neural Tensor Networks (RNTN) [92] for relational learning. RNTNs were originally designed by Socher to support learning from tree-structured data such as sentences’ parse trees and they were used successfully to improve sentiment analysis results. In [42], the authors start by building a Directed Acyclic Graph (DAG) representation of the RDF input. Every resource in the graph is initially represented as an incidence vector that indicates the set of rdf:type(s) of the resource. Then the embeddings of the resources are computed using the RTN model that takes into consideration the type or the relation that each resource has. Two types of targets are considered: a unary target for type prediction and a binary target for predicate classification. The input for the binary targets are the embeddings of two resources – to which the predicates are being classified.

9. Conclusions, discussions and future work

The main contribution of this paper is the empirical evidence that deep learning (neural networks translation in particular) can in fact be used to learn semantic reasoning – RDFS rules specifically. The goal was not to reinvent the wheel and design a Yet another Semantic Reasoner (YaSR) using a new technology; it was rather to fill a gap that existing rule-based semantic reasoners could not satisfy, which is noise-tolerance.

While the current approach proves empirically that RDFS rules are learnable by sequence-to-sequence models with noise-tolerant reasoning capabilities, it is barely a scratch on the surface of noise-tolerant reasoning in general. This research can be extended in the following directions:

9.1. Generative adversarial model for graph words

The experiments on controlled noisy datasets from LUBM1 showed that the noise-tolerance capability of the deep reasoner depends on the type of noise – specifically the noise-tolerance on noisy type assertions is better than the noise-tolerance on noisy property assertions. In the propagable noise cases – where Jena or any rule-based reasoner generates noisy inferences – the deep reasoner showed noise-tolerance with varying degrees of accuracy (from $93 %$ to $46 %$ ). However, for the non-propagable noise cases – that do not affect rule reasoners inference – Jena performed better than the deep reasoner. For the special case of WOAD noise, both Jena and the deep reasoner have the worst accuracy of $0 %$ . In these experiments, the training was performed on intact data and noise was seen only during the test phase. One way to improve the noise-tolerance capability for these cases is to induce a small percentage of noise in the training set as well. Our previous experiments on the naive sequence-to-sequence learning for RDFS reasoning [59] proved that training with a small percentage of noise improves the noise-tolerance capability dramatically. Instead of generating noise of a specific type – which assumes the prior knowledge of the type of noise encountered during the test phase – we propose designing generative adversarial models for graph words. Generative adversarial models, described in [30], are being used successfully in other fields to add robustness to unknown types of noise. In these models, two networks were trained while competing with each other: the generator is trained to generate the most difficult sample that can fool the discriminator into thinking that the sample is not noisy, and the discriminator is trained to distinguish between noisy and intact samples. The deep reasoner will then learn from the ground truth graph words as well as the corrupted graph words generated by the adversarial generator.

9.2. OWL reasoning

In this work, we tackled the problem of noise-tolerant RDFS reasoning. Web Ontology Language (OWL) reasoning with noise-tolerant capability is also a very promising research track that can find its applications in the biological and biomedical fields for example. We investigated some use cases using ontologies from the Open Biological and Biomedical Ontology (OBO) Foundry [89], specifically using the Human Disease Ontology [47]. In this use case, some patients’ descriptions would contain misdiagnoses and the goal is to generate correct inferences with the presence of these misdiagnoses. The hurdle that we faced in proceeding with this use case was ground truthing, as we needed patients’ data with tagged noise. In this context, tagged noise means that the misdiagnosed cases are known. This is required to compare the inference from intact data versus the inference from noisy data.

In [59], we tested the naive sequence-to-sequence learning approach on a subset of OWL-RL rules. This subset includes what we call generative rules that generate inference triples and exclude the consistency checking rules. The performance of the naive sequence-to-sequence approach on OWL-RL rules was comparable to its performance on RDFS rules. This is a preliminary indication that the graph words translation approach can also be applicable to learning OWL-RL rules.

9.3. Training with multiple “ABoxes”

Another limitation to the current approach is that the training is done on a dataset that uses only one ontology for the inference. After training the graph-to-graph model on the LUBM1 dataset, we needed to adapt the model hyperparameters for the scientists’ dataset and start the training from scratch. We propose exploring transfer learning: Instead of starting the training process from scratch when training to infer using a new ontology, the neural network weights from the previous training can be used to initialize the new model. Transfer learning [72] aims to capitalize on the knowledge learned from one domain and adapt it to a new domain. The adaptation phase in neural networks consists of tuning the model weights after initializing them using the previous models’ weights. Research in this direction looks promising especially when transferring weights between models of different width. The width of the model is determined by the length of the graph words sequence.

9.4. Towards the trust layer

In a recent positional paper titled “Semantic Web: Learning from Machine Learning” [11], Brickley describes his vision of how deep learning and SW fields can communicate and learn from each other. In this paper, we initiated the communication in one direction which is: deep learning for SW. The other direction, SW for deep learning, is also equally important and very promising with lots of opportunities for research and subsequent discovery. One such research effort in that direction is [82] where the authors use SW technologies to describe the inputs and outputs of neural networks.

We believe that our deep learning for noise-tolerant semantic reasoning contribution can be extended into a hub where both fields can communicate and benefit from each other. One way to create this hub is through provenance-based reasoning. Imagine that the deep reasoner will not only have access to the erroneous triple in DBpedia but to the provenance of that triple i.e. the person who originally edited the Wikipedia page and input the wrong information. By detecting that most of the triples provenant from that user causes the reasoner to be in noise-tolerance mode, it should not only ignore the triples generated by that user but also assign a trust level to its “facts”. This can be a step towards the trust layer in the SW layers cake.

Footnotes

Acknowledgements

We would like to thank DARPA SMISC, AFRL NS-CTA and IBM HEALS for sponsoring different stages of this research.

RDFS rules

Table 5

RDFS rules (from [37])

Pattern types of RDFS rules’ premises

Table 6

Pattern types of RDFS rules’ premises

Propagable noise by rule-based RDFS reasoners

Table 7

Propagable noise by rule-based RDFS reasoners

Input graph g

Inference graph of the input graph in Listing 1

RDF graph formalism

An RDF graph can be defined using these formalisms from [34,40,63] (that is updated in this paper to conform to the more recent RDF 1.1 recommendation [22]:

Let:

A tuple $(s, p, o) \in (I \cup B) \times I \times (I \cup B \cup L)$ is called an RDF triple where s denotes the triple’s subject, p denotes its predicate and o denotes its object.

An RDF graph is a set of RDF triples.

$T = {(s, p, o) ∣ (s, p, o) \in (I \cup B) \times I \times (I \cup B \cup L)}$

Let:

$Subj (T)$ be the set of subjects from $(I \cup B)$ that occur in the triples of T

$Pred (T)$ be the set of predicates from I that occur in the triples of T

$Obj (T)$ be the set of objects from $(I \cup B \cup L)$ that occur in the triples of T

$Subj Obj (T) = Subj (T) \cup Obj (T)$

Layered graph examples

Let the tuple of properties in $P^{+}$ for LUBM have the following order: (rdf:type, ub:takesCourse, ub:advisor, ub:emailAddress, ub:memberOf, ub:name, ub:telephone, ub:undergraduateDegreeFrom, ub:publicationAuthor, ub:degree).

The RDF graph in Listing 1 has the corresponding layered graph in Listing 3 and its inference graph has the layered graph listed in Listing 4.

Proof of the tensor creation requirement

Hence, it is mandatory that the encoding dictionary for a given graph g contains all the possible resources that might be used in its corresponding inference graph i.

Tensor creation detailed algorithms

In order to fulfill the requirement established in Proposition 1, it is mandatory that the global_resources_dictionary and the local_resources_dictionary for a given graph g contain all the possible resources that might be used in its corresponding inference graph i. To create the properties dictionary Properties_dictionary, the list of properties is collected using the following SPARQL Protocol and RDF Query Language (SPARQL) query:

which returns all the properties in the ontology that were used at least once. An ID is then assigned to each property. In the LUBM1 dataset, this query gives 32 properties, which means that the 3D adjacency matrix will have 32 layers.

For the global resources dictionary, the list of RDFS classes are collected from the ontology using this SPARQL query:

A filter is used to eliminate blank nodes. In the LUBM1 dataset, this query returns 57 classes where each class is assigned an ID in a global_resources_dictionary.

The local_resources_dictionary is created incrementally during the encoding routine for each graph g in G. It holds the IDs of the resources that are not present in the global_resources_dictionary. The local_resources_dictionary is populated with an offset equal to the length of global_resources_dictionary i.e. 57 in the case of LUBM1. The largest ID in the local_resources_dictionary for every graph in G is less than 80. This value will be used to initialize the size of the 3D adjacency matrix.

Advanced tensor creation technique

According to Corollary 1, for the encoding dictionaries of the input graph and the inference graph to be equal, the encoding algorithm of the inference graph should only use lookups from the encoding dictionary without adding any new resources. The simplified encoding technique achieved this because all the layers share the same local_resources_dictionary. However, by having a local_resources_dictionary per layer (i.e. per property) the following issues arise:

Advanced encoding algorithm

Graph words creation algorithm

Possible number of links per properties per classes in LUBM1

Table 8

Possible number of links per properties per classes in LUBM1

Classes	Properties

	rdf:type	ub:advisor	ub:teacherOf	ub:researchInterest
ub:GraduateStudent	1, 2	1	0	0
ub:Publication	1	0	0	0
ub:TeachingAssistant	2	1	0	0
ub:ResearchAssistant	2	1	0	0
ub:AssistantProfessor	1	0	2, 3, 4	1
ub:AssociateProfessor	1	0	2, 3, 4	1
ub:Lecturer	1	0	2, 3, 4	0
ub:Course	1	0	0	0
ub:GraduateCourse	1	0	0	0
ub:FullProfessor	1	0	2, 3, 4	1
ub:ResearchGroup	1	0	0	0
ub:Department	1	0	0	0
ub:University	1	0	0	0

The network of the relation RDFS:subPropertyOf in the DBpedia ontology (depicted without labels for visibility)

References

Abadi,

Barham,

Chen,

Davis,

Dean,

Devin,

Ghemawat,

Irving,

Isard,

Kudlur,

Levenberg,

Monga,

Moore,

D.G.

Murray,

Steiner,

Tucker,

Vasudevan,

Warden,

Wicke,

Yu and

Zheng, TensorFlow: A system for large-scale machine learning, in: 12th USENIX Symp. Operating Syst. Design Implementation (OSDI 16), USENIX Association, Savannah, GA, USA, 2016, pp. 265–283, https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi.

Ahmed,

Shervashidze,

S.M.

Narayanamurthy,

Josifovski and

A.J.

Smola, Distributed large-scale natural graph factorization, in: Proc. 22Nd Int. Conf. World Wide Web,

Schwabe,

V.A.F.

Almeida,

Glaser,

R.A.

Baeza-Yates and

S.B.

Moon, eds, ACM, New York, NY, USA, 2013, pp. 37–48. doi:10.1145/2488388.2488393.

Auer,

Bizer,

Kobilarov,

Lehmann,

Cyganiak and

Z.G.

Ives, DBpedia: A nucleus for a web of open data, in: The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC, 2007, Busan, Korea, November 11–15, 2007,

Aberer,

Choi,

N.F.

Noy,

Allemang,

Lee,

L.J.B.

Nixon,

Golbeck,

Mika,

Maynard,

Mizoguchi,

Schreiber and

Cudré-Mauroux, eds, Lecture Notes in Computer Science, Vol. 4825, Springer, 2007, pp. 722–735. doi:10.1007/978-3-540-76298-0_52.

Belkin and

Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in: Advances in Neural Inform. Process. Syst. 14,

T.G.

Dietterich,

Becker and

Ghahramani, eds, MIT Press, Cambridge, MA, USA, 2002, pp. 585–591, http://papers.nips.cc/paper/1961-laplacian-eigenmaps-and-spectral-techniques-for-embedding-and-clustering.pdf.

Berners-Lee,

Hendler and

Lassila, The Semantic Web, Scientific American284(5) (2001), 34–43, http://www.nature.com/scientificamerican/journal/v284/n5/pdf/scientificamerican0501-34.pdf.

Bhagat,

Cormode and

Muthukrishnan, Node classification in social networks, in: Social Network Data Analytics,

C.C.

Aggarwal, ed., Springer, 2011, pp. 115–148. doi:10.1007/978-1-4419-8462-3_5.

Bizer,

Heath and

Berners-Lee, Linked data – the story so far, Int. J. Semantic Web Inform. Syst.5(3) (2009), 1–22. doi:10.4018/jswis.2009081901.

Bordes,

Glorot,

Weston and

Bengio, A semantic matching energy function for learning with multi-relational data – application to word-sense disambiguation, Machine Learning94(2) (2014), 233–259. doi:10.1007/s10994-013-5363-6.

Bordes,

Usunier,

García-Durán,

Weston and

Yakhnenko, Translating embeddings for modeling multi-relational data, in: Advances in Neural Inform. Process. Syst. 26,

C.J.C.

Burges,

Bottou,

Ghahramani and

K.Q.

Weinberger, eds, Curran Associates, Inc., Red Hook, NY, USA, 2013, pp. 2787–2795, http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf.

10.

Bordes,

Weston,

Collobert and

Bengio, Learning structured embeddings of knowledge bases, in: Proc. Twenty-Fifth AAAI Conf. Artificial Intelligence,

Burgard and

Roth, eds, AAAI Press, 2011, pp. 301–306, http://www.aaai.org/ocs/index.php/AAAI/AAAI11/paper/view/3659.

11.

Brickley, in: Semantic Web: Learning from Machine Learning, Joint Proc. Int. Workshops Hybrid Statistical Semantic Understanding Emerging Semantics, Semantic Statistics Co-Located with 16th Int. Semantic Web Conference, HybridSemStats ,

Capadisli,

Cotton,

X.L.

Dong,

R.V.

Guha,

Haller,

Hitzler,

Kalampokis,

Kejriwal,

Lécué,

Sivakumar,

Szekely,

Troncy and

M.J.

Witbrock, eds, Vienna, Austria, CEUR-WS.org, 2017, pp. 1–6, http://ceur-ws.org/Vol-1923/article-08.pdf.

12.

Cai,

V.W.

Zheng and

K.C.

Chang, A comprehensive survey of graph embedding: Problems, techniques, and applications, IEEE Trans. Knowl. Data Eng.30(9) (2018), 1616–1637. doi:10.1109/TKDE.2018.2807452.

13.

Calvanese,

De Giacomo,

Lembo,

Lenzerini and

Rosati, Tractable reasoning and efficient query answering in description logics: The DL-lite family, J. Autom. Reasoning39(3) (2007), 385–429. doi:10.1007/s10817-007-9078-x.

14.

Carlson,

Betteridge,

Kisiel,

Settles,

E.R.H.

HruschkaJr. and

T.M.

Mitchell, Toward an architecture for never-ending language learning, in: Proc. Twenty-Fourth AAAI Conf. Artificial Intelligence, AAAI,

Fox and

Poole, eds, AAAI Press, Cambridge, MA, USA, 2010, pp. 1306–1313, http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1879.

15.

J.J.

Carroll,

Dickinson,

Dollin,

Reynolds,

Seaborne and

Wilkinson, Jena: Implementing the Semantic Web recommendations, in: Proc. 13th Int. World Wide Web Conf. Alternate Track Papers Posters, ACM, New York, NY, USA, 2004, pp. 74–83. doi:10.1145/1013367.1013381.

16.

Chernenkiy,

Gapanyuk,

Nardid,

Skvortsova,

Gushcha,

Fedorenko and

Picking, Using the metagraph approach for addressing RDF knowledge representation limitations, in: 2017 Internet Technologies Appl. (ITA), 2017, pp. 47–52. doi:10.1109/ITECHA.2017.8101909.

17.

Cho,

van Merrienboer,

Bahdanau and

Bengio, On the properties of neural machine translation: Encoder-decoder approaches, in: Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014,

Wu,

Carpuat,

Carreras and

E.M.

Vecchi, eds, Association for Computational Linguistics, 2014, pp. 103–111, https://www.aclweb.org/anthology/W14-4012/. doi:10.3115/v1/W14-4012.

18.

Chollet

et al., Keras, GitHub, 2015. https://github.com/keras-team/keras.

19.

Chung,

Ç.

Gülçehre,

Cho and

Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014, http://arxiv.org/abs/1412.3555.

20.

Cochez,

Ristoski,

S.P.

Ponzetto and

Paulheim, Biased graph walks for RDF graph embeddings, in: Proc. 7th Int. Conf. Web Intelligence, Mining Semantics,

Akerkar,

Cuzzocrea,

Cao and

Hacid, eds, ACM, New York, NY, USA, 2017, pp. 21:1–21:12. doi:10.1145/3102254.3102279.

21.

Cochez,

Ristoski,

S.P.

Ponzetto and

Paulheim, Global RDF vector space embeddings, in: Semantic Web – ISWC 2017 – 16th Int. Semantic Web Conference, Part I,

d’Amato,

Fernández,

V.A.M.

Tamma,

Lécué,

Cudré-Mauroux,

J.F.

Sequeda,

Lange and

Heflin, eds, Vol. 10587, Springer, New York, NY, USA, 2017, pp. 190–207. doi:10.1007/978-3-319-68288-4_12.

22.

Cyganiak,

Wood,

Lanthaler, RDF 1.1 Concepts and Abstract Syntax, W3C, W3C, 2014, https://www.w3.org/TR/rdf11-concepts/.

23.

A.S.

d’Avila Garcez,

T.R.

Besold,

L.D.

Raedt,

Földiák,

Hitzler,

Icard,

Kühnberger,

L.C.

Lamb,

Miikkulainen and

D.L.

Silver, Neural-symbolic learning and reasoning: Contributions and challenges, in: 2015 AAAI Spring Symposia, Stanford University, Palo Alto, California, USA, March 22–25, 2015, AAAI Press, 2015. doi:10.13140/2.1.1779.4243.

24.

DBpedia, About: James Hendler, 2016. http://dbpedia.org/page/James_Hendler.

25.

DBpedia, About: Yoshua Bengio, 2016. http://dbpedia.org/page/Yoshua_Bengio.

26.

G.K.D.

de Vries, A fast approximation of the Weisfeiler–Lehman graph kernel for RDF data, in: Mach. Learning Knowledge Discovery in Databases,

Blockeel,

Kersting,

Nijssen and

Zelezný, eds, Springer, Berlin Heidelberg, Berlin, Germany, 2013, pp. 606–621. doi:10.1007/978-3-642-40988-2_39.

27.

Fleischhacker,

Paulheim,

Bryl,

Völker and

Bizer, Detecting errors in numerical linked data using cross-checked outlier detection, in: Semantic Web – ISWC 2014,

Mika,

Tudorache,

Bernstein,

Welty,

Knoblock,

Vrandečić,

Groth,

Noy,

Janowicz and

Goble, eds, Springer International Publishing, New York, NY, USA, 2014, pp. 357–372. doi:10.1007/978-3-319-11964-9_23.

28.

Gangemi,

Guarino,

Masolo,

Oltramari and

Schneider, Sweetening ontologies with DOLCE, in: Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web, 13th International Conference, EKAW, 2002, Proceedings, Siguenza, Spain, October 1–4, 2002,

Gómez-Pérez and

V.R.

Benjamins, eds, Lecture Notes in Computer Science, Vol. 2473, Springer, 2002, pp. 166–181. ISBN 3-540-44268-5. doi:10.1007/3-540-45810-7_18.

29.

Gehring,

Auli,

Grangier,

Yarats and

Y.N.

Dauphin, Convolutional sequence to sequence learning, in: Proceedings of the 34th International Conference on Machine Learning, ICML, 2017, Sydney, NSW, Australia, 6–11 August 2017,

Precup and

Y.W.

Teh, eds, Proceedings of Machine Learning Research, Vol. 70, PMLR, 2017, pp. 1243–1252, http://proceedings.mlr.press/v70/gehring17a.html.

30.

I.J.

Goodfellow,

Shlens and

Szegedy, Explaining and harnessing adversarial examples, in: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015,

Bengio and

LeCun, eds, Conference Track Proceedings, 2015, http://arxiv.org/abs/1412.6572.

31.

Goyal and

Ferrara, Graph embedding techniques, applications, and performance: A survey, Knowl.-Based Syst.151 (2018), 78–94. doi:10.1016/j.knosys.2018.03.022.

32.

Grover and

Leskovec, Node2vec: Scalable feature learning for networks, in: Proc. 22Nd ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining,

Krishnapuram,

Shah,

A.J.

Smola,

C.C.

Aggarwal,

Shen and

Rastogi, eds, ACM, New York, NY, USA, 2016, pp. 855–864. doi:10.1145/2939672.2939754.

33.

Guo,

Pan and

Heflin, LUBM: A benchmark for OWL knowledge base systems, Web Semantics: Science, Services Agents World Wide Web3(2–3) (2005), 158–182. doi:10.1016/j.websem.2005.06.005.

34.

Gutiérrez,

C.A.

Hurtado and

A.O.

Mendelzon, Formal aspects of querying RDF databases, in: Proc. First Int. Conf. Semantic Web Databases,

I.F.

Cruz,

Kashyap,

Decker and

Eckstein, eds, CEUR-WS.org, Aachen, Germany, 2003, pp. 279–293, http://dl.acm.org/citation.cfm?id=2889905.2889924.

35.

A.A.

Hagberg,

D.A.

Schult and

P.J.

Swart, Exploring network structure, dynamics, and function using NetworkX, in: Proc. 7th Python in Sci. Conference,

Varoquaux,

Vaught and

Millman, eds, Pasadena, CA, USA, 2008, pp. 11–15.

36.

Hayes and

Gutiérrez, Bipartite graphs as intermediate model for RDF, in: Semantic Web – ISWC 2004: Third Int. Semantic Web Conference,

S.A.

McIlraith,

Plexousakis and

van Harmelen, eds, Vol. 3298, Springer, Berlin, Germany, 2004, pp. 47–61. doi:10.1007/978-3-540-30475-3_5.

37.

P.J.

Hayes and

P.F.

Patel-Schneider, RDF 1.1 Semantics, W3C Recommendation 25 February 2014, W3C, 2014, https://www.w3.org/TR/rdf11-mt/.

38.

He,

Liu,

Ji and

Zhao, Learning to represent knowledge graphs with Gaussian embedding, in: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM ’15, ACM, New York, NY, USA, 2015, pp. 623–632. ISBN 978-1-4503-3794-6. doi:10.1145/2806416.2806502.

39.

Hitzler and

van Harmelen, A reasonable Semantic Web, Semantic Web1(1–2) (2010), 39–44. doi:10.3233/SW-2010-0010.

40.

Hogan,

Arenas,

Mallea and

Polleres, Everything you always wanted to know about blank nodes, Web Semantics: Science, Services Agents World Wide Web27 (2014), 42–69. doi:10.1016/j.websem.2014.06.004.

41.

Hogan,

Harth,

Passant,

Decker and

Polleres, Weaving the pedantic Web, in: Proc. Linked Data Web Workshop (LDOW2010),

Bizer,

Heath,

Berners-Lee and

Hausenblas, eds, CEUR-WS.org, Vol. 628, Raleigh, NC, USA, 2010, pp. 1–10, http://ceur-ws.org/Vol-628/ldow2010_paper04.pdf.

42.

Hohenecker and

Lukasiewicz, Deep learning for ontology reasoning, 2017, http://arxiv.org/abs/1705.10342.

43.

Jenatton,

N.L.

Roux,

Bordes and

Obozinski, A latent factor model for highly multi-relational data, in: Advances in Neural Inform. Process. Syst. 25,

P.L.

Bartlett,

F.C.N.

Pereira,

C.J.C.

Burges,

Bottou and

K.Q.

Weinberger, eds, Curran Associates, Inc., Red Hook, NY, USA, 2012, pp. 3167–3175, http://papers.nips.cc/paper/4744-a-latent-factor-model-for-highly-multi-relational-data.pdf.

44.

Ji,

He,

Xu,

Liu and

Zhao, Knowledge graph embedding via dynamic mapping matrix, in: Proc. 53rd Annu. Meeting Assoc. Computational Linguistics 7th Int. Joint Conf. Natural Language Process. (Volume 1: Long Papers), The Association for Computer Linguistics, Beijing, China, 2015, pp. 687–696, http://aclweb.org/anthology/P/P15/P15-1067.pdf. doi:10.3115/v1/P15-1067.

45.

Ji,

Liu,

He and

Zhao, Knowledge graph completion with adaptive sparse transfer matrix, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, Arizona, USA, February 12–17, 2016,

Schuurmans and

M.P.

Wellman, eds, AAAI Press, 2016, pp. 985–991, http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11982. ISBN 978-1-57735-760-5.

46.

Katz, A new status index derived from sociometric analysis, Psychometrika18(1) (1953), 39–43. doi:10.1007/BF02289026.

47.

W.A.

Kibbe,

Arze,

Felix,

Mitraka,

Bolton,

Fu,

C.J.

Mungall,

J.X.

Binder,

Malone,

Vasant,

Parkinson and

L.M.

Schriml, Disease ontology 2015 update: An expanded and updated database of human diseases for linking biomedical knowledge through disease data, Nucleic Acids Research43(D1) (2015), 1071–1078. doi:10.1093/nar/gku1011.

48.

D.P.

Kingma and

Ba, Adam: A method for stochastic optimization, in: 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings, San Diego, CA, USA, May 7–9, 2015,

Bengio and

LeCun, eds, 2015, http://arxiv.org/abs/1412.6980.

49.

T.N.

Kipf and

Welling, Semi-supervised classification with graph convolutional networks, in: 5th International Conference on Learning Representations, ICLR, 2017, Conference Track Proceedings, Toulon, France, April 24–26, 2017, OpenReview.net, 2017, https://openreview.net/forum?id=SJU4ayYgl.

50.

T.N.

Kipf and

Welling, Variational graph auto-encoders, 2016, http://arxiv.org/abs/1611.07308.

51.

Lembo,

Lenzerini,

Rosati,

Ruzzi and

D.F.

Savo, Query rewriting for inconsistent DL-lite ontologies, in: Web Reasoning and Rule Systems – 5th International Conference, RR 2011, Proceedings, Galway, Ireland, August 29–30, 2011,

Rudolph and

Gutiérrez, eds, Lecture Notes in Computer Science, Vol. 6902, Springer, 2011, pp. 155–169. ISBN 978-3-642-23579-5. doi:10.1007/978-3-642-23580-1_12.

52.

Liben-Nowell and

Kleinberg, The link-prediction problem for social networks, J. Assoc. Inform. Sci. Technology58(7) (2007), 1019–1031. doi:10.1002/asi.v58:7.

53.

Lin,

Liu,

Sun,

Liu and

Zhu, Learning entity and relation embeddings for knowledge graph completion, in: Proc. Twenty-Ninth AAAI Conf. Artificial Intelligence,

Bonet and

Koenig, eds, AAAI Press, Cambridge, MA, USA, 2015, pp. 2181–2187, http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9571.

54.

Lin,

Liu,

Sun,

Liu and

Zhu, Learning entity and relation embeddings for knowledge graph completion, in: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, USA, January 25–30, 2015,

Bonet and

Koenig, eds, AAAI Press, 2015, pp. 2181–2187, http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9571. ISBN 978-1-57735-698-1. doi:10.1016/j.procs.2017.05.045.

55.

Liu,

Dou,

Jin,

LePendu and

Shah, Mining biomedical ontologies and data using RDF hypergraphs, in: 12th Int. Conf. Mach. Learning Applications, ICMLA 2013, IEEE, Miami, FL, USA, 2013, pp. 141–146. doi:10.1109/ICMLA.2013.31.

56.

Lösch,

Bloehdorn and

Rettinger, Graph kernels for RDF data, in: Semantic Web: Res. Applications,

Simperl,

Cimiano,

Polleres,

Ó.

Corcho and

Presutti, eds, Springer, Berlin Heidelberg, Berlin, Germany, 2012, pp. 134–148. doi:10.1007/978-3-642-30284-8_16.

57.

Lösch,

Bloehdorn and

Rettinger, Graph kernels for RDF data, in: The Semantic Web: Research and Applications – 9th Extended Semantic Web Conference, ESWC 2012, Proceedings, Heraklion, Crete, Greece, May 27–31, 2012,

Simperl,

Cimiano,

Polleres,

Ó.

Corcho and

Presutti, eds, Lecture Notes in Computer Science, Vol. 7295, Springer, 2012, pp. 134–148. ISBN 978-3-642-30283-1. doi:10.1007/978-3-642-30284-8_16.

58.

Lovász, Random walks on graphs: A survey, Combinatorics, Paul Erdos Is Eighty2 (1993), 1–46, http://web.cs.elte.hu/~lovasz/erdos.pdf.

59.

Makni and

Hendler, Deep learning of RDFS rules, in: IJCAI Workshop “Semantic Machine Learning”, 2016, http://ist.gmu.edu/~hpurohit/events/sml16/pdf/SML2016_submission6_Bassem_revised.pdf.

60.

Meilicke,

Ruffinelli,

Nolle,

Paulheim and

Stuckenschmidt, Fast ABox consistency checking using incomplete reasoning and caching, in: Rules and Reasoning – International Joint Conference, RuleML+RR 2017, Proceedings, London, UK, July 12–15, 2017,

Costantini,

Franconi,

W.V.

Woensel,

Kontchakov,

Sadri and

Roman, eds, Lecture Notes in Computer Science, Vol. 10364, Springer, London, UK, 2017, pp. 168–183. ISBN 978-3-319-61251-5. doi:10.1007/978-3-319-61252-2_12.

61.

Mikolov,

Chen,

Corrado and

Dean, Efficient estimation of word representations in vector space, in: 1st International Conference on Learning Representations, ICLR, 2013, Workshop Track Proceedings, Scottsdale, Arizona, USA, May 2–4, 2013,

Bengio and

LeCun, eds, 2013, http://arxiv.org/abs/1301.3781.

62.

Mikolov,

Sutskever,

Chen,

Corrado and

Dean, Distributed representations of words and phrases and their compositionality, in: Proc. 26th Int. Conf. Neural Inform. Process. Syst, Vol. 2, Curran Associates Inc., Red Hook, NY, USA, 2013, pp. 3111–3119, http://dl.acm.org/citation.cfm?id=2999792.2999959.

63.

A.A.M.

Morales, A directed hypergraph model for RDF, in: Proc. KWEPSY 2007 Knowledge Web PhD Symp. 2007,

E.P.B.

Simperl,

Diederich and

Schreiber, eds, CEUR-WS.org, Vol. 275, Innsbruck, Austria, 2007, pp. 1–2, http://ceur-ws.org/Vol-275/paper24.pdf.

64.

Motik,

B.C.

Grau,

Horrocks,

Wu,

Fokoue and

Lutz, OWL 2 Web Ontology Language Profiles, W3C Recommendation 27 October 2009, W3C, 2012, https://www.w3.org/TR/owl2-profiles/.

65.

Nickel,

Rosasco and

Poggio, Holographic embeddings of knowledge graphs, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, AAAI Press, 2016, pp. 1955–1961, http://dl.acm.org/citation.cfm?id=3016100.3016172.

66.

Nickel,

Tresp and

H.-P.

Kriegel, A three-way model for collective learning on multi-relational data, in: Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, Omnipress, USA, 2011, pp. 809–816, http://dl.acm.org/citation.cfm?id=3104482.3104584. ISBN 978-1-4503-0619-5.

67.

A.G.

Nuzzolese,

Gangemi,

Presutti and

Ciancarini, Encyclopedic knowledge patterns from Wikipedia links, in: Semantic Web – ISWC 2011,

Aroyo,

Welty,

Alani,

Taylor,

Bernstein,

Kagal,

N.F.

Noy and

Blomqvist, eds, Vol. 7031, Springer, Berlin Heidelberg, Berlin, Germany, 2011, pp. 520–536. doi:10.1007/978-3-642-25073-6_33.

68.

A.G.

Nuzzolese,

Gangemi,

Presutti and

Ciancarini, Type inference through the analysis of Wikipedia links, in: WWW2012 Workshop Linked Data Web,

Bizer,

Heath,

Berners-Lee and

Hausenblas, eds, CEUR-WS.org, Vol. 937, Lyon, France, 2012, pp. 1–9, http://ceur-ws.org/Vol-937/ldow2012-paper-13.pdf.

69.

V.C.

Ostuni,

T.D.

Noia,

Mirizzi and

E.D.

Sciascio, A linked data recommender system using a neighborhood-based graph kernel, in: E-Commerce Web Technologies,

Hepp and

Hoffner, eds, Springer International Publishing, New York, NY, USA, 2014, pp. 89–100. doi:10.1007/978-3-319-10491-1_10.

70.

Ou,

Cui,

Pei,

Zhang and

Zhu, Asymmetric transitivity preserving graph embedding, in: Proc. 22Nd ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining, ACM, New York, NY, USA, 2016, pp. 1105–1114. doi:10.1145/2939672.2939751.

71.

Ou,

Cui,

Pei,

Zhang and

Zhu, Asymmetric transitivity preserving graph embedding, in: Proc. 22Nd ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining,

Krishnapuram,

Shah,

A.J.

Smola,

C.C.

Aggarwal,

Shen and

Rastogi, eds, ACM, New York, NY, USA, 2016, pp. 1105–1114. doi:10.1145/2939672.2939751.

72.

S.J.

Pan and

Yang, A survey on transfer learning, IEEE Trans. Knowl. Data Eng22(10) (2010), 1345–1359. doi:10.1109/TKDE.2009.191.

73.

Parzen, On estimation of a probability density function and mode, Ann. Math. Statist.33(3) (1962), 1065–1076. doi:10.1214/aoms/1177704472.

74.

Paulheim and

Bizer, Type inference on noisy RDF data, in: The Semantic Web – ISWC 2013 – 12th International Semantic Web Conference, Proceedings, Part I, Sydney, NSW, Australia, October 21–25, 2013,

Alani,

Kagal,

Fokoue,

P.T.

Groth,

Biemann,

J.X.

Parreira,

Aroyo,

N.F.

Noy,

Welty and

Janowicz, eds, Lecture Notes in Computer Science, Vol. 8218, Springer, 2013, pp. 510–525. doi:10.1007/978-3-642-41335-3_32.

75.

Paulheim and

Bizer, Improving the quality of linked data using statistical distributions, Int. J. Semantic Web Inform. Syst.10(2) (2014), 63–86. doi:10.4018/ijswis.2014040104.

76.

Paulheim and

Gangemi, Serving DBpedia with DOLCE – more than just adding a cherry on top, in: The Semantic Web – ISWC 2015 – 14th International Semantic Web Conference, Proceedings, Part I, Bethlehem, PA, USA, October 11–15, 2015,

Arenas,

Ó.

Corcho,

Simperl,

Strohmaier,

d’Aquin,

Srinivas,

P.T.

Groth,

Dumontier,

Heflin,

Thirunarayan and

Staab, eds, Lecture Notes in Computer Science, Vol. 9366, Springer, Bethlehem, PA, USA, 2015, pp. 180–196. ISBN 978-3-319-25006-9. doi:10.1007/978-3-319-25007-6_11.

77.

Paulheim and

Stuckenschmidt, Fast approximate A-box consistency checking using machine learning, in: The Semantic Web. Latest Advances and New Domains – 13th International Conference, ESWC 2016, Proceedings, Heraklion, Crete, Greece, May 29–June 2, 2016,

Sack,

Blomqvist,

d’Aquin,

Ghidini,

S.P.

Ponzetto and

Lange, eds, Lecture Notes in Computer Science, Vol. 9678, Springer, 2016, pp. 135–150. doi:10.1007/978-3-319-34129-3_9.

78.

Pennington,

Socher and

C.D.

Manning, Glove: Global vectors for word representation, in: Proc. 2014 Conf. Empirical Methods in Natural Language Process (EMNLP),

Moschitti,

Pang and

Daelemans, eds, Association for Computational Linguistics, Doha, Qatar, 2014, pp. 1532–1543, http://www.aclweb.org/anthology/D14-1162. doi:10.3115/v1/D14-1162.

79.

Ristoski and

Paulheim, RDF2Vec: RDF graph embeddings for data mining, in: Semantic Web – ISWC 2016,

P.T.

Groth,

Simperl,

A.J.G.

Gray,

Sabou,

Krötzsch,

Lécué,

Flöck and

Gil, eds, Vol. 9981, Springer International Publishing, New York, NY, USA, 2016, pp. 498–514. doi:10.1007/978-3-319-46523-4_30.

80.

Rocktäschel and

Riedel, End-to-end differentiable proving, in: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017,

Guyon,

von Luxburg,

Bengio,

H.M.

Wallach,

Fergus,

S.V.N.

Vishwanathan and

Garnett, eds, 2017, pp. 3791–3803, http://papers.nips.cc/paper/6969-end-to-end-differentiable-proving.

81.

S.T.

Roweis and

L.K.

Saul, Nonlinear dimensionality reduction by locally linear embedding, Science290(5500) (2000), 2323–2326, http://science.sciencemag.org/content/290/5500/2323. doi:10.1126/science.290.5500.2323.

82.

M.K.

Sarker,

Xie,

Doran,

Raymer and

Hitzler, Explaining trained neural networks with semantic web technologies: First steps, in: Proc. Twelfth Int. Workshop Neural-Symbolic Learning Reasoning,

T.R.

Besold,

A.S.

d’Avila Garcez and

Noble, eds, CEUR-WS.org, Vol. 2003, London, UK, 2017, pp. 1–10, http://ceur-ws.org/Vol-2003/NeSy17_paper4.pdf.

83.

Scarselli,

Gori,

A.C.

Tsoi,

Hagenbuchner and

Monfardini, The graph neural network model, IEEE Trans. Neural Netw.20(1) (2009), 61–80. doi:10.1109/TNN.2008.2005605.

84.

M.S.

Schlichtkrull,

T.N.

Kipf,

Bloem,

van den Berg,

Titov and

Welling, Modeling relational data with graph convolutional networks, in: The Semantic Web – 15th International Conference, ESWC 2018, Proceedings, Heraklion, Crete, Greece, June 3–7, 2018,

Gangemi,

Navigli,

Vidal,

Hitzler,

Troncy,

Hollink,

Tordai and

Alam, eds, Lecture Notes in Computer Science, Vol. 10843, Springer, 2018, pp. 593–607. doi:10.1007/978-3-319-93417-4_38.

85.

Schuster and

K.K.

Paliwal, Bidirectional recurrent neural networks, IEEE Trans. Signal Processing45(11) (1997), 2673–2681. doi:10.1109/78.650093.

86.

Serafini and

A.S.

d’Avila Garcez, Logic tensor networks: Deep learning and logical reasoning from data and knowledge, in: Proceedings of the 11th International Workshop on Neural-Symbolic Learning and Reasoning (NeSy’16) Co-located with the Joint Multi-Conference on Human-Level Artificial Intelligence (HLAI 2016), New York City, NY, USA, July 16–17, 2016,

T.R.

Besold,

L.C.

Lamb,

Serafini and

Tabor, eds, CEUR Workshop Proceedings, Vol. 1768, CEUR-WS.org, 2016, http://ceur-ws.org/Vol-1768/NESY16_paper3.pdf.

87.

Shervashidze,

Schweitzer,

E.J.

van Leeuwen,

Mehlhorn and

K.M.

Borgwardt, Weisfeiler–Lehman graph kernels, J. Mach. Learning Research12 (2011), 2539–2561, http://dl.acm.org/citation.cfm?id=1953048.2078187.

88.

Shi and

Weninger, Open-world knowledge graph completion, in: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7, 2018,

S.A.

McIlraith and

K.Q.

Weinberger, eds, AAAI Press, 2018, pp. 1957–1964, https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16055.

89.

Smith,

Ashburner,

Rosse,

Bard,

Bug,

Ceusters,

L.J.

Goldberg,

Eilbeck,

Ireland,

C.J.

Mungall,

T.O.

Consortium,

Leontis,

Rocca-Serra,

Ruttenberg,

S.-A.

Sansone,

R.H.

Scheuermann,

Shah,

P.L.

Whetzel and

Lewis, The OBO foundry: Coordinated evolution of ontologies to support biomedical data integration, Nature Biotechnology25 (2007), 1251. doi:10.1038/nbt1346.

90.

Socher,

Chen,

C.D.

Manning and

A.Y.

Ng, Reasoning with neural tensor networks for knowledge base completion, in: Advances in Neural Information Processing Systems, Vol. 26, 2013.

91.

Socher,

Chen,

C.D.

Manning and

A.Y.

Ng, Reasoning with neural tensor networks for knowledge base completion, in: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a Meeting Held, Lake Tahoe, Nevada, United States, December 5–8, 2013,

C.J.C.

Burges,

Bottou,

Ghahramani and

K.Q.

Weinberger, eds, 2013, pp. 926–934, http://papers.nips.cc/paper/5028-reasoning-with-neural-tensor-networks-for-knowledge-base-completion.

92.

Socher,

Perelygin,

Wu,

Chuang,

C.D.

Manning,

Ng and

Potts, Recursive deep models for semantic compositionality over a sentiment treebank, in: Proc. 2013 Conf. Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Seattle, WA, USA, 2013, pp. 1631–1642, http://www.aclweb.org/anthology/D13-1170.

93.

Sutskever,

Vinyals and

Q.V.

Le, Sequence to sequence learning with neural networks, in: Proc. 27th Int. Conf. Neural Inform. Process. Syst, Vol. 2, MIT Press, Cambridge, MA, USA, 2014, pp. 3104–3112, http://dl.acm.org/citation.cfm?id=2969033.2969173.

94.

Trouillon,

C.R.

Dance,

É.

Gaussier,

Welbl,

Riedel and

Bouchard, Knowledge graph completion via complex tensor factorization, Journal of Machine Learning Research (JMLR)18(130) (2017), 1–38.

95.

Trouillon,

Welbl,

Riedel,

Gaussier and

Bouchard, Complex embeddings for simple link prediction, in: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML’16, Vol. 48, JMLR.org, 2016, pp. 2071–2080, http://dl.acm.org/citation.cfm?id=3045390.3045609.

96.

L.G.

Valiant, Knowledge infusion: In pursuit of robustness in artificial intelligence, in: IARCS Annu. Conf. Foundations Software Technol. Theoretical Computer Science,

Hariharan,

Mukund and

Vinay, eds, Vol. 2, Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 2008, pp. 415–422, http://drops.dagstuhl.de/opus/volltexte/2008/1770. doi:10.4230/LIPIcs.FSTTCS.2008.1770.

97.

Wang,

Mao,

Wang and

Guo, Knowledge graph embedding: A survey of approaches and applications, IEEE Trans. Knowl. Data Eng29(12) (2017), 2724–2743. doi:10.1109/TKDE.2017.2754499.

98.

Wang,

Zhang,

Feng and

Chen, Knowledge graph embedding by translating on hyperplanes, in: Proc. Twenty-Eighth AAAI Conf. Artificial Intelligence,

C.E.

Brodley and

Stone, eds, AAAI Press, Cambridge, MA, USA, 2014, pp. 1112–1119, http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8531.

99.

Wienand and

Paulheim, Detecting incorrect numerical data in DBpedia, in: Semantic Web: Trends Challenges,

Presutti,

d’Amato,

Gandon,

d’Aquin,

Staab and

Tordai, eds, Springer International Publishing, New York, NY, USA, 2014, pp. 504–518. doi:10.1007/978-3-319-07443-6_34.

100.

Wu,

Li,

Hu and

Wang, System π: A native RDF repository based on the hypergraph representation for RDF data model, J. Computer Sci. Technology24(4) (2009), 652–664. doi:10.1007/s11390-009-9265-9.

101.

R.X.Z.L.M.S. Xu Han Yankai Lin, OpenKE, THUNLP, 2017. http://openke.thunlp.org/home.

102.

Yang,

Yih,

He,

Gao and

Deng, Embedding entities and relations for learning and inference in knowledge bases, in: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015,

Bengio and

LeCun, eds, Conference Track Proceedings, 2015, http://arxiv.org/abs/1412.6575.

103.

Zaveri,

Kontokostas,

M.A.

Sherif,

Bühmann,

Morsey,

Auer and

Lehmann, User-driven quality evaluation of DBpedia, in: Proc. 9th Int. Conf. Semantic Systems, ACM, New York, NY, USA, 2013, pp. 97–104. doi:10.1145/2506182.2506195.

104.

Zhu,

Gao and

Quan, Noisy type assertion detection in semantic datasets, in: The Semantic Web – ISWC 2014 – 13th International Semantic Web Conference, Proceedings, Part I, Riva del Garda, Italy, October 19–23, 2014,

Mika,

Tudorache,

Bernstein,

Welty,

C.A.

Knoblock,

Vrandecic,

P.T.

Groth,

N.F.

Noy,

Janowicz and

C.A.

Goble, eds, Lecture Notes in Computer Science, Vol. 8796, Springer, 2014, pp. 373–388. doi:10.1007/978-3-319-11964-9_24.

Deep learning for noise-tolerant RDFS reasoning 1

Abstract

Keywords

1. Introduction

1.1. Contributions and outline

2. Background and problem statement

2.1. Noise handling strategies

2.2. Knowledge graph completion categories

2.3. Graph embedding output

2.4. Problem statement

3. Ground truthing and noise induction

3.1. Taxonomy of Semantic Web noise types

Definition 1 (Triple corruption).

Definition 2 (Non-propagable noise).

Definition 3 (Propagable noise).

Table 1 RDFS9 rule from [37]

3.2. Ground-truthing in LUBM1

3.2.1. Noise induction in LUBM1

3.3. Ground truthing the scientist dataset from DBpedia

Table 3 Number of resources per class in the scientists dataset Class Number of resources dbo:Scientist 25,760 dbo:Place 22,035 dbo:EducationalInstitution 6,048 dbo:Award 1,166

4.1. Notations and definitions

5.1.2. Advanced version

5.2. Graph words

7.1. Hardware setup

7.2. Data analysis

7.2.1. LUBM1 data analysis

7.3. Baseline experiments

7.4. Evaluation on intact LUBM1

7.6. Evaluation on the scientists dataset

8.1. Handling noise in Semantic Web data

8.1.1. Active noise handling

8.1.2. Adaptive noise handling

8.2. Graph embedding

8.2.1. Why embedding graphs?

8.2.2. How to embed graphs?

8.2.3. Graph embedding methods

8.2.4. Embedding of knowledge graphs

8.3. Approximate semantic reasoning

8.3.1. Type inference

8.3.2. Consistency checking

8.4. Deep learning for semantic reasoning

9. Conclusions, discussions and future work

9.1. Generative adversarial model for graph words

9.2. OWL reasoning

9.3. Training with multiple “ABoxes”

9.4. Towards the trust layer

Footnotes

Acknowledgements

RDFS rules

Pattern types of RDFS rules’ premises

Propagable noise by rule-based RDFS reasoners

Input graph g

Inference graph of the input graph in Listing 1

RDF graph formalism

Layered graph examples

Proof of the tensor creation requirement

Tensor creation detailed algorithms

Advanced tensor creation technique

Advanced encoding algorithm

Graph words creation algorithm

Possible number of links per properties per classes in LUBM1

The network of the relation RDFS:subPropertyOf in the DBpedia ontology (depicted without labels for visibility)

References

Table 1
RDFS9 rule from [37]

Table 3
Number of resources per class in the scientists dataset

Class Number of resources

dbo:Scientist 25,760

dbo:Place 22,035

dbo:EducationalInstitution 6,048

dbo:Award 1,166