Sage Journals: Discover world-class research

Abstract

Machine Learning methods have been introduced in the Semantic Web for solving problems such as link and type prediction, ontology enrichment and completion (both at terminological and assertional level). Whilst initially mainly focussing on symbol-based solutions, recently numeric-based approaches have received major attention, motivated by the need to scale on the very large Web of Data. In this paper, the most representative proposals, belonging to the aforementioned categories are surveyed, jointly with the analysis of their main peculiarities and drawbacks. Afterwards the main envisioned research directions for further developing Machine Learning solutions for the Semantic Web are presented.

Keywords

Machine Learning symbol-based methods numeric-based methods

1. Introduction

The Semantic Web (SW) vision has been introduced with the goal of making the Web machine readable [3], by enriching resources with metadata whose formal semantics is defined in OWL1

¹
https://www.w3.org/OWL/

ontologies acting as shared vocabularies to be reused. Ontologies are also empowered with deductive reasoning capabilities which allow to derive knowledge that is implicitly encoded. While developing this vision, some limitations [30,55] arose: ontology construction resulted in a time consuming task; being strongly decoupled, ontologies and assertions can be out-of-sync, thus resulting incomplete, noisy and sometimes inconsistent with regard to the actual usage of the conceptual vocabulary in the assertions. These limitations became even more evident when pushing on Linked Data [4,54] for enabling the actual creation of the Web of Data. As a consequence, multiple necessities emerged: reasoning at large scale; managing noise, inconsistencies and incompleteness in the Web of Data; (semi-)automatizing tasks such as ontology completion, enrichment (both at schema and assertional level), link prediction; exploiting alternative forms of reasoning complementing the deductive approach.

In order to fill some of these gaps, machine learning (ML) methods have been proposed [13]. Problems such as query answering, instance retrieval and link prediction have been regarded as classification problems. Suitable machine learning methods, often inspired by symbol-based solutions in the Inductive Logic Programming (ILP) field (aiming at inducing an hypothesised logic program from a background knowledge and a collection of examples), have been proposed [14,26,32,38,56]. Most of them are able to cope with the expressive SW representations and the Open World Assumption (OWA) typically adopted, differently from the Closed Wold Assumption (CWA) that is usually assumed in the traditional ML settings. Problems such as ontology refinement and enrichment at terminology level, e.g. assessing disjointness axioms or complex descriptions for a given concept name, have been regarded as concept learning problems to be solved via supervised/unsupervised inductive learning methods for Description Logics [2] (DLs) representations [20,21,23,37,50,59].

Nowadays, the adoption of ML methods represents a major trend in several research fields such as computer vision, bioinformatics, image recognition, natural language processing and artificial intelligence. This is mostly due to the impressive scalability that recent methods, mainly grounded on numeric approaches (also called sub-symbolic), such as embeddings and deep learning [18], have shown. This trend is also occurred in SW. Indeed, motivated by the need for scaling, many of the recent ML-based solutions, e.g. for performing link/type predictions, as well as data-intensive tasks by exploiting the Web of Data and the emerging Knowledge Graphs (KGs) as background knowledge, are mainly grounded on embeddings [11,42,45] that are methods for translating high-dimensional vectors into relatively low-dimensional spaces. Nevertheless, the important gain in terms of scalability, that ML methods for the SW are obtaining is penalizing: a) having interpretable models as a result of a learning process; b) the ability to exploit deductive (and complementary forms of) reasoning capabilities; c) the expressiveness of the SW representations and the compliance with the OWA.

In the following, the main problems and ML methods that have been developed in the SW are surveyed along with the two categories: symbol-based (Section 2) and numeric-based (Section 3), hence the fundamental peculiarities and issues are discussed. The main envisioned research directions that need to be pursued for developing ML methods for the SW are illustrated in Section 4. Conclusions are drawn in Section 5.

2. Symbol-based methods for the Semantic Web

The first efforts in developing ML methods for the SW have been devoted to solve deductive reasoning tasks over ontologies under an inductive perspective. This was motivated by the necessity of offering an alternative way to perform some forms of reasoning when deductive reasoning was not applicable, for instance because of inconsistencies within ontologies, but also for supplying a solution for reasoning in presence of incompleteness (that is when missing information with respect to a certain domain of reference is registered), and/or in presence of noise (that is when ontologies are consistent but the information therein is somehow wrong with respect to a reference domain, e.g. missing disjointness axioms, missing and/or wrong assertions). Particularly, the incompleteness of knowledge bases, both at assertional and schema level, drove the development of ML methods trying to specifically tackle this problem. The overall idea consisted in exploiting the evidence coming from assertional knowledge for drawing plausible conclusions to be possibly represented with intensional models. In the following, the tasks that received major attention are reported jointly with the analysis of the main solutions for them.

2.1. Instance retrieval

One of the first problems that has been investigated is the instance retrieval problem, which amounts at assessing if an individual is an instance of a given concept. It has been regarded as a classification problem aiming at assessing the class-membership of an individual with respect to a query concept. Similarity-based methods, such as K-Nearest Neighbor and Support Vector Machine, have been developed since they are well known to be noise tolerant [6,14,47]. This required to cope with: 1) the OWA rather than the CWA generally adopted in ML; 2) the non-disjointness of the classes (since an individual can be instance of more that one concept at the same time) while, in the usual ML setting, classes are assumed to be disjoint; 3) the definition of new similarity measures and kernel functions for exploiting the expressiveness of SW representations. Additionally, because of the OWA, new metrics for the evaluation of the classification results have been defined [14]. This is because, by using standard metrics as precision, recall and F-measure, new inductive results were deemed as mistakes whilst they could turn out to be correct inferences when judged by a knowledge engineer. The proposed solutions experimentally proved their ability to perform inductive instance retrieval when compared to a standard deductive reasoner. They also proved their ability to induce new knowledge that was not logically derivable, but they did not result fully able to work at large scale.

Methods characterized by more interpretable models have been also defined [23,51]. Inspired by the ILP literature concerning the induction of decision trees in clausal representation [5], a solution for inducing a Terminological Decision Tree (TDT) has been formalized [23]. A TDT is a tree structure, naturally compliant with the OWA, employing: a DL language for representing nodes and inference services as corresponding tests on the nodes. The tree-induction algorithm adopts a classical top-down divide-and-conquer strategy with the use of refinement operators for DL concept descriptions. Once a TDT is induced, similarly to logical decision trees, a definition for the target concept (namely the concept with respect to perform classification) can be drawn, by exploiting the nodes in the tree structure. This solution showed the interesting ability to provide an interpretable model, but it turned out slightly less effective then similarity-based classification methods.

2.2. Concept learning for ontology enrichment

With the purpose of enriching ontologies at terminological level, methods for learning concept descriptions for a concept name have been proposed. The problem has been regarded as a supervised concept learning problem aiming at approximating an intensional DLs definition, given a set of individuals of an ontology acting as positive/negative training examples.

Various solutions, e.g. DL-Foil [20] and celoe [37] (part of the DL-Learner suite2

²
https://dl-learner.org/.

), have been formalized. They are mostly grounded on a separate-and-conquer (sequential covering) strategy: a new concept description is built by specializing, via suitable refinement operators, a partial solution to correctly cover (i.e. decide a consistent classification for) as many training instances as possible. Whilst DL-Foil works under OWA, celoe works under CWA. Both of them may suffer of ending up in sub-optimal solutions. In order to overcome such issue, DL-Focl [52], Parcel [58] and SpACEL [57] have been proposed. DL-Focl is an optimized version of DL-Foil, implementing a base greedy covering learner. Parcel combines top-down and bottom-up refinements in the search space. The learning problem is split into various sub-problems, according to a divide-and-conquer strategy, that are solved by running celoe. Once the partial solutions are obtained, they are combined in a bottom-up fashion. SpACEL extends Parcel with a symmetrical specialization of a concept description.

These solutions proved their ability to learn approximated concept descriptions for a target concept name but relatively small ontological knowledge bases have been considered for the experiments.

2.3. Knowledge completion

Knowledge completion consists in finding new information at assertional level, that is facts that are missing in a considered knowledge base. This task has become very popular with the development of KGs, that are well known to be incomplete, and it is also strongly related to the link prediction task (see Section 3).

One of the most well known system for knowledge completion of RDF knowledge bases is AMIE [26]. Inspired by the literature in association rule mining [1] and ILP methods for learning Horn clauses, AMIE aims to mine logic rules from RDF knowledge bases with the final goal of predicting new assertions. AMIE (and its optimized version AMIE+ [25]) currently represents the most scalable rule mining system for learning Horn rules on large RDF data collections and is also explicitly tailored to support the OWA. However, it does not exploit any form of deductive reasoning. A related rule mining system, similarly based on a level-wise generate and test strategy has been proposed in [16]. It aims to learn SWRL rules [31] from OWL ontologies while exploiting schema level information and deductive reasoning during the rule learning process. Both AMIE and the solution presented in [16] showed the ability to mine useful rules and to predict new assertional knowledge. the solution proposed in [16] showed reduced scalability due to the exploitation of the reasoning capabilities.

2.4. Learning disjointness axioms

Disjointness axioms are essential for making explicit the negative knowledge about a domain, yet they are often overlooked during the modeling process (thus affecting the efficacy of reasoning services). To tackle this problem, automated methods for discovering these axioms from the data distribution have been devised.

A solution grounded on association rule mining [1] has been proposed in [59,60]. It is based on studying the correlation between classes comparatively, namely association rules, negative association rules and correlation coefficient. Background knowledge and reasoning capabilities are used to a limited extent.

A different solution has been proposed in [50] where, moving from the assumption that two or more concepts may be mutually disjoint when the sets of their (known) instances do not overlap, the problem has been regarded as a clustering problem, aiming at finding partitions of similar individuals of the knowledge base, according to a cohesion criterion quantifying the degree of homogeneity of the individuals in an element of the partition. Specifically, the problem has been cast as a conceptual clustering problem, where the goal is both to find the best possible partitioning of the individuals and also to induce intensional definitions of the corresponding classes expressed in the standard representation languages. Emerging disjointness axioms are captured by the employment of terminological cluster trees (TCTs) and by minimizing the risk of mutual overlap between concepts. Once the TCT is grown, groups of (disjoint) clusters located at sibling nodes identify concepts involved in candidate disjointness axioms to be derived. Unlike [59,60], based on the statistical correlation between instances, the empirical evaluation of [50] showed its ability to discover disjointness axioms also involving complex concept descriptions, thanks to the exploitation of the underlying ontology as background knowledge.

3. Numeric-based methods for the Semantic Web

Whilst symbolic methods adopt symbols for representing entities and relationships of a domain and infer generalizations that provide new insights into the data and are ideally readily interpretable, numeric-based methods typically adopt feature vector (propositional) representations and cannot provide interpretable models but they usually result rather scalable [40].

The problem that has been mainly investigated in the SW context by adopting numeric solutions is link prediction which amounts to predict the existence (or the probability of correctness) of triples in (a portion of) the Web of Data. Data are considered in their graph representation, mostly RDF representation language has been targeted and almost no reasoning is exploited; most expressive SW languages are basically discarded. The attention towards this problem is also grown due to the increasing of KGs, that are known to be often missing facts [61]. In the KG context, link prediction is also referred to as knowledge graph completion. Methods borrowed from the Statistical Relational Learning (SRL) [27] (having as main goal the creation of statistical models for relational/graph-based data) have been mostly developed. In the following the main classes of methods and solutions targeting link prediction in the SW are analyzed.

3.1. Probabilistic latent variable models

Probabilistic Latent Variable Models explains relations between entities by associating each resource to a set of intrinsic latent attributes (i.e. attributes not directly observable in the data) and conditions the probability distribution of the relations between two resources on their latent attributes. All relations are considered conditionally independent given the latent attributes. This allows the information to propagate through the network of interconnected latent variables.

One of the first numeric-based link prediction solution belonging to this category is the Infinite Hidden Semantic Model (IHSM) [48]. It formalizes a probabilistic latent variable that associates a latent class variable with each resource/node and makes use of constraints expressed in First Order Logic during the learning process. IHSM showed promising results but resulted limited in scaling on large SW data collection because of the complexity of the probabilistic inference and learning, which is intractable in general [34].

3.2. Embedding models

With the goal of scaling on very large SW data collections, embedding models have been investigated. Similarly to probabilistic latent variable models, in embedding models each resource/node is represented with a continuous embedding vector encoding its intrinsic latent features within the data collection. Models in this class do not necessarily rely on probabilistic inference for learning the optimal embedding vectors and this allows to avoid the issues related to the normalization of probability distributions, that may lead to intractable problems.

One of the first solution belonging to this category is RESCAL [45], which implements graph embedding by computing a three-way factorization of an adjacency tensor that represents the multi-graph structure of the data collection. RESCAL resulted a powerful model also was able to capture complex relational patterns over multiple hops in a graph, however, even if improving the scalability of IHSM, it did not result to be able to scale on very large graph-based data collection (e.g. the whole YAGO or DBPedia). The main limitation was represented by the parameter learning phase, which may take rather long for converging to optimal solutions.

Nevertheless, since embedding models proved interesting ability to scale while maintaining comparative performance to probabilistic latent variable models in terms of predictive accuracy [7], with the goal of improving the model training phase employed by RESCAL, a solution exploiting adaptive learning rates during training has been proposed [42]. Specifically, an energy-based embedding model has been formalized, where entities and relations are embedded in continuous vector spaces and the probability of an RDF triple to encode a true statement is expressed in terms of energy of the triple, which is an unnormalized score that is inversely proportional to such a probability value. It is computed as a function of the embedding vectors of the subject, the predicate and the object of the triple. This solution experimentally showed improvements in terms of efficiency of the parameter learning process and more accurate results in a significantly lower number of iterations.

An aspect that needs to be highlighted is that, due to tackling RDF representation, most of the considered data collections only contain positive (training) examples, since usually false facts are not encoded. As training a learning model in all-positive examples could be tricky because the model might easily over generalize, for obtaining negative examples two different approaches are generally adopted: either perturbing true/observed triples with the goal of generating plausible negative examples or making a local-closed world assumption (LCWA) in which the data collection is assumed as locally complete [44].

3.3. Vector space embeddings for propositionalization

A complementary research direction focused on the exploitation of vector space embeddings for obtaining a propositional feature vector representation of RDF data collections. Specifically, inspired by the data mining (DM) literature on propositionalization [35], that is a collection of methods for transforming a relational data representation into a (numeric) propositional feature vector representation so that scalable propositional DM/ML methods can be applied, RDF2Vec [49] has been proposed. It formalizes a solution for learning latent numeric representations of entities in RDF graphs by adapting language modeling approaches. A two-steps approach is adopted: first the RDF graph is converted into a set of sequences of entities (for the purpose two different approaches using local information, that are graph walks and Weisfeiler–Lehman Subtree RDF graph kernels, are exploited); in the second step, the obtained sequences are used to train a neural language model estimating the likelihood of a sequence of entities appearing in a graph. The outcome of the training process provides each entity in the graph represented as a vector of latent numerical features. DBpedia and Wikidata have been processed. In order to show that the obtained vector representation is independent from task and algorithm, an experimental evaluation involving a number of classification and regression tasks has been performed.

An upgrade of RDF2Vec has been presented in [11]. The proposed solution is grounded on the exploitation of global patterns, differently from RDF2Vec which exploits local patterns. None of the two solutions can cope with literals.

4. Machine Learning for the Semantic Web: Next research directions

In this section the envisioned most challenging research problems are illustrated. Hence additional ML settings and methods that could be usefully adopted for SW related issues are briefly discussed.

4.1. Research problems

The need to cope with the fast growing of the Web of Data and the emerging very large KGs required the SW community to show its ability to manage such tremendous amount of data and knowledge.

This mostly motivated the right attention towards numeric ML methods, particularly for providing scalable solutions to manage the inherent incompleteness of the Web of Data. Indeed, current symbolic methods are not actually comparable, in terms of scalability, to numeric-based solutions. This gain is not for free. It is obtained by giving up the expressive representation languages, such as OWL, that the SW community contributed to standardize with the goal of formalizing rich and expressive knowledge, but also by forgetting one of the most powerful characteristic of these languages, that is being empowered with deductive reasoning capabilities that allow to derive new knowledge. This means to loose knowledge that is already available. Indeed, as illustrated in Section 3, almost all numeric methods focus on RDF as a representation language and nearly no reasoning capabilities are exploited. Furthermore, differently from symbolic methods, numeric-based solutions lack of the ability to provide interpretable models (see Section 3), thus limiting the possibility to interpret and understand the motivations for the returned results. Additionally, tasks such as learning concept or disjointness axioms cannot be performed without symbol-based methods which can certainly benefit of very large amount of information to provide potentially more accurate results.

Integration of symbolic and numeric approaches: Research efforts need to be devoted towards ML solutions that, while keeping scalability, are able to target most expressive representations as well as to provide interpretable models. As a first step, the integration of numeric and symbolic approaches should be focused.

Some discussions in this direction have been developed by the Neural-Symbolic Learning and Reasoning community [17,29], which seeks to integrate principles from neural networks learning and logical reasoning. The main conclusion has been that neural-symbolic integration appears particularly suitable for applications characterized by the joint availability of large amounts of (heterogeneous) data and knowledge descriptions, which is actually the case of the Web of Data. A set of key challenges and opportunities have been outlined [17], such as: how to represent expressive logics within neural networks, how neural networks should reason with variables, or how to extract symbolic representation from trained neural networks.

Some preliminary results for some these challenges have been recently provided. For instance SimplE [33] has been proposed. It is a scalable tensor-based factorization model that is able to learn interpretable embeddings incorporating logical rules through weight tying. Ideas for extracting propositional rules from trained neural networks under a SW background knowledge have been also illustrated [36], showing that the exploitation of a background knowledge allows: to reduce the extracted rule set; to reproduce the input-output function of the trained neural network. A conceptual sketch for explaining artificial neural networks classification behavior in a non-propositional setting while using SW background knowledge bases has been proposed in [53]. These initial results, show the feasibility of this research direction while remarking the importance of pursuing this goal.

Providing Explanations: The problem of explaining artificial neural networks classification behavior [53] sheds the light on another important issue, that is the necessity to provide explanations for results supplied by ML methods [12], particularly when they come from very large sources of knowledge, e.g. results for a link prediction problem. The solution depicted in [53] is in agreement with the idea of exploiting symbol-based interpretable models to explain conclusions [40,46]. Nevertheless, interpretable models describe how solutions are obtained but not why they are obtained. As argued in [17,19], providing an explanation means to supply a line of reasoning, illustrating the decision making process of a model whilst using human understandable features. Following this direction, a solution providing human-centric transfer learning explanation has been proposed [10]. It takes advantage of ontologies (DBPedia is used) and reasoning capabilities to infer different kinds of human understandable explanatory evidence.

Hence, providing an explanation means to open the box of the reasoning process and make it understandable. In a complex setting such as the Web of Data, where knowledge may result from an automatic information acquisition and integration process from different sources, thus potentially noisy and with conflicting information, multiple reasoning paradigms may be required e.g. deduction (when rules and theory are available), induction (for building models from the available knowledge), abduction (for filling in partial models coping with incomplete theory), commonsense reasoning etc. Large research efforts have been devoted to study each paradigm, however in the considered complex scenario, multiple paradigms could be needed at the same time. This may require the formalization of a unifying reasoning framework.

Capturing Context and Evolution: Some acquired knowledge may also evolve over the time, that is it can be valid only for a certain time period, or it may be context dependent [15]. Capturing these phenomenon may be fundamental and unsupervised as well as pattern mining methods would be useful for the purpose. Some preliminary research on capturing knowledge evolution by conceptual clustering methods has been presented [22] showing the feasibility of the approach but highlighting an existing limitation given by the lack of gold standards for validating the results.

4.2. Additional Machine Learning settings

As for the settings to be exploited, multiple research directions still need to be investigated. Several problems such as instance retrieval but also link prediction and assertional knowledge completion have been solved by casting them as classification tasks. However, as discussed in Section 2, when assessing the concept membership for an individual, it may result instance of more than one concept at the same time. As such a more suitable way to regard the problem is as multi-label classification task [62], where multiple labels (concepts in the specific case) may be assigned to each instance. Some preliminary research has been presented in [41], focussing on type prediction in RDF data collections where limited information from the available background knowledge is considered.

Multiple-instance learning (MIL) [8] is also a setting that would need investigation. It deals with the problem of incomplete knowledge concerning labels in training sets, as it happens in SW knowledge bases due to OWA. MIL is a type of supervised learning where training instances are not individually labeled, they are collected in sets of labeled bags. From a collection of labeled bags, the learner tries to either (i) induce a concept that will label individual instances correctly or (ii) learn how to label bags without inducing the concept. It may be fruitfully exploited for discovering correlations among resources and/or emerging concepts.

Other settings that would be useful for coping with the large number of unlabelled instances are semi-supervised learning (SSL) [9] and learning from imbalanced data. SSL makes use of both labeled and unlabeled instances, during the learning process, for surpassing the classification performance that could be obtained by discarding the unlabeled data (as it would happen in a supervised learning setting). Very few research efforts have been made in this direction. Some initial results have been presented in [43], where a link prediction problem is solved in a transductive learning framework. In learning from unbalanced data [28,39], that is data collections where the labels distribution is not uniform, sampling techniques are usually adopted in order to create a balanced dataset to be successively used for the learning task. Ensemble methods, consisting in using multiple learning algorithms to obtain better predictive performance, could be fruitfully adopted, as illustrated in [24,51] where respectively a boosting [24] and bagging technique is employed.

As a last point, considering the increasing volume of the Web of Data, online and incremental learning, which input data is continuously used to further train and extend the learned model, would be naturally investigated. For the best of knowledge, no research efforts have been made in this direction.

5. Conclusions

In this paper, the progresses that have been made in SW by exploiting ML methods have been surveyed. Specifically symbol-based and numeric-based solutions have been analyzed highlighting their main peculiarities and drawbacks. Hence the main envisioned research directions have been drawn.

References

Agrawal,

Imieliński and

Swami, Mining association rules between sets of items in large databases, in: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data,

Buneman et al., eds, ACM Press, 1993, pp. 207–216. doi:10.1145/170035.170072.

Baader,

Calvanese,

D.L.

McGuinness,

Nardi and

P.F.

Patel-Schneider (eds), Description Logic Handbook, 2nd edn, Cambridge University Press, 2010. doi:10.1017/CBO9780511711787.

Berners-Lee,

Hendler and

Lassila, The Semantic Web, Scientific American 284(5) (2001), 34–43.

Bizer,

Heath and

Berners-Lee, Linked data – the story so far, Int. J. on Sem. Web and Inf. Syst. 5(3) (2009), 1–22. doi:10.4018/jswis.2009081901.

Blockeel and

De Raedt, Top-down induction of first-order logical decision trees, Artif. Intelligence 101(1–2) (1998), 285–297. doi:10.1016/S0004-3702(98)00034-4.

Bloehdorn and

Sure, Kernel methods for mining instance data in ontologies, in: The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007,

Aberer et al., eds, LNCS, Vol. 4825, Springer, 2007, pp. 58–71. doi:10.1007/978-3-540-76298-0_5.

Bordes and

Gabrilovich, Constructing and mining web-scale knowledge graphs, in: Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015 – Companion Volume,

Gangemi et al., eds, ACM, 2015, p. 1523. doi:10.1145/2740908.2741993.

Carbonneau,

Cheplygina,

Granger and

Gagnon, Multiple instance learning, Pattern Recognition 77 (2018), 329–353. doi:10.1016/j.patcog.2017.10.009.

Chapelle,

Schölkopf and

Zien (eds), Semi-Supervised Learning, The MIT Press, 2006. doi:10.7551/mitpress/9780262033589.001.0001.

10.

Chen,

Lécué,

Pan,

Horrocks and

Chen, Knowledge-based transfer learning explanation, in: Principles of Knowledge Representation and Reasoning: Proceedings of the Sixteenth International Conference, KR 2018,

Thielscher et al., eds, AAAI Press, 2018, pp. 349–358. ISBN 978-1-57735-803-9.

11.

Cochez,

Ristoski,

Ponzetto and

Paulheim, Global RDF vector space embeddings, in: The Semantic Web – ISWC 2017 – 16th International Semantic Web Conference, Proceedings, Part I,

d’Amato et al., eds, LNCS, Vol. 10587, Springer, 2017, pp. 190–207. doi:10.1007/978-3-319-68288-4_12.

12.

d’Amato, Logic and learning: Can we provide explanations in the current knowledge lake?, in: Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web (Dagstuhl Seminar 18371),

Bonatti et al., eds, Dagstuhl Reports, Vol. 8, Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2019, pp. 37–38. doi:10.4230/DagRep.8.9.29.

13.

d’Amato,

Fanizzi and

Esposito, Inductive learning for the Semantic Web: What does it buy?, Semantic Web 1(1–2) (2010), 53–59. doi:10.3233/SW-2010-0007.

14.

d’Amato,

Fanizzi and

Esposito, Query answering and ontology population: An inductive approach, in: The Semantic Web: Research and Applications, 5th European Semantic Web Conference, ESWC 2008, Proceedings,

Bechhofer et al., eds, LNCS, Vol. 5021, Springer, 2008, pp. 288–302. doi:10.1007/978-3-540-68234-9_23.

15.

d’Amato,

Kirrane,

Bonatti,

Rudolph,

Krötzsch,

van Erp and

Zimmermann, Foundations, in: Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web (Dagstuhl Seminar 18371),

Bonatti et al., eds, Dagstuhl Reports, Vol. 8, Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2019, pp. 79–88. doi:10.4230/DagRep.8.9.29.

16.

d’Amato,

Tettamanzi and

D.M.

Tran, Evolutionary discovery of multi-relational association rules from ontological knowledge bases, in: Knowledge Engineering and Knowledge Management – 20th International Conference, EKAW 2016, Proceedings,

Blomqvist et al., eds, LNCS, Vol. 10024, 2016, pp. 113–128. doi:10.1007/978-3-319-49004-5_8.

17.

d’Avila-Garcez,

Besold,

De Raedt,

Földiák,

Hitzler,

Icard,

Kühnberger,

Lamb,

Miikkulainen and

Silver, Neural-symbolic learning and reasoning: contributions and challenges, in: 2015 AAAI Spring Symposia, AAAI Press, 2015, http://www.aaai.org/ocs/index.php/SSS/SSS15/paper/view/10281 .

18.

Deng and

Yu (eds), Deep Learning: Methods and Applications, NOW Publishers, 2014. doi:10.1561/2000000039.

19.

Doran,

Schulz and

Besold, What does explainable AI really mean? A new conceptualization of perspectives, in: Proceedings of the First International Workshop on Comprehensibility and Explanation in AI and ML 2017 Co-Located with 16th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2017),

T.R.

Besold and

Kutz, eds, CEUR Work. Proc, Vol. 2071, CEUR-WS.org, 2017.

20.

Fanizzi,

d’Amato and

Esposito, DL-FOIL concept learning in description logics, in: Inductive Logic Programming, 18th International Conference, ILP 2008, Proceedings,

Zelezný and

Lavrac, eds, LNCS, Vol. 5194, Springer, 2008, pp. 107–121. doi:10.1007/978-3-540-85928-4_12.

21.

Fanizzi,

d’Amato and

Esposito, Metric-based stochastic conceptual clustering for ontologies, Inf. Syst. 34(8) (2009), 792–806. doi:10.1016/j.is.2009.03.008.

22.

Fanizzi,

d’Amato and

Esposito, Conceptual clustering and its application to concept drift and novelty detection, in: The Semantic Web: Research and Applications, 5th European Semantic Web Conference, ESWC 2008, Proceedings,

Bechhofer et al., eds, LNCS, Vol. 5021, Springer, 2008, pp. 318–332. doi:10.1007/978-3-540-68234-9_25.

23.

Fanizzi,

d’Amato and

Esposito, Induction of concepts in web ontologies through terminological decision trees, in: Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2010, Proceedings, Part I,

J.L.

Balcázar et al., eds, LNCS, Vol. 6321, Springer, 2010, pp. 442–457. doi:10.1007/978-3-642-15880-3_34.

24.

Fanizzi,

Rizzo and

d’Amato, Boosting DL concept learners, in: The Semantic Web – 16th International Conference, ESWC 2019, Proceedings,

Hitzler et al., eds, LNCS, Vol. 11503, Springer, 2019, pp. 68–83. doi:10.1007/978-3-030-21348-0_5.

25.

Galárraga,

Teflioudi,

Hose and

F.M.

Suchanek, Fast rule mining in ontological knowledge bases with AMIE+, VLDB Journal 24(6) (2015), 707–730. doi:10.1007/s00778-015-0394-1.

26.

Galárraga,

Teflioudi,

Hose and

F.M.

Suchanek, AMIE: association rule mining under incomplete evidence in ontological knowledge bases, in: 22nd International World Wide Web Conference, WWW’13,

Schwabe et al., eds, International World Wide Web Conferences Steering Committee / ACM, 2013, pp. 413–422. doi:10.1145/2488388.2488425.

27.

Getoor and

Taskar (eds), Introduction to Statistical Relational Learning, MIT Press, 2007. ISBN 978-0-262-07288-5.

28.

Guo and

V.L.

Herna, Learning from imbalanced data sets with boosting and data generation: The DataBoost-IM approach, SIGKDD Explorations 6(1) (2004), 30–39. doi:10.1145/1007730.1007736.

29.

Hitzler,

Bianchi,

Ebrahimi and

M.K.

Sarker, Neural-symbolic integration and the Semantic Web, Semantic Web Journal, To appear (2020). doi:10.3233/SW-2010-0007.

30.

Hoekstra, The knowledge reengineering bottleneck, Semantic Web Journal 1 (2010), 111–115. doi:10.3233/SW-2010-0004.

31.

Horrocks,

Patel-Schneider,

Boley,

Tabet,

Grosof and

Dean, SWRL: A Semantic Web Rule Language Combining OWL and RuleML, W3C, 2004, http://www.aaai.org/ocs/index.php/SSS/SSS15/paper/view/10281 .

32.

Józefowska,

Lawrynowicz and

Lukaszewski, The role of semantics in mining frequent patterns from knowledge bases in description logics with rules, TPLP 10(3) (2010), 251–289. doi:10.1017/S1471068410000098.

33.

Kazemi and

Poole, SimplE embedding for link prediction in knowledge graphs, in: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018,

Bengio et al., eds, 2018, pp. 4289–4300, http://papers.nips.cc/book/advances-in-neural-information-processing-systems-31-2018 .

34.

Koller and

Friedman (eds), Probabilistic Graphical Models: Principles and Techniques, MIT Press, 2009. ISBN 9780262013192.

35.

Kramer,

Lavrač and

Flach, Propositionalization approaches to relational data mining, in: Relational Data Mining,

Džeroski and

Lavraž, eds, LNCS, Springer, 2001, pp. 262–291. doi:10.1007/978-3-662-04599-2_11.

36.

Labaf,

Hitzler and

Evans, Propositional rule extraction from neural networks under background knowledge, in: Proceedings of the Twelfth International Workshop on Neural-Symbolic Learning and Reasoning, NeSy 2017,

T.R.

Besold et al., eds, CEUR Workshop Proceedings, Vol. 2003, CEUR-WS.org, 2017, http://ceur-ws.org/Vol-2003/NeSy17_paper2.pdf .

37.

Lehmann,

Auer,

Bühmann and

Tramp, Class expression learning for ontology engineering, J. Web Semant. 9(1) (2011), 71–81. doi:10.1016/j.websem.2011.01.001.

38.

Lehmann and

Bühmann, ORE – a tool for repairing and enriching knowledge bases, in: The Semantic Web – ISWC 2010 – 9th International Semantic Web Conference, ISWC 2010, Revised Selected Papers, Part II,

P.F.

Patel-Schneider et al., eds, LNCS, Vol. 6497, Springer, 2010, pp. 177–193. doi:10.1007/978-3-642-17749-1_12.

39.

Liu,

Wu and

Zhou, Exploratory under-sampling for class-imbalance learning, in: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), IEEE Computer Society, 2006, pp. 965–969. doi:10.1109/ICDM.2006.68.

40.

G.F.

Luger (ed.), Artificial Intelligence: Structures and Strategies for Complex Problem Solving, Addison Wesley, 2005. ISBN 978-0805347807.

41.

Melo,

Völker and

Paulheim, Type prediction in noisy RDF knowledge bases using hierarchical multilabel classification with graph and latent features, International Journal on Artificial Intelligence Tools 26(2) (2017), 1–32. doi:10.1142/S0218213017600119.

42.

Minervini,

d’Amato and

Fanizzi, Efficient energy-based embedding models for link prediction in knowledge graphs, J. Intell. Inf. Syst. 47(1) (2016), 91–109. doi:10.1007/s10844-016-0414-7.

43.

Minervini,

Tresp,

d’Amato and

Fanizzi, Adaptive knowledge propagation in web ontologies, TWEB 12(1) (2018), 2:1–2:28. doi:10.1145/3105961.

44.

Nickel,

Murphy,

Tresp and

Gabrilovich, A review of relational Machine Learning for knowledge graphs, Proceedings of the IEEE 104(1) (2016), 11–33. doi:10.1109/JPROC.2015.2483592.

45.

Nickel,

Tresp and

Kriegel, A three-way model for collective learning on multi-relational data, in: Proceedings of the 28th International Conference on Machine Learning, ICML 2011,

Getoor and

Scheffer, eds, Omnipress, 2011, pp. 809–816, https://icml.cc/2011/papers/438_icmlpaper.pdf .

46.

L.D.

Raedt (ed.), Logical and Relational Learning: From ILP to MRDM (Cognitive Technologies), Springer-Verlag, 2008. ISBN 978-3540200406.

47.

Rettinger,

Lösch,

Tresp,

d’Amato and

Fanizzi, Mining the Semantic Web – statistical learning for next generation knowledge bases, Data Minining and Knowledge Discovery 24(3) (2012), 613–662. doi:10.1007/s10618-012-0253-2.

48.

Rettinger,

Nickles and

Tresp, Statistical relational learning with formal ontologies, in: Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Proceedings, Part II,

W.L.

Buntine et al., eds, LNCS, Vol. 5782, Springer, 2009, pp. 286–301. doi:10.1007/978-3-642-04174-7_19.

49.

Ristoski and

Paulheim, RDF 2Vec: RDF graph embeddings for data mining, in: The Semantic Web – ISWC 2016 – 15th International Semantic Web Conference, Proceedings, Part I,

P.T.

Groth et al., eds, LNCS, Vol. 9981, Springer, 2016, pp. 498–514. doi:10.1007/978-3-319-46523-4_30.

50.

Rizzo,

d’Amato,

Fanizzi and

Esposito, Terminological cluster trees for disjointness axiom discovery, in: The Semantic Web – 14th International Conference, ESWC 2017, Proceedings, Part I,

Blomqvist et al., eds, LNCS, Vol. 10249, Springer, 2017, pp. 184–201. doi:10.1007/978-3-319-58068-5_12.

51.

Rizzo,

Fanizzi,

d’Amato and

Esposito, Approximate classification with web ontologies through evidential terminological trees and forests, Int. J. Approx. Reasoning 92 (2018), 340–362. doi:10.1016/j.ijar.2017.10.019.

52.

Rizzo,

Fanizzi,

d’Amato and

Esposito, A framework for tackling myopia in concept learning on the web of data, in: Knowledge Engineering and Knowledge Management – 21st International Conference, EKAW 2018, Proceedings,

Faron-Zucker et al., eds, LNCS, Vol. 11313, Springer, 2018, pp. 338–354. doi:10.1007/978-3-030-03667-6_22.

53.

Sarker,

Xie,

Doran,

Raymer and

Hitzler, Explaining trained neural networks with Semantic Web technologies: First steps, in: Proceedings of the Twelfth International Workshop on Neural-Symbolic Learning and Reasoning, NeSy 2017,

T.R.

Besold et al., eds, CEUR Workshop Proceedings, Vol. 2003, CEUR-WS.org, 2017, http://ceur-ws.org/Vol-2003/NeSy17_paper4.pdf .

54.

Shadbolt,

Berners-Lee and

Hall, The Semantic Web revisited, IEEE Intelligent Systems 21(3) (2006), 96–101. doi:10.1109/MIS.2006.62.

55.

Siorpaes and

Hepp, OntoGame: Towards overcoming the incentive bottleneck in ontology building, in: On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops, OTM Confederated International Workshops and Posters, AWeSOMe, CAMS, OTM Academy Doctoral Consortium, MONET, OnToContent, ORM, PerSys, PPN, RDDS, SSWS, and SWWS 2007, Proceedings, Part II,

Meersman et al., eds, LNCS, Vol. 4806, Springer, 2007, pp. 1222–1232. doi:10.1007/978-3-540-76890-6_50.

56.

Tiddi,

d’Aquin and

Motta, Dedalo: Looking for clusters explanations in a labyrinth of linked data, in: The Semantic Web: Trends and Challenges – 11th International Conference, ESWC 2014, Proceedings,

Presutti et al., eds, LNCS, Vol. 8465, Springer, 2014, pp. 333–348. doi:10.1007/978-3-319-07443-6_23.

57.

A.C.

Tran,

Dietrich,

Guesgen and

Marsland, Parallel symmetric class expression learning, J. of Machine Learning Research 18(64) (2017), 1–34.

58.

A.C.

Tran,

Dietrich,

H.W.

Guesgen and

Marsland, An approach to parallel class expression learning, in: Rules on the Web: Research and Applications – 6th International Symposium, RuleML 2012, Proceedings,

Bikakis and

Giurca, eds, LNCS, Vol. 7438, Springer, 2012, pp. 302–316. doi:10.1007/978-3-642-32689-9_25.

59.

Völker,

Fleischhacker and

Stuckenschmidt, Automatic acquisition of class disjointness, Journal of Web Semantics 35(P2) (2015), 124–139. doi:10.1016/j.websem.2015.07.001.

60.

Völker and

Niepert, Statistical schema induction, in: The Semantic Web: Research and Applications – 8th Extended Semantic Web Conference, ESWC 2011, Proceedings, Part I,

Grigoris et al., eds, LNCS, Vol. 6643, Springer, 2011, pp. 124–138. doi:10.1007/978-3-642-21034-1_9.

61.

West,

Gabrilovich,

Murphy,

Sun,

Gupta and

Lin, Knowledge base completion via search-based question answering, in: 23rd International World Wide Web Conference, WWW’14,

Chung et al., eds, ACM, 2014, pp. 515–526. doi:10.1145/2566486.2568032.

62.

Zhou and

Zhang, Multi-label learning, in: Encyclopedia of Machine Learning and Data Mining,

Sammut and

Geoffrey, eds, Springer, 2017, pp. 875–881. doi:10.1007/978-1-4899-7687-1_910.

Machine Learning for the Semantic Web: Lessons learnt and next research directions

Abstract

Keywords

1. Introduction

1 https://www.w3.org/OWL/

2.1. Instance retrieval

2.2. Concept learning for ontology enrichment

2 https://dl-learner.org/.

2.4. Learning disjointness axioms

3. Numeric-based methods for the Semantic Web

3.1. Probabilistic latent variable models

3.2. Embedding models

3.3. Vector space embeddings for propositionalization

4. Machine Learning for the Semantic Web: Next research directions

4.1. Research problems

4.2. Additional Machine Learning settings

5. Conclusions

References

¹
https://www.w3.org/OWL/

²
https://dl-learner.org/.