Abstract
Symbolic Systems in Artificial Intelligence which are based on formal logic and deductive reasoning are fundamentally different from Artificial Intelligence systems based on artificial neural networks, such as deep learning approaches. The difference is not only in their inner workings and general approach, but also with respect to capabilities. Neural-symbolic Integration, as a field of study, aims to bridge between the two paradigms. In this paper, we will discuss neural-symbolic integration in its relation to the Semantic Web field, with a focus on promises and possible benefits for both, and report on some current research on the topic.
Approaches in Artificial Intelligence (AI) based on machine learning, and in particular those employing artificial neural networks, differ fundamentally from approaches that leverage knowledge bases to perform logical deduction and reasoning.1
We focus herein on deductive reasoning. Logical inductive and abductive reasoning have also been looked at in the Semantic Web context, e.g. [14,22], but to keep the discussion concise, we have not included them in this treatise.
Symbolic and subsymbolic systems are rather complementary to each other. For example, the key strengths of subsymbolic systems are weaknesses of symbolic ones, and vice versa. Symbolic systems are brittle, i.e., susceptible to data noise or minor flaws in the logial encoding of a problem, which stands in contrast to the robustness of connectionist approaches. But subsymbolic systems are generally black boxes in the sense that the systems cannot be inspected in ways that provide insight into their decisions (despite some recent progress on this in the wake of the explainable AI effort) while symbolic knowledge bases can in principle be inspected to interpret how a decision follows from input. Most importantly, symbolic and subsymbolic systems contrast in the types of problems and data they excel at. Scene recognition from images appears to be a problem which in general lies outside the capabilities of symbolic systems, for example, while complex planning scenarios appear to be outside the scope of current deep learning approaches.2
The topic is being investigated, of course, and some recent progress is made. E.g., [1] reports on an application of deep learning to planning, and explicitly frames it as work towards bridging the “subsymbolic-symbolic boundary.”
On a more technical level, symbolic and subsymbolic systems differ fundamentally in how they represent data, information, or knowledge. Symbolic systems typically utilize structured representation languages, e.g. stemming from formal logic and the subfield of AI known as knowledge representation and reasoning. Trainable artificial neural networks, on the other hand, typically use representations based on high-dimensional Euclidean space, i.e. real-valued vectors, matrices, etc., and it is by no means obvious how reconciliations between these representation forms can be designed.3
It is possible to establish a formal, mathematical bridge in some cases, as e.g. laid out in [31], but so far with limited applicability [3].
The complementary nature of these methods has drawn a divide in the rich field of AI. The divide is technical in nature, as symbol manipulation as captured by logical, deductive reasoning, which lies at the core of symbolic approaches, cannot be sufficiently performed using current subsymbolic systems. Moreover, the training to study subsymbolic systems (involving probability theory, statistics, linear algebra, and optimization) differs from symbolic systems (involving logic and propositional calculus, set and recursion theory, and advanced computability reasoning) so strongly that AI researchers tend to find a side of the divide based on their intellectual interests and background. The divide is also cultural in nature, one of mindsets and prior believes, that in the past could sometimes split the academic AI research community by provoking (heated) fundamental discussions. The divide is even geographical, where the European Union holds a much higher prevalence of researchers working on symbolic approaches than in the United States.
Neural-Symbolic Integration [2,4,16,28],4
See also
In the following, we will first lay out, in more detail, promises and possible benefits of neural-symbolic integration research for the Semantic Web. Then we will look at potential benefits of Semantic Web and neural-symbolic integration research for deep learning. Finally, we will also provide brief pointers to some current research going on in relation to this theme.
One of the issues that plagues the Semantic Web (as well as many other fields in Computer Science and its applications) is the knowledge acquisition bottleneck. It refers to the difficult issue of encoding or otherwise storing knowledge, as structured information, for use in Computer Science applications. The manual encoding of such information, e.g. from human experts’ knowledge, is a very slow and time-consuming, thus costly, process involving both topic experts and knowledge engineers. At the same time, automated methods are a far cry from producing artifacts (e.g., from textbooks, technical documentations, and other written sources) which would be of sufficient quality for use in intelligent systems applications based on logical inference, such as expert systems, or for data curation and integration.
The underpinnings of key Semantic Web standards, such as RDF [9] and OWL [29], are explicitly logical, which reflects that Semantic Web applications often rely on high data (and schema/ontology) quality, similar to knowledge bases used primarily for deductive reasoning. The knowledge acquisition bottleneck in the Semantic Web field is very noticeable, e.g., given that the creation of ontologies as well as the creation of high-quality knowledge graphs involves high amounts of human export labour and is correspondingly expensive.
The promise of integrated neural-symbolic systems is that they would be capable of both learning and (deductive) reasoning, and thus that they would be able to acquire, through machine learning, knowledge which is of sufficiently high quality to perform deductive reasoning. This anticipated capability directly addresses the knowledge acquisition bottleneck. There is, thus, a promise in this line of work that integrated neural-symbolic systems will lead to
better methods for automated ontology construction, better methods for ontology population (and, thus, knowledge graph construction), better methods for ontology alignment, better methods for assessing the quality of knowledge graph content,
and similar major lines of research central to the Semantic Web field.
At the same time, integrated neural-symbolic systems carry the promise of being able to perform deductive reasoning – after training – using a (highly parallel) artificial neural network architecture. Consequently, reasoning using such systems can be expected to be extremely fast. This contrasts with traditional deductive reasoning methods, which are usually designed to be provably sound and complete but suffer from long algorithm runtimes. While there has been significant progress on developing highly efficient deductive reasoning engines for Semantic Web content, this remains an issue given ever-increasing availability of data. In fact, the underlying problem is fundamental, as sound and complete reasoning over Semantic Web data necessarily suffers from high computational complexity [30].
Integrated neural-symbolic systems would perform reasoning after training, and presumably this form of reasoning would not be provably sound and complete, but would trade correctness guarantees with higher runtime efficiency, in the spirit of approximate reasoning – see e.g. [32] for an exhibition of the underlying rationale. As such, integrated neural-symbolic systems carry a promise to elevate deductive Semantic Web reasoning to much larger amounts of data.
With integrated neural-symbolic systems capable of approximate deductive reasoning, this would furthermore open up possible investigations into combining deductive and inductive reasoning, as well as common-sense reasoning based e.g. on natural language, within a single (artificial neural network based) system.
Side products of such approaches would also be, e.g., entity encodings in formats suitable for artificial neural networks, such as vector or matrix representations. These in turn could be utilized to assess entity similarity with potential applications in data integration. Such encodings could furthermore be used as a sort of compression for data transfer and storage.
Benefits of Semantic Web technologies and neural-symbolic integration for deep learning
Semantic Web Technologies are designed for enabling better and more efficient data sharing, discovery, integration and reuse. These data management core capabilities of Semantic Web Technologies are designed to ease the data curation and preparation burden for the training of deep learning systems. Semantic Web data, provided in large amounts and freely available on the Web [51], furthermore provides a rich resource for training data, and deductive reasoning methods over such data can further extend it.
Integrated neural-symbolic systems will furthermore make it possible to utilize background knowledge, given as knowledge graphs or ontologies, as part of deep learning applications. Promises of this include the leveraging of background knowledge and deductive reasoning aspects for improved trainability, but also for interpreting trained deep learning systems by means of background knowledge. The former aspect attempts to reinforce the usefulness of deep learning models through injection of knowledge and has been successfully used in task-oriented conversational AI systems [23] and question answering [44]. The latter aspect touches on the Explainable AI theme currently being discussed, which aims at addressing the black-box nature of deep learning systems by making them more transparent, understandable, verifiable, and trustworthy. Most of the current work on this topic attempts to explain system behavior by means of input or output features; however explanations by way of background knowledge carry the promise of being much closer to human conceptualizations, and thus more useful in applications.
Integrated neural-symbolic systems which incorporate deductive reasoning capabilities could furthermore naturally combine these with inferences based on statistics or similarities, including natural-language common-sense reasoning as demonstrated by some deep learning approaches. Such combinations should naturally lead to stronger deep learning systems.
Neural-symbolic systems have already been used on linked datasets like Freebase and DBpedia for different tasks like link prediction [62] and noise tolerant RDFS reasoning [41]. The links between linked datasets could further allow neural-symbolic systems to both integrate and reason over information coming from different sources. The advantage of this is twofold: firstly, the combined information can be used to extend the amount of training data for neural-symbolic systems; secondly, a neural-symbolic system can be used to learn to reason over a single knowledge graph and then links can be used as entry points to reason over a different one. This could be useful in contexts in which it is costly to learn to reason over a large dataset; one could thus use neural-symbolic methods over a smaller one (or a part of the large one) and then use the learned capabilities over the large one.
Recent years have also seen some progress in zero/few-shot relation learning over knowlegdge graphs, utilizing deep learning [12]. Zero/few-shot relation learning refers to the ability of the deep learning model to infer new relations of pairs of entities where that relation has not been seen or has only occurred a few times before in the training set [7]. This generalization capability is still quite limited and fundamentally different from the efforts that have been done under transfer learning and the domain adaptation paradigm in other machine learning tasks.
Selection of recent related work
Deductive Semantic Web reasoning using deep learning
Deductive reasoning over RDF(S) and OWL data has become a part of the standard toolbox for knowledge graphs, and the use of neural-symbolic systems for this purpose has begun to be investigated.
[41] has proposed a noise-tolerant algorithm for deep-learning-based reasoning designed specifically for RDF(S) knowledge graphs. They have introduced a layered graph model representation of RDF graphs based on their predicates, in the form of 3D adjacency matrices where each layer layout forms a graph word. Each input graph and its corresponding entailments then have been represented as a sequence of graph words and have been fed to a neural machine translation model. Their results show noise-tolerant capabilities of their deep model, compared to their symbolic counterpart. However evaluation and training are done on a dataset that uses only one ontology for the inference, i.e., there is no learning of the general logical deduction calculus, and consequently no transfer thereof to new data.
[33] applies Recursive Reasoning Networks (RNN) to OWL RL reasoning where recursive update layers are used to update the individual embeddings using the relations and class memberships in the knowledge base. Their results show the potential of neural-symbolic methods to attain accuracy similar to symbolic methods. However, as for the above mentioned [41], re-training is required for new ontologies to learn the embeddings for the new vocabularies in the ontology, i.e., the approach does not natively support transfer to new data.
[20] addresses the transferability issue by adapting end-to-end memory networks for emulating deductive RDFS reasoning. Transfer was achieved primarily by utilizing a preprocessing step consisting of a normalization. It was demonstrated that the resulting approach can perform reasoning over previously unseen RDFS knowledge graphs.
Knowledge graph embeddings
With the recent revival of interest in artificial neural networks, neural link prediction models have been applied extensively for the completion of knowledge graphs, understood in the sense of link-prediction.5
Traditionally, “completion” in the context of RDF(S) referred only to materialization of logical consequences; more recently, the term has also been used to refer to the adding of new relationships (graph edges) based on statistical or NLP methods.
The use of additional information, such has text, can increase the quality of the representation [65,67,68]. Moreover, embedded representations of knowledge graphs can be extended by considering the logical axioms that appear in a knowledge base, for example, complex logical formulas can be aggregated using fuzzy logic [27].
A recent trend in knowledge graph embedding concerns approaches that use hyperbolic geometry in place of euclidean geometry [46,54]; hyperbolic geometry generally appears to be more suited to represent hierarchical structures like terminologies and ontologies.
Node2vec [24] is instead a widely adopted approach that combines random walks and natural language techniques [43] to efficiently generate vector representations of networks nodes that has also been used to support knowledge graph embeddings [48]. In the same line of works, RDF2Vec [52] embeds RDF-based entities in a vector space by applying word embedding techniques [43] over a virtual document that contains lexicalized rdf-graph walks; thus the generated representations are based on token-token co-occurrences.
While most knowledge graph embedding approaches rely on a single encoding of triples, there is a recent line of work that tries to leverage the information that can be found in longer paths using recurrent neural networks [15,70]
Also recently, a number of works have been done on the problem of generalizing neural networks to work on arbitrarily structured graphs [17,34] opening promising directions for future research on reasoning on structured data.
While deep learning is highly successful [36] and even surpasses human capabilities [59,60] in many fields, it also lacks transparency or interpretability [26,40] of how a decision is being produced from these systems. In safety-critical applications, e.g., in medical, legal or military contexts, this is deemed insufficient. Consequently, researchers are investigating how to produce explanations for the behavior of deep learning systems [25].
Explanations [55,71,72] produced from deep learning systems are mostly statistical and helpful to understand how it produces the output, and the additional use of domain information helps to enhance [19] the explanation. [49] used an ontology-based deep learning model which predicts human behaviour via Restrictricted Boltzman Machines [61] and produces explanations of the output using domain ontologies. In the domain of transfer learning to explain which features are beneficial and which are not for the transfer, [11] used domain knowledge to enhance the explanation.
[73] shows the use of semantic annotations to label objects in the hidden layers of popular CNN architectures. Labels ranging from colors, materials, textures, parts, objects and scenes help to get a better understanding of hidden parts of the deep network. Although the labels are not semantically structured, this shows that background knowledge can help to improve explainability.
[57] provides a feasibility study on how domain ontologies together with description logic based concept induction [37,56] can be used to explain input output behaviour of trained deep neural networks.
Although the explanation produced solely using statistical techniques is beneficial, it is far from being a trustable explanation [35,40]. The main limitation of statistical methods is that it does not take domain knowledge or general background knowledge into account when making the output. A combined effort to use statistical techniques with semantic web technologies should be helpful to provide trustable explanations. An overview of using knowledge graphs to enhance explanation, and possible limitations of this, is described in [35].
Other systems for deductive reasoning using deep learning
The Neural Theorem Prover (NTP) [53] is an extension of the Prolog language in which strict atom unification is replaced with similarity of atoms in an embedded space; while originally NTP suffered from scalability issues, due to the complexity of the approach, there is evidence that proof-path selection strategies can reduce the complexity impact [45].
DeepProbLog [42] is a programming language that combines a probabilistic logic with neural networks, thus offering a framework that combines the strengths from both approaches.
Logic Tensor Networks [58] (LTNs) combine deep neural networks and first order fuzzy logic. Elements of the logic language are embedded in a vector space (e.g., constants are represented as vectors while predicates are neural tensor networks [62] that have been used on simple reasoning tasks [8,62]). LTNs can be trained over both facts and rules and after training they can be used to make novel logical inferences over data. LTNs have been applied to semantic image interpretation tasks [18] but they have also been shown to have some computational limitations [5].
PossibleWorldNet [21] is a variant of Tree Neural Networks (TreeNN) which has been successfully used for conducting entailment over propositional logic formulas. To evaluate whether A entails B, the PossibleWorldNet generates a set of “possible worlds,” and then evaluates A and B in each of those worlds. Their results show the clear advantage of using this model compared to sequence-to-sequence models which would capture the structure implicitly.
Neural multi-hop reasoners [15,70] deal with more complex reasoning on large knowledge bases where multi-hop inference is required. They combine the rich multi-hop inference of the symbolic logical reasoning paradigm with the generalization capabilities of attention-based recurrent neural networks.
Conclusion
In the wake of deep learning, neural-symbolic approaches are receiving renewed attention. We have laid out promises of neural-symbolic integration research for the Semantic Web field, and vice versa. It appears to be reasonable to expect that the corresponding lines of research will receive growing attention in forthcoming years. E.g., several articles in this issue point into similar directions [13,35,38].
