Abstract
Knowledge graphs (KGs) contain rich resources that represent human knowledge in the world. There are mainly two kinds of reasoning techniques in knowledge graphs, symbolic reasoning and statistical reasoning. However, both of them have their merits and limitations. Therefore, it is desirable to combine them to provide hybrid reasoning in a knowledge graph. In this paper, we present the first work on the survey of methods for hybrid reasoning in knowledge graphs. We categorize existing methods based on applications of reasoning techniques, and introduce the key ideas of them. Finally, we re-examine the remaining research problems to be solved and provide an outlook to future directions for hybrid reasoning in knowledge graphs.
Introduction
With the rapid development of Internet technology and Web applications, large amounts of data are published online, which become an important source for large-scale knowledge extraction. How to organize, represent and analyze this knowledge has attracted much attention. Knowledge graphs (KGs) contain rich resources that represent human knowledge in the world. Most of KGs are directed labeled graphs composed of entities (nodes) and various relations (different semantic labels of edges) [1]. A fact in a knowledge graph is usually represented as a triple of the form (head entity, relation, tail entity), indicating that two entities are connected by a specific relation, e.g. (Barack Obama, BornIn, Honolulu Hawaii). Recent years have witnessed the rapid growth in open KGs such as DBpedia [2], YAGO [3], NELL [4] and Probase [5], which have been widely used to support real applications of the Semantic Web.
The quality of a knowledge graph is critical to its applications, such as question answering. Two important factors that influence the quality of a knowledge graph are completeness and logical coherence of a knowledge graph. Knowledge reasoning, which plays an important role in the services of KGs, aims at inferring implicit knowledge to enrich incomplete KGs and refine their logical correctness. There are two mainstream techniques for knowledge reasoning. One is based on symbolic reasoning that formalizes the problem by a semantic framework and infers the implicit knowledge according to some predefined rules. The other is based on statistical reasoning that tries to finds suitable statistical models to fit the samples and predicts the expected probabilities of inferred relations between entities.
Unfortunately, both symbolic reasoning and statistical reasoning have drawbacks in applications of knowledge graphs. Symbolic reasoning is often based on either rules or schematic knowledge, which is hard to obtain. In contrast, statistical reasoning draws imprecise conclusions and the results of reasoning may be hard to find an explanation. Therefore, many researchers tried to combine their advantages together, and obtained some encouraging performance in related tasks such as knowledge completion [6,7], schema induction [8,9], knowledge alignment [10,11], question answering [12,13] and so on. For example, one can merge the symbolic information (e.g. path, context or logical rules) into the statistical framework so as to constrain the conditions of object functions or refine the predicted results.
So far, there is no systematical and in-depth survey on hybrid reasoning methods in KGs for various goals of reasoning. In this paper, we summarize the latest research progress of methods in knowledge graphs and look forward to future development directions and prospects. Specifically, we first give a short introduction of knowledge graphs, and analyze the pros and cons of symbolic reasoning and statistic reasoning, respectively, which motivate the necessity of hybrid reasoning. Next, we provide a thorough review of current methods for various goals of reasoning in KGs. Finally, we re-examine remaining research challenges and give an outlook to future directions for hybrid reasoning in KGs.
Hybrid reasoning in knowledge graph
In this section, we present a short introduction of knowledge graphs and motivate hybrid reasoning in a knowledge graph. So far, some people have tried to provide a formal definition of a knowledge graph [14,15]. However, none of them has become a standard definition as the term “knowledge graph” can have different views. In this paper, we do not intend to provide such a definition, but consider the characteristics of a knowledge graph given in [16,17]:
mainly describes real world entities and their interrelations, organized in a graph.
defines classes and properties of entities in a schema.
allows for potentially interrelating arbitrary entities with each other.
covers various topical domains.
As shown in Fig. 1, entities represent real-world individuals (e.g. “

An example for a part of a knowledge graph.
There are two kinds of knowledge in a knowledge graph, one is called schematic knowledge and the other is called factual knowledge. The schematic knowledge consists of the statements about concepts and properties, and the factual knowledge consists of the statement about instances. For example, the triple ⟨Asian Country,
Knowledge graphs have their logical foundations based in ontology languages, such as the Resource Description Framework (RDF)3
In this section, we roughly categorize hybrid reasoning techniques based on goals of reasoning in KGs into four groups: knowledge completion, schematic knowledge induction, knowledge alignment, multi-hop reasoning for question answering. We also introduce some other hybrid reasoning methods that are hard to be categorized into these groups.
Knowledge completion
To deal with the problem of incompleteness in knowledge graphs, much work has been done to apply statistical relational learning (SRL) models [18] to infer implicit relations between two entities in a knowledge graph. Path ranking algorithm (PRA) [19] and knowledge graph embedding (KGE) [1] are two typical kinds of methods belonging to SRL, and have shown widely used in knowledge completion. In this subsection, we first introduce path ranking algorithm. We then introduce three categories of methods for knowledge graph embedding models.
Path ranking algorithm and its extensions
Path ranking algorithm based on random walk techniques is proposed for discovering complex path features of relational data [19]. The key idea of PRA is employing the paths that connect two entities as features to predict potential relations between them. For example, ⟨bornIn, capitalOf⟩ is a path linking
Merging relational paths in KGE
Knowledge graph embedding encodes components of a KG including entities and relations into continuous vector spaces [1]. There are mainly three types of KG embedding models. The first is translational distance models, such as TransE, which exploit distance-based scoring functions and measure the plausibility of a fact as the distance between two entities [6]. The second is semantic matching models, like RESCAL [7], which measure plausibility of facts by matching latent semantics of entities and relations embodied in their vector space representations. Another type of KG embedding models is based on language modeling approaches that employ unsupervised feature extraction from sequences of words. RDF2Vec [22] generates a set of sequences of entities using two different approaches, i.e. graph walks and Weisfeiler-Lehman Subtree RDF graph kernels. Then, the authors utilized those sequences to train Word2vec for estimating the likelihood of a sequence of entities appearing in a graph. Cochez et al. [23] exploited a global pattern instead of local sequences generated for nodes in RDF2Vec. The authors combined Global Vectors (GloVe) with Bookmark-Coloring Algorithm to efficiently learn embeddings of entities.
As triples in KGs are not independent, so the interrelations of triples should not be ignored, which can give context information to improve existing KG embedding models. PTransE [24] extends TransE by modeling a path-based representation. The authors utilized connected relational facts between entity pairs instead of only considering the relation between two entities. Since not all relational paths are reliable, they designed a path-constraint resource allocation algorithm to measure the reliability of relation paths and represented these paths via semantic composition of relation embeddings. GAKE [25] defines three types of graph contexts which contain different KGs structured information for representation learning. Therefore, the score function of GAKE takes into account the connection between target entities (or relations) and their contexts. In addition, the authors designed an attention mechanism to learn the representative power of different vertices or edges. Gao et al. [26] proposed a triple context-based embedding method called TCE for knowledge graph completion. TCE takes two kinds of structured information of each triple into consideration. One is a set of neighboring entities along with their outgoing relations, the other is a set of relation paths which contain a pair of target entities.
Employing logical rules in KGE
Logical rules can also enhance the performance of KG embedding models for knowledge completion. Wang et al. [27] utilized these rules to refine embedding models. In their work, KG completion was formulated as an integer linear programming problem that was constrained by rules. Hence, the inferred facts would be the most preferred by the embedding models and complied with all the rules. Similarly, Wei et al. [28] combined rules and embedding models via Markov logic networks, in which they incorporated the similarity priori generated by embedding-based models into inferring and designed a grounding network sampling strategy to promote the inference precision. On the other hand, logical rules can be represented as horn clauses e.g.
Preserving logical properties in KGE
Another type of KG embedding methods has been proposed for preserving the logical properties of semantic relations. On2Vec [33] employs translation-based embedding models for populating ontologies, which integrated matrices that transformed the head and tail entities in order to characterize the transitivity of some relations. To represent concepts, instances, and relations differently in the same semantic space, TransC [34] encodes instances as vectors and concepts as spheres so that they could preserve the transitivity of isA relations. Sun et al. [35] proposed a model based on complex spaces, called RotateE. It employs the characteristics of complex real numbers and imaginary numbers to effectively characterize the symmetry, antisymmetry and composition of relations.
Schemaitc knowledge induction
Existing KGs contain lots of triples but lack schematic knowledge, e.g.
One main category of the methods to produce schematic knowledge combines rule mining algorithms with symbolic reasoning. The works in [36,39] defined association rule patterns to generate various kinds of axioms and performed inconsistency handling for ontology construction by enriching an original schema incrementally. Considering the open world assumption adopted by KGs, Galárraga et al. [8] adopted partial completeness assumption to generate counterexamples for rules and redefined the standard measurements for support and confidence. Its extension AMIE+ [40] further improved the precision by using type hierarchy and joint reasoning when learning association rules. Inspired by these methods, Gao et al. [41] exploited a type inference algorithm and defined a mining model with the probabilistic type assertions to deal with noisy negative examples, which could generate high-quality
The other main category combines machine learning techniques with logical reasoning. The work in [9] used inductive logic programming, which integrated machine learning with logic programming, and defined an
Knowledge alignment
Over past decades, more and more knowledge graphs become available on the Web, but the heterogeneity and multi-linguality gap of KGs still hinder their sharing and reusing in the Semantic Web. Benefited from hybrid reasoning, the studies of knowledge alignment have obtained some encouraging results.
Cross-lingual taxonomy alignment (CLTA) refers to mapping each category in the source taxonomy of one language onto the most relevant category in the target taxonomy of another language. However, existing methods for CLTA mainly rely on features based on symbolic similarities. Wu et al. [10] proposed a bilingual topic model, called Bilingual Biterm Topic Model (BiBTM). After identified the matched categories based on string similarity, they trained BiBTM by textual contexts extracted from the Web and obtained the topic vector of the extracted textual context for each category. Finally, they utilized the cosine similarity between topic vectors to calculate the taxonomy alignment. Furthermore, they improved the performance of proposed models by merging explicit category correlations including co-occurrence correlation and structural correlation [46].
In addition, there exist some works that employ embedding-based ideas [6] for entity alignment (EA) among knowledge graphs. MTransE [11] separately trains the entity embeddings of two KGs and designed different techniques to represent cross-lingual transitions including axis calibration, translation vectors and linear transformations. JAPE [47] learns the embeddings of two KGs in a unified space and leveraged attributes of triples to refine entity embeddings. To deal with the problem of lack of prior alignment, IPTransE [48] and BootEA [49] employ an iterative process and designed several sophisticated strategies based on the structure of KGs to refine the new alignment. Chen et al. [50] proposed a method called KDCoE, which co-trains the embeddings of multilingual KGs and descriptions of entities. To utilize various features of KGs, Zhang et al. [51] proposed a framework to unify multiple views of entities and learn embeddings for entity alignment. Furthermore, they designed two cross-KG identity inference methods at the entity level as well as the relation and attribute level to preserve and enhance the alignment between different KGs.
Multi-hop reasoning for question answering
Question answering (QA) is a hot topic that has recently been facilitated by large-scale knowledge bases. However, due to the variety and complexity of questions and knowledge, question answering over knowledge bases (KBQA) is still a challenging task, especially in multi-hop QA.
There are two typical categories of multi-relation questions, a path question [52] and a conjunctive question [53]. A path question contains only one topic entity and its answer could be found by walking down an answer path consisting of a few relations and intermediate entities. A conjunctive question contains more than one subject entity and its answer could be obtained by the intersection of results from multiple path questions. At present, semantic parsing models [12] and embedding-based models [13] tailored for QA are not adequate to handle multi-hop QA because of heavy data annotations and reasoning ability. Therefore, recent works utilized hybrid ideas to improve the performance and make these results explainable.
Zhang et al. [54] proposed a probabilistic modeling framework for multi-hop QA, which could simultaneously handle uncertain topic entity and multi-hop reasoning for QA. They introduced a new propagation architecture over KGs so that logical inference could be performed in the probabilistic model. Zhou et al. [52] designed an interpretable reasoning network (IRN). It could dynamically decide which part of an input question should be analyzed at each hop, and predict a relation corresponding to the parsed results. Compared with existing methods, the intermediate entities and relations predicted by IRN could construct traceable reasoning paths to reveal how the answer was derived. Hamilton et al. [53] introduced a framework to efficiently make predictions about conjunctive logical queries. They encoded graph nodes in a low-dimensional space and represented logical operators (i.e. projection operator and intersection operator) as learned geometric operations. Moreover, they further demonstrated how to map a practical subset of logic to efficient geometric operations in an embedding space. Vakulenko et al. [55] proposed a novel approach for complex QA using unsupervised message passing. It could propagate confidence scores by parsing an input question and matching terms in a KG to a set of possible answers. This approach was implemented as a series of sparse matrix multiplications mimicking joins over small local subgraphs so that it could successfully be applied to very large KGs, such as DBpedia.
Other hybrid reasoning methods
Other hybrid reasoning methods focus on boosting the performance of NLP tasks. Most of them merge the symbolic information (e.g. the structure of a KG) into the statistic-based methods and provide the explanation for results of reasoning [56].
Wang et al. [57] proposed a joint model that takes advantage of both explicit and implicit representations for short text classification. They incorporated character level features of KG into a convolutional neural network to capture fine-grained subword information. Experiments on real data showed that their method achieved significant improvement for this task.
To alleviate the bound of number and quality of annotated data, Luo et al. [58] exploited the rich expressiveness of regular expressions at different levels within a neural network (NN). This combined framework could significantly enhance the learning effectiveness and improve the performance on the tasks of intent detection and slot filling.
To tackles the problem of learning and prediction with concept drifts, Chen et al. [59] revisited features embeddings as semantic ones (i.e. consistency vectors and entailment vectors). Such embeddings can be exploited in a context of supervised stream learning to learn statistic models, which are robust to concept drifts. Moreover, they explored an ontology-based knowledge representation and reasoning framework for the transfer learning explanation [60]. It can models a learning domain in transfer learning with expressive OWL ontologies and complement the learning domain with the prediction task-related common sense knowledge. Furthermore, the authors designed a correlative reasoning algorithm to infer three kinds of explanatory evidence for explaining a positive feature or a negative transfer from one learning domain to another.
Conclusion and future direction
Hybrid reasoning in knowledge graphs plays an important role in knowledge completion, schematic knowledge induction, knowledge alignment, complex question answering, explanation of AI, etc. However, there does not exist a survey of existing methods and discuss the challenging problems for this topic. In this paper, we gave an overview of existing methods for hybrid reasoning in KGs. We provided a thorough review of current methods for various goals of reasoning in KGs, and further introduced the key ideas of them. Although there exist many methods for hybrid reasoning in knowledge graphs, there still exist some problems to be solved which are listed in the following.
Footnotes
Acknowledgements
The authors would like to thank all the reviewers for their insightful and valuable suggestions, which significantly improve the quality of this survey. Research presented in this paper was partially supported by the National Key Research and Development Program of China under grants (2018YFC0830200, 2017YFB1002801), the Natural Science Foundation of China grants (U1736204, 61602259), the Fundamental Research Funds for the Central Universities (3209009601), and the Judicial Big Data Research Centre, School of Law at Southeast University.
