Sage Journals: Discover world-class research

Abstract

In the recent years, different Web knowledge graphs, both free and commercial, have been created. While Google coined the term “Knowledge Graph” in 2012, there are also a few openly available knowledge graphs, with DBpedia, YAGO, and Freebase being among the most prominent ones. Those graphs are often constructed from semi-structured knowledge, such as Wikipedia, or harvested from the web with a combination of statistical and linguistic methods. The result are large-scale knowledge graphs that try to make a good trade-off between completeness and correctness. In order to further increase the utility of such knowledge graphs, various refinement methods have been proposed, which try to infer and add missing knowledge to the graph, or identify erroneous pieces of information. In this article, we provide a survey of such knowledge graph refinement approaches, with a dual look at both the methods being proposed as well as the evaluation methodologies used.

Keywords

Knowledge graphs refinement completion correction error detection evaluation

1. Introduction

Knowledge graphs on the Web are a backbone of many information systems that require access to structured knowledge, be it domain-specific or domain-independent. The idea of feeding intelligent systems and agents with general, formalized knowledge of the world dates back to classic Artificial Intelligence research in the 1980s [91]. More recently, with the advent of Linked Open Data [5] sources like DBpedia [56], and by Google’s announcement of the Google Knowledge Graph in 2012,1

¹
http://googleblog.blogspot.co.uk/2012/05/introducing-knowledge-graph-things-not.html.
representations of general world knowledge as graphs have drawn a lot of attention again.

There are various ways of building such knowledge graphs. They can be curated like Cyc [57], edited by the crowd like Freebase [9] and Wikidata [104], or extracted from large-scale, semi-structured web knowledge bases such as Wikipedia, like DBpedia [56] and YAGO [101]. Furthermore, information extraction methods for unstructured or semi-structured information are proposed, which lead to knowledge graphs like NELL [14], PROSPERA [70], or KnowledgeVault [21].

Whichever approach is taken for constructing a knowledge graph, the result will never be perfect [10]. As a model of the real world or a part thereof, formalized knowledge cannot reasonably reach full coverage, i.e., contain information about each and every entity in the universe. Furthermore, it is unlikely, in particular when heuristic methods are applied, that the knowledge graph is fully correct – there is usually a trade-off between coverage and correctness, which is addressed differently in each knowledge graph [111].

To address those shortcomings, various methods for knowledge graph refinement have been proposed. In many cases, those methods are developed by researchers outside the organizations or communities which create the knowledge graphs. They rather take an existing knowledge graph and try to increase its coverage and/or correctness by various means. Since such works are reviewed in this survey, the focus of this survey is not knowledge graph construction, but knowledge graph refinement.

For this survey, we view knowledge graph construction as a construction from scratch, i.e., using a set of operations on one or more sources to create a knowledge graph. In contrast, knowledge graph refinement assumes that there is already a knowledge graph given which is improved, e.g., by adding missing knowledge, or by identifying and removing errors. Usually, those methods directly use the information given in a knowledge graph, e.g., as training information for automatic approaches. Thus, the methods for both construction and refinement may be similar, but not the same, since the latter work on a given graph, while the former are not.

It is important to note that for many knowledge graphs, one or more refinement steps are applied when creating and/or before publishing the graph. For example, logical reasoning is applied on some knowledge graphs for validating the consistency of statements in the graph, and removing the inconsistent statements. Such post processing operations (i.e., operations applied after the initial construction of the graph) would be considered as refinement methods for this survey, and are included in the survey.

Decoupling knowledge base construction and refinement has different advantages. First, it allows – at least in principle – for developing methods for refining arbitrary knowledge graphs, which can then be applied to improve multiple knowledge graphs.2 ²
See Section 7.2 for a critical discussion.
Other than fine-tuning the heuristics that create a knowledge graph, the impact of such generic refinement methods can thus be larger. Second, evaluating refinement methods in isolation of the knowledge graph construction step allows for a better understanding and a cleaner separation of effects, i.e., it facilitates more qualified statements about the effectiveness of a proposed approach.

The rest of this article is structured as follows. Section 2 gives a brief introduction into knowledge graphs in the Semantic Web. In Sections 3 and 4, we present a categorization of approaches and evaluation methodologies. In Sections 5 and 6, we present the review of methods for completion (i.e., increasing coverage) and error detection (i.e., increasing correctness) of knowledge graphs. We conclude with a critical reflection of the findings in Section 7, and a summary in Section 8.
2. Knowledge graphs in the Semantic Web

From the early days, the Semantic Web research community has promoted a graph-based representation of knowledge, e.g., by pushing the RDF standard.3

³
http://www.w3.org/RDF/.
In such a graph-based knowledge representation, entities, which are the nodes of the graph, are connected by relations, which are the edges of the graph (e.g., Shakespeare has written Hamlet), and entities can have types, denoted by is a relations (e.g., Shakespeare is a writer, Hamlet is a play). In many cases, the sets of possible types and relations are organized in a schema or ontology, which defines their interrelations and restrictions of their usage.

With the advent of Linked Data [5], it was proposed to interlink different datasets in the Semantic Web. By means of interlinking, the collection of could be understood as one large, global knowledge graph (although very heterogenous in nature). To date, roughly 1,000 datasets are interlinked in the Linked Open Data cloud, with the majority of links connecting identical entities in two datasets [95].

Table 1
Overview of popular knowledge graphs. The table depicts the number of instances and facts; as well as the number of different types and relations defined in their schema. Instances denotes the number of instances or A-box concepts defined in the graph, Facts denotes the number of statements about those instances, Entity types denotes the number of different types or classes defined in the schema, and Relation types denotes the number of different relations defined in the schema. Microsoft’s Satori and Facebook’s Entities Graph are not shown, because to the best of our knowledge, no detailed recent numbers on the graph are publicly available

Name Instances Facts Types Relations

DBpedia (English) 4,806,150 176,043,129 735 2,813

YAGO 4,595,906 25,946,870 488,469 77

Freebase 49,947,845 3,041,722,635 26,507 37,781

Wikidata 15,602,060 65,993,797 23,157 1,673

NELL 2,006,896 432,845 285 425

OpenCyc 118,499 2,413,894 45,153 18,526

Google’s Knowledge Graph 570,000,000 18,000,000,000 1,500 35,000

Google’s Knowledge Vault 45,000,000 271,000,000 1,100 4,469

Yahoo! Knowledge Graph 3,443,743 1,391,054,990 250 800

The term Knowledge Graph was coined by Google in 2012, referring to their use of semantic knowledge in Web Search (“Things, not strings”), and is recently also used to refer to Semantic Web knowledge bases such as DBpedia or YAGO. From a broader perspective, any graph-based representation of some knowledge could be considered a knowledge graph (this would include any kind of RDF dataset, as well as description logic ontologies). However, there is no common definition about what a knowledge graph is and what it is not. Instead of attempting a formal definition of what a knowledge graph is, we restrict ourselves to a minimum set of characteristics of knowledge graphs, which we use to tell knowledge graphs from other collections of knowledge which we would not consider as knowledge graphs. A knowledge graph
mainly describes real world entities and their interrelations, organized in a graph.

defines possible classes and relations of entities in a schema.

allows for potentially interrelating arbitrary entities with each other.

covers various topical domains.
The first two criteria clearly define the focus of a knowledge graph to be the actual instances (A-box in description logic terminology), with the schema (T-box) playing only a minor role. Typically, this means that the number of instance-level statements is by several orders of magnitude larger than that of schema level statements (cf. Table 1). In contrast, the schema can remain rather shallow, at a small degree of formalization. In that sense, mere ontologies without any instances (such as DOLCE [27]) would not be considered as knowledge graphs. Likewise, we do not consider WordNet [67] as a knowledge graph, since it is mainly concerned with common nouns and words4 ⁴
The question of whether words as such are real world entities or not is of philosophical nature and not answered within the scope of this article.
and their relations (although a few proper nouns, i.e., instances are also included).5 ⁵
Nevertheless, it is occasionally used for evaluating knowledge graph refinement methods, as we will show in the subsequent sections.

The third criterion introduces the possibility to define arbitrary relations between instances, which are not restricted in their domain and/or range. This is a property which is hardly found in relational databases, which follow a strict schema.

Furthermore, knowledge graphs are supposed to cover at least a major portion of the domains that exist in the real world, and are not supposed to be restricted to only one domain (such as geographic entities). In that sense, large, but single-domain datasets, such as GeoNames,6 ⁶
http://www.geonames.org/.
would not be considered a knowledge graph.

Knowledge graphs on the Semantic Web are typically provided using Linked Data [5] as a standard. They can be built using different methods: they can be curated by an organization or a small, closed group of people, crowd-sourced by a large, open group of individuals, or created with heuristic, automatic or semi-automatic means. In the following, we give an overview of existing knowledge graphs, both open and company-owned.
2.1. Cyc and OpenCyc

Name	Instances	Facts	Types	Relations
DBpedia (English)	4,806,150	176,043,129	735	2,813
YAGO	4,595,906	25,946,870	488,469	77
Freebase	49,947,845	3,041,722,635	26,507	37,781
Wikidata	15,602,060	65,993,797	23,157	1,673
NELL	2,006,896	432,845	285	425
OpenCyc	118,499	2,413,894	45,153	18,526
Google’s Knowledge Graph	570,000,000	18,000,000,000	1,500	35,000
Google’s Knowledge Vault	45,000,000	271,000,000	1,100	4,469
Yahoo! Knowledge Graph	3,443,743	1,391,054,990	250	800

The Cyc knowledge graph is one of the oldest knowledge graphs, dating back to the 1980s [57]. Rooted in traditional artificial intelligence research, it is a curated knowledge graph, developed and maintained by CyCorp Inc.7

⁷
http://www.cyc.com/.
OpenCyc is a reduced version of Cyc, which is publicly available. A Semantic Web endpoint to OpenCyc also exists, containing links to DBpedia and other LOD datasets.

OpenCyc contains roughly 120,000 instances and 2.5 million facts defined for those instances; its schema comprises a type hierarchy of roughly 45,000 types, and 19,000 possible relations.8 ⁸
These numbers have been gathered by own inspections of the 2012 of version of OpenCyc, available from http://sw.opencyc.org/.

2.2. Freebase

Curating a universal knowledge graph is an endeavour which is infeasible for most individuals and organizations. To date, more than 900 person years have been invested in the creation of Cyc [92], with gaps still existing. Thus, distributing that effort on as many shoulders as possible through crowdsourcing is a way taken by Freebase, a public, editable knowledge graph with schema templates for most kinds of possible entities (i.e., persons, cities, movies, etc.). After MetaWeb, the company running Freebase, was acquired by Google, Freebase was shut down on March 31st, 2015.

The last version of Freebase contains roughly 50 million entities and 3 billion facts.9

⁹
http://www.freebase.com.
Freebase’s schema comprises roughly 27,000 entity types and 38,000 relation types.10 ¹⁰
These numbers have been gathered by queries against Freebase’s query endpoint.

2.3. Wikidata

Like Freebase, Wikidata is a collaboratively edited knowledge graph, operated by the Wikimedia foundation11

¹¹
http://wikimediafoundation.org/.
that also hosts the various language editions of Wikipedia. After the shutdown of Freebase, the data contained in Freebase is subsequently moved to Wikidata.12 ¹²
http://plus.google.com/109936836907132434202/posts/3aYFVNf92A1.
A particularity of Wikidata is that for each axiom, provenance metadata can be included – such as the source and date for the population figure of a city [104].

To date, Wikidata contains roughly 16 million instances13 ¹³
http://www.wikidata.org/wiki/Wikidata:Statistics.
and 66 million statements.14 ¹⁴
http://tools.wmflabs.org/wikidata-todo/stats.php.
Its schema defines roughly 23,000 types15 ¹⁵
http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes.
and 1,600 relations.16 ¹⁶
http://www.wikidata.org/wiki/Special:ListProperties.

2.4. DBpedia

DBpedia is a knowledge graph which is extracted from structured data in Wikipedia. The main source for this extraction are the key-value pairs in the Wikipedia infoboxes. In a crowd-sourced process, types of infoboxes are mapped to the DBpedia ontology, and keys used in those infoboxes are mapped to properties in that ontology. Based on those mappings, a knowledge graph can be extracted [56].

The most recent version of the main DBpedia (i.e., DBpedia 2015-04, extracted from the English Wikipedia based on dumps from February/March 2015) contains 4.8 million entities and 176 million statements about those entities.17

¹⁷
http://dbpedia.org/services-resources/datasets/dataset-2015-04/dataset-2015-04-statistics.
The ontology comprises 735 classes and 2,800 relations.18 ¹⁸
http://dbpedia.org/dbpedia-data-set-2015-04.

2.5. YAGO

Like DBpedia, YAGO is also extracted from DBpedia. YAGO builds its classification implicitly from the category system in Wikipedia and the lexical resource WordNet [67], with infobox properties manually mapped to a fixed set of attributes. While DBpedia creates different interlinked knowledge graphs for each language edition of Wikipedia [12], YAGO aims at an automatic fusion of knowledge extracted from various Wikipedia language editions, using different heuristics [65].

The latest release of YAGO, i.e., YAGO3, contains 4.6 million entities and 26 million facts about those types. The schema comprises roughly 488,000 types and 77 relations [65].

2.6. NELL

While DBpedia and YAGO use semi-structured content as a base, methods for extracting knowledge graphs from unstructured data have been proposed as well. One of the earliest approaches working at web-scale was the Never Ending Language Learning (NELL) project [14]. The project works on a large-scale corpus of Web pages and exploits a coupled process which learns text patterns corresponding type and relation assertions, as well as applies them to extract new entities and relations. Reasoning is applied for consistency checking and removing inconsistent axioms. The system is still running today, continuously extending its knowledge base. While not published using Semantic Web standards, it has been shown that the data in NELL can be transformed to RDF and provided as Linked Open Data as well [113].

In its most recent version (i.e., the 945th iteration), NELL contains roughly 2 million entities and 433,000 relations between those. The NELL ontology defines 285 classes and 425 relations.19

¹⁹
These numbers have been derived from the promotion heatmap at http://rtw.ml.cmu.edu/resources/results/08m/NELL.08m.945.heatmap.html.

2.7. Google’s Knowledge Graph

Google’s Knowledge Graph was introduced to the public in 2012, which was also when the term knowledge graph as such was coined. Google itself is rather secretive about how their Knowledge Graph is constructed; there are only a few external sources that discuss some of the mechanisms of information flow into the Knowledge Graph based on experience.20

²⁰
E.g., http://www.techwyse.com/blog/search-engine-optimization/seo-efforts-to-get-listed-in-google-knowledge-graph/.
From those, it can be assumed that major semi-structured web sources, such as Wikipedia, contribute to the knowledge graph, as well as structured markup (like schema.org Microdata [66]) on web pages and contents from Google’s online social network Google+.

According to [21], Google’s Knowledge Graph contains 18 billion statements about 570 million entities, with a schema of 1,500 entity types and 35,000 relation types.
2.8. Google’s Knowledge Vault

The Knowledge Vault is another project by Google. It extracts knowledge from different sources, such as text documents, HTML tables, and structured annotations on the Web with Microdata or MicroFormats. Extracted facts are combined using both the extractor’s confidence values, as well as prior probabilities for the statements, which are computed using the Freebase knowledge graph (see above). From those components, a confidence value for each fact is computed, and only the confident facts are taken into Knowledge Vault [21].

According to [21], the Knowledge Vault contains roughly 45 million entities and 271 million fact statements, using 1,100 entity types and 4,500 relation types.

2.9. Yahoo!’s Knowledge Graph

Like Google, Yahoo! also has their internal knowledge graph, which is used to improve search results. The knowledge graph builds on both public data (e.g., Wikipedia and Freebase), as well as closed commercial sources for various domains. It uses wrappers for different sources and monitors evolving sources, such as Wikipedia, for constant updates.

Yahoo’s knowledge graph contains roughly 3.5 million entities and 1.4 billion relations. Its schema, which is aligned with schema.org, comprises 250 types of entities and 800 types of relations [6].

2.10. Microsoft’s Satori

Satori is Microsoft’s equivalent to Google’s Knowledge Graph.21

²¹
http://blogs.bing.com/search/2013/03/21/understand-your-world-with-bing/.
Although almost no public information on the construction, the schema, or the data volume of Satori is available, it has been said to consist of 300 million entities and 800 million relations in 2012, and its data representation format to be RDF.22 ²²
http://research.microsoft.com/en-us/projects/trinity/query.aspx, currently offline, accessible through the Internet Archive.

2.11. Facebook’s Entities Graph

Although the majority of the data in the online social network Facebook23

²³
http://www.facebook.com/.
is perceived as connections between people, Facebook also works on extracting a knowledge graph which contains a larger variety of entities. The information people provide as personal information (e.g., their home town, the schools they went to), as well as their likes (movies, bands, books, etc.), often represent entities, which can be linked both to people as well as among each other. By parsing textual information and linking to Wikipedia, the graph also contains links among entities, e.g., the writer of a book. Although not many public numbers about Facebook’s Entities Graph exist, it is said to contain more than 100 billion connections between entities.24 ²⁴
http://www.facebook.com/notes/facebook-engineering/under-the-hood-the-entities-graph/10151490531588920.

2.12. Summary

Table 1 summarizes the characteristics of the knowledge graphs discussed above. It can be observed that the graphs differ in the basic measures, such as the number of entities and relations, as well as in the size of the schema they use, i.e., the number of classes and relations. From these differences, it can be concluded that the knowledge graphs must differ in other characteristics as well, such as average node degree, density, or connectivity.

3. Categorization of knowledge graph refinement approaches

Knowledge graph refinement methods can differ along different dimensions. For this survey, we distinguish the overall goal of the method, i.e., completion vs. correction of the knowledge graph, the refinement target (e.g., entity types, relations between entities, or literal values), as well as the data used by the approach (i.e., only the knowledge graph itself, or further external sources). All three dimensions are orthogonal.

There are a few research fields which are related to knowledge graph refinement: Ontology learning mainly deals with learning a concept level description of a domain, such as a hierarchy (e.g., Cities are Places) [13,64]. Likewise, description logic learning is mainly concerned with refining such concept level descriptions [55]. As stated above, the focus of knowledge graphs, in contrast, is rather the instance (A-box) level, not so much the concept (T-box) level. Following that notion, we only consider those works as knowledge graph refinement approaches which focus on refining the A-box. Approaches that only focus on the refining T-box are not considered for this survey, however, if the schema or ontology is refined as a means to ultimately improve the A-box, those works are included in the survey.

3.1. Completion vs. error detection

There are two main goals of knowledge graph refinement: (a) adding missing knowledge to the graph, i.e., completion, and (b) identifying wrong information in the graph, i.e., error detection. From a data quality perspective, those goals relate to the data quality dimensions free-of-error and completeness [86].

3.2. Target of refinement

Both completion and error detection approaches can be further distinguished by the targeted kind of information in the knowledge graph. For example, some approaches are targeted towards completing/correcting entity type information, while others are targeted to (either specific or any) relations between entities, or interlinks between different knowledge graphs, or literal values. While the latter can be of any datatype (strings, numbers, dates, etc.), most research focuses on numerical or date-valued literal values.

Another strand of research targets the extension of the schema used by the knowledge graph (i.e., the T-box), not the data (the A-box). However, as discussed above, approaches focusing purely on the schema without an impact on the instance level are not considered for this survey.

3.3. Internal vs. external methods

A third distinguishing property is the data used by an approach. While internal approaches only use the knowledge graph itself as input, external methods use additional data, such as text corpora. In the widest sense, approaches making use of human knowledge, such as crowdsourcing [1] or games with a purpose [105], can also be viewed as external methods, although not fully automatic ones.

4. Categorization of evaluation methods

There are different possible ways to evaluate knowledge graph refinement approaches. On a high level, we can distinguish methodologies that use only the knowledge graph at hand, and methodologies that use external knowledge, such as annotations provided by humans.

4.1. Partial gold standard

One common evaluation strategy is to use a partial gold standard. In this methodology, a subset of graph entities or relations are selected and labeled manually. Other evaluations use external knowledge graphs and/or databases as partial gold standards.

For completion tasks, this means that all axioms that should exist in the knowledge graph are collected, whereas for correction tasks, a set of axioms in the graph is manually labeled as correct or incorrect. The quality of completion approaches is usually measured in recall, precision, and F-measure, whereas for correction methods, accuracy and/or area under the ROC curve (AUC) are often used alternatively or in addition.

Sourcing partial gold standards from humans can lead to high quality data (given that the knowledge graph and the ontology it uses are not overly complex), but is costly, so that those gold standards are usually small. Exploiting other knowledge graphs based on knowledge graph interlinks (e.g., using Freebase data as a gold standard to evaluate DBpedia) is sometimes proposed to yield larger-scale gold standards, but has two sources of errors: errors in the target knowledge graph, and errors in the linkage between the two. For example, it has been reported that 20% of the interlinks between DBpedia and Freebase are incorrect [110], and that roughly half of the owl:sameAs links between knowledge graphs connect two things which are related, but not exactly the same (such as the company Starbucks and a particular Starbucks coffee shop) [33].

4.2. Knowledge graph as silver standard

Another evaluation strategy is to use the given knowledge graph itself as a test dataset. Since the knowledge graph is not perfect (otherwise, refinement would not be necessary), it cannot be considered as a gold standard. However, assuming that the given knowledge graph is already of reasonable quality, we call this method silver standard evaluation, as already proposed in other works [32,45,74].

The silver standard method is usually applied to measure the performance of knowledge graph completion approaches, where it is analyzed how well relations in a knowledge graph can replicated by a knowledge graph completion method. As for gold standard evaluations, the result quality is usually measured in recall, precision, and F-measure. In contrast to using human annotations, large-scale evaluations are easily possible. The silver standard method is only suitable for evaluating knowledge graph completion, not for error detection, since it assumes the knowledge graph to be correct.

There are two variants of silver standard evaluations: in the more common ones, the entire knowledge graph is taken as input to the approach at hand, and the evaluation is then also carried out on the entire knowledge graph. As this may lead to an overfitting effect (in particular for internal methods), some works also foresee the splitting of the graph into a training and a test partition, which, however, is not as straight forward as, e.g., for propositional classification tasks [72], which is why most papers use the former method. Furthermore, split and cross validation do not fully solve the overfitting effect. For example, if a knowledge graph, by construction, has a bias towards certain kinds of information (e.g., relations are more complete for some classes than for others), approaches overadapting to that bias will be rated better than those which do not (and which may actually perform better in the general case).

A problem with this approach is that the knowledge graph itself is not perfect (otherwise, it would not need refinement), thus, this evaluation method may sometimes underrate the evaluated approach. More precisely, most knowledge graphs follow the open world assumption, i.e., an axiom not present in the knowledge graph may or may not hold. Thus, if a completion approach correctly predicts the existence of an axiom missing in the knowledge graph, this would count as a false positive and thus lower precision. Approaches overfitting to the coverage bias of a knowledge graph at hand may thus be overrated.

4.3. Retrospective evaluation

For retrospective evaluations, the output of a given approach is given to human judges for annotation, who then label suggested completions or identified errors as correct and incorrect. The quality metric is usually accuracy or precision, along with a statement about the total number of completions or errors found with the approach, and ideally also with a statement about the agreement of the human judges.

In many cases, automatic refinement methods lead to a very large number of findings, e.g., lists of tens of thousands of axioms which are potentially erroneous. Thus, retrospective evaluations are often carried out only on samples of the results. For some approaches which produce higher level artifacts – such as error patterns or completion rules – as intermediate results, a feasible alternative is to evaluate those artifacts instead of the actually affected axioms.

While partial gold standards can be reused for comparing different methods, this is not the case for retrospective evaluations. On the other hand, retrospective evaluations may make sense in cases where the interesting class is rare. For example, when evaluating error detection methods, a sample for a partial gold standard from a high-quality graph is likely not to contain a meaningful number of errors. In those cases, retrospective evaluation methodologies are often preferred over partial gold standards.

Another advantage of retrospective evaluations is that they allow a very detailed analysis of an approach’s results. In particular, inspecting the errors made by an approach often reveals valuable findings about the advantages and limitations of a particular approach.

Table 2 sums up the different evaluation methodologies and contrasts their advantages and disadvantages.

Table 2
Overview on evaluation methods with their advantages and disadvantages

Methodology Advantages Disadvantages

Partial Gold Standard Highly reliable Costly to produce

Results reusable Balancing problems

Knowledge Graph as Silver Standard Large-scale evaluation feasible Less reliable results

Subjectiveness is minimized Prone to overfitting

Retrospective Evaluation Applicable to disbalanced problems Not reusable

Allows for more detailed analysis of approaches Approaches cannot be compared directly

Methodology	Advantages	Disadvantages
Partial Gold Standard	Highly reliable	Costly to produce
Results reusable	Balancing problems
Knowledge Graph as Silver Standard	Large-scale evaluation feasible	Less reliable results
Subjectiveness is minimized	Prone to overfitting
Retrospective Evaluation	Applicable to disbalanced problems	Not reusable
Allows for more detailed analysis of approaches	Approaches cannot be compared directly

4.4. Computational performance

In addition to the performance w.r.t. correctness and/or completeness of results, computational performance considerations become more important as knowledge graphs become larger. Typical performance measures for this aspect are runtime measurements, as well as memory consumption.

Besides explicit measurement of computational performance, a “soft” indicator for computational performance is whether an approach has been evaluated (or at least the results have been materialized) on an entire large-scale knowledge graph, or only on a subgraph. The latter is often done when applying evaluations on a partial gold standard, where the respective approach is only executed on entities contained in that partial gold standard.

5. Approaches for completion of knowledge graphs

Completion of knowledge graphs aims at increasing the coverage of a knowledge graph. Depending on the target information, methods for knowledge graph completion either predict missing entities, missing types for entities, and/or missing relations that hold between entities.

In this section, we survey methods for knowledge graph completion. We distinguish internal and external methods, and further group the approaches by the completion target.

5.1. Internal methods

Internal methods use only the knowledge contained in the knowledge graph itself to predict missing information.

5.1.1. Methods for completing type assertions

Predicting a type or class for an entity given some characteristics of the entity is a very common problem in machine learning, known as classification. The classification problem is supervised, i.e., it learns a classification model based on labeled training data, typically the set of entities in a knowledge graph (or a subset thereof) which have types attached. In machine learning, binary and multi-class prediction problems are distinguished. In the context of knowledge graphs, in particular the latter are interesting, since most knowledge graphs contain entities of more than two different types. Depending on the graph at hand, it might be worthwile distinguishing multi-label classification, which allows for assigning more than one class to an instance (e.g., Arnold Schwarzenegger being both an Actor and a Politician), and single-label classification, which only assigns one class to an instance [103].

For internal methods, the features used for classification are usually the relations which connect an entity to other entities [81,88], i.e., they are a variant of link-based classification problems [31]. For example, an entity which has a director relation is likely to be a Movie.

In [79,80], we propose a probabilistic method, which is based on conditional probabilities, e.g., the probability of a node being of type Actor is high if there are ingoing edges of type cast. Such probabilities are exploited by the SDType algorithm, which is currently deployed for DBpedia and adds around 3.4 million additional type statements to the knowledge graph.

In [98], the use of Support Vector Machines (SVMs) has been proposed to type entities in DBpedia and Freebase. The authors also exploit interlinks between the knowledge graphs and classify instances in one knowledge graph based on properties present in the other, in order to increase coverage and precision. Nickel et al. [73] propose the use of matrix factorization to predict entity types in YAGO.

Since many knowledge graphs come with a class hierarchy, e.g., defined in a formal ontology, the type prediction problem could also be understood as a hierarchical classification problem. Despite a larger body of work existing on methods for hierarchical classification [96], there are, to the best of our knowledge, no applications of those methods to knowledge graph completion.

In data mining, association rule mining [38] is a method that analyzes the co-occurrence of items in itemsets and derives association rules from those co-occurrences. For predicting missing information in knowledge graphs, those methods can be exploited, e.g., in the presence of redundant information. For example, in DBpedia, different type systems (i.e., the DBpedia ontology and YAGO, among others) are used in parallel, which are populated with different methods (Wikipedia infoboxes and categories, respectively). This ensures both enough overlap to learn suitable association rules, as well as a number of entities that only have a type in one of the systems, to which the rules can be applied. In [77], we exploit such association rules to predict missing types in DBpedia based on such redundancies.

In [99], the use of topic modeling for type prediction is proposed. Entities in a knowledge graph are represented as documents, on which Latent Dirichlet Allocation (LDA) [7] is applied for finding topics. By analyzing the co-occurrence of topics and entity types, new types can be assigned to entities based on the topics detected for those entities.

5.1.2. Methods for predicting relations

While primarily used for adding missing type assertions, classification methods can also be used to predict the existence of relations. To that end, Socher et al. [100] propose to train a tensor neural network to predict relations based on chains of other relations, e.g., if a person is born in a city in Germany, then the approach can predict (with a high probability) that the nationality of that person is German. The approach is applied to Freebase and WordNet. A similar approach is presented in [50], where the authors show that refining such a problem with schema knowledge – either defined or induced – can significantly improve the performance of link prediction. In [49], an approach similar to association rule mining is used to find meaningful chains of relations for relation prediction. Similarly, in [112], an embedding of pairwise entity relations into a lower dimensional space is learned, which is then used to predict the existence of relations in Freebase.

Likewise, association rule mining can be used for predicting relations as well. In [46], the mining of association rules which predict relations between entities in DBpedia from Wikipedia categories is proposed.25

²⁵
Note that since Wikipedia categories are part of the DBpedia knowledge graph, we consider this approach an internal one.

5.2. External methods

External methods use sources of knowledge – such as text corpora or other knowledge graphs – which are not part of the knowledge graph itself. Those external sources can be linked from the knowledge graph, such as knowledge graph interlinks or links to web pages, e.g., Wikipedia pages describing an entity, or exist without any relation to the knowledge graph at hand, such as large text corpora.

5.2.1. Methods for completing type assertions

For type prediction, there are also classification methods that use external data. In contrast to the internal classification methods described above, external data is used to create a feature representation of an entity.

Nuzzolese et al. [75] propose the usage of the Wikipedia link graph to predict types in a knowledge graph using a k-nearest neighbors classifier. Given that a knowledge graph contains links to Wikipedia, interlinks between Wikipedia pages are exploited to create feature vectors, e.g., based on the categories of the related pages. Since links between Wikipedia pages are not constrained, there are typically more interlinks between Wikipedia pages than between the corresponding entities in the knowledge graph.

Apriosio et al. [4] use types of entities in different DBpedia language editions (each of which can be understood as a knowledge graph connected to the others) as features for predicting missing types. The authors use a k-NN classifier with different distance measures (i.e., kernel functions), such as the overlap of two articles’ categories. In their setting, a combination of different distance measures is reported to provide the best results.

Another set of approaches uses abstracts in DBpedia to extract definitionary clauses, e.g., using Hearst patterns [36]. Such approaches have been proposed by Gangemi et al. [28] and Kliegr [47], where the latter uses abstracts in the different languages in order to increase coverage and precision.

5.2.2. Methods for predicting relations

Like types, relations to other entities can also be predicted from textual sources, such as Wikipedia pages. Lange et al. [52] learn patterns on Wikipedia abstracts using Conditional Random Fields [51]. A similar approach, but on entire Wikipedia articles, is proposed by [109].26

²⁶
Although both approaches do not explicitly mention DBpedia, but aim at completing missing key-value pairs in infoboxes, this can be directly transferred to extending DBpedia.

Another common method for the prediction of a relation between two entities is distant supervision. Typically, such approaches use large text corpora. As a first step, entities in the knowledge graph are linked to the text corpus by means of Named Entity Recognition [40,90]. Then, based on the relations in the knowledge graph, those approaches seek for text pattern which correspond to relation types (such as: Y’s book X being a pattern for the relation author holding between X and Y), and apply those patterns to find additional relations in the text corpus. Such methods have been proposed by Mintz et al. [68] for Freebase, and by Aprosio et al. [3] for DBpedia. In both cases, Wikipedia is used as a text corpus. In [30], a similar setting with DBpedia and two text corpora – the English Wikipedia and an English-language news corpus – is used, the latter showing less reliable results. A similar approach is followed in the RdfLiveNews prototype, where RSS feeds of news companies are used to address the aspect of timeliness in DBpedia, i.e., extracting new information that is either outdated or missing in DBpedia [29].

West et al. [107] propose the use of web search engines to fill gaps in knowledge graphs. Like in the works discussed above, they first discover lexicalizations for relations. Then, they use those lexicalizations to formulate search engine queries for filling missing relation values. Thus, they use the whole Web as a corpus, and combine information retrieval and extraction for knowledge graph completion.

While text is unstructured, some approaches have been proposed that use semi-structured data for completing knowledge graphs. In particular, approaches leveraging on structured data in Wikipedia are found in the literature. Those are most often used together with DBpedia, so that there are already links between the entities and the corpus of background knowledge, i.e., no Named Entity Recognition has to be performed, in contrast to the distant supervision approaches discussed above.

Muñoz et al. [69] propose extraction from tables in Wikipedia. They argue that for two entities co-occurring in a Wikipedia table, it is likely that the corresponding entities should share an edge in the knowledge graph. To fill in those edges, they first extract a set of candidates from the tables, using all possible relations that hold between at least one pair of entities in two columns. Then, based on a labeled subset of that extraction, they apply classification using various features to identify those relations that should actually hold in the knowledge graph.

Ritze et al. [89] extend this approach to arbitrary HTML tables. This requires that not only that pairs of table columns have to be matched to properties in the DBpedia ontology, but also that rows in the table need to be matched to entities in DBpedia. The authors propose an iterative approach to solve those two problems. The approach is evaluated on a gold standard mapping for a sample of HTML tables from the WebDataCommons Web Table corpus.27 ²⁷
http://webdatacommons.org/webtables/.
Since such tables can also contain literal values (such as population figures), the approach is capable of completing both relations between entities, and literal values for entities.

In [84], we have proposed the use of list pages in Wikipedia for generating both type and relation assertions in knowledge graphs, based on statistical methods. The idea is that entities appear together in list pages for a reason, and it should be possible to identify that common pattern appearing for the majority of the instance in the list page. For example, instances linked from the page List of Jewish-American Writers should all be typed as Writer and include an edge religion to Jewish, as well as an edge nationality to United States of America. Once such patterns are found for the majority of the list items, they can be applied to the remaining ones to fill gaps in the knowledge graph.

Many knowledge graphs contain links to other knowledge graphs. Those are often created automatically [71]. Interlinks between knowledge graphs can be used to fill gaps in one knowledge graph from information defined in another knowledge graph. If a mapping both on the instance and on the schema level is known, it can be exploited for filling gaps in knowledge graphs on both sides.

One work in this direction is presented by Bryl and Bizer [12], where different language versions of DBpedia (each of which can be seen as a knowledge graph of its own) are used to fill missing values in the English language DBpedia (the one which is usually meant when referring to DBpedia).

Dutta et al. [23] propose a probabilistic mapping between knowledge graphs. Based on distributions of types and properties, they create a mapping between knowledge graphs, which can then be used to derive additional, missing facts in the knowledge graphs. To that end, the type systems used by two knowledge graphs are mapped to one another. Then, types holding in one knowledge graph can be used to predict those that should hold in another.
6. Approaches for error detection in knowledge graphs

Like completion methods discussed in the previous section, methods for identifying errors in knowledge graphs can target various types of information, i.e., type assertions, relations between individuals, literal values, and knowledge graph interlinks.

In this section, we survey methods for detecting errors in knowledge graphs. Like for the previous section, we distinguish internal and external methods, and further group the approaches by the error detection target.

6.1. Internal methods

Internal methods use only the information given in a knowledge graph to find out whether an axiom in the knowledge graph is plausible or not.

6.1.1. Methods for finding erroneous type assertions

In contrast to relation assertions, type assertions are most often more correct in knowledge graphs than relation assertions [80]. Hence, methods for finding erroneous type assertions are rather rare. One such method is proposed by Ma et al. [63], who use inductive logic programming for learning disjointness axioms, and then apply those disjointness axioms for identifying potentially wrong type assertions.

6.1.2. Methods for finding erroneous relations

For building Knowledge Vault, Dong et al. use classification to tell relations which should hold in a knowledge graph from those which should not [21]. Like the work by Muñoz et al. discussed above, each relation is used as an instance in the classification problem, with the existence of the relation in the knowledge graph being used as a binary class. This classification is used as a cleansing step after the knowledge extraction process. While the creation of positive training examples from the knowledge graph is quite straight forward, the authors propose the creation of negative training examples by applying a Local Closed World Assumption, assuming that a relation r between two entities $e_{1}$ and $e_{2}$ does not hold if it is not present in the knowledge graph, and there is a relation r between $e_{1}$ and another $e_{3}$ .

In [80], we have proposed a statistical method for finding wrong statements within a knowledge graph. For each type of relation, we compute the characteristic distribution of subject and object types for the edges, i.e., each instantiation of the relation. Edges in the graph whose subject and object type strongly deviate from the characteristic distributions are then identified as potential errors.

Reasoning is a field of study in the artificial intelligence community which deals with automatically deriving proofs for theorems, and for uncovering contradictions in a set of axioms [91]. The techniques developed in this field have been widely adopted in the Semantic Web community, leading to the development of a larger number of ontology reasoners [19,20,62].

For exploiting reasoning for error checking in knowledge graphs, a rich ontology is required, which defines the possible types of nodes and edges in a knowledge graph, as well as the restrictions that hold on them. For example, if a person is defined to be the capital of a state, this is a contradiction, since capitals are cities, and cities and persons are disjoint, i.e., no entity can be a city and a person at the same time. Reasoning is often used at the building stage of a knowledge graph, i.e., when new axioms are about to be added. For example, NELL and PROSPERA perform reasoning at that point to determine whether the new axiom is plausible or not, and discard implausible ones [14,70]. For real-world knowledge graphs, reasoning can be difficult due to the presence of errors and noise in the data [43,87].

Works using reasoning as a refinement operation for knowledge graphs have also been proposed. However, many knowledge graphs, such as DBpedia, come with ontologies that are not rich enough to perform reasoning for inconsistency detection – for example, they lack class disjointness assertions needed for an inference as in the example above. Therefore, approaches exploiting reasoning are typically used in conjunction with methods for enriching ontologies, such as statistical methods, as proposed in [42] and [102], or association rule mining, as in [53]. In all of those works, the ontology at hand is enriched with further axioms, which can then be used for detecting inconsistencies. For example, if a reasoner concludes that an entity should both be a person and an organization, and from the enrichment steps has a disjointness axiom between the two types added, a reasoner can state that one out of a few axioms in the knowledge graph has to be wrong. Another source of additional disjointness axioms is the use of a top level ontology like DOLCE [27], which provides high-level disjointness axioms [82].

In [85], a light-weight reasoning approach is proposed to compare actual and defined domains and ranges of relations in a knowledge graph schema. The authors propose a set of heuristics for fixing the schema if the actual and the defined domain or range strongly deviate.

6.1.3. Methods for finding erroneous literal values

Outlier detection or anomaly detection methods aim at identifying those instances in a dataset that deviate from the majority from the data, i.e., that follow different characteristics than the rest of the data [15,39].

As outlier detection in most cases deals with numeric data, numeric literals are a natural target for those methods. In [108], we have proposed the application of different univariate outlier detection methods (such as interquartile range or kernel density estimation) to DBpedia. Although outlier detection does not necessarily identify errors, but also natural outliers (such as the population of very large cities), it has been shown that the vast majority of outliers identified are actual errors in DBpedia, mostly resulting from mistakes made when parsing strings using various number formats and units of measurement.

To lower the influence of natural outliers, an extension of that approach has been presented in [24], where the instance set under inspection is first split into smaller subsets. For example, population values are inspected for countries, cities, and towns in isolation, thus, the distributions are more homogenous, which leads to a higher precision in error identification. Furthermore, the approach foresees cross-checking the outliers that have been found, using other knowledge graphs in order to further reduce the influence of natural outliers, which makes it a mixed approach with both an internal and an external component.

The work in [58] tries to learn arithmetic relations between attributes, e.g., lessThan or greaterThan, using probabilistic modeling. For example, the birth date of a person must be before her death date, the total area of a country must be larger than the area covered by water, etc. Violations of those relations are then used to identify errors.28

²⁸
Note that an error here is not a single statement, but a pair of statements that cannot be true at the same time. Thus, the approach does not trivially lead to a fully automatic repairing mechanism (unless both statements are removed, which means that most likely, one correct statement is removed as well).

6.1.4. Methods for finding erroneous knowledge graph interlinks

In [78], we have shown that outlier detection is not only applicable to numerical values, but also to other targets, such as knowledge graph interlinks. To that end, the interlinks are represented as a multi-dimensional feature vector, e.g., with each type of the respective entity in both knowledge graphs being a binary feature. In that feature space, standard outlier detection techniques such as Local Outlier Factor [11] or cluster-based outlier detection [35] can be used to assign outlier scores. Based on those scores, implausible links, such as a owl:sameAs assertion between a person and a book, can be identified based only on the overall distribution of all links, where such a combination is infrequent.

6.2. External methods

Purely automatic external methods for error detection in knowledge graphs have limitations, e.g., in telling apart actual errors from unusual findings [94]. Semi-automatic approaches, which exploit human knowledge, have also been proposed.

6.2.1. Methods for finding erroneous relations

Most external methods are targeted on finding erroneous relations in knowledge graphs. One of the few fully automatic approaches is DeFacto [54]. The system uses a database of lexicalizations for predicates in DBpedia. Based on those lexicalizations, it transforms statements in DBpedia to natural language sentences, and uses a web search engine to find web pages containing those sentences. Statements with no or only very few web pages supporting the corresponding sentences are then assigned a low confidence score.

Apart from fully automatic methods, semi-automatic methods involving users have been proposed for validating knowledge graphs, such as crowdsourcing with microtasks [1]. In order to increase the user involvement and motivation, game-based approaches (i.e., games with a purpose) have been proposed [37,48,97,105]. In a wider sense, those can also be viewed as external methods, with the human in the loop being the external source of information.

Generally, a crucial issue with human computation is the size of Web scale knowledge graphs. In [80], it has been argued that the time needed to validate the entire DBpedia knowledge graph with the crowdsourcing approach proposed in [1] – extrapolating the task completion times reported – would take more than 3,000 years. To overcome such scaling problems, we have recently proposed a clustering of inconsistencies identified by automatic means, which allows to present only representative examples to the human for inspection [82]. We have shown that most of the clusters have a common root cause in the knowledge graph construction (e.g., a wrong mapping rule or a programming error), so that by inspecting only a few dozen examples (and addressing the respective root causes), millions of statements can be corrected.

6.2.2. Methods for finding erroneous literal values

While most of the crowdsourcing approaches above are focusing on relations in the knowledge graph, the work in [1] uses similar mechanisms for validating knowledge graph interlinks and literal values.

In [60], an automatic approach using knowledge graph interlinks for detecting wrong numerical values is proposed. The authors exploit links between identical resources and apply different matching functions between properties in the individual sources. Facts in one knowledge graph are assumed to be wrong if multiple other sources have a consensus for a conflicting fact (e.g., a radically different population figure).

7. Findings from the survey

From the survey in the last two sections, we can observe that is a large number of approaches proposed for knowledge graph refinement, both for automatic completion and for error detection. Tables 3 to 5 sum up the results from the previous section.

Table 3
Overview of knowledge graph completion approaches (part 1). Abbreviations used: Target (T = Types, R = Relations), Type (I = internal, E = external), Evaluation (RE = retrospective evaluation, PG = partial gold standard, either available (a) or unavailable (n/a), KG = evaluation against knowledge graph, SV = split validation, CV = cross validation), Metrics (P/R = precision and recall, A = accuracy, AUC-PR = area under precision-recall-curve, ROC = Area under the ROC curve, T = total new statements). Comp.: evaluation or materialization carried out on whole knowledge graph or not, Performance: computational performance reported or not

Paper	Target	Type	Methods and Sources	Knowledge Graph(s)	Eval.	Metrics	Whole	Comp.
Paulheim [77]	T	I	Association Rule Mining	DBpedia	RE	P, T	no	yes
Nickel et al. [73]	T	I	Matrix Factorization	YAGO	KG (CV)	P/R, AUC-PR	yes	yes
Paulheim/Bizer [79,80]	T	I	Likelihood based	DBpedia, OpenCyc, Nell	KG, RE	P/R, T	yes	no
Sleeman et al. [99]	T	I	Topic Modeling	DBpedia	KG (SV)	P/R	no	no
Nuzzolese et al. [75]	T	E	Different machine learning methods, Wikipedia link graph	DBpedia	PG (a)	P/R	yes	no
Gangemi et al. [28]	T	E	NLP on Wikipedia abstracts	DBpedia	PG (a)	P/R	no	yes
Kliegr [47]	T	E	NLP on Wikipedia abstracts	DBpedia	PG (n/a)	P/R	no	no
Aprosio et al. [4]	T	E	K-NN classification, different DBpedia language editions	DBpedia	PG (n/a)	P/R	yes	no
Dutta et al. [23]	T	E	Knowledge graph fusion (statistical)	NELL, DBpedia	PG (a)	P/R	no	no
Sleeman/Finin [98]	T	I, E	SVM, using other KGs	DBpedia, Freebase, Arnetminer	KG	P/R	no	no
Socher et al. [100]	R	I	Neural Tensor Network	WordNet, Freebase	KG (SV)	A	no	no
Krompaß et al. [50]	R	I	Latent variable models	DBpedia, Freebase, YAGO	KG (SV)	AUC-PR, ROC	no	no
Kolthoff and Dutta [49]	R	I	Rule mining	DBpedia, YAGO	KG	P/R	no	no
Zhao et al. [112]	R	I	Learning Embeddings	WordNet, Freebase	KG	P	no	no

Table 4

Overview of knowledge graph completion approaches (part 2). Abbreviations used: Target (T = Types, R = Relations), Type (I = internal, E = external), Evaluation (RE = retrospective evaluation, PG = partial gold standard, either available (a) or unavailable (n/a), KG = evaluation against knowledge graph, SV = split validation, CV = cross validation), Metrics (P/R = precision and recall, A = accuracy, AUC-PR = area under precision-recall-curve, T = total new statements). Comp.: evaluation or materialization carried out on whole knowledge graph or not, Performance: computational performance reported or not

Paper	Target	Type	Methods and Sources	Knowledge Graph(s)	Eval.	Metrics	Whole	Comp.
Galárraga et al. [26]	R	I	Association Rule Mining	YAGO, DBpedia	KG, RE	A, T	yes	yes
Kim et al. [46]	R	I	Association Rule Mining	DBpedia	RE	P, T	yes	no
Bryl/Bizer [12]	R	E	Fusion of DBpedia language editions	DBpedia	PG (a)	A	no	no
Apriosio et al. [3]	R	E	Distant supervision and NLP techniques on Wikipedia text corpus	DBpedia	KG	P/R	no	no
Lange et al. [52]	R	E	Pattern learning on Wikipedia abstracts	(DBpedia)	KG (CV)	P/R	yes	yes
Wu et al. [109]	R	E	Pattern learning on Wikipedia articles, Web search	(DBpedia)	KG (CV)	P/R	no	no
Mintz et al. [68]	R	E	Distant supervision and NLP techniques on Wikipedia text corpus	Freebase	KG	P/R	no	no
Gerber/Ngonga Ngomo [30]	R	E	Distant supervision and NLP techniques on two corpora	DBpedia	KG, RE	P/R	no	no
Gerber et al. [29]	R	E	NLP and pattern mining on RSS news feeds	DBpedia	PG	P/R, A	yes	yes
West et al. [107]	R	E	search engines	Freebase	KG	P/R, rank	no	no
Muñoz et al. [69]	R	E	Statistical measures, machine learning, using Wikipedia tables	DBpedia	PG (a)	P/R	yes	no
Ritze et al. [89]	R, L	E	Schema and instance matching using HTML tables	DBpedia	PG (a)	P/R	yes	no
Paulheim/Ponzetto [84]	T, R	E	Statistical measures, Wikipedia list pages	DBpedia	–	–	no	no

Table 5

Overview of error detection approaches. Abbreviations used: Target (T = Types, R = Relations, L = Literals, I = Interlinks, S = Schema), Type (I = internal, E = external), Evaluation (RE = retrospective evaluation, PG = partial gold standard, either available (a) or unavailable (n/a), KG = evaluation against knowledge graph, SV = split validation, CV = cross validation), Metrics (P/R = precision and recall, A = accuracy, AUC-PR = area under precision-recall-curve, ROC = Area under the ROC curve, T = total new statements, RMSE = Root Mean Squared Error). Comp.: evaluation or materialization carried out on whole knowledge graph or not, Performance: computational performance reported or not

Paper	Target	Type	Methods and Sources	Knowledge Graph(s)	Eval.	Metrics	Whole	Comp.
Ma et al. [63]	T	I	Reasoning, Association Rule Mining	DBpedia, Zhishi.me	RE	P, T	yes	no
Dong et al. [21]	R	I	Classification	Knowledge Vault	KG (SV)	AUC-PR	yes	no
Li et al. [58]	L	I	Probabilistic Modeling	DBpedia	RE	P, T	yes	yes
Nakashole et al. [70]	R	I	Reasoning	Prospera	RE	P, T	yes	yes
Paulheim/Bizer [80]	R	I	Probabilistic	DBpedia, NELL	RE	P	yes	no
Lehmann et al. [54]	R	E	Text pattern induction, Web search engines	DBpedia	KG (CV)	P/R, ROC, RMSE	no	yes
Töpper et al. [102]	R, S	I	Statistical methods, Reasoning	DBpedia	RE	P, T	yes	no
Jang et al. [42]	R	I	Statistical methods	DBpedia	RE	P, R	yes	no
Lehmann/Bühmann [53]	R, T	I	Reasoning, ILP	DBpedia, Open Cyc, seven smaller ontologies	RE	A	yes	yes
Wienand/Paulheim [108]	L	I	Outlier Detection	DBpedia	RE	P, T	no	no
Fleischhacker et al. [24]	L	I, E	Outlier Detection and Data Fusion with other KG	DBpedia, NELL	RE	AUC-ROC	no	no
Liu et al. [60]	L	E	Matching to other KGs	DBpedia	RE	P, T	no	no
Paulheim [78]	I	I	Outlier Detection	DBpedia + two linked graphs	PG (a)	P/R, ROC	yes	no
Acosta et al. [1]	L, I	E	Crowdsourcing	DBpedia	PG (a)	P	no	no
Waitelonis et al. [105]	R	E	Quiz game	DBpedia	RE	P, T	no	yes
Hees et al. [37]	R	E	Two-player game	DBpedia	RE	T	no	yes
Paulheim and Gangemi [82]	R	E	Reasoning, clustering, human inspection	DBpedia	RE	P, T	yes	yes

By taking a closer look at those results, we can derive some interesting findings, both with respect to the approaches, as well as with respect to evaluation methodologies.

7.1. Approaches

A first interesting observation is that our distinction into completion and error detection is a strict one. That is, there exist no approaches which do both completion and correction at the same time. The only exception we found is the pairing of the two approaches SDType and SDValidate [80], which are two closely related algorithms which share the majority of the computations and can output both completion axioms and errors.

For many of the approaches, it is not obvious why they were only used for one purpose. For example, many of the probabilistic and NLP-based completion approaches seek for evidence for missing axioms, e.g., by means of scanning text corpora. Similarly, many completion approaches ultimately compute a confidence score, which is then combined with a suitable threshold for completing a knowledge graph. In principle, they could also be used for error detection by flagging axioms for which no or only little evidence was found, or those with a low confidence score, as wrong.

Furthermore, in particular in the machine learning area, approaches exist which can be used for simultaneously creating a predictive model and creating weights for pieces of information. For example, random forests can assign weights to attributes [59], whereas boosting assign weights to instances [25], which can also be interpreted as outlier scores [16]. Likewise, there are anomaly detection methods that build on learning predictive models [34,83]. Such approaches could be a starting point for developing methods for simultaneous completion and error detection in knowledge graphs.

Along the same lines, there are hardly any among the error detection approaches which are also suitable for correcting errors, i.e., suggest fixes for the errors found. Here, a combination between completion and error detection methods could be of great value: once an error is detected, the erroneous axiom(s) could be removed, and a correction algorithm could try to find a new (and, in the best case, more accurate) replacement for the removed axiom(s).

Another finding for error detection approaches is that those approaches usually output a list of potentially erroneous statements. Higher level patterns from those errors, which would hint at design level problems in the knowledge graph construction, are rarely derived (apart from the work presented in [82]). Such patterns, however, would be highly valuable for the parties involved in developing and curating knowledge graphs.

In addition to the strict separation of completion and correction, we also observe that most of the approaches focus on only one target, i.e., types, relations, literals, etc. Approaches that simultaneously try to complete or correct, e.g., type and relation assertions in a knowledge graph, are also quite rare.

For the approaches that perform completion, all works examined in this survey try to add missing types for or relations between existing entities in the knowledge graph. In contrast, we have not observed any approaches which populate the knowledge graph with new entities. Here, entity set expansion methods, which have been deeply investigated in the NLP field [76,93,106], would be an interesting fit to further increase the coverage of knowledge graphs, especially for less well-known long tail entities.

Another interesting observation is that, although the discussed works address knowledge graphs, only very few of them are, in the end, genuinely graph-based approaches. In many cases, simplistic transformations to a propositional problem formulation are taken. Here, methods from the graph mining literature still seek their application to knowledge graphs. In particular, for many of the methods applied in the works discussed above – such as outlier detection or association rule mining – graph-based variants have been proposed in the literature [2,44]. Likewise, graph kernel functions – which can be used in Support Vector Machines as well as other machine learning algorithms – have been proposed for RDF graphs [18,41,61] and hence could be applied to many Web knowledge graphs.

7.2. Evaluation methodologies

For evaluation methodologies, our first observation is that there are various different evaluation metrics being used in the papers examined. There is a clear tendency towards precision and recall (or precision and total number of statements for retrospective evaluations) are the most used metrics, with others – such as ROC curves, accuracy, or Root Mean Squared Error – occasionally being used as well.

With respect to the overall methodology, the results are more mixed. Evaluations using the knowledge graph as a silver standard, retrospective evaluations, and evaluations based on partial gold standards appear roughly at equal frequency, with retrospective validations mostly used for error detection. The latter is not too surprising, since due to the high quality of most knowledge graphs used for the evaluations, partial gold standards based on random samples are likely to contain only few errors. For partial gold standards, it is crucial to point out that the majority of authors make those partial gold standards public,29

²⁹
For this survey, we counted a partial gold standard as public if there was a working download link in the paper, but we did not make any additional efforts to search for the gold standard, such as contacting the authors.
which allows for replication and comparison.

DBpedia is the knowledge graph which is most frequently used for evaluation. This, in principle, makes the results comparable to a certain extent, although roughly each year, a new version of DBpedia is published, so that papers from different years are likely to be evaluated on slightly different knowledge graphs.

That being said, we have observed that roughly two out of three approaches evaluated on DBpedia are only evaluated on DBpedia. Along the same lines, about half of the approaches reviewed in this survey are only evaluated on one knowledge graph. This, in many cases, limits the significance of the results. For some works, it is clear that they can only work on a specific knowledge graph, e.g., DBpedia, by design, e.g., since they exploit the implicit linkage between a DBpedia entity and the corresponding Wikipedia page.

As discussed in Section 2, knowledge graphs differ heavily in their characteristics. Thus, for an approach evaluated on only one graph, it is unclear whether it would perform similarly on another knowledge graph with different characteristics, or whether it exploits some (maybe not even obvious) characteristics of that knowledge graph, and/or overfits to particular characteristics of that graph.

Last, but not least, we have observed that only a minority of approaches have been evaluated on a whole, large-scale knowledge graph. Moreover, statements about computational performance are only rarely included in the corresponding papers.30 ³⁰
Even though we were relaxed on this policy and counted also informal statements about the computational performance as a performance evaluation.
In the age of large-scale knowledge graphs, we think that this is a dimension that should not be neglected.

In order to make future works on knowledge graph evolution comparable, it would be useful to have a common selection of benchmarks. This has been done in other fields of the semantic web as well, such as for schema and instance matching [22], reasoning [8], or question answering [17]. Such benchmarks could serve both for comparison in the qualitative as well as the computational performance.
8. Conclusion

In this paper, we have presented a survey on knowledge graph refinement methods. We distinguish completion from error detection, and internal from external methods. We have shown that a larger body of works exist which apply different methods, ranging from techniques from the machine learning field to NLP related techniques.

The survey has revealed that there are, at the moment, rarely any approaches which simultaneously try to improve completeness and correctness of knowledge graphs, and usually only address one target, such as type or relation assertions, or literal values. Holistic solutions which simultaneously improve the quality of knowledge graphs in many different aspects are currently not observed.

Looking at the evaluation methods, the picture is quite diverse. Different methods are applied, using either the knowledge graph itself as a silver standard, using a partial gold standard, or performing a retrospective evaluation, are about equally distributed. Furthermore, approaches are often only evaluated on one specific knowledge graph. This makes it hard to compare approaches and make general statements on their relative performance.

In addition, scalability issues are only rarely addressed by current research works. In the light of the advent of Web-scale knowledge graphs, however, this is an aspect which will be of growing importance.

To sum up, this survey shows that knowledge graph refinement is a relevant and flowering research area. At the same time, this survey has pointed out some uncharted territories on the research map, which we hope will inspire researchers in the area.

References

Acosta,

Zaveri,

Simperl,

Kontokostas,

Auer and

Lehmann, Crowdsourcing Linked Data quality assessment, in: The Semantic Web-ISWC 2013, LNCS, Vol. 8219, Springer, Berlin, Heidelberg, 2013, pp. 260–276. doi:10.1007/978-3-642-41338-4_17.

Akoglu,

Tong and

Koutra, Graph based anomaly detection and description: A survey, Data Mining and Knowledge Discovery 29(3) (2014), 1–63. doi:10.1007/s10618-014-0365-y.

A.P.

Aprosio,

Giuliano and

Lavelli, Extending the coverage of DBpedia properties using distant supervision over Wikipedia, in: NLP&DBpedia, CEUR Workshop Proceedings, Vol. 1064, 2013, http://ceur-ws.org/Vol-1064/ .

A.P.

Aprosio,

Giuliano and

Lavelli, Automatic expansion of DBpedia exploiting Wikipedia cross-language information, in: The Semantic Web: Semantics and Big Data, LNCS, Vol. 7882, Springer, Berlin, Heidelberg, 2013, pp. 397–411. doi:10.1007/978-3-642-38288-8_27.

Bizer,

Heath and

Berners-Lee, Linked Data – the story so far, International Journal on Semantic Web and Information Systems 5(3) (2009), 1–22. doi:10.4018/jswis.2009081901.

Blanco,

B.B.

Cambazoglu,

Mika and

Torzec, Entity recommendations in Web search, in: The Semantic Web-ISWC 2013, LNCS, Vol. 8219, Springer, Berlin, Heidelberg, 2013, pp. 33–48. doi:10.1007/978-3-642-41338-4_3.

D.M.

Blei,

A.Y.

Ng and

M.I.

Jordan, Latent Dirichlet allocation, Journal of Machine Learning Research 3 (2003), 993–1022.

Bock,

Haase,

Ji and

Volz, Benchmarking OWL reasoners, in: Workshop on Advancing Reasoning on the Web: Scalability and Commonsense, CEUR Workshop Proceedings, Vol. 350, 2008, http://ceur-ws.org/Vol-350/ .

Bollacker,

Evans,

Paritosh,

Sturge and

Taylor, Freebase: A collaboratively created graph database for structuring human knowledge, in: Proc. of the 2008 ACM SIGMOD International Conference on Management of Data, ACM, New York, 2008, pp. 1247–1250. doi:10.1145/1376616.1376746.

10.

Bordes and

Gabrilovich, Constructing and mining Web-scale knowledge graphs, in: Proc. of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, 2014, p. 1967. doi:10.1145/2623330.2630803.

11.

M.M.

Breunig,

H.-P.

Kriegel,

R.T.

Ng and

Sander, LOF: Identifying density-based local outliers, ACM SIGMOD Record 29(2) (2000), 93–104. doi:10.1145/335191.335388.

12.

Bryl and

Bizer, Learning conflict resolution strategies for cross-language Wikipedia data fusion, in: Proc. of the Companion Publication of the 23rd International Conference on World Wide Web Companion, International World Wide Web Conferences Steering Committee, Geneva, 2014, pp. 1129–1134. doi:10.1145/2567948.2578999.

13.

Buitelaar,

Cimiano and

Magnini, Ontology Learning from Text: Methods, Evaluation and Applications, Frontiers in Artificial Intelligence and Applications, Vol. 123, IOS Press, Clifton, VA, 2005.

14.

Carlson,

Betteridge,

R.C.

Wang,

E.R.

HruschkaJr. and

T.M.

Mitchell, Coupled semi-supervised learning for information extraction, in: Proc. of the Third ACM International Conference on Web Search and Data Mining, ACM, New York, 2010, pp. 101–110. doi:10.1145/1718487.1718501.

15.

Chandola,

Banerjee and

Kumar, Anomaly detection: A survey, ACM Computing Surveys (CSUR) 41(3) (2009). doi:10.1145/1541880.1541882.

16.

Cheze and

J.-M.

Poggi, Iterated boosting for outlier detection, in: Data Science and Classification, Springer, Berlin, Heidelberg, 2006, pp. 213–220. doi:10.1007/3-540-34416-0_23.

17.

Cimiano,

Lopez,

Unger,

Cabrio,

A.-C.N.

Ngomo and

Walter, Multilingual question answering over Linked Data (QALD-3): Lab overview, in: Information Access Evaluation. Multilinguality, Multimodality, and Visualization, LNCS, Vol. 8138, Springer, Berlin, Heidelberg, 2013, pp. 321–332. doi:10.1007/978-3-642-40802-1_30.

18.

G.K.D.

de Vries, A fast approximation of the Weisfeiler-Lehman graph kernel for RDF data, in: Machine Learning and Knowledge Discovery in Databases, LNCS, Vol. 8188, Springer, Berlin, Heidelberg, 2013, pp. 606–621. doi:10.1007/978-3-642-40988-2_39.

19.

Dentler,

Cornet,

ten Teije and

de Keizer, Comparison of reasoners for large ontologies in the OWL 2 EL profile, Semantic Web 2(2) (2011), 71–87. doi:10.3233/SW-2011-0034.

20.

Ding,

Kolari,

Ding and

Avancha, Using ontologies in the semantic web: A survey, in: Ontologies, Integrated Series in Information Systems, Vol. 14, Springer, US, 2007, pp. 79–113. doi:10.1007/978-0-387-37022-4_4.

21.

X.L.

Dong,

Murphy,

Gabrilovich,

Heitz,

Horn,

Lao,

Strohmann,

Sun and

Zhang, Knowledge Vault: A Web-scale approach to probabilistic knowledge fusion, in: 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, 2014, pp. 601–610. doi:10.1145/2623330.2623623.

22.

Dragisic,

Eckert,

Euzenat,

Faria,

Ferrara,

Granada,

Ivanova,

Jimenez-Ruiz,

Kempf,

Lambrix et al., Results of the ontology alignment evaluation initiative 2014 in: International Workshop on Ontology Matching, CEUR Workshop Proceedings, Vol. 1317, 2014, pp. 61–104, http://ceur-ws.org/Vol-1317/ .

23.

Dutta,

Meilicke and

S.P.

Ponzetto, A probabilistic approach for integrating heterogeneous knowledge sources, in: The Semantic Web: Trends and Challenges, LNCS, Vol. 8465, Springer, Switzerland, 2014, pp. 286–301. doi:10.1007/978-3-319-07443-6_20.

24.

Fleischhacker,

Paulheim,

Bryl,

Völker and

Bizer, Detecting errors in numerical Linked Data using cross-checked outlier detection, in: The Semantic Web-ISWC 2014, LNCS, Vol. 8796, Springer, Switzerland, 2014, pp. 357–372. doi:10.1007/978-3-319-11964-9_23.

25.

Freund and

R.E.

Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences 55(1) (1997), 119–139. doi:10.1006/jcss.1997.1504.

26.

L.A.

Galárraga,

Teflioudi,

Hose and

Suchanek, AMIE: Association rule mining under incomplete evidence in ontological knowledge bases, in: Proc. of the 22nd International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, Geneva, 2013, pp. 413–422.

27.

Gangemi,

Guarino,

Masolo and

Oltramari, Sweetening WordNet with DOLCE, AI Magazine 24(3) (2003), 13–24. doi:10.1609/aimag.v24i3.1715.

28.

Gangemi,

A.G.

Nuzzolese,

Presutti,

Draicchio,

Musetti and

Ciancarini, Automatic typing of DBpedia entities, in: The Semantic Web-ISWC 2012, LNCS, Vol. 7649, Springer, Berlin, Heidelberg, 2012, pp. 65–81. doi:10.1007/978-3-642-35176-1_5.

29.

Gerber,

Hellmann,

Bühmann,

Soru,

Usbeck and

A.-C.N.

Ngomo, Real-time RDF extraction from unstructured data streams, in: The Semantic Web-ISWC 2013, LNCS, Vol. 8218, Springer, Berlin, Heidelberg, 2013, pp. 135–150. doi:10.1007/978-3-642-41335-3_9.

30.

Gerber and

A.-C.N.

Ngomo, Bootstrapping the Linked Data web, in: Workshop on Web Scale Knowledge Extraction, 2011, http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Workshops/WeKEx/paper_3.pdf .

31.

Getoor and

C.P.

Diehl, Link mining: A survey, ACM SIGKDD Explorations Newsletter 7(2) (2005), 3–12. doi:10.1145/1117454.1117456.

32.

Groza,

Oellrich and

Collier, Using silver and semi-gold standard corpora to compare open named entity recognisers, in: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, Piscataway, New Jersey, 2013, pp. 481–485. doi:10.1109/BIBM.2013.6732541.

33.

Halpin,

P.J.

Hayes,

J.P.

McCusker,

D.L.

McGuinness and

H.S.

Thompson, When owl:sameAs isn’t the same: An analysis of identity in Linked Data, in: The Semantic Web-ISWC 2010, LNCS, Vol. 6496, Springer, Berlin, Heidelberg, 2010, pp. 305–320. doi:10.1007/978-3-642-17746-0_20.

34.

Hawkins,

He,

Williams and

Baxter, Outlier detection using replicator neural networks, in: Data Warehousing and Knowledge Discovery, LNCS, Vol. 2454, Springer, Berlin, Heidelberg, 2002, pp. 170–180. doi:10.1007/3-540-46145-0_17.

35.

He,

Xu and

Deng, Discovering cluster-based local outliers, Pattern Recognition Letters 24(9) (2003), 1641–1650. doi:10.1016/S0167-8655(03)00003-5.

36.

M.A.

Hearst, Automatic acquisition of hyponyms from large text corpora, in: Proc. of the 14th Conference on Computational Linguistics (Volume 2), Association for Computational Linguistics, Stroudsburg, PA, 1992, pp. 539–545. doi:10.3115/992133.992154.

37.

Hees,

Roth-Berghofer,

Biedert,

Adrian and

Dengel, BetterRelations: Using a game to rate Linked Data triples, in: KI 2011: Advances in Artificial Intelligence, LNCS, Vol. 7006, Springer, Berlin, Heidelberg, 2011, pp. 134–138. doi:10.1007/978-3-642-24455-1_12.

38.

Hipp,

Güntzer and

Nakhaeizadeh, Algorithms for association rule mining – a general survey and comparison, ACM SIGKDD Explorations Newsletter 2(1) (2000), 58–64. doi:10.1145/360402.360421.

39.

V.J.

Hodge and

Austin, A survey of outlier detection methodologies, Artificial Intelligence Review 22(2) (2004), 85–126. doi:10.1007/s10462-004-4304-y.

40.

Hoffart,

M.A.

Yosef,

Bordino,

Fürstenau,

Pinkal,

Spaniol,

Taneva,

Thater and

Weikum, Robust disambiguation of named entities in text, in: Proc. of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Edinburgh, 2011, pp. 782–792, http://www.aclweb.org/anthology/D11-1072 .

41.

Huang,

Nickel,

Tresp and

H.-P.

Kriegel, A scalable kernel approach to learning in semantic graphs with applications to Linked Data, Semantic Web 5(1) (2010), 5–22. doi:10.3233/SW-130100.

42.

Jang, Megawati

Choi and

M.Y.

Yi, Semi-automatic quality assessment of Linked Data without requiring ontology, in: Workshop on NLP and DBpedia, 2015, https://nlpdbpedia2015.files.wordpress.com/2015/08/nlpdbpedia_2015_submission_2.pdf .

43.

Ji,

Gao and

Huang, Reasoning with noisy Semantic Data, in: The Semanic Web: Research and Applications, LNCS, Vol. 6644, Springer, Berlin, Heidelberg, 2011, pp. 497–502. doi:10.1007/978-3-642-21064-8_42.

44.

Jiang,

Coenen and

Zito, A survey of frequent subgraph mining algorithms, The Knowledge Engineering Review 28(1) (2013), 75–105. doi:10.1017/S0269888912000331.

45.

Kang,

E.M.

van Mulligen and

J.A.

Kors, Training text chunkers on a silver standard corpus: Can silver replace gold? BMC Bioinformatics 13(1) (2012), 17. doi:10.1186/1471-2105-13-17.

46.

Kim,

E.-K.

Kim,

Won,

Nam and

K.-S.

Choi, The association rule mining system for acquiring knowledge of DBpedia from Wikipedia categories, in: Workshop on NLP and DBpedia, 2015, https://nlpdbpedia2015.files.wordpress.com/2015/08/nlpdbpedia_2015_submission_13.pdf .

47.

Kliegr, Linked hypernyms: Enriching DBpedia with targeted hypernym discovery, Web Semantics: Science, Services and Agents on the World Wide Web 31 (2015), 59–69. doi:10.1016/j.websem.2014.11.001.

48.

Knuth,

Hercher and

Sack, Collaboratively patching Linked Data, in: Workshop on Usage Analysis and the Web of Data (USEWOD), 2012, http://arxiv.org/abs/1204.2715 .

49.

Kolthoff and

Dutta, Semantic relation composition in large scale knowledge bases, in: 3rd Workshop on Linked Data for Information Extraction, CEUR Workshop Proceedings, Vol. 1467, 2015, pp. 34–47, http://ceur-ws.org/Vol-1467/ .

50.

Krompaß,

Baier and

Tresp, Type-constrained representation learning in knowledge graphs, in: International Semantic Web Conference, LNCS, Vol. 9366, Springer, Switzerland, 2015, pp. 640–655. doi:10.1007/978-3-319-25007-6_37.

51.

Lafferty,

McCallum and

F.C.N.

Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, in: 18th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, 2001, pp. 282–289.

52.

Lange,

Böhm and

Naumann, Extracting structured information from Wikipedia articles to populate infoboxes, in: Proc. of the 19th ACM Conference on Information and Knowledge Management (CIKM), ACM, New York, 2010, pp. 1661–1664. doi:10.1145/1871437.1871698.

53.

Lehmann and

Bühmann, ORE – a tool for repairing and enriching knowledge bases, in: The Semantic Web-ISWC 2010, LNCS, Vol. 6497, Springer, Berlin, Heidelberg, 2010, pp. 177–193. doi:10.1007/978-3-642-17749-1_12.

54.

Lehmann,

Gerber,

Morsey and

A.-C.N.

Ngomo, DeFacto – deep fact validation, in: The Semantic Web-ISWC 2012, LNCS, Vol. 7649, Springer, Berlin, Heidelberg, 2012, pp. 312–327. doi:10.1007/978-3-642-35176-1_20.

55.

Lehmann and

Hitzler, Concept learning in description logics using refinement operators, Machine Learning 78 (2010), 203–250. doi:10.1007/s10994-009-5146-2.

56.

Lehmann,

Isele,

Jakob,

Jentzsch,

Kontokostas,

P.N.

Mendes,

Hellmann,

Morsey,

van Kleef,

Auer and

Bizer, DBpedia – a large-scale, multilingual knowledge base extracted from Wikipedia, Semantic Web Journal 6(2) (2013). doi:10.3233/SW-140134.

57.

D.B.

Lenat, CYC: A large-scale investment in knowledge infrastructure, Communications of the ACM 38(11) (1995), 33–38. doi:10.1145/219717.219745.

58.

Li,

Xu and

Zhong, Probabilistic error detecting in numerical Linked Data, in: Database and Expert Systems Applications, LNCS, Vol. 9261, Springer International, 2015, pp. 61–75. doi:10.1007/978-3-319-22849-5_5.

59.

Liaw and

Wiener, Classification and regression by randomForest, R News 2(3) (2002), 18–22, http://www.bios.unc.edu/~dzeng/BIOS740/randomforest.pdf .

60.

Liu,

d’Aquin and

Motta, Towards Linked Data fact validation through measuring consensus, in: Workshop on Linked Data Quality, CEUR Workshop Proceedings, Vol. 1376, 2015, http://ceur-ws.org/Vol-1376/ .

61.

Lösch,

Bloehdorn and

Rettinger, Graph kernels for RDF data, in: The Semantic Web: Research and Applications, LNCS, Vol. 7295, Springer, Berlin, Heidelberg, 2012, pp. 134–148. doi:10.1007/978-3-642-30284-8_16.

62.

Luther,

Liebig,

Böhm and

Noppens, Who the heck is the father of Bob? in: The Semantic Web: Research and Applications, LNCS, Vol. 5554, Springer, Berlin, Heidelberg, 2009, pp. 66–80. doi:10.1007/978-3-642-02121-3_9.

63.

Ma,

Gao,

Wu and

Qi, Learning disjointness axioms with association rule mining and its application to inconsistency detection of Linked Data, in: The Semantic Web and Web Science,

Zhao,

Du,

Wang,

Ji and

J.Z.

Pan, eds, Communications in Computer and Information Science, Vol. 480, Springer, Berlin, Heidelberg, 2014, pp. 29–41. doi:10.1007/978-3-662-45495-4_3.

64.

Maedche, Ontology Learning for the Semantic Web, Springer Science & Business Media, Luxembourg, 2002.

65.

Mahdisoltani,

Biega and

F.M.

Suchanek, YAGO3: A knowledge base from multilingual Wikipedias, in: Conference on Innovative Data Systems Research, 2015, http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper1.pdf .

66.

Meusel,

Petrovski and

Bizer, The WebDataCommons microdata, RDFa and microformat dataset series, in: The Semantic Web-ISWC 2014, LNCS, Vol. 8796, Springer International, 2014, pp. 277–292. doi:10.1007/978-3-319-11964-9_18.

67.

G.A.

Miller, WordNet: A lexical database for English, Communications of the ACM 38(11) (1995), 39–41. doi:10.1145/219717.219748.

68.

Mintz,

Bills,

Snow and

Jurafsky, Distant supervision for relation extraction without labeled data, in: Proc. of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Association for Computational Linguistics, Stroudsburg, PA, 2009, pp. 1003–1011.

69.

Muñoz,

Hogan and

Mileo, Triplifying Wikipedia’s tables, in: Linked Data for Information Extraction, CEUR Workshop Proceedings, Vol. 1057, 2013, http://ceur-ws.org/Vol-1057/ .

70.

Nakashole,

Theobald and

Weikum, Scalable knowledge harvesting with high precision and high recall, in: Proc. of the Fourth ACM International Conference on Web Search and Data Mining, ACM, New York, 2011, pp. 227–236. doi:10.1145/1935826.1935869.

71.

Nentwig,

Hartung,

A.-C.N.

Ngomo and

Rahm, A survey of current link discovery frameworks. Semantic Web (2015) (to appear). doi:10.3233/SW-150210.

72.

Neville and

Jensen, Iterative classification in relational data, in: Proc. AAAI-2000 Workshop on Learning Statistical Models from Relational Data, AAAI, Palo Alto, CA, 2000, pp. 13–20. http://www.aaai.org/Library/Workshops/2000/ws00-06-007.php .

73.

Nickel,

Tresp and

H.-P.

Kriegel, Factorizing YAGO: Scalable machine learning for Linked Data, in: Proc. of the 21st International Conference on World Wide Web, ACM, New York, 2012, pp. 271–280. doi:10.1145/2187836.2187874.

74.

Nothman,

Ringland,

Radford,

Murphy and

J.R.

Curran, Learning multilingual named entity recognition from Wikipedia, Artificial Intelligence 194 (2013), 151–175. doi:10.1016/j.artint.2012.03.006.

75.

A.G.

Nuzzolese,

Gangemi,

Presutti and

Ciancarini, Type inference through the analysis of Wikipedia links, in: Linked Data on the Web, CEUR Workshop Proceedings, Vol. 937, 2012, http://ceur-ws.org/Vol-937/ .

76.

Pantel,

Crestan,

Borkovsky,

A.-M.

Popescu and

Vyas, Web-scale distributional similarity and entity set expansion, in: Proc. of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2, Association for Computational Linguistics, Stroudsburg, PA, 2009, pp. 938–947, http://www.aclweb.org/anthology/D09-1098 .

77.

Paulheim, Browsing Linked Open Data with auto complete, Semantic Web Challenge, 2012, https://km.aifb.kit.edu/sites/swc/2012/submissions/swc2012_submission_15.pdf.

78.

Paulheim, Identifying wrong links between datasets by multi-dimensional outlier detection, in: International Workshop on Debugging Ontologies and Ontology Mappings, CEUR Workshop Proceedings, Vol. 1162, 2014, pp. 27–38, http://ceur-ws.org/Vol-1162/ .

79.

Paulheim and

Bizer, Type inference on noisy RDF data, in: The Semantic Web-ISWC 2013, LNCS, Vol. 8218, Springer, Berlin, Heidelberg, 2013, pp. 510–525. doi:10.1007/978-3-642-41335-3_32.

80.

Paulheim and

Bizer, Improving the quality of Linked Data using statistical distributions, International Journal on Semantic Web and Information Systems (IJSWIS) 10(2) (2014), 63–86. doi:10.4018/ijswis.2014040104.

81.

Paulheim and

Fürnkranz, Unsupervised generation of data mining features from Linked Open Data, in: 2nd International Conference on Web Intelligence, Mining and Semantics, ACM, New York, 2012, p. 31. doi:10.1145/2254129.2254168.

82.

Paulheim and

Gangemi, Serving DBpedia with DOLCE – more than just adding a cherry on top, in: International Semantic Web Conference, LNCS, Vol. 9366, Springer International, 2015. doi:10.1007/978-3-319-25007-6_11.

83.

Paulheim and

Meusel, A decomposition of the outlier detection problem into a set of supervised learning problems, Machine Learning 100(2–3) (2015), 509–531. doi:10.1007/s10994-015-5507-y.

84.

Paulheim and

S.P.

Ponzetto, Extending DBpedia with Wikipedia list pages, in: 1st International Workshop on NLP and DBpedia, CEUR Workshop Proceedings, Vol. 1064, 2013, http://ceur-ws.org/Vol-1064/ .

85.

Péron,

Raimbault,

Ménier and

P.-F.

Marteau, On the detection of inconsistencies in RDF data sets and their correction at ontological level, Technical report 00635854, HAL, 2011, https://hal.archives-ouvertes.fr/hal-00635854/en/.

86.

L.L.

Pipino,

Y.W.

Lee and

R.Y.

Wang, Data quality assessment, Communications of the ACM 45(4) (2002), 211–218. doi:10.1145/505248.506010.

87.

Polleres,

Hogan,

Harth and

Decker, Can we ever catch up with the Web? Semantic Web 1(1) (2010), 45–52. doi:10.3233/SW-2010-0016.

88.

Ristoski and

Paulheim, A comparison of propositionalization strategies for creating features from Linked Open Data, in: Linked Data for Knowledge Discovery, CEUR Workshop Proceedings, Vol. 1232, 2014, http://ceur-ws.org/Vol-1232/ .

89.

Ritze,

Lehmberg and

Bizer, Matching HTML tables to DBpedia, in: Proc. of the 5th International Conference on Web Intelligence, Mining and Semantics, ACM, New York, 2015, p. 10. doi:10.1145/2797115.2797118.

90.

Rizzo and

Troncy, NERD: Evaluating named entity recognition tools in the Web of Data, in: Workshop on Web Scale Knowledge Extraction (WEKEX’11), 2011, http://nerd.eurecom.fr/ui/paper/Rizzo_Troncy-wekex2011.pdf .

91.

Russell and

Norvig, Artificial Intelligence: A Modern Approach, Pearson, London, 1995.

92.

Sarjant,

Legg,

Robinson and

Medelyan, “All you can eat” ontology-building: Feeding Wikipedia to Cyc, in: Proc. of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology-Volume 01, IEEE Computer Society, Piscataway, NJ, 2009, pp. 341–348. doi:10.1109/WI-IAT.2009.60.

93.

Sarmento,

Jijkuon,

de Rijke and

Oliveira, “More Like These”: Growing entity classes from seeds, in: Proc. of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, ACM, New York, 2007, pp. 959–962. doi:10.1145/1321440.1321585.

94.

Schäfer,

Ristoski and

Paulheim, What is special about Bethlehem, Pennsylvania? – Identifying unexpected facts about DBpedia entities, in: ISWC 2015 – Posters and Demonstrations, CEUR Workshop Proceedings, Vol. 1486, 2015, http://ceur-ws.org/Vol-1486/ .

95.

Schmachtenberg,

Bizer and

Paulheim, Adoption of the Linked Data best practices in different topical domains, in: International Semantic Web Conference, LNCS, Vol. 8796, Springer International, 2014. doi:10.1007/978-3-319-11964-9_16.

96.

C.N.

SillaJr. and

A.A.

Freitas, A survey of hierarchical classification across different application domains, Data Mining and Knowledge Discovery 22(1–2) (2011), 31–72. doi:10.1007/s10618-010-0175-9.

97.

Siorpaes and

Hepp, Games with a purpose for the Semantic Web, IEEE Intelligent Systems 3 (2008), 50–60. doi:10.1109/MIS.2008.45.

98.

Sleeman and

Finin, Type prediction for efficient coreference resolution in heterogeneous semantic graphs, in: 2013 IEEE Seventh International Conference on Semantic Computing (ICSC), IEEE, Piscataway, NJ, 2013, pp. 78–85. doi:10.1109/ICSC.2013.22.

99.

Sleeman,

Finin and

Joshi, Topic modeling for RDF graphs, in: 3rd Workshop on Linked Data for Information Extraction, CEUR Workshop Proceedings, Vol. 1467, 2015, http://ceur-ws.org/Vol-1467/ .

100.

Socher,

Chen,

C.D.

Manning and

Ng, Reasoning with neural tensor networks for knowledge base completion, in: Advances in Neural Information Processing Systems 26 (NIPS 2013), Curran Associates, Inc., Newry, 2013, pp. 926–934, http://papers.nips.cc/paper/5028-reasonin .

101.

F.M.

Suchanek,

Kasneci and

Weikum, YAGO: A core of semantic knowledge unifying WordNet and Wikipedia, in: 16th International Conference on World Wide Web, ACM, New York, 2007, pp. 697–706. doi:10.1145/1242572.1242667.

102.

Töpper,

Knuth and

Sack, DBpedia ontology enrichment for inconsistency detection, in: Proc. of the 8th International Conference on Semantic Systems, ACM, New York, 2012, pp. 33–40. doi:10.1145/2362499.2362505.

103.

Tsoumakas

et al., Multi label classification: An overview, International Journal of Data Warehousing and Mining 3(3) (2007), 1–13. doi:10.4018/jdwm.2007070101.

104.

Vrandečić and

Krötzsch, Wikidata: A free collaborative knowledge base, Communications of the ACM 57(10) (2014), 78–85. doi:10.1145/2629489.

105.

Waitelonis,

Ludwig,

Knuth and

Sack, WhoKnows? – Evaluating linked data heuristics with a quiz that cleans up DBpedia, International Journal of Interactive Technology and Smart Education 8(4) (2011), 236–248. doi:10.1108/17415651111189478.

106.

R.C.

Wang and

W.W.

Cohen, Iterative set expansion of named entities using the Web, in: Eighth IEEE International Conference on Data Mining, IEEE, Piscataway, NJ, 2008, pp. 1091–1096. doi:10.1109/ICDM.2008.145.

107.

West,

Gabrilovich,

Murphy,

Sun,

Gupta and

Lin, Knowledge base completion via search-based question answering, in: Proc. of the 23rd International Conference on World Wide Web, ACM, New York, 2014, pp. 515–526. doi:10.1145/2566486.2568032.

108.

Wienand and

Paulheim, Detecting incorrect numerical data in DBpedia, in: The Semantic Web: Trends and Challenges, LNCS, Vol. 8465, Springer International, 2014, pp. 504–518. doi:10.1007/978-3-319-07443-6_34.

109.

Wu,

Hoffmann and

D.S.

Weld, Information extraction from Wikipedia: Moving down the long tail, in: Proc. of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, 2008, pp. 731–739. doi:10.1145/1401890.1401978.

110.

Zaveri,

Kontokostas,

M.A.

Sherif,

Bühmann,

Morsey,

Auer and

Lehmann, User-driven quality evaluation of DBpedia, in: 9th International Conference on Semantic Systems (I-SEMANTICS ’13), ACM, New York, 2013, pp. 97–104. doi:10.1145/2506182.2506195.

111.

Zaveri,

Rula,

Maurino,

Pietrobon,

Lehmann,

Auer and

Hitzler, Quality assessment methodologies for Linked Open Data, Semantic Web Journal 7(1) (2015), 63–93. doi:10.3233/SW-150175.

112.

Zhao,

Gao,

Gallinari and

Guo, Knowledge base completion by learning pairwise-interaction differentiated embeddings, Data Mining and Knowledge Discovery 29(5) (2015), 1486–1504. doi:10.1007/s10618-015-0430-1.

113.

Zimmermann,

Gravier,

Subercaze and

Cruzille, Nell2RDF: Read the Web, and turn it into RDF, in: Knowledge Discovery and Data Mining Meets Linked Open Data, CEUR Workshop Proceedings, Vol. 992, 2013, pp. 2–8, http://ceur-ws.org/Vol-992/ .

Knowledge graph refinement: A survey of approaches and evaluation methods

Abstract

Keywords

1. Introduction

9 http://www.freebase.com. Freebase’s schema comprises roughly 27,000 entity types and 38,000 relation types.10 10 These numbers have been gathered by queries against Freebase’s query endpoint. 2.3. Wikidata

17 http://dbpedia.org/services-resources/datasets/dataset-2015-04/dataset-2015-04-statistics. The ontology comprises 735 classes and 2,800 relations.18 18 http://dbpedia.org/dbpedia-data-set-2015-04. 2.5. YAGO

2.6. NELL

19 These numbers have been derived from the promotion heatmap at http://rtw.ml.cmu.edu/resources/results/08m/NELL.08m.945.heatmap.html. 2.7. Google’s Knowledge Graph

2.9. Yahoo!’s Knowledge Graph

2.10. Microsoft’s Satori

3. Categorization of knowledge graph refinement approaches

3.1. Completion vs. error detection

3.2. Target of refinement

3.3. Internal vs. external methods

4. Categorization of evaluation methods

4.1. Partial gold standard

4.2. Knowledge graph as silver standard

4.3. Retrospective evaluation

5. Approaches for completion of knowledge graphs

5.1. Internal methods

5.1.1. Methods for completing type assertions

5.1.2. Methods for predicting relations

25 Note that since Wikipedia categories are part of the DBpedia knowledge graph, we consider this approach an internal one. 5.2. External methods

5.2.1. Methods for completing type assertions

5.2.2. Methods for predicting relations

6.1. Internal methods

6.1.1. Methods for finding erroneous type assertions

6.1.2. Methods for finding erroneous relations

6.1.3. Methods for finding erroneous literal values

6.2. External methods

6.2.1. Methods for finding erroneous relations

6.2.2. Methods for finding erroneous literal values

7. Findings from the survey

7.2. Evaluation methodologies

References

⁹
http://www.freebase.com.
Freebase’s schema comprises roughly 27,000 entity types and 38,000 relation types.10 ¹⁰
These numbers have been gathered by queries against Freebase’s query endpoint.

2.3. Wikidata

¹⁷
http://dbpedia.org/services-resources/datasets/dataset-2015-04/dataset-2015-04-statistics.
The ontology comprises 735 classes and 2,800 relations.18 ¹⁸
http://dbpedia.org/dbpedia-data-set-2015-04.

2.5. YAGO

¹⁹
These numbers have been derived from the promotion heatmap at http://rtw.ml.cmu.edu/resources/results/08m/NELL.08m.945.heatmap.html.

2.7. Google’s Knowledge Graph

²⁵
Note that since Wikipedia categories are part of the DBpedia knowledge graph, we consider this approach an internal one.

5.2. External methods