Abstract
DBpedia is one of the earliest and most prominent nodes of the Linked Open Data cloud. DBpedia extracts and provides structured data for various crowd-maintained information sources such as over 100 Wikipedia language editions as well as Wikimedia Commons by employing a mature ontology and a stable and thorough Linked Data publishing lifecycle. Wikidata, on the other hand, has recently emerged as a user curated source for structured information which is included in Wikipedia. In this paper, we present how Wikidata is incorporated in the DBpedia eco-system. Enriching DBpedia with structured information from Wikidata provides added value for a number of usage scenarios. We outline those scenarios and describe the structure and conversion process of the DBpediaWikidata (
Introduction
In the past decade, several large and open knowledge bases were created. A popular example, DBpedia [6], extracts information from more than one hundred Wikipedia language editions and Wikimedia Commons [9] resulting in several billion facts. A more recent effort, Wikidata [10], is an open knowledge base for building up structured knowledge for re-use across Wikimedia projects.
At the time of writing, both databases grow independently. The Wikidata community is manually curating and growing the Wikidata knowledge base. The data DBpedia extracts from different Wikipedia language editions, and in particular the infoboxes, are constantly growing as well. Although this creates an incorrect perception of rivalry between DBpedia and Wikidata, it is on everyone‘s interest to have a common source of truth for encyclopedic knowledge. Currently, it is not always clear if the Wikidata or the Wikipedia community provide more up-to-date information. In addition to the independent growth of DBpedia and Wikidata, there is a number of structural complementarities as well as overlaps with regard to identifiers, structure, schema, curation, publication coverage and data freshness that are analysed throughout this manuscript.
We argue that aligning both knowledge bases in a loosely coupled way would produce an improved resource and render a number of benefits for the end users. Wikidata would have an alternate DBpedia-based view of its data and an additional data distribution channel. End users would have more options for choosing the dataset that fits better in their needs and use cases. Additionally, it would create an indirect connection between the Wikidata and Wikipedia communities that could enable a big range of use cases.
The remainder of this article is organized as follows. Section 2 provides an overview of Wikidata and DBpedia as well as a comparison between the two datasets. Following, Section 3 provides a rational for the design decision that shaped

Wikidata statements, image taken from Commons (
Wikidata and DBpedia
Wikidata Wikidata is a community-created knowledge base to manage factual information of Wikipedia and its sister projects operated by the Wikimedia Foundation [10]. In other words, Wikidata’s goal is to be the central data management platform of Wikipedia. As of April 2016, Wikidata contains more than 20 million items and 87 million statements1
@prefix wkdt:⟨

RDF serialization for the fact: Douglas Adams’ (Q42) spouse is Jane Belson (Q14623681) from (P580) 25 November 1991 until (P582) 11 May 2001. Extracted from [10], Fig. 3
DBpedia The semantic extraction of information from Wikipedia is accomplished using the DBpedia Information Extraction Framework (DIEF) [6]. DIEF is able to process input data from several sources provided by Wikipedia.
The actual extraction is performed by a set of pluggable Extractors, which rely on certain Parsers for different data types. Since 2011, DIEF is extended to provide better knowledge coverage for internationalized content [5] and further eases the integration of different Wikipedia language editions as well as Wikimedia Commons [9].
Both knowledge bases overlap as well as complement each other as described in the high-level overview below.
DBpedia uses human-readable Wikipedia article identifiers to create IRIs for concepts in each Wikipedia language edition. Wikidata on the other hand uses language-independent numeric identifiers. DBpedia starts with RDF as a base data model while Wikidata developed its own data model, which provides better means for capturing provenance information. Using the Wikidata data model as a base, different RDF serializations are possible. Both schemas of DBpedia and Wikidata are community curated and multilingual. The DBpedia schema is an ontology based in OWL that organizes the extracted data and integrates the different DBpedia language editions. The Wikidata schema avoids direct use of RDFS or OWL terms and redefines many of them, e.g. All DBpedia data is automatically extracted from Wikipedia and is a read-only dataset. Wikipedia editors are, as a side-effect, the actual curators of the DBpedia knowledge base but due to the semi-structured nature of Wikipedia, not all content can be extracted and errors may occur. Wikidata on the other hand has its own direct data curation interface called WikiBase,5
Both DBpedia and Wikidata publish datasets in a number of Linked Data ways, including datasets dumps, dereferencable URIs and SPARQL endpoints.
DBpedia provides identifiers for all structural components in a Wikipedia language edition. This includes articles, categories, redirects and templates. Wikidata creates common identifiers for concepts that exist in more than one language. For example, not all articles, categories, templates and redirects from a Wikipedia language edition have a Wikidata identifier. On the other hand, Wikidata has more flexible notability criteria and can describe concepts beyond Wikipedia. There has not yet been a thorough qualitative and quantitative comparison in terms of content but the following two studies provide a good comparison overview [3,7].
DBpedia is a static, read-only dataset that is updated periodically. An exception is DBpedia Live (available for English, French and German).
On the other hand, Wikidata has a direct editing interface where people can create, update or fix facts instantly. However, there has not yet been a study that compares whether facts entered in Wikidata are more up to date than data entered in Wikipedia (and thus, transitively in DBpedia live).
In this section we describe the design decisions we took to shape the DBpediaWikidata (
New IRI minting The most important design decision we had to take was whether to re-use the existing Wikidata IRIs or minting new IRIs in the DBpedia namespace. The decision dates back to 2013, when this project originally started and after lengthy discussions we concluded that minting new URIs was the only viable option.6
Re-publishing minted IRIs as linked data Since 2007, there has been many tools created by the DBpedia community to explore and exploit DBpedia data through the DBpedia ontology. Although there does not exists any thorough survey, some of these tools are collected on the DBpedia website and we refer the readers to publications related to DBpedia.7
Ontology design, reification and querying The DBpedia ontology is a crowdsourced ontology developed and maintained since 2006. The DBpedia ontology has reached a stable state where mostly additions and specializations are added in the ontology. At the time of writing, the DBpedia ontology defines 375 datatypes and units.8
The Wikidata schema on the other hand is quite new and evolving and thus, not as stable. Simple datatype support in Wikidata started from the beginning of the project but units were only introduced at the end of 2015. In addition, Wikidata did not start with RDF as a primary data representation mechanism. There were different RDF serializations of Wikidata data and in particular different reification techniques. For example the RDF we get from content negotiation9
The DBpedia Information Extraction Framework was greatly refactored to accommodate the extraction of data in Wikidata. The major difference between Wikidata and the other Wikimedia projects DBpedia extracts is that Wikidata uses JSON instead of WikiText to store items.
In addition to some DBpedia provenance extractors that can be used in any MediaWiki export dump, we defined 10 additional Wikidata extractors to export as much knowledge as possible out of Wikidata. These extractors can get labels, aliases, descriptions, different types of sitelinks, references, statements and qualifiers.
For statements we define a RawWikidataExtractor that extracts all available information but uses our reification scheme (cf. Section 5) and the Wikidata properties and the R2RWikidataExtractor that uses a mapping-based approach to map Wikidata statements to the DBpedia ontology. Figure 2 depicts the current
Wikidata property mappings
In the same way the DBpedia mappings wiki defines infobox to ontology mappings, in the context of this work we define Wikidata property to DBpedia ontology mappings. Wikidata property mappings can be defined both as Schema Mappings and as Value Transformation Mappings. Related approaches have been designed for the migration of Freebase to Wikidata [8].
Schema mappings
The DBpedia mappings wiki11
The value transformation takes the form of a JSON structure that binds a Wikidata property to one or more value transformation strings. A complete list of the existing value transformation mappings can be found in the DIEF.15
replaces the placeholder with the raw Wikidata value, e.g.
replaces the placeholder with an escaped value to form a valid MediaWiki title, used when the value is a Wikipedia title and needs proper whitespace escaping, e.g.
Using the schema class mappings, tries to map the current value to a DBpedia class. This function is used to extract
Geo-related functions to extract coordinates from values. The following is a complete geo mapping that the extracts geo coordinates similar to the DBpedia coordinates dataset.16
For every occurrence of the property P625, four triples — one for every mapping — are generated (Listings 2 and 3).

Geographical

Resulting RDF from applied mappings for Wikidata item Q64
Mappings application The R2RWikidataExtractor merges the schema and value transformation property mappings. For every statement or qualifier it encounters, if mappings for the current Wikidata property exist, it tries to apply them and emit the mapped triples. Statements or qualifiers without mappings are discarded by the R2RWikidataExtractor but captured by the RawWikidataExtractor (cf. Section 5).
Besides the basic extraction phase, additional processing steps are added in the workflow.
Type inferencing In a similar way DBpedia calculates transitive types for every resource, the DBpedia Information Extraction Framework was extended to generate these triples directly at extraction time. As soon as an
Transitive redirects DBpedia already has scripts in place to identify, extract and resolve redirects. After the redirects are extracted, a transitive redirect closure is calculated and applied in all generated datasets by replacing the redirected IRIs to the final ones.
Validation The DBpedia extraction framework already takes care of the correctness of the extracted datatypes during extraction. This is achieved by making sure that the value of every property conforms to the range of that property (i.e.
IRI schemes
As mentioned earlier, we decided to generate the RDF datasets under the wikidata.dbpedia.org domain. For example,
Reification In contrast to Wikidata, simple RDF reification was chosen for the representation of qualifiers. This leads to a simpler design and further reuse of the DBpedia properties. The IRI schemes for the

Simple RDF reification example
IRI splitting The Wikidata data model allows multiple identical claims with different qualifiers. In those not so common cases the

Example of splitting duplicate claims with different qualifiers using dbo:wikidataSplitIri
Description of the DBw datasets. (R) stands for a reified dataset and (Q) for a qualifiers dataset
Description of the
Number of triples comparison before and after automatic class mappings extracted from Wikidata SubClassOf relations
A statistical overview of the
Wikidata sitelinks are processed to provide three datasets: (1)
Mapped facts are generated from the Wikidata property mappings (cf. Section 4.1). Based on a combination of the predicate and object value of a triple they are split in different datasets. Types, transitive types, geo coordinates, depictions and external
Raw facts consist of three datasets that generate triples with
Wikidata statement references are extracted in the references dataset using the reified statement resource IRI as subject and the
The statistics we present are based on the Wikidata XML dump from January 2016. We managed to generate a total of 1.4B triples with 188,818,326 unique resources. In Table 1 we provide the number of triples per combined datasets.
Class & property statistics We provide the 5 most popular
Mapping statistics In total, 269 value transformation mappings were defined along with 185
Redirects In the current dataset we generated 854,578 redirects – including transitive. The number of redirects in Wikidata is small compared to the project size but is also a relatively new project. As the project matures in time the number of redirects will increase and resolving them will have an impact on the resulting data.
Validation According to Table 1, a total of 2.9M errors originated from 11 wrong Wikidata-to-DBpedia schema mappings and 42,541 triples did not pass the ontology validation (cf. Section 4.2).
Access statistics There were more than 10 million requests to wikidata.dbpedia.org since May 2015 from 28,557 unique IPs as of February 2016 and the daily visitors range from 300 to 2.7K (cf. Fig. 3).
The access logs were analysed by using WebLog Expert.18
Top classes
Top properties
Top mapped qualifiers
Top properties in Wikidata
This dataset is part of the official DBpedia knowledge infrastructure and is published through the regular releases of DBpedia, along with the rest of the DBpedia language editions. The first DBpedia release that included this dataset is DBpedia release 2015-04. DBpedia is a pioneer in adopting and creating best practices for Linked Data and RDF publishing. Thus, being incorporated into the DBpedia publishing workflow guarantees: (a) long-term availability through the DBpedia Association and (b) agility in adopting any new best-practices promoted by DBpedia. In addition to the regular and stable releases of DBpedia we provide additional dataset updates from the project website. Table 7 provides an overview of the dataset metadata.
Besides the stable dump availability we created
Although it is early to identify a big range of possible use cases for

Number of daily visitors in
Technical details of
Listings 6 and 7 provide query examples with simple and reified statements. Since DBpedia provides transitive types directly, queries such as select all places can be formulated in SPARQL endpoints without SPARQL 1.1 support or simple scripts on the dumps. Moreover,
At the time of writing, there is a mismatch of the actual Wikidata syntax reported from the Wikidata paper, the Wikidata RDF dumps and the official Wikidata SPARQL endpoint.
The fact that the datasets are split according to the information they contain eases data consumption when someone needs a specific subset, e.g. coordinates. An additional important use case is data integration. Converting a dataset to a common schema, facilitates the integration of data. The
Use cases for Wikidata Through

Queries with simple statement

Queries with reified statements
Combination of both datasets Currently there is indeed an overlap of facts that exist both in DBpedia and Wikidata however, there are also a lot of facts that are unique to each dataset. For instance, DBpedia captures the links between Wikipedia pages that are used to compute page-rank datasets for different Wikipedia/DBpedia language editions. Using the page links from DBpedia and Wikidata as an article association hub, a global page-rank score for Wikidata items that takes the interconnection graph of all Wikipedias is possible. In general
We present an effort to provide an alternative RDF representation of Wikidata. Our work involved the creation of 10 new DBpedia extractors, a Wikidata2DBpedia mapping language and additional post-processing & validation steps. With the current mapping status we managed to generate over 1.4 billion RDF triples with CC0 license. According to the web server statistics, the daily number of
Footnotes
Acknowledgements
This work was funded by grants from the EU’s 7th & H2020 Programmes for projects ALIGNED (GA 644055), GeoKnow (GA 318159) and HOBBIT (GA 688227).
