Abstract
Ontologies have been used on the Web to enable semantic interoperability between parties that publish information independently of each other. They have also played an important role in the emergence of Linked Data. However, many ontologies on the Web do not see much use beyond their initial deployment and purpose in one dataset and therefore should rather be called what they are – (local) schemas, which per se do not provide any interoperable semantics. Only few ontologies are truly used as a shared conceptualization between different parties, mostly in controlled environments such as the BioPortal. In this paper, we discuss open challenges relating to true re-use of ontologies on the Web and raise the question: “are we better off with just one ontology on the Web?”
Keywords
Introduction
Back in 1993, Gruber introduced “ontologies”1
The plural use of the term “ontology” in computer science quite likely still raises eyebrows for anyone with a background in ontology in philosophy.
The use of ontologies as an approach to overcome the problem of semantic heterogeneity on the World Wide Web has since been well established. Semantic heterogeneity occurs whenever two contexts do not use the same interpretation of information. According to Goh [22] three causes for such semantic heterogeneity can be identified.
Many ontology-based approaches that address these causes of semantic heterogeneity have been proposed since [50,71]. The idea is that a shared ontology which carries a formal semantics, acts as a gold standard for the definition of information in different contexts and applications. Many kinds of ontologies have been proposed that can be classified on a spectrum from very lightweight ones that may consist of terms only, with little or no specification of the meaning of the term, to rigorously formalized logical theories [66]. In this paper we focus on the latter, i.e., formal ontologies expressed in RDFS/OWL.

Levels of abstraction in ontology design.
The ontology engineering community has proposed ontologies with different levels of abstractions to ease reuse and to also layer ontologies upon each other. Although no agreed upon ontology hierarchy exists, adapting the ontology classification of Guarino [26], we can largely distinguish four different levels of abstraction in ontology design as shown in Fig. 1.
While upper ontologies have experienced strong research interest in the early 2000’s, their use on the Web has largely been confined to the biomedical domain where the community, through the OBO foundry, maintained and mandated the use of the BFO upper ontology. In fact, in an analysis of links [29] in the LOD Cloud [1] we have discovered that not a single dataset in a corpus of 430 Linked Open Datasets that were investigated for this study reuses DOLCE or SUMO, the other two main open-source upper ontologies.
This lack of adoption of upper ontologies outside the biomedical domain can mostly be attributed to the complexity and rigidity of these ontologies and the often unintended inferences that would result from importing the upper ontology in a mid-level or domain ontology. Examples of such unintended inferences are global domain and range restrictions defined in an upper ontology (e.g., DOLCE+DnS Ultralite (DUL) uses global property restrictions) that may lead to inferences in the importing domain ontology that are inconsistent in its domain of discourse. Another example is the disjointness of a set of classes defined in an upper ontology that results in an unintended restriction on the use of the domain class that is a subclass of such an upper level class. For example, in the old SSN, the
Recognising the issues with adoption of upper ontologies, the ontology engineering community has developed reusable ontology design patterns [19] that are suitable to be used as templates (i.e., guiding design principles) in lower level ontologies. These patterns bring the benefits of a traditional upper-ontology-based integration approach while avoiding its pitfalls, i.e., the need of importing the upper ontology with all its ontological commitment. Over 200 such patterns have since been submitted to the ontology design pattern initiative2
See
Beyond the aforementioned challenges in reusing upper ontologies, evaluating which mid-level or domain ontology is suitable for a given use case is challenging for several reasons. Gómez-Pérez [23] has proposed a criteria-based approach to ontology evaluation. Yu et al. [72] have reviewed the various criteria that have been proposed for the evaluation of ontologies. These include clarity, coherence, extendibility, minimal ontological commitment, and minimal encoding bias as proposed by Gruber [24]; competency as proposed by Grüninger and Fox [25]; consistency, completeness, conciseness, expandability, and sensitiveness as proposed by Gómez-Pérez [23] and correctness as proposed by Guarino and Welty [27].
While some of these criteria (e.g., consistency) can be verified automatically using reasoners such as Pellet [59], FaCT
In the following we identify a set of challenges that we have repeatedly encountered in ontology engineering consultancies with Government and industry clients. These include some of the ontology evaluation criteria above (some of which, e.g., clarity, consistency, correctness, conciseness, are combined together into one category, ‘quality’), but also include other challenges that are specific to the reuse of distributed ontologies on the Web.
Availability For ontologies to be any use in terms of serving Linked Data, they need to be highly available, preferably in perpetuity. What that means is that the file encoding the ontology needs to be permanently retrievable at the namespace URI of the ontology. Although studies have shown [29,56] that ontologies have higher availability than Linked datasets built using these ontologies, various issues with accessing ontologies still exists. For example, purl.org, a popular service for over 15 years for creating permanent URLs on the Web that was used for many ontology namespaces including the Dublin Core Metadata initiative, ran into availability issues in 2015, as it was mostly a volunteer-driven community service. The Internet Archive has taken control of the service in the meantime and guarantees its continued support, while the W3C has since introduced w3id.org, a permanent identifier service for the Web. However, both services only offer a solution for the permanence of the URI, the ontology file itself has to still be stored persistently somewhere else. Many ontologies are now hosted on Github, but the long-term availability of this service depends on its commercial viability, and as history has shown not all such services survive: e.g., Google Code turned off its hosting services in 2016,3
Discoverability One of the main barriers for the uptake of ontologies has been the difficulty that data publishers face in discovering ontologies on the Web to describe the semantics of their data. Although, again the biomedical community has developed and maintained their own successful repository, the BioPortal [49], there has been a lack of a general-purpose ontology search engine or a central ontology library [10], beyond the relatively recently proposed Linked Open Vocabulary repository [68]. However, neither of the major search engine providers support the search or discovery of ontologies on the Web and therefore a non-expert ontology user has to largely rely on their social network to find and reuse existing ontologies. Ideally, in order to facilitate discoverability, search engines would need to provide a dedicated concept/property search operator, similar to “filetype” or “site” in Google. We emphasise that such services existed in the past,4
For instance, we used services like Sindice [64] and SWSE [35] in the past for auto-completion of ontology term search in Drupal [7].
Completeness & adaptability Completeness of an ontology can only be evaluated against the purpose it was built for. Typically this purpose has been expressed through a number of use cases against which the ontology has been validated [25]. Often, when reusing a specific ontology, the use case may differ from the one the ontology was built for, and consequently, not all concepts and axioms that are needed, are included in the ontology for reuse. Also, ideally, the ontology should be adaptable, i.e., the ontological commitment of the ontology should not prevent the reuse of a term in a different context (e.g., through unrestricted domain and range restrictions). However, studies [38] have found that term reuse from existing ontologies is not widespread (most ontologies reuse less than 5% of their terms), while almost one in three terms overlapped in the investigated ontology corpus, i.e., they could have been reused. While the study itself did not present findings on why these terms were not reused, the ontological commitment and semantic completeness of a term often influences its potential reuse.
Maintenance & versioning Curating and maintaining reusable ontologies is a prerequisite for their continuous relevance since the mental models of the world that the ontology has been created for may change. Just imagine a mobile phone ontology that was created in the late 90’s. It would not include concepts for a ‘touchscreen’, ‘fingerprint sensor’ or even a ‘wifi antenna’. These and human factors (mistakes in the ontology design) can lead to semantic drift in ontologies over time. In order to address these, ontologies need to undergo regular revision. Some of the most used ontologies on the Web [44], such as FOAF [6], SIOC [5] or SKOS [46], have undergone several revisions. Ontologies managed by the W3C, for example, do undergo regular revisions, most recently the W3C Time Ontology [8] underwent a revision more than 10 years after its first publication. When an ontology is revised, decisions have to be made on the versioning of the ontology namespace. In their seminal work on ontology versioning, Klein and Fensel [39] identified four different methods of how an ontology might be versioned; (1) the previous version is silently replaced by the new version; (2) the ontology is visibly changed, but the old version is replaced by the new version; (3) the ontology is visibly changed, and both versions are accessible at different URIs; or (4) there are two versions available at separate URIs and there is an explicit specification of the relation between terms in the new version and terms in the previous version. The authors also raise a question at what point a new URI should be minted, and recommend to change the namespace URI only in cases where the conceptualization of the ontology changes.
Ideally, every ontology should follow the guidelines proposed in Klein and Fensel [39] in combination with more recent guidelines around content negotiation [37] and use version numbers for changes in the conceptualization of the ontology in combination with a persistent URI that redirects to the most recent version of the ontology [17]. Another possible approach to versioning is to use the Memento protocol [67], or components thereof, to express temporal versioning of a dataset and to allow access to the version that was operational at a given datetime.
In many cases, however, either one of the first three approaches mentioned above is chosen instead when publishing an ontology. Even the popular FOAF ontology violates some of the proposed versioning principles. Although it uses different version numbers for the evolution of the ontology, it still uses the original namespace URI (i.e.,
Modularization There are two different methods one can reuse terms from an ontology; (1) either by directly importing the source ontology using an
Quality Beyond syntactic and semantic errors that can be checked by reasoners as mentioned above, the notion of the quality of an ontology is rather imprecise. Some even argue that ontologies on the Web do not need to be consistent, and systems should be able to deal with noise, different perspectives, and uncertainty [32]. In his dissertation, Vrandeĉić [15] investigates how to assess the quality of an ontology on the Web and concludes that a single measure to assess the overall quality of an ontology is elusive, and proposes ontology evaluation methods that identify shortcomings in ontologies instead. Few tools exist [54], though, that test such common shortcomings in ontologies, while no framework is available that assesses and compares the quality of ontologies available on the Web. Some ontologies are now undergoing a peer-review process in scientific conferences and journals, while others are being standardised, but still the vast majority of ontologies are not assessed for their quality. Therefore, users of ontologies need to have the expertise to assess the quality of an ontology themselves. Since most naïve users do not possess this skill and can not distinguish between high-quality and low-quality ontologies, they assess the ontology rather by its fit for a given use case.
Trust While ontologies are built in a truly decentralised manner, companies and organisations still need to trust the publisher when reusing a digital asset on the Web, such as an ontology. Consequently, the most popular ontologies have either been developed and/or are hosted by standardisation bodies such as the W3C (e.g., PROV-O [41], ORG [12], SSN/SOSA [31]), have a long history of availability, curation and community support (e.g., FOAF [6], SIOC [5]) or are supported through a community of best practices (e.g., the OBO Foundry). While the W3C has resisted to standardise ontologies for a long time, and still does not see itself in the business of doing so, the major search engines Google, Yahoo!, and Bing have built their own ontology (schema.org [28]) while Facebook has built its own simple social profile ontology, the Open Graph Protocol,5
See
The success story of schema.org as an ontology with very lightweight semantics, that already in 2015 has been used in 31.3% of all pages on the Web [28] and that is backed by a trusted consortium of search engine providers, raises the question of whether it is an end-all solution for defining terminology on the Semantic Web [45]. Revisiting the above challenges, let us briefly discuss if and how schema.org addresses these (cf. also Table 1).
Evaluation of reuse criteria for schema.org, wikidata.org and dbpedia.org ontologies
Evaluation of reuse criteria for schema.org, wikidata.org and dbpedia.org ontologies
Availability While neither the schema.org ontology itself is hosted by a publicly-funded open-access repositories nor is the namespace registered with a persistent URI service such as w3id.org, the ontology and namespace are managed by a consortia of globally operating search engines, which implies high availability and support for the ontology.
Discoverability Although the schema.org ontology is surprisingly hard to find on Google,6
E.g., a Google search for “product concept” or “product ontology concept” does not yield in a result to the schema.org “product” class (which is core to the ontology) within the first 10 result pages.
Completeness & adaptability With a strong focus on the eCommerce domain, schema.org is far from being a complete ontology for general human knowledge. However, a mechanism is provided where the community can propose extensions to schema.org. From personal experience (in the concrete case, a suggestion for addition to the ontology from the SOSA/SSN specification [30]), it appeared that the feedback process from outside the community is handled by a few individuals and not very dynamic. Although this is sufficient for data publishers that are mainly interested in improving the appearance of their search results on Google or the inclusion of their data in the Google Knowledge graph, it is an unsuitable process for governmental, industrial or science applications.
Maintenance & versioning Schema.org is continuously curated since its launch in 2011 [28]. Although the process of change in schema.org is transparent, with a release history that works through issues that have been raised on the tracker being published online, the changes to terms in the ontology are not made explicit in the term definition itself and the class or property URI is just servicing the new semantics of the term.
Modularization While schema.org is not published in a modular fashion, each term in the ontology is being served by its own webpage and through using a Linked Data content negotiation technique a subgraph is served at the same URI.
Quality While an ontology like schema.org that is constantly evolving may not always be consistent or correct, there is a feedback mechanism in the form of an issue tracker. Also, schema.org is using lightweight semantics with annotation properties (
Trust Since schema.org is supported by a consortia of all major search engine providers (other than Baidu) there is little doubt that users (will) trust schema.org. While that is true for the ontology itself, the data modelled using schema.org, however, has trustworthiness/reliability issues similar to any other data that is created on the Web for a commercial benefit of the publisher.
The analysis above shows that schema.org scores well in most of the considered reuse criteria. However, although we believe that schema.org will continue to evolve and we will see an even bigger uptake of it, we believe it is not yet the end-all ontology on the Web for two reasons; (1) in terms of its Completeness there is little indication that it will be extended beyond the eCommerce domain (with few exceptions like the Health and Lifesciences domain) any time soon. Moreover, data providers are providing schema.org annotations mainly for commercial reasons, i.e., better ranking and visibility on search engines [43], while there is little to no incentive for them to annotate non-commercial knowledge with schema.org; (2) in regards to its Quality, while the lightweight semantics were deliberately chosen to make annotations on the Web easier for the average Web developer [28], they prevent the use of the ontology in environments with a requirement for stricter formal connections such as in sciences’ domains or in the Governmental policy domain. Also, while community extensions are managed through an open process, the decision on additions to the ontology still sits with the providers of the ontology, i.e., the search engine companies.
The large uptake of schema.org [28,43] and the Open Graph protocol on the Web [44], however, are signs of an emerging trend of a long tail in ontology use on the Web, with some few ontologies seeing the majority of use, while most other ontologies are only used once in the use case they were built for, a phenomena that we also observed in a recent study [29].
There have been mainly two approaches, DBpedia7
Yago [61] is another very similar approach to DBpedia with a stronger taxonomic backbone that ensures better quality than DBpedia. However, at the time of writing, the latest stable release of Yago is from 2017, whereas DBpedia releases a new version monthly. We therefore limit our analysis to DBpedia, while both approaches can be considered largely equivalent in the assessment of the reuse criteria other than on the quality aspect.
Availability Both ontologies are highly available. That being said, while Wikidata is run by Wikimedia, the same organisation successfully hosting Wikipedia for more than 18 years, DBpedia is run by an association affiliated with the University of Leipzig.
Discoverability Although Wikidata does not yet have the same visibility as Wikipedia, its Alexa rank is 8,496 as of October 2019 compared to Wikipedia’s rank of 9, it can easily be reached through any page on Wikipedia. DBpedia, while extremely well known in the Semantic Web community, only ranks 158,385 on Alexa. From our own experience representing the W3C in Australia and chairing a Government Linked Data working group, it is largely unknown outside of the scientific Semantic Web community, even to people with ontology engineering skills. Assessing the discoverability of the ontology itself, Wikidata leaves a lot to be desired. To the best of our knowledge, it is impossible to download the entire ontology from the Wikidata site. There are pages listing some of the top-level concepts and relations,8
but to retrieve only the TBox statements from the Wikidata dump or SPARQL endpoint, someone would need to write sophisticated queries. DBpedia on the other hand releases its ontology as one file that is easily discoverable from its namespace URI (i.e.,Completeness & adaptability Neither Wikidata nor DBpedia are built for a specific use case, but they are rather generic knowledge bases that aim to capture the sum of all human knowledge (as of their vision statement). Studies have compared the breadth and depth of the knowledge captured and concluded that they are comparable [2]. Comparing the ontologies themselves is difficult, as for the difficulty in obtaining the entirety of the Wikidata ontology. However, it has to be noted, though, that there is a fundamental difference in how the two ontologies are built and how they can be adapted. Anyone can add concepts or relations to the Wikidata ontology directly, whereas in DBpedia concepts and relations are added to the ontology through the “schema” of the info boxes in Wikipedia, i.e., they cannot be added to the ontology directly. Reusing and adapting specific entities of either ontology is easy, as both ontologies are served through Linked Data APIs that allow one to reference the entity by its URI (while retrieving only its subgraph). The implications of doing that with a DBpedia entity are different to a Wikidata entity, as the former is an OWL-based ontology, whereas the latter does not rely on Description Logics’ (DL) semantics: the fact that Wikidata, by defining own properties and classes for relationships such as
Maintenance & versioning While both, Wikidata and DBpedia are continuously evolving ontologies that rely on a manually developed core, the major difference is that large parts of the Wikidata ontology are generated in a collaborative, bottom-up fashion by a large number of contributors, while the DBpedia ontology is created by the maintainers of the mapping from the Wikipedia info boxes to the DBpedia data set. Each release of the DBpedia ontology corresponds to a new release of the DBpedia data set. In terms of versioning the two approaches differ too. While DBpedia continuously uses the same namespace of the ontology, the version number is made explicit by an
Modularization Neither of the two ontologies is modularized. Whereas the DBpedia ontology is provided in one monolithic file, the Wikidata ontology can only be retrieved on the basis of an entity. The ontology itself can not be transparently retrieved at its namespace URI, nor can the ontology itself, to the best of our knowledge, be downloaded from a single source. The ontology is, of course, retrievable through the Wikidata SPARQL API, but even for expert users it is a challenge to just retrieve the TBox statements, given that this SPARQL endpoint gives also access to the entire Wikidata ABox.
Quality Both, the Wikidata ontology and the DBpedia ontology are collaboratively created. While editors can directly manipulate the Wikidata ontology through the MediaWiki software, the DBpedia ontology is derived through a mapping from the Wikipedia info boxes, which themselves are created by contributors to the English Wikipedia. However, since these info boxes are created using natural language, the mapping of attributes from those info boxes to ontology relations in DBpedia leads to issues with the conciseness and minimal commitment of the DBpedia ontology. For example, the current version of the ontology includes over two dozen relations (e.g.,
The Wikidata ontology does not introduce such redundancies, since the software will alert an editor if a relation already exists. It does, however, still suffer from modelling inconsistencies at lower levels of the class hierarchy. For example, in its current version as of October, a “Beef Wellington” (
Trust Beyond a manually created core, the Wikidata ontology is created in a collaborative fashion. As such, the quality varies, similar to how the quality of Wikipedia articles varies. Still, users of Wikipedia trust that the moderation process and the many editors make sure that the information is largely correct. Similarly, Wikidatans have collaborated to create and maintain the Wikidata ontology and one can expect that the users will have a fairly high trust in the ontology. While the same applies to DBpedia to a certain extent, the ontology itself is created through a mapping process and hosted by Universities that do not have the same brand recognition as Wikipedia/Wikidata.
While DBpedia has been around since its first public release in 2007 and seen great success as a core reference ontology and dataset in the Linked Data Cloud [29], it has not become the one general knowledge reference ontology on the Web. Also, studies have shown that the Linked Data cloud itself has become rather stale, of late [13,53,69]. Interestingly, parts of the Wikipedia info boxes that are used to create the RDF graph in DBpedia are now created from Wikidata (with a plan to progressively create all Wikipedia info boxes from Wikidata). This should lead, in the long term, to a convergence between the Wikidata and DBpedia ontology (essentially, making the latter obsolete).
While a future of highly distributed ontologies on the Web with strong linkage between them is still possible, evidence from analysis [29] of the most successful Linked Data project, the LOD cloud [1], largely paints a different picture. We believe, however, that the Wikidata ontology, which was only introduced in late 2012 together with the Wikidata project, may have more success in becoming this “one ontology on the Web”. Its strength lies in the bottom-up, collaborative development approach that strives to incorporate the source of a term. This means, for the ontology part, it reuses and references existing ontologies where possible, but mints URIs for entities in the Wikidata namespace. This clearly sets it apart from the schema.org and DBpedia approach, the former just creates entities in its namespace without an explicit reference to existing models, while the latter relies on these references being part of the Wikipedia info boxes. What that means for Wikidata is that it can incorporate existing, highly curated and high-quality ontologies. This means, that such ontologies that are built and maintained in domain portals, such as the BioPortal [49], the ETSI community building the Smart Appliances REFerence (SAREF) ontology [9] or the FiBO financial ontology,9
Cf.
However, although Wikidata meets most of the reuse criteria outlined above, there are still challenges that need to be addressed for it to become a true reference ontology for general knowledge on the Web, in particular in terms of its quality assurance and better accessibility and discoverability of the TBox itself. There are efforts to improve the quality of entities by including shape expressions for entities in Wikidata [62]. This should lead, in the long term, to more consistency between similar typed entities, and as such, also in its ontology. For the latter, we are not aware of efforts to make the ontology more accessible, but we are hoping that this discussion paper may contribute to this issue being addressed.
In this paper we have asked the question if we “are better off with just one ontology on the Web?”. Analysing the major challenges that publishers and users of ontologies face, and how schema.org addressed some of these challenges to become the most widely used ontology on the Web, we argue that we may indeed be better off with just one ontology on the Web. Similar to how the likes of Amazon, Google, Apple, Facebook or AirBnB benefit from the phenomena of a “winner takes all” network effect, a single winner-takes-it-all ontology would be a true boon for data interoperability on the Web. We argue that schema.org, despite its success in the eCommerce domain, is not (yet) the end-all solution to our ontology woes. We further argue that a winner-takes-it-all ontology should follow the same approach as the one taken by Wikipedia, and provide a bottom-up development of the ontology by the Web community. This bottom-up development of content on Wikipedia helped it, through a network effect, to become the only encyclopedia in use on the Web.
Wikidata as the sister project of Wikipedia to manage the factual human knowledge is building such a community-driven ontology with a strong focus on incorporating and referencing existing ontologies, while at the same time minting URIs in the Wikidata namespace. This allows it to thrive along-side specialised, high-quality domain ontology repositories, while at the same time increasing their visibility to people outside of these specialised communities.
While the Wikidata ontology still has issues with its modularization and access, only partially addresses the ontology versioning problem through metadata annotations (but not versioned URIs), and has variable quality in some knowledge domains due to its relative young age, we believe and propose that with small changes (the details of which are still in need to be worked out), its ontology could eventually become this one end-all solution to semantic interoperability on the Web.
