The Semantic Web: Two decades on

Abstract

More than two decades have passed since the establishment of the initial cornerstones of the Semantic Web. Since its inception, opinions have remained divided regarding the past, present and potential future impact of the Semantic Web. In this paper – and in light of the results of over two decades of development on both the Semantic Web and related technologies – we reflect on the current status of the Semantic Web, the impact it has had thus far, and future challenges. We first review some of the external criticism of this vision that has been put forward by various authors; we draw together the individual critiques, arguing both for and against each point based on the current state of adoption. We then present the results of a questionnaire that we have posed to the Semantic Web mailing list in order to understand respondents’ perspective(s) regarding the degree to which the original Semantic Web vision has been realised, the impact it can potentially have on the Web (and other settings), its success stories thus far, as well as the degree to which they agree with the aforementioned critiques of the Semantic Web in terms of both its current state and future feasibility. We conclude by reflecting on future challenges and opportunities in the area.

Keywords

Semantic Web ontologies Linked Data knowledge graphs

1. Introduction

Arguably the first concrete milestones towards realising the Semantic Web were the 1998 release of the initial versions of the Resource Description Framework (RDF) [14] and RDF Schema (RDFS) specifications [50]. In 2001, Berners-Lee et al. [9] would position RDF as a key technology for realising their vision of what they called the “Semantic Web”, which would “bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users”. A slew of developments were to follow, culminating in the release of numerous standards, such as OWL, SPARQL, SKOS, RIF, RDB2RDF, SHACL, ShEx, as well as a variety of updates to existing standards. Each standard has received varying degrees of attention and acceptance from researchers, developers, and publishers alike. We refer the reader to the survey by Gandon [25] for further details on the developments and trends in the Semantic Web research area spanning the first two decades.

More than two decades on there are varying opinions on the extent to which the original vision of Berners-Lee et al. [9] has been realised – or indeed, the extent to which it can or should be realised.

Within the Semantic Web community, there has long been a consensus that while the vision has yet to be fully translated into reality, it was a question of when, not if. In 2006, Shadbolt et al. [74], while admitting that the Semantic Web wasn’t “yet with us on any scale”, argued that it soon would be once the “standards are well established”. In 2007, Horrocks [43], while likewise admitting that “fully realising the Semantic Web still seems some way off”, argued that OWL had “already been very successful” and had “become a de facto standard for ontology development in fields as diverse as geography, geology, astronomy, agriculture, defence and the life sciences”. The years that followed were marked by optimism with regard to Linked Data, with authors claiming an exponential growth of data published following these principles [20,47,57,64]. Optimism was further expressed with the selective adoption of Semantic Web technologies by household names, including the BBC [48], the New York Times [71], Oracle [87], Facebook [82], Google [11,30], Wikimedia [86], Amazon [2], and so forth. More recent announcements of the development of knowledge graphs by Google [76], LinkedIn [34], Bing [75], eBay [67], Amazon [49], Airbnb [17], etc., have also been viewed as a win for the Semantic Web community.

The Semantic Web has not only had numerous proponents down through the years, but also numerous vocal opponents. As early as 2001, impassioned criticism of the vision of the Semantic Web began to emerge, with Doctrow’s often cited “Metacrap” essay [23] laying out the seven “insurmountable obstacles” that made the Semantic Web vision “a pipe-dream” in his view; in summary, he criticises the naivety of expecting users to create high-quality structured content, and of expecting domain ontologies to be globally agreed-upon given the many possible interpretations on how a particular domain may be described. Various other online articles and blog posts criticising the Semantic Web emerged through the years. Here we summarise a number of recent, prominent examples (found through web searches for Semantic Web-related terms combined with negative terms such as “fail”, “dead”, etc., further following hyperlinks to related articles):

In 2013, ter Heide [81] suggested that the Semantic Web had “failed” mainly due to: not catering to a typical user’s interests, not considering new streams of information such as messages, and expecting users to pull complex information rather than being pushed content relevant to them.

In 2014, Rothkind [69] discusses a thread on Hacker News, asking “is the Semantic Web still a thing?”, critiquing in particular the lack of incentive for publishers to invest in publishing Linked Data versus publishing the data in its native format; he highlights the lack of clear business models for doing so, noting that the infrastructure to exploit Linked Data had “not really materialized, and it’s hardly clear that it will”.

In 2016, Cagle [16] suggested that the Semantic Web had “failed”, primarily because it is hard to understand, and it does not fit with other familiar paradigms (citing Object Oriented Programming), arguing for more lightweight semantics (taxonomies) to alleviate the burden on users.

In 2017, Cabeda [15] suggested that the rapid advancement in Machine Learning techniques “leaves the Semantic Web in the dust”, and concluded that it “needs to evolve and integrate its ideas with artificial intelligence”.

In 2018, Target [80] – while giving a brief history on the major developments of the Semantic Web – suggests that it has “threatened to recede as an idea altogether”, observing that “work on the Semantic Web seems to have petered out”; while he acknowledges adoption in settings such as the Open Graph Protocol and schema.org, and commends technologies such as JSON-LD, he ultimately concludes that there are many “engineering and security issues” to be addressed before the original decentralised vision of the Semantic Web can be meaningfully realised.

These critiques of the Semantic Web raise a number of important issues in terms of the feasibility of realising its original vision and should be carefully considered in the context of the Semantic Web community: while the community is perhaps generally aware of such potential criticisms, it is not always clear what (if anything) should be done to address them.

Some such critiques have been addressed by members of the community, both formally and informally. In a 2013 keynote, Hendler [36] counters a number of criticisms of the Semantic Web – such as the lack of need for ontologies, the inability of the relevant technologies to scale, etc. – while ultimately concluding that there are open challenges to face, particularly in terms of uniting Ontologies and Linked Data, and developing practical reasoning methods for the Web. In a 2017 keynote, Mika [60] provides a brief history of the Semantic Web, noting a “chicken and egg” problem in the early days of applications requiring data and applications being needed to incentivise the publication of data, but discussing how more and more incentives are available for publishing data through initiatives such as Linking Open Data, schema.org, etc.; he further discusses some application domains – Semantic Search, eCommerce, Social Web – in which Semantic Web concepts are being deployed.

Given the differing opinions that yet exist two decades on, we believe it to be a fitting moment to understand the varying perspectives within the Semantic Web community itself regarding its impact thus far, the aforementioned critique, and the opportunities presented and challenges faced when looking to the future. Along these lines, in this paper:

we first review external critique of the Semantic Web, synthesising the primary criticisms raised, presenting an argument both for and against each;

we present the results of a questionnaire posed to the Semantic Web mailing list, aiming to ascertain the various perspectives of respondents regarding the extent to which Berners-Lee et al.’s original vision of the Semantic Web has been realised or can be realised, the level of perceived impact that the Semantic Web has had thus far on the current Web, the success stories of the Semantic Web, as well as opinions of the main points of critique resulting from the previous analysis;

we summarise the main success stories, opportunities, and challenges found regarding the past, present and future of the Semantic Web.

2. Critique of the Semantic Web

Based on the previous critiques of the Semantic Web, we now distil ten main criticisms paraphrased from these articles [15,16,23,69,80,81]; though the list of issues should not be considered comprehensive, it covers the main points in the articles found. We first summarise the point of criticism, providing references for sources that inspire its inclusion; we then argue both for and against each point in turn to better understand its implications.1

¹
We do so in the style of a debate, meaning that the author does not necessarily hold the point-of-view being argued for/against.

The criticism presented stems from authors with different roles and perspectives, representing diverse points of view. We further categorise each criticism according to: (i) economic: relating to financial costs, incentives, market, etc.; (ii) human: relating to individual users in terms of usability, accessibility, etc.; (iii) social: relating to groups of people in terms of agreement, social trends, etc.; and (iv) technical: relating to issues such as computational cost, difficulty to implement, etc. Later we will use these criticisms to form a questionnaire posted to the Semantic Web mailing list in order to gain insights into the perspectives of experts from the community on these issues.

2.1. The Semantic Web addresses a niche problem [81]

Categories: Human, Social

Critique: Scenarios used to motivate the Semantic Web are fact-based and often overly specific and complex. The majority of users are only interested in finding individual webpages with simple facts, opinions, social recommendations, etc., rather than solving complex queries on factual content involving multiple sources. The current Web, with the help of search engines like Google, thus covers (and will continue to cover) the needs of the vast majority of users.

For: Search engines such as Google, Bing, Yandex, etc., have improved considerably over the years, where finding information on the Web is now easier than ever. In a July 2014 analysis of organic Google click-through rates, Petrescu [66] estimated that users click on a result listed on the first page for 71.3% of searches and on a later page for 5.6% of searches; these figures do not account for users clicking paid results, finding answers directly on the results page, refining their search, etc. With current search engines, most user searches can be quickly and easily resolved. Aside from search, use-cases relating to modelling complex domains using ontologies, data integration in enterprises, etc., are not tangible for ordinary web users.

Against: There are many niche problems of importance to society with which the Semantic Web can help, including, for example, drug discovery in the case of rare diseases [45]. However, the Semantic Web is not limited to niche use-cases. Search engines themselves have been adopting Semantic Web concepts to enable semantic search; for example, through schema.org [30], Knowledge Graphs [75,76], etc. On the other hand, while current search engines are excellent for finding individual webpages, the Semantic Web vision addresses more complex types of queries that require drawing information from multiple sources on the Web. While current searches generally appear to be resolved quickly (e.g., are answered by a single high-ranking result), users may not be currently issuing more “complex” searches as they know search engines will not offer useful results. Searches requiring cross-referencing multiple webpages are not necessarily niche, but may rather be personalised [9]; for example, finding the closest store open now selling aspirin does not appear to be niche, and could be better automated with Semantic Web techniques. Regarding users’ interests, the Semantic Web does not only address encyclopaedic data, nor does it only address search; for example, its graph-based data model can be used to integrate and find novel connections within social data [13]. Regarding other use-cases, though users may not know of the use of Semantic Web techniques within specific domains or enterprises, this does not prevent them from benefiting from such technologies.

2.2. The Semantic Web will be made redundant by advances in Machine Learning before it has a chance to take off [15]

Category: Technical

Critique: The Semantic Web assumes that the current (HTML-based) Web is poorly machine-readable. However, advances in Machine Learning are increasingly undermining this assumption. By the time the Semantic Web could reach enough maturity to have major impact on the Web, Machine Learning will have advanced to a point where such technologies for publishing/consuming structured content are made redundant.

For: Advances in areas such as Deep Learning have led to results that previously seemed unachievable in the short term. Machines can now perform more “human-like” tasks with increasing precision and recall. These advances, combined with developments in Information Extraction, increasingly blur the lines between human-readable and machine-readable content [56]; as a relevant example, in the TAC–KBP “Cold Start” challenge, which requires systems to extract knowledge-bases from scratch from text, systems improved their $F_{1}$ scores from 0.48 to 0.58 on English texts in the space of a year (from 2016 to 2017) [26]. The need for a specialised machine-readable Web becomes more tenuous as machines succeed in processing our natural language with increasing fidelity.

Against: Techniques like Deep Learning are still applied as a form of specialised Artificial Intelligence, requiring extensive training data to build models for one particular task. Though impressive gains are being made, the aforementioned $F_{1}$ score of 0.58 [26] still leaves much to be desired. Addressing the tasks discussed by Berners-Lee et al. [9] on the current Web – without structured content – would require a general form of Artificial Intelligence as yet without precedent (sometimes referred to as AI-complete tasks [88]). Many of the prominent data-driven AI-style applications found in practice – such as digital assistants (Siri, Alexa, etc.) – in fact already rely on Semantic Web resources to provide structured content [54]. While the Semantic Web undoubtedly stands to benefit from Machine Learning, so too can applications using Machine Learning benefit from advances in the Semantic Web.

2.3. The Semantic Web depends too much on reliable publishers [16,23]

Categories: Human, Technical

Critique: The Semantic Web is founded on the idea that machines will automatically process structured content on the Web. Such processing is particularly brittle in the face of both indeliberate errors and deliberate deception due to unreliable publishers (as commonplace on the Web).

For: Automatically solving complex tasks on the Semantic Web involves processes such as inferencing to integrate information. Such processes work by assuming input data to be held true and computing other entailments that then follow; this assumption is clearly naive for Web data. Even small errors in the input data (e.g., inconsistent claims) can lead to nonsensical entailments; in previous work we found 301 thousand RDFS/OWL inconsistencies in a crawl of 4 million RDF documents (294 thousand relating to datatypes, 7 thousand relating to instances of disjoint classes) [12]. More complex tasks require more complex chains of inferencing, where each step accumulates a higher probability of error. Such processes could then be easily manipulated by deceptive agents.

Against: The Semantic Web community recognises that publishers are not always reliable, and though the issue of data quality is a major challenge, it is one that the community has been addressing [90]. Much like on the Web, rather than assume all information to be trustworthy, two elements are required: reliable sources of data, and methods to accurately estimate the reliability of sources. Specifically regarding inferencing, methods such as paraconsistent reasoning [53] are more robust to noisy inference, while methods such as authoritative and quarantined reasoning [68] select more trustworthy sources for inferencing based on link analysis. Finally – as acknowledged by the original vision paper [9] – users should not blindly trust results, but can rather be provided details (on-demand) of how these results were achieved, refining criteria as required.

2.4. The Semantic Web depends too much on ontological agreement [16,23]

Categories: Social, Technical

Critique: There is no single way to model a domain using an ontology. There is no global truth. Different stakeholders in the domain may consider different semantics for terms or even hold contradictory claims. The Semantic Web is brittle to differing views.

For: Is a tomato a “fruit” or a “vegetable”? Is Pluto a “planet”? Is Sherlock Holmes a “person”? The answer to each such question depends, either due to a lack of consensus, or ambiguity on what terms like “fruit”, “person”, etc., mean. While we might define in an ontology that all mayors are people, Bosco the Dog was elected mayor of Sunol, California while Duke the Dog was elected mayor of Cormorant, Minnesota. The real-world is messy and hosts innumerable perspectives on what is true, or what “truth” even means. Edit wars on Wikipedia evidence such disagreement [89]. These ambiguities and conflicts are the true underlying cause of interoperability issues, and rather than solving them, ontologies (particularly expressive ones), require them to be have been solved beforehand; doing so at the scope of the Web presupposes either a utopian (global agreement reached) or a dystopian (global agreement enforced) view of society.

Against: To be more precise, the Semantic Web benefits from – rather than requires – ontological agreement. The fact that full agreement cannot always be reached does not preclude the utility of formally capturing the agreement that can be reached. While agreement on detailed domain definitions is costly, ontologies such as SNOMED CT [51] show that it can be achieved with sufficient will and organisation. For the broader Web, initiatives such as schema.org [30] show that agreement is possible on lightweight semantic definitions (given sufficient incentives). The impact of collaboratively-edited datasets such as Wikidata [54,86] further exemplify ways in which (partial) agreement can be fostered in an emergent way. Considerable attention has been given by the Semantic Web literature to resolving inconsistencies reflecting different views [12], to inferencing over contextual data reflecting different versions of truth [31], and so forth. Furthermore, ontologies are defined in a decentralised way [84], where stakeholders can adopt their preferred ontology or define their own, giving rise to an emergent agreement; exemplifying this, Schmachtenberg et al. [72] found that FOAF and Dublin Core were used by 69% and 56% of the 1,014 RDF datasets that they crawled. In the case of multiple competing ontologies, mappings can be computed or defined to enable interoperability by bridging the concepts on which they agree [24]; along these lines Vandenbussche et al. [84] find over 5 thousand links between different vocabularies in their collection.

2.5. Publishing Semantic Web content on the Web has a prohibitively high cost [16]

Categories: Economic, Technical

Critique: Given data in a legacy format, a relational database, JSON, CSV, etc., there is a prohibitively high cost associated with publishing the data using the Semantic Web standards.

For: Publishing Semantic Web content in a suitable way – e.g., following Linked Data principles [35] – requires expertise. Where data are available in a structured format, conversion to RDF is far from straightforward, especially when issues such as offering dereferenceable IRIs, adding links, etc., are considered [42]. While certain types of data are easily conceptualised as RDF graphs, others require various forms of indirection (e.g., reification [37]) to be properly represented.

Against: Most websites are now based on data stored in databases. Standards have been developed to reduce the cost of publishing RDF from legacy data, key amongst which are the RDB2RDF mappings [6,22] for generating RDF data from relational databases, and JSON-LD for lifting JSON to an RDF-style data model [78]. Tools have been developed to help with tasks such as linking, most prominently Silk [85] and LIMES [63]. Exporters built into commonly-used platforms such as Drupal allow thousands of websites to begin publishing RDF quickly and easily [18]. Work continues to better support more and more types of data, such as the standardisation of the RDF Data Cube vocabulary for representing statistical data [19].

2.6. There are too few incentives for adopting Semantic Web technologies on the Web [69]

Categories: Economic, Social

Critique: Aside from the costs of using Semantic Web technologies on the Web, there is little incentive to do so, due in part to the fact that the infrastructure for publishing and/or exploiting such content on the Web has not been adequately developed or adopted.

For: The Semantic Web has long faced a chicken-and-egg problem [60]: incentives for publishing data require infrastructure to exploit those data, while infrastructure for exploiting data cannot develop without data. While the Linked Data community partially resolved this dilemma by successfully convincing various stakeholders to publish data on the (implicit) promise that applications would arrive to justify the cost, these applications did not emerge, and as a result, many datasets and related services went offline [5,41]; for example, Aranda et al. [5] estimated, in 2013, that around 29% of the 427 public SPARQL services they found had gone offline. The dearth of Linked Data applications hint at an important lesson: publishing data independently of a particular application implies higher costs for leveraging that data in that application; publishing data independently of any application then implies higher costs for all applications. Finally, one of the main incentives for publishing on the current Web is advertising revenue, where it is not clear how advertising would work on the Semantic Web where software agents, rather than humans, access websites [29].

Against: In the case of schema.org [30], publishers are incentivised to embed structured data in their webpages by the promise of “rich snippets”: having the data – denoting images, ratings, etc. – displayed in search engine results, offering a more eye-catching result summary that attracts more clicks; as a result, schema.org has been widely adopted on the Web, where Meusel et al. [58] found more than 700 thousand pay-level-domains (websites) hosting schema.org content in the 2014 WebDataCommons dataset. Such examples show that incentives do exist for Web publishers to provide more structured content: offering such content can, in the context of certain applications, help direct traffic back to a website or increase demand for a particular product or service it describes, which can drive new business models that replace traditional advertising revenues [29]. The varied use of datasets such as Wikidata [54,86] – whose SPARQL service received over over 3.8 million queries per day in the first quarter of 2018 [54] – show that a variety of applications – including some not originally envisaged – can benefit from the increasing availability of structured content offered by the Semantic Web.

2.7. The Semantic Web standards are too verbose [16,80]

Category: Human

Critique: The Semantic Web standards are (unnecessarily) long, complex and difficult to understand. This creates a major barrier for attracting new adopters. More concise standards would have been better.

For: Most of the Semantic Web standards have been designed by committee, anticipating use-cases that had yet to arrive or be fully understood, sometimes focusing on academic rather than practical issues. The resulting standards are difficult to understand, with much of their complexity dedicated to relatively niche issues; as a result, we can find various calls to simplify the standards, with, e.g., Berners-Lee calling for the deprecation of various features in the RDF standard in 2010 [8]. In the same way that JSON has become more popular than its more complex XML cousin, simpler standards that suffice for common needs will tend to win out versus complex standards that (additionally) address more niche need; along these lines, for example, Meusel et al. [59] found over five times more Microdata/Microformats statements than RDFa in the 2013 Common Crawl dataset; in previous works, we found that (pure) RDFS is much more prevalently used than OWL in Web data [27]; and so forth.

Against: When speaking of verbose standards, one should not overlook the SQL:2016 standard [44], which has 1,732 pages – yet the core of SQL is broadly adopted and understood. One does not need to understand the entire standard in order to profitably use parts of it. Along the same lines, one does not need to understand the model theoretic definitions of RDF to describe data in RDF, nor does one need to understand the semantic conditions defined for OWL to use it to describe an ontology, etc.; rather practitioners can start with a simple system based on the parts of the standards important for them, extending their use of the standards – as needs arise – towards building more complex (and powerful) systems that work for them. Simpler standards that arise can also be mapped to more complex standards; for example, Microdata and Microformats are directly convertible to RDF. More modern Semantic Web standards – such as JSON-LD [78] – have also had success in terms of adoption.2

²
See https://w3techs.com/technologies/details/da-jsonld/all/all; retrieved 2019-09-29: JSON-LD is used by 26.5% of websites.

2.8. The Semantic Web will not scale [7]

Category: Technical

Critique: Consuming data published using the Semantic Web standards requires algorithms with poor scalability and/or performance. Current implementations exhibit poor scalability and/or performance.

For: Even the most common tasks that one might consider over (most of) the Semantic Web standards are intractable. Deciding if two RDF graphs have been parsed from the same document, potentially with different blank node labels (aka. RDF isomorphism), is GI-complete [39]. SPARQL query evaluation is PSPACE-hard (PSPACE-complete for the original standard [65]). Entailment is undecidable for OWL (2) Full and N2EXPTIME-complete for OWL 2 DL [62]; infamously even the OWL “Lite” fragment of the original OWL standard – motivated as a more terse fragment permitting more efficient reasoning – was later found to have EXPTIME-complete entailment. Other experimental works have shown Semantic Web query engines to be considerably outperformed by relational databases; for example, with the Berlin SPARQL Benchmark, Bizer and Schultz [10] show that, in some cases, MySQL can execute 13 times more queries in a given time period than the best SPARQL store tested (Sesame) considering comparable queries.

Against: Such complexity results are not particular to Semantic Web proposals, where for example the complexity of SPARQL query evaluation is analogous to that for SQL [65]. More generally, worst-case complexity results rarely tell the whole story: the fact that there exists at least one input for which a task is difficult tells us little about how efficient solutions might be for practical inputs (see, e.g., [39]). Achieving scale and efficiency often requires trade-offs, where by trading in completeness, OWL reasoning has been shown to scale to billions of triples [68,83]; along similar lines, a variety of tractable profiles of OWL 2 have been defined that trade expressivity for efficiency of reasoning tasks [62]. More practically speaking, a poor implementation does not refute its underlying idea. With this aside, some more recent benchmarks show, for example, SPARQL engines being capable of outperforming graph databases and relational databases for more complex graph patterns [38]. Anecdotally, we can also point to Wikidata’s decision to use Semantic Web technologies (RDF, SPARQL, etc.) to publish and manage its content, with positive (performance) results [54]. Adoption of the Semantic Web standards by major vendors – such as Oracle [87] and Amazon [2] – further help to (anecdotally) refute this criticism.

2.9. The Semantic Web lacks usable systems & tools [16]

Categories: Human, Technical

Critique: Practitioners who are initially interested in adopting Semantic Web technologies are quickly alienated by a lack of usable tools for their use-cases.

For: While one may argue that end-users need not understand the Semantic Web to benefit from it – that the Semantic Web is something “under the hood” powering end-user applications – such an argument still supposes the availability of systems, tools, etc., for building these applications. While many systems and tools have been developed for the Semantic Web, the bulk have been created in an academic context for the purposes of proving a concept described in a paper. Systems often go offline after the paper is published; tools may rather be of a more prototypical nature; few resources are tested in terms of usability [46]; etc. On the other hand, newer competing technologies with more usable, developer-friendly resources are seeing more adoption, including formats such as JSON/Microdata/Microformats being more popular than RDF [59], the Neo4j graph database being far more popular than its closest SPARQL rival,3

³
https://db-engines.com/en/ranking ranks graph databases (including SPARQL engines) in terms of popularity, where as of 2019/05/25, Neo4j is ranked first (49.46 points), while the highest-ranked SPARQL engine – Virtuoso – is ranked fifth (2.73 points).

Facebook’s GraphQL [33] being widely adopted for public query interfaces (versus SPARQL/Linked Data), etc. The Semantic Web is thus left in the wake of alternative, more lightweight, more usable technologies.

Against: While the Semantic Web could always benefit from having more (usable) systems and tools, most standards have a variety of mature implementations to choose from (including from well-known vendors such as Oracle [87], Amazon [2], etc.). On the other hand, the adoption of similar, competing technologies is an opportunity for the Semantic Web, as in the case of JSON-LD [78] successfully leveraging the popularity of JSON to help (implicitly) bridge the gap between developers and the Semantic Web. Along similar lines, various works have looked at making property graphs – the model underlying many graph databases [3] – and RDF graphs interoperable [21,32]. The same story is borne out with proposals such as GraphQL-LD [79], this time bridging GraphQL and SPARQL. What we see, then, is increasing adoption of the core concepts underlying the Semantic Web: structured data formats, graph-based data modelling, public query APIs, etc.; with some syntactic glue, these advances can be leveraged as advances, in turn, for the Semantic Web.4

⁴

In a signed public comment in the questionnaire described later, Staab refers to this as a “hijacking strategy” (e.g., JSON-LD “hijacking” JSON, adding a core Semantic Web principle), expressing the opinion that is is an excellent way forward.

2.10. The Semantic Web advocates decentralisation, which is too costly [80]

Categories: Economic, Social, Technical

Critique: The original vision of the Semantic Web is a decentralised one (where, e.g., individual health care providers host their own web-site with their own structured content). On the other hand, on the current Web, centralisation has become the predominant paradigm (considering Google, Facebook, etc.). Decentralising the Semantic Web is too costly.

For: Berners-Lee et al. [9] talk about individual providers (doctors, physical therapists, etc.) hosting their own websites and agents, giving a decentralised setting for the Semantic Web. However, the Web has tended more and more towards centralisation, with individual providers rather collecting on central, specialised websites. For example, rather than hosting personal websites, most people rather host profiles on social networks. Likewise success stories sometimes quoted for the Semantic Web have involved some level of centralisation: Wikidata [86] centralises data creation and curation, schema.org [30] centralises the schema/ontology, and so forth. Decentralisation incurs significant conceptual and practical costs in terms of design, performance, etc. In terms of querying, for example, Schmidt et al. [73] demonstrate that local query processing is often orders of magnitude more efficient than federated querying over endpoints, even when statistics about remote data are made available for optimisation purposes. More generally, no precedent exists in the Semantic Web setting for the type of decentralised infrastructure envisaged by Berners-Lee [9].

Against: There is an emergent public awareness of the problems associated with growing centralisation in terms of users’ privacy, control of data, etc. Along these lines, the recently standardised Linked Data Platform [77], along with projects such as Solid [55], not only further a decentralised vision of the Semantic Web, but also position the Semantic Web as a path towards a more decentralised Web. Abstractly, the benefits of centralisation versus decentralisation are mostly technological – benefits that will inevitably shrink as technology continues to improve. Conversely, the benefits of decentralisation versus centralisation are mostly social, be they upholding privacy, avoiding hegemony and monopoly, averting censorship, etc. – benefits that will at least remain constant, or more likely grow, over time. Asymptotically speaking, the relative benefits of decentralisation will thus, over time, increasingly dominate those of centralisation.

3. Questionnaire

We have, thus far, presented ten points critiquing the Semantic Web, arguing both for and against each individual point; the goal in each case was not to reach a verdict, but rather to understand possible arguments on both sides. We are now rather interested to see what members of the Semantic Web community, more broadly, think of the current state of adoption of the Semantic Web, what impact it could have in future, what they view as the main success stories thus far, and finally, what they think of the previously raised points of critique. We are particularly interested in the perspectives of experts in the Semantic Web who have read and worked extensively on the topic and can thus offer a more informed opinion; it is important to keep in mind, however, that targetting experts in this way may in turn lead to a pro-Semantic Web bias.

We designed a questionnaire for these issues and sent it to the W3C Semantic Web mailing list5

⁵
semantic-web@w3.org; we also asked that members share the list with others who might be interested.

soliciting responses. All questions were left optional. The questionnaire was open to responses from May 12^th to May 25^th, 2019, in which time 113 responses were collected. In this section we present the details of the questionnaire and the responses received. Additional material is available online for the purposes of further analysis, including details of the questionnaire design, individual responses, public comments, keywords of success stories, and word clouds in SVG format [40].

3.1. Expertise of participants

The questionnaire began with two questions to ascertain the self-assessed level of expertise of the respondent in terms of Semantic Web topics. The first question asked respondents to select one of the following options regarding their own level of expertise:

Zero expertise (e.g., I have not read about the topic nor worked on the topic)

Some expertise (e.g., I have read about the topic but not worked on the topic)

Considerable expertise (e.g., I have read about the topic and worked occasionally on the topic)

Strong expertise (e.g., I have read and worked extensively on the topic)

The results are shown in Fig. 1, indicating strong expertise on the Semantic Web amongst respondents, as was the goal of the questionnaire: to target experts.

Fig. 1.

Self-reported expertise of respondents.

We were further interested to know if respondents’ expertise was mainly relating to academia, industry, or other settings; we thus asked respondents to select all that applied to them from the following:

I have worked on the Semantic Web in academia (more than 1 year of experience).

I have worked on the Semantic Web in industry (more than 1 year of experience).

I have worked on the Semantic Web outside of both academia and industry (more than 1 year of experience).

None of the above.

The results shown in Fig. 2 reveal that 63.7% of respondents have an academic background, while 42.5% (also) have an industrial background.6

⁶

We highlight a possible ambiguity in the question for what students should choose (noticed after posting the questionnaire).

Fig. 2.

Type of expertise of respondents in terms of Academia, Industry, Other, and combinations thereof.

3.2. Realisation and impact

In order to understand to what extent the respondents believe that the original vision of the Semantic Web has been already realised, to what extent they believe it can be realised in future, the impact it has had thus far and the impact it will have (in terms of both the Web and other settings), we posed the questions shown in Fig. 3 to the participants. The results are shown in Fig. 4, displaying the distribution of votes, as well as the mean $\bar{x}$ and p-value computed using a two-tailed z-test (recalling that $n = 113$ ) for the null hypothesis of the results being approximately normally distributed with $μ = 3$ (i.e., the mean being the neutral value: 3).

Fig. 3.

Realisation and impact section of the questionnaire.

Fig. 4.

Responses to realisation and impact section of the questionnaire (shown in Fig. 3).

From these results, we observe the following:

regarding the original vision of the Semantic Web, the majority of respondents believe that it remains mostly or completely unrealised;

regarding the potential for realising the original vision of the Semantic Web in future, while 10 respondents believe it is completely unfeasible to realise, 14 believe is it completely feasible to realise; other responses were weighted towards believing it is mostly feasible to realise;

regarding current impact on the Web, responses were weighted towards the centre: that while Semantic Web technologies play some role on the Web, they do not play a key role;

regarding future impact on the Web, responses were weighted towards an optimistic view, with 76 respondents indicating their belief that Semantic Web technologies will play a significant or key role on the future Web;

regarding current impact in settings other than the Web, responses were weighted towards the centre: that while Semantic Web technologies play some role, they do not play a key role;

regarding future impact in settings other than the Web, responses were again weighted towards optimism, with 76 respondents again indicating their belief that Semantic Web technologies will play a significant or key role in the future.

While respondents tend to be reserved about the extent to which the Semantic Web has been realised and the impact that related technologies have had thus far, they tend to be much more positive regarding the future; per Q2, however, the bright future they envisage for the Semantic Web does not necessarily depend on completely realising the original vision.

We also performed a two-tailed z-test to look for statistically-significant differences in mean responses for each of the six questions between the 40 respondents who have only worked in academy (i.e., the Acad (Only) group of Fig. 2) and all other respondents; with $α = 0.05$ , we found no such statistically-significant differences. The most notable difference was for Q6, with means of 3.615 and 3.931, respectively, for the academic-only group and the other group, suggesting that the latter group may believe more in the potential impact of the Semantic Web for settings not directly involving the Web than the former group; however, as aforementioned, with $p \approx 0.131 < α$ , the difference was not statistically significant.

Fig. 5.

Tag cloud of success stories for the Semantic Web (left) along with top-10 keywords (right).

3.3. Success stories

We next asked respondents to list success stories they associate with the Semantic Web; specifically:

What are the main success stories that you would associate with the Semantic Web thus far (if any)? Please specify one per line; you may use simple keywords referring to the name of a technology, system, standard, dataset, project, etc.

A text field was provided below the question.

A total of 90 non-empty responses were collected. In order to summarise the main success stories mentioned, the raw responses required some manual curation. While some respondents provided keywords on individual lines, others rather answered with full sentences or paragraphs of free text; in these cases, we manually extracted a list of keywords from such text. While some responses referred to concrete standards, datasets, initiatives, etc., other responses rather referred to more general concepts and domains. Regarding the latter cases, distinct but related terms – such as biology, bioinformatics, life sciences, etc. – were used by different respondents, potentially “splitting the vote”; in such cases, we manually selected and mapped related terms to a canonical term (e.g., in the previous case, we selected bioinformatics). A total of 394 occurrences of 136 unique keywords were found.

Figure 5 illustrates the main success stories referenced in the responses, with schema.org [30] being the most referenced project. Knowledge Graphs (e.g., [17,34,49,67,75,76]), Wikidata [86] and DBpedia [52] fill the next positions, followed by two keywords often mentioned side-by-side: Bioinformatics and Ontologies. Linked Data was next, followed by a sequence of three standards: RDF, JSON-LD and SPARQL. Informally, we noticed a number of clusters of responses: (1) those focused on the Web and Public Datasets, including search engines, embedded meta-data, Wikidata, DBpedia; etc.; (2) those focused on Semantics, including the use of ontologies in specific domains, particularly bioinformatics; (3) those focused on Enterprises, particularly relating to Knowledge Graphs, Data Integration and Data Governance, etc.; and (4) those focused on the Public Sector, including relevant initiatives within governments, libraries, museums, etc.

Fig. 6.

Example question for critique number 8.

3.4. Reaction to critique

The next part of the questionnaire sought feedback on the ten points of critique presented previously. More specifically, we presented the title and description of each point of critique as given in Section 2 without the associated arguments for or against. We then asked respondents to indicate the extent to which they agreed with the stated critique, both in terms of the current state of the Semantic Web, as well as how significant an obstacle it might pose to future development and adoption of the Semantic Web. In the cases of points (7) verbose standards, (8) does not scale and (9) lacks usable tools, we further ask respondents to indicate the standards they believe to be most problematic regarding the highlighted issue (if any), selecting zero-to-many from RDF (data model), RDFS, OWL and SPARQL. By way of example, Fig. 6 shows the question issued for point (8); the same structure was followed for other points, with C1 and C2 posed for all points, and C3 posed for points (7–9).

The results for C1 – level of agreement with respect to the current state of the Semantic Web – are summarised for all ten points of critique in Fig. 7, again showing the mean $\bar{x}$ and p-value computed as before. We see that respondents were most in agreement with the critiques regarding (9) a lack of usable tools, (6) a lack of incentives, and (3) a lack of tolerance to unreliable publishers. On the other hand, they mostly disagreed with the idea that (2) advances in Machine Learning render the Semantic Web redundant, and that (10) decentralisation is too costly. Other critiques rather saw a balance of responses.

Fig. 7.

Responses to C1 indicating the level of agreement for each critique regarding the current state of the Semantic Web, with options ranging from 1 Completely agree to 5 Completely disagree.

Fig. 8.

Responses to C2 indicating the level to which respondents think the highlighted issue will pose an obstacle to future adoption and development of the Semantic Web, with options ranging from 1 Insurmountable obstacle to 5 Not an obstacle / Trivial to resolve.

Looking to the future, the results for C2 are presented in Fig. 8; while in general we see few responses indicating that the presented issue is insurmountable (option 1), we see many responses indicative of major obstacles (option 2) to be overcome. More generally, the most critical challenges that the Semantic Web must face in future according to the respondents are (6) a lack of incentives, and (3) a lack of tolerance to unreliable publishers; when compared to responses for the current state of the Semantic Web, respondents are slightly more optimistic regarding (9) a lack of usable tools. Conversely, respondents do not see (2) redundant w/ML or (5) costly publishing as posing major challenges relative to other issues.

With respect to the four categories previous discussed (Economic, Human, Social, Technical), from Figs 7 and 8, we observe that the criticisms for which there was most agreement related predominantly to Human issues: (3) unreliable publishers, (7) verbose standards, and (9) lacks usable tools; on the other hand, respondents tended to disagree with purely Technical issues: (2) redundant w/ML, and (8) won’t scale. These results suggest that respondents see more pressing issues relating to the human aspect of Semantic Web technologies rather than the technical aspect.

Fig. 9.

Responses to C3 for critiques 7–9 indicating the standards believed to be most problematic with respect to the highlighted issue.

Within these results, we again look at the difference between the respondents with only academic experience and other respondents; we find statistically-significant differences for three questions:

C2: Unreliable publishers The means were different with statistical significance ( $p \approx 0.002$ ) – 2.450 for the academic-only group, versus 2.986 for the other group – indicating that the academic-only group tends to view noisy/deceptive data as a greater future obstacle for the Semantic Web.

C1: Won’t Scale The means were different with statistical significance ( $p \approx 0.007$ ) – 2.900 for the academic-only group, versus 3.507 for the other group – indicating that the academic-only group tends to agree more that scalability is currently a significant issue facing the Semantic Web.

C2: Won’t Scale The means were different with statistical significance ( $p \approx 0.002$ ) – 2.775 for the academic-only group, versus 3.411 for the other group – indicating that the academic-only group tends to view a lack of scalability as a greater future obstacle for the Semantic Web.

Finally, regarding (7) verbose standards, (8) problems with scale, and (9) a lack of usable tools, Fig. 9 presents the results of C3 indicating the standards that respondents feel most problematic. We see that OWL, followed by SPARQL, have the most responses in terms of being problematic for each of the three highlighted issues. Notably, the OWL 2 standard defines three tractable profiles [62] that aim to address issues (7) and (8), and a number of non-standard proposals such as RDFS⁺ [1] or OWL-LD [27] have also been put forward; despite these proposals, the responses show that the majority of the respondents view these issues as unresolved for OWL. Of the three critiques, (9) a lack of usable tools is the one identified as most universally affecting the standards according to respondents, 34 of whom identified all four standards as being problematic with respect to this issue.

3.5. Comments

The questionnaire ended with a comments section, where respondents could indicate both public and private comments. These comments varied in content.

Some comments, both positive and negative, spoke directly of the questionnaire. Aside from individual comments relating to the questionnaire being too long, the way in which options were ordered, and the lack of a “don’t know” option (rather each question was optional) a number of public comments suggested other issues not raised, specifically relating to: social aspects, shared vocabularies, complex information modelling, agility of standardisation, RDF syntaxes, semantic modelling, lack of high-level abstractions, etc.

Other comments expressed more detailed opinions on the overall theme of the questionnaire, on specific critiques, or on their outlook for the Semantic Web. Some comments related to being less focused on adoption of Semantic Web standards and more focused on the adoption of its concepts and best practices (even if not using RDF(S), SPARQL, OWL, etc.); how incentives may be bootstrapped; a lack of focus on how data are used; key use-cases such as data maintenance and research data management (under FAIR principles); the need for new/improved standards; the difficulty of modelling certain data in RDF; the need for more dogfooding, education and marketing; problems with the Semantic Web being driven primarily by academia; etc. Other comments rather took a more pessimistic view, noting that if the Semantic Web were useful we should have seen more of it by now, that the Web of “walled gardens” looks set to continue, etc. We refer to the public comments online for more details [40].

4. Discussion

Two decades on, the general consensus in the Semantic Web community appears to be that there is still a long way to go before the original vision of the Semantic Web is realised. On the other hand, the consensus is that Semantic Web technologies are presently having some impact on both the Web and in non-Web settings, and will continue to have more impact looking to the future. Along these lines, respondents to our survey cite success stories such as schema.org, Knowledge Graphs, Wikidata, DBpedia, Biomedical Ontologies, etc., as examples where the Semantic Web has had most impact thus far. On the other hand, a lack of usable tools, a lack of incentives, a lack of robustness for unreliable publishers, and overly verbose standards, in particular, are widely acknowledged as valid criticisms of the Semantic Web in its current state.

Looking to the future, the general consensus is that while none of the highlighted issues are insurmountable, many do pose non-trivial obstacles to the further adoption and development of the Semantic Web. A theme widely recognised as a key obstacle for the Semantic Web is the lack of availability of usable tools; such issues are known with the community and have been discussed, for example, by Karger et al. [46]. Part of the reason for the lack of usable tools may also be due to the largely academic nature of the Semantic Web, where work on such tools is difficult to publish (seen as “engineering” rather than “science”), while the community perhaps lacks expertise in areas such as Human Computer Interaction (HCI) relating to conducting and publishing usability studies. Another major issue is the lack of incentives, which, with some exceptions such as schema.org [30], remains a general challenge; while some authors have begun to tackle this issue from a more general point-of-view [29], more work is called for. The results of the questionnaire also highlight the need for more work on data quality [90] and methods to ensure robustness in the presence of unreliable publishers [53,68]. The results further reveal issues relating to the (perceived) verboseness of the core standards, particularly OWL, perhaps suggesting the need for (further [1,27,62]) work to better understand and address this issue. A more transversal theme is implicit in the responses: the Semantic Web needs more contributors from other research communities and from outside academia.

Among all of the mentioned issues, one that stands out, in particular, relates to the usability of Semantic Web technologies and their accessibility to newcomers. We thus call for more work on this particular topic – work that may take a number of directions. First, we require more work on tools and interfaces that reduce the cognitive load and expertise required for users to benefit from Semantic Web technologies; ideally, the design of such tools and interfaces should be guided by usability studies with target end-users. Second, we require more work on making the Semantic Web standards more accessible and appealing to newcomers; this may involve simplifying standards, creating more lightweight profiles of existing standards, creating interactive primers to motivate and introduce the standards in a more engaging manner, and so forth.

Along similar lines, we further call for more works that bridge the Semantic Web with other technologies having similar goals, particularly those that gain (or have gained) considerable traction. This may take a number of forms. In the case of languages, mappings can be created to make the technologies interoperable, as was done in the case of OBO and OWL [28], SPARQL and SQL [22], and so forth; more work can be done to align new query languages like Cypher [61], Gremlin [70] or GraphQL [33] with SPARQL [4,21,79], thus more closely aligning the graph database/NoSQL/Web developer community with the Semantic Web. A second option is to take existing technologies, and extend them to support Semantic Web concepts; this has worked particularly well for JSON-LD [78], taking a familiar concept for developers – JSON – and adding some additional syntactic sugar to create an RDF-compatible data format.

The results presented herein highlight that the original vision of the Semantic Web still eludes us, though major strides have been made in recent years. No matter how elusive, however, the Semantic Web vision remains an alluring one (at least to some, including the present author). We are all intimately aware of how the Web has revolutionised society, where the Semantic Web has the potential to further propel the Web to a new stage, marked by unprecedented levels of automation and convenience for users. Unlike twenty years ago, we now have the benefit of many years of experience and research on the topic, as well as established success stories like schema.org, Wikidata, Biomedical Ontologies, etc., to further build upon. Even a partial realisation of the Semantic Web vision will serve (and arguably is serving) as a great boon to society, much like how A.I. is finding more and more applications without ever having surpassed the Turing test. Part of the criticism, perhaps, stems from comparing the Semantic Web with the Web: a technological development to which almost anything else would pale in comparison; while the Semantic Web has not seen the same level of rapid growth and penetration as the Web, this does not devalue the (sometimes quiet) impact that the Semantic Web community can point to, while still hinting at the vast impact it could potentially have. Two decades on, it is thus still a vision that merits patient pursuit, even if – or perhaps even especially given that – there is much work left to be done before the Semantic Web holds the sorts of conclusive answers that might satisfy even its most ardent critics.

Footnotes

Acknowledgements

We thank the respondents to the questionnaire and the reviewers for their helpful comments. This work was funded in part by the Millennium Institute for Foundational Research on Data (IMFD) and Fondecyt, Grant No. 1181896.

References

Allemang and

J.A.

Hendler , Semantic Web for the Working Ontologist – Effective Modeling in RDFS and OWL, 2nd edn, Morgan Kaufmann, 2011, ISBN 978-0-12-385965-5.

Amazon Neptune: Fast, reliable graph database built for the cloud, 2019, https://aws.amazon.com/neptune/.

Angles ,

Arenas ,

Barceló ,

Hogan ,

J.L.

Reutter and

Vrgoc , Foundations of modern query languages for graph databases, ACM Comput. Surv. 50(5) (2017), 68:1–68:40. doi:10.1145/3104031.

Angles ,

Thakkar and

Tomaszuk , RDF and Property Graphs Interoperability: Status and Issues, in: Alberto Mendelzon International Workshop (AMW), Vol. 2369, CEUR-WS.org, 2019.

C.B.

Aranda ,

Hogan ,

Umbrich and

Vandenbussche , SPARQL Web-querying infrastructure: Ready for action? in: International Semantic Web Conference, 2013, pp. 277–293. doi:10.1007/978-3-642-41338-4_18.

Arenas ,

Bertails ,

Prud’hommeaux and

Sequeda , A Direct Mapping of Relational Data to RDF, 2012, https://www.w3.org/TR/rdb-direct-mapping/.

Bergman , Scalability of the Semantic Web, 2006, http://www.mkbergman.com/227/scalability-of-the-semantic-web/.

Berners-Lee , The Future of RDF, 2010, https://www.w3.org/DesignIssues/RDF-Future.html.

Berners-Lee ,

Hendler and

Lassila , The Semantic Web, Scientific American 284(5) (2001), 34–43.

10.

Bizer and

Schultz , The Berlin SPARQL benchmark, Int. J. Semantic Web Inf. Syst. 5(2) (2009), 1–24. doi:10.4018/jswis.2009040101.

11.

K.D.

Bollacker ,

Evans ,

Paritosh ,

Sturge and

Taylor , Freebase: A collaboratively created graph database for structuring human knowledge, in: ACM SIGMOD International Conference on Management of Data, 2008, pp. 1247–1250. doi:10.1145/1376616.1376746.

12.

P.A.

Bonatti ,

Hogan ,

Polleres and

Sauro , Robust and scalable Linked Data reasoning incorporating provenance and trust annotations, J. Web Semant. 9(2) (2011), 165–201. doi:10.1016/j.websem.2011.06.003.

13.

J.G.

Breslin ,

Passant and

Vrandecic , Social Semantic Web, in: Handbook of Semantic Web Technologies, Springer, 2011, pp. 467–506. doi:10.1007/978-3-540-92913-0_12.

14.

Brickley ,

R.V.

Guha and

Layman , Resource Description Framework (RDF) Schemas, 1998, https://www.w3.org/TR/1998/WD-rdf-schema-19980409/.

15.

Cabeda , Semantic Web is Dead, Long live the AI!!! 2017, https://hackernoon.com/semantic-web-is-dead-long-live-the-ai-2a5ea0cf6423.

16.

Cagle , Why the Semantic Web Has Failed, 2016, https://www.linkedin.com/pulse/why-semantic-web-has-failed-kurt-cagle.

17.

Chang , Scaling Knowledge Access and Retrieval at Airbnb, 2018, https://medium.com/airbnb-engineering/scaling-knowledge-access-and-retrieval-at-airbnb-665b6ba21e95.

18.

Corlosquet ,

Delbru ,

Clark ,

Polleres and

Decker , Produce and consume Linked Data with Drupal!, in: International Semantic Web Conference, 2009, pp. 763–778. doi:10.1007/978-3-642-04930-9_48.

19.

Cyganiak ,

Reynolds and

Tennison , The RDF Data Cube Vocabulary, 2014, https://www.w3.org/TR/vocab-data-cube/.

20.

Dadzie and

Pietriga , Visualisation of Linked Data, Reprise, Semantic Web 8(1) (2017), 1–21. doi:10.3233/SW-160249.

21.

Das ,

Srinivasan ,

Perry ,

E.I.

Chong and

Banerjee , A tale of two graphs: Property graphs as RDF in oracle, in: International Conference on Extending Database Technology (EDBT), OpenProceedings.org, 2014, pp. 762–773. doi:10.5441/002/edbt.2014.82.

22.

Das ,

Sundara and

Cyganiak , RDB to RDF Mapping Language (2012), R2RML, https://www.w3.org/TR/r2rml/.

23.

Doctorow , Metacrap: Putting the torch to seven straw-men of the meta-utopia, 2001, https://people.well.com/user/doctorow/metacrap.htm.

24.

Euzenat and

Shvaiko , Ontology Matching, 2nd edn, Springer, 2013, ISBN 978-3-642-38720-3.

25.

Gandon , A Survey of the First 20 Years of Research on Semantic Web and Linked Data, 2018, ff10.3166/ISI.23.3-4.11-56ff. ffhal-01935898.

26.

Getman ,

Ellis ,

Song ,

Tracey and

S.M.

Strassel , Overview of linguistic resources for the TAC KBP 2017 evaluations: Methodologies and results, in: Text Analysis Conference (TAC), NIST, 2017.

27.

Glimm ,

Hogan ,

Krötzsch and

Polleres , OWL: Yet to arrive on the Web of data?, in: Linked Data on the Web (LDOW), CEUR-WS.org, 2012.

28.

Golbreich ,

Horridge ,

Horrocks ,

Motik and

Shearer , OBO and OWL: Leveraging Semantic Web technologies for the life sciences, in: International Semantic Web Conference (ISWC), Springer, 2007, pp. 169–182. doi:10.1007/978-3-540-76298-0_13.

29.

Grubenmann ,

Bernstein ,

Moor and

Seuken , Financing the Web of data with delayed-answer auctions, in: World Wide Web Conference (WWW), ACM, 2018, pp. 1033–1042. doi:10.1145/3178876.3186002.

30.

R.V.

Guha ,

Brickley and

Macbeth , Schema.org: Evolution of structured data on the web, CACM 59(2) (2016), 44–51. doi:10.1145/2844544.

31.

R.V.

Guha ,

McCool and

Fikes , Contexts for the Semantic Web, in: International Semantic Web Conference, Springer, 2004, pp. 32–46. doi:10.1007/978-3-540-30475-3_4.

32.

Hartig , Foundations of RDF⋆ and SPARQL⋆, in: Alberto Mendelzon International Workshop on Foundations of Data Management and the Web, CEUR-WS.org, 2017.

33.

Hartig and

Pérez , Semantics and complexity of GraphQL, in: World Wide Web Conference (WWW), ACM, 2018, pp. 1155–1164. doi:10.1145/3178876.

34.

He ,

B.-C.

Chen and

Agarwal , Building the LinkedIn Knowledge Graph, 2016, https://engineering.linkedin.com/blog/2016/10/building-the-linkedin-knowledge-graph.

35.

Heath and

Bizer , Linked Data: Evolving the Web into a Global Data Space, Synthesis Lectures on the Semantic Web, Morgan & Claypool Publishers, 2011. doi:10.2200/S00334ED1V01Y201102WBE001.

36.

Hendler , Why the semantic web will never work, in: Extended Sematic Web Conference (ESWC), Keynote, 2013, http://videolectures.net/eswc2011_hendler_work/.

37.

Hernández ,

Hogan and

Krötzsch , Reifying RDF: What works well with Wikidata? in: International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS), Vol. 1457, CEUR-WS.org, 2015, pp. 32–47.

38.

Hernández ,

Hogan ,

Riveros ,

Rojas and

Zerega , Querying Wikidata: Comparing SPARQL, relational and graph databases, in: International Semantic Web Conference, 2016, pp. 88–103. doi:10.1007/978-3-319-46547-0.

39.

Hogan , Canonical forms for isomorphic and equivalent RDF graphs: Algorithms for leaning and labelling blank nodes, TWEB 11(4) (2017), 22:1–22:62. doi:10.1145/3068333.

40.

Hogan , Responses to “Semantic Web: Perspectives” Questionnaire, 2019. doi:10.5281/zenodo.3229401.

41.

Hogan ,

Hitzler and

Janowicz , Linked Dataset description papers at the Semantic Web journal: A critical assessment, Semantic Web 7(2) (2016), 105–116. doi:10.3233/SW-160216.

42.

Hogan ,

Umbrich ,

Harth ,

Cyganiak ,

Polleres and

Decker , An empirical survey of Linked Data conformance, J. Web Semant. 14 (2012), 14–44. doi:10.1016/j.websem.2012.02.001.

43.

Horrocks , Semantic Web: The story so far, in: International Cross-Disciplinary Conference on Web Accessibility (W4A), 2007, pp. 120–125. doi:10.1109/ICTAI.2007.181.

44.

Information technology – Database languages – SQL, 2016, SQL:2016; ISO/IEC 9075:2016.

45.

Kanza and

J.G.

Frey , A new wave of innovation in Semantic Web tools for drug discovery, Expert Opinion on Drug Discovery 14(5) (2019), 433–444. doi:10.1080/17460441.2019.1586880.

46.

D.R.

Karger , The Semantic Web and end users: What’s wrong and how to fix it, IEEE Int. Comp. 18(6) (2014), 64–70. doi:10.1109/MIC.2014.124.

47.

Karnstedt ,

Sattler and

Hauswirth , Scalable distributed indexing and query processing over Linked Data, J. Web Semant. 10 (2012), 3–32. doi:10.1016/j.websem.2011.11.010.

48.

Kobilarov ,

Scott ,

Raimond ,

Oliver ,

Sizemore ,

Smethurst ,

Bizer and

Lee , Media meets Semantic Web – how the BBC uses DBpedia and Linked Data to make connections, in: European Semantic Web Conference (ESWC), 2009, pp. 723–737. doi:10.1007/978-3-642-02121-3.

49.

Krishnan , Making search easier: How Amazon’s Product Graph is helping customers find products more easily, 2018, https://blog.aboutamazon.com/innovation/making-search-easier.

50.

Lassila and

R.R.

Swick , Resource Description Framework (RDF) Model and Syntax Specification, 1998, https://www.w3.org/TR/1998/WD-rdf-syntax-19981008/.

51.

Lee ,

de Keizer ,

Lau and

Cornet , Literature review of SNOMED CT use, Journal of the American Medical Informatics Association 21(e1) (2013), e11–e19. doi:10.1136/amiajnl-2013-001636.

52.

Lehmann ,

Isele ,

Jakob ,

Jentzsch ,

Kontokostas ,

P.N.

Mendes ,

Hellmann ,

Morsey ,

van Kleef ,

Auer and

Bizer , DBpedia – a large-scale, multilingual knowledge base extracted from Wikipedia, Semantic Web 6(2) (2015), 167–195. doi:10.3233/SW-140134.

53.

Maier ,

Ma and

Hitzler , Paraconsistent OWL and related logics, Semantic Web 4(4) (2013), 395–427. doi:10.3233/SW-2012-0066.

54.

Malyshev ,

Krötzsch ,

González ,

Gonsior and

Bielefeldt , Getting the most out of wikidata: Semantic technology usage in Wikipedia’s knowledge graph, in: International Semantic Web Conference, 2018, pp. 376–394. doi:10.1007/978-3-030-00668-6_23.

55.

Mansour ,

A.V.

Sambra ,

Hawke ,

Zereba ,

Capadisli ,

Ghanem ,

Aboulnaga and

Berners-Lee , A demonstration of the solid platform for social Web applications, in: International Conference on World Wide Web (WWW), 2016, pp. 223–226. doi:10.1145/2872518.2890529.

56.

J.L.

Martinez-Rodriguez ,

Hogan and

Lopez-Arevalo , Information Extraction Meets the Semantic Web: A Survey, Semantic Web, 2020. To appear, http://www.semantic-web-journal.net/content/information-extraction-meets-semantic-web-survey-0. doi:10.3233/SW-180333.

57.

P.N.

Mendes ,

Mühleisen and

Bizer , Sieve: Linked data quality assessment and fusion, in: PJoint EDBT/ICDT Workshops, 2012, pp. 116–123. doi:10.1145/2320765.2320803.

58.

Meusel ,

Bizer and

Paulheim , A web-scale study of the adoption and evolution of the schema.org vocabulary over time, in: International Conference on Web Intelligence, Mining and Semantics (WIMS), ACM, 2015, pp. 15:1–15:11. doi:10.1145/2797115.2797124.

59.

Meusel ,

Petrovski and

Bizer , The WebDataCommons microdata, RDFa and microformat dataset series, in: International Semantic Web Conference, 2014, pp. 277–292. doi:10.1007/978-3-319-11964-9_18.

60.

Mika , What happened to the Semantic Web?, in: ACM Conference on Hypertext and Social Media (HYPERTEXT), 2017, p. 3. doi:10.1145/3078714.3078751.

61.

J.J.

Miller, Graph Database Applications and Concepts with Neo4j, in: Southern Association for Information Systems Conference (SAIS), AIS ELibrary, 2013. doi:10.1002/bult.2010.1720360610.

62.

Motik ,

B.C.

Grau ,

Horrocks ,

Wu ,

Fokoue and

Lutz , OWL 2 Web Ontology Language Profiles, 2nd edn, 2012, https://www.w3.org/TR/owl2-profiles/.

63.

A.N.

Ngomo and

Auer , LIMES – a time-efficient approach for large-scale link discovery on the Web of data, in: International Joint Conference on Artificial Intelligence (IJCAI), 2011, pp. 2312–2317. doi:10.5591/978-1-57735-516-8/IJCAI11-385.

64.

Oren ,

Delbru ,

Catasta ,

Cyganiak ,

Stenzhorn and

Tummarello , Sindice.com: A document-oriented lookup index for open linked data, IJMSO 3(1) (2008), 37–52. doi:10.1504/IJMSO.2008.021204.

65.

Pérez ,

Arenas and

Gutiérrez , Semantics and complexity of SPARQL, ACM Trans. Database Syst. 34(3) (2009), 16:1–16:45. doi:10.1145/1567274.1567278.

66.

Petrescu , Google Organic Click-Through Rates in 2014, 2014, https://moz.com/blog/google-organic-click-through-rates-in-2014.

67.

Pittman ,

Srivastava ,

Hewavitharana ,

Kale and

Mansour , Cracking the Code on Conversational Commerce, 2017, https://www.ebayinc.com/stories/news/cracking-the-code-on-conversational-commerce/.

68.

Polleres ,

Hogan ,

Delbru and

Umbrich , RDFS and OWL reasoning for Linked Data, in: Reasoning Web, Springer, 2013, pp. 91–149. doi:10.1007/978-3-642-39784-4.

69.

Rochkind , Is the semantic web still a thing?, 2014, https://bibwild.wordpress.com/2014/10/28/is-the-semantic-web-still-a-thing/.

70.

M.A.

Rodriguez , The Gremlin graph traversal machine and language, in: Symposium on Database Programming Languages (DBPL), ACM, 2015, pp. 1–10. doi:10.1145/2815072.2815073.

71.

Sandhaus , Build Your Own NYT Linked Data Application, 2010, https://open.blogs.nytimes.com/2010/03/30/build-your-own-nyt-linked-data-application/.

72.

Schmachtenberg ,

Bizer and

Paulheim , Adoption of the linked data best practices in different topical domains, in: International Semantic Web Conference (ISWC), Springer, 2014, pp. 245–260. doi:10.1007/978-3-319-11964-9.

73.

Schmidt ,

Görlitz ,

Haase ,

Ladwig ,

Schwarte and

Tran , FedBench: A benchmark suite for federated semantic data query processing, in: International Semantic Web Conference (ISWC), Springer, 2011, pp. 585–600. doi:10.1007/978-3-642-25073-6.

74.

Shadbolt ,

Berners-Lee and

Hall , The Semantic Web revisited, IEEE Intelligent Systems 21(3) (2006), 96–101. doi:10.1109/MIS.2006.62.

75.

Shrivastava , Bring rich knowledge of people, places, things and local businesses to your apps, 2017, https://blogs.bing.com/search-quality-insights/2017-07/bring-rich-knowledge-of-people-places-things-and-local-businesses-to-your-apps.

76.

Singhal , Introducing the Knowledge Graph: Things, not strings, 2012, https://www.blog.google/products/search/introducing-knowledge-graph-things-not/.

77.

Speicher ,

Arwe and

Malhotra , Linked Data Platform 1(0) (2015), https://www.w3.org/TR/ldp/.

78.

Sporny ,

Longley ,

Kellogg ,

Lanthaler and

Lindström , JSON-LD 1.0 – A JSON-based Serialization for Linked Data, 2014, https://www.w3.org/TR/json-ld/.

79.

Taelman ,

M.V.

Sande and

Verborgh , GraphQL-LD: Linked Data querying with GraphQL, in: International Semantic Web Conference – Posters & Demos (ISWC P&D), CEUR-WS.org, 2018.

80.

Target , Whatever Happened to the Semantic Web? 2018, https://twobithistory.org/2018/05/27/semantic-web.html.

81.

ter Heide , Three reasons why the Semantic Web has failed, 2013, https://gigaom.com/2013/11/03/three-reasons-why-the-semantic-web-has-failed/.

82.

The Open Graph Protocol, 2017, http://ogp.me/.

83.

Urbani ,

Kotoulas ,

Maassen ,

van Harmelen and

H.E.

Bal , WebPIE: A web-scale parallel inference engine using MapReduce, J. Web Semant. 10 (2012), 59–75. doi:10.1016/j.websem.2011.05.004.

84.

Vandenbussche ,

Atemezing ,

Poveda-Villalón and

Vatant , Linked Open Vocabularies (LOV): A gateway to reusable semantic vocabularies on the Web, Semantic Web 8(3) (2017), 437–452. doi:10.3233/SW-160213.

85.

Volz ,

Bizer ,

Gaedke and

Kobilarov , Silk – a link discovery framework for the Web of data, in: Linked Data on the Web (LDOW), 2009.

86.

Vrandecic and

Krötzsch , Wikidata: A free collaborative knowledgebase, CACM 57(10) (2014), 78–85. doi:10.1145/2629489.

87.

Wu ,

Eadon ,

Das ,

E.I.

Chong ,

Kolovski ,

Annamalai and

Srinivasan , Implementing an inference engine for RDFS/OWL constructs and user-defined rules in oracle, in: International Conference on Data Engineering, 2008, pp. 1239–1248. doi:10.1109/ICDE.2008.4497533.

88.

R.V.

Yampolskiy , AI-Complete, AI-Hard, or AI-Easy – Classification of Problems in AI, in: Midwest Artificial Intelligence and Cognitive Science Conference (MAICS), CEUR Workshop Proceedings, 2012, pp. 94–101.

89.

Yasseri ,

Sumi ,

Rung ,

Kornai and

Kertész , Dynamics of conflicts in Wikipedia, PLOS ONE 7(6) (2012). doi:10.1371/journal.pone.0038869.

90.

Zaveri ,

Rula ,

Maurino ,

Pietrobon ,

Lehmann and

Auer , Quality assessment for Linked Data: A survey, Semantic Web 7(1) (2016), 63–93. doi:10.3233/SW-150175.

The Semantic Web: Two decades on

Abstract

Keywords

1. Introduction

2. Critique of the Semantic Web

1 We do so in the style of a debate, meaning that the author does not necessarily hold the point-of-view being argued for/against.

2.2. The Semantic Web will be made redundant by advances in Machine Learning before it has a chance to take off [15]

2.3. The Semantic Web depends too much on reliable publishers [16,23]

2.4. The Semantic Web depends too much on ontological agreement [16,23]

2.5. Publishing Semantic Web content on the Web has a prohibitively high cost [16]

2.6. There are too few incentives for adopting Semantic Web technologies on the Web [69]

2.7. The Semantic Web standards are too verbose [16,80]

2 See https://w3techs.com/technologies/details/da-jsonld/all/all; retrieved 2019-09-29: JSON-LD is used by 26.5% of websites.

2.9. The Semantic Web lacks usable systems & tools [16]

3 https://db-engines.com/en/ranking ranks graph databases (including SPARQL engines) in terms of popularity, where as of 2019/05/25, Neo4j is ranked first (49.46 points), while the highest-ranked SPARQL engine – Virtuoso – is ranked fifth (2.73 points).

3. Questionnaire

5 semantic-web@w3.org; we also asked that members share the list with others who might be interested.

4. Discussion

Footnotes

Acknowledgements

References

¹
We do so in the style of a debate, meaning that the author does not necessarily hold the point-of-view being argued for/against.

²
See https://w3techs.com/technologies/details/da-jsonld/all/all; retrieved 2019-09-29: JSON-LD is used by 26.5% of websites.

³
https://db-engines.com/en/ranking ranks graph databases (including SPARQL engines) in terms of popularity, where as of 2019/05/25, Neo4j is ranked first (49.46 points), while the highest-ranked SPARQL engine – Virtuoso – is ranked fifth (2.73 points).

⁵
semantic-web@w3.org; we also asked that members share the list with others who might be interested.