Abstract

Introduction
Linked Data and knowledge graphs more generally1
There are some notable differences, such as that Linked Data is usually expressed in RDF, perhaps with an OWL ontology, is supposed to be available on the Web, and “linked” with other datasets. Knowledge graphs more broadly speaking would not necessarily conform with any particular syntax, may not be linked to other external sources, and may not even be (easily) accessible on the Web. Nonetheless, many of these differences relate to technologies and not fundamentally different paradigms.
Much of Linked Data is generated by researchers as a contribution to the community effort. Due to the importance of this type of data for the advance of research and applications in our field, the Semantic Web journal began in 2012 to solicit papers containing linked dataset descriptions.2
The idea behind this was to provide the broader researchers and practitioners community with concise summaries of the data, the different tools and endpoints that have been made available to work with the data, worked examples, best practice, and so on, to foster the usage of Linked Data beyond the Semantic Web community. In addition, dataset descriptions allow authors to record their efforts in a citeable paper publication, since this type of credit is still much more important than alternative measures of productivity and impact which take the reuse of other artefacts such as datasets and software into account. The initiative quickly became very popular,3See
It should be noted that we have in the meantime increased the bar for such papers, i.e., we have become more selective, in particular regarding tangible evidence for usefulness of the presented datasets. The reason for this is that both community and technology have advanced, and the journal needs to keep up with such developments.
More recently, the issue of Linked Data quality came into focus [55]. This followed on the heels of application interest and the rise of questions regarding issues such as trustworthiness of content, reliability of resources, or ease of reuse. Indeed in the last few years a significant number of publications have laid out possible quality measures and corresponding algorithms for quality assessment. Truth be told, though, the final verdict is still outstanding regarding the question what dimensions or measures are in fact the most relevant for assessing Linked Data quality.
One of the aspects which has so far not received sufficient attention, is the question of relevance of a quality schema or vocabulary for the quality of a linked dataset. If it is indeed relevant, as has been argued [39], then this in turn raises the question how to assess the quality of such schema or vocabulary, in particular since such an evaluation may not be identical to a quality assessment of ontologies independent of a Linked Data context.
To this end, some of us proposed a relatively simple schema – the 5-star LD vocabulary principles, or 5-star principles in short – for assessing Linked Data vocabulary quality, in an editorial published in this journal a few years ago [29], as an early contribution to the topic. A majority of the linked dataset papers which the Semantic Web journal has published so far had already been published or submitted, and we did not require adherence to the ideas in that editorial for future submissions, though since the publication of the editorial we have encouraged authors to consider them.
So far, however, there hasn’t been any coordinated assessment of the 5-star LD vocabulary principles and how they pan out for real existing datasets. Therefore, in this work, we will look at all linked dataset papers from the perspective of the 5-star vocabulary principles. This serves both as a partial assessment of the quality of linked dataset papers in the Semantic Web journal, and as assessment of the 5-star principles and their applicability in practice.5
In a sense, this paper also complements the earlier editorial [26] in which other quality aspects were studied.
The paper is structured as follows. Section 2 recalls the 5-star principles and discusses ambiguities and other issues arising when attempting to apply them, as well as explanations on how we resolved these for our present analysis. Section 3 presents the data we have collected, for all linked dataset papers which have so far been published in the Semantic Web journal. Section 4 contains a discussion of the data and what conclusions we can draw from this as we go forward.
Background and motivation
In 2010, Tim Berners-Lee published a non-technical schema [5] of awarding anywhere between 0 to 5 stars in rating (Linked) Data depending on how easy it was to discover, use and understand it. However, the 5 Star Linked Data appears to be just a necessary pre-condition to what is really needed and does not necessarily make the resulting Linked Data more reusable to humans or machines. It does not make any assumptions about the use of vocabularies either. In practice, querying Linked Data that do not refer to a vocabulary is difficult and ascertaining that the results reflect the intended query is impossible. A good vocabulary must restrict potential interpretations of the used classes and roles towards their intended meaning.
In March of 2014, the 5-star LD vocabulary principles [29] were introduced, to encourage data owners, engineers and practitioners to publish and use vocabularies on the Web. Similar to Tim Berners-Lee’s stars, which do not directly refer to the quality of the data, this star rating is also not directly concerned with the quality of the vocabularies themselves. Instead, it rates the vocabulary use of a linked dataset following the intuition that data that utilizes well-established, documented, maintained, and interconnected vocabularies is easier to (re)use than Linked Data that may be 5-star data in Berners-Lee’s sense but does not utilize such vocabularies.
The model
According to the model, a linked dataset would also be awarded anywhere between 0 to 5 stars as follows:
We followed this model to manually rate the linked datasets accepted by, described and published in the Semantic Web journal.
The rating
The following steps were followed to rate the datasets:
If a vocabulary was developed and used for the generation of the dataset, with the clear presence of a web-accessible, de-referenceable and in human readable form, description of the vocabulary, the 1st star was awarded to the dataset.
If the vocabulary was well-defined with all of the various axioms that comprised it, the 2nd star was awarded.
The 3rd star was awarded, if other standard, well-established vocabularies were re-used in a supplemental nature along with (linked to by) this vocabulary, through the use of properties such as subClassOf, subPropertyOf and unionOf. It showed that this vocabulary defines only the required classes and properties that do not already exist in some other standard, well-established vocabulary. If the classes and properties do exist in some other vocabulary, they were re-used by establishing links at the vocabulary level and not at the individual level.
The 4th star was awarded, if there was evidence of availability of web-accessible metadata regarding the vocabulary. This ensured that the metadata was available in a standard format such as VoID, VOAF, etc.
If it has been clearly shown, that the developed vocabulary has been linked to by other vocabularies to be used in a manner consistent with point 3 above, then the 5th star was awarded.
A total of 49 datasets have been published so far in the Semantic Web journal since 2012 and our analysis, given below, shows that most of them are at least at the 4-star level.
Ambiguities & issues
The following ambiguities and issues came to light while applying the 5-star principles to the practical reality of the ways vocabularies were used in the generation of datasets.
For several datasets, the reference links mentioned in the related description paper, do not work and there is no mention on how to find the most recent, updated ones. However, there was ample evidence in the description paper that necessary, web-accessible information regarding the vocabulary used, was present, which resulted in the dataset earning the star rating that it did.
For the generation of some datasets, no new vocabulary was developed. However, other standard, already well-established vocabularies were reused. This was sufficient for the dataset to earn 4 stars. If the reused vocabulary was, in turn, reused in the generation of a different dataset, the dataset automatically earned 5 stars.
In some cases, there was no need for the developed vocabulary to be supplemented by other standard, well-established ones, and hence was not linked to other vocabularies, the dataset thus not earning the 3rd star. However, the metadata regarding the vocabulary was present, the dataset thus earning the 4th star without having earned the 3rd star. This was resolved by awarding the dataset 3 stars.
Linked datasets analysis
Linked datasets analysis
(Continued)
(Continued)
(Continued)
(Continued)
(Continued)
(Continued)
(Continued)
Table 1 presents the data from our analysis for the 49 papers and the star rating of vocabulary use.
Discussion
At this stage, our collected data is mainly descriptive. The editorial introducing the 5-star vocabulary categories was about the presentation of a position on the importance of Linked Data vocabularies and a anticipation on what aspects may be indicative of quality while at the same time providing simple guidelines and rules. As such, it was meant as a starting point for discussions, rather than as an end in itself. And as we write this new editorial, the discussion regarding the question what aspects of linked datasets are actually important for quality, is still ongoing.
Likewise, the point of publishing our analysis herein lies in stimulating further discussion.
Based on our analysis, a case may be made for interchanging the orders of the third and fourth stars of the 5-star model. The first 3 stars would then be a reflection on the completeness of information on the vocabulary itself, while the next 2 stars would be a reflection on the vocabulary’s interactions with other vocabularies. Furthermore, this would avoid the occurrence of a dataset getting a fourth star without getting the third star, in some cases. Of course, if the metadata of the vocabulary were not defined, then we could still see this happening.
Our assessment shows that all linked datasets with corresponding papers in SWJ have between 3 and 5 stars for their vocabulary, with the overwhelming majority having 4 stars. If we accept the categories as a quality measure, then the datasets score high, but most do not score the top 5 stars. But we can at this stage only speculate as to the reasons for this. Due to the journal’s depublishing strategy, we cannot easily track whether rejected dataset description papers were less than 4 stars on average.
We note that obtaining the fifth star is in fact not quite easy, because it actually requires third parties to acknowledge the independent value of the used vocabulary by establishing links. This means that the authors of a dataset cannot always actively improve their dataset in order to obtain the fifth star. The fifth star thus constitutes an indirect quality measure, by understanding a reuse by third parties as a type of endorsement of the vocabulary. Authors of a dataset can actively only obtain the fifth star by basing their dataset on a vocabulary that has already been established and is already linked to by others. But of course such a vocabulary may not exist yet in the topic area of the linked dataset being constructed. Note, that a 4 star dataset may become a 5 star dataset as time progresses and the utilized vocabulary is gaining impact.
Regarding the overall good quality of the vocabularies with respect to our rating, we of course acknowledge that the corresponding papers (and thus the underlying datasets) have undergone a rigorous review for the journal, and although a rating regarding our 5-star categories was not part of the review process, some of the categories – stars one through four – are arguably rather natural and would often be adhered to by authors concerned about a certain minimum quality of the artifact they create. And correspondingly, reviewers could be expected to look into such aspects.
This of course raises the open question where other datasets, i.e., datasets which do not have corresponding dataset description papers in the Semantic Web journal, fall with respect to the 5-star rating. If the profiles look very similar to the ratings described herein, i.e., they would mostly be four stars, then perhaps this could be taken as an argument that the 5-star rating is actually not very helpful in the sense that it is an already established practice. If the profiles look different, in particular if they include a significant body of ratings with fewer stars, then the hypothesis arises that the categories may indeed correlate with aspects of quality, although research will have to advance on the Linked Data quality front before this hypothesis can be investigated effectively.
More importantly, however, the fact that accepted dataset description papers typically clock in at 4 stars means that vocabularies and their quality are indeed considered important aspects of proper Linked Data publishing and that it may indeed be best practice to select vocabularies wisely and following certain criteria. The star rating proposed back in 2014 reminds us that in earlier years many Linked Data enthusiasts questioned the need for shared (and formal) vocabularies altogether and that this sentiment seems to be changing. We believe that proper vocabularies are a key driver for dataset discovery and reuse. Interestingly, the reuse of vocabularies themselves and best strategies for doing so (e.g., direct usage versus alignment) remains an open issue.
