Abstract
Many museums are currently providing online access to their collections. The state of the art research in the last decade shows that it is beneficial for institutions to provide their datasets as Linked Data in order to achieve easy cross-referencing, interlinking and integration. In this paper, we present the Rijksmuseum linked dataset (accessible at
Introduction
Publishing cultural heritage collections as Linked Data improves reusability of the data and allows for easier integration with other data sources [1,13]. Concepts providing context for collection items are often shared among multiple cultural heritage organisations, which is an ideal basis for creating connections between collections and allowing reuse of information [6,9]. The availability of data models tailored towards publishing cultural heritage data helps to make the data available in an interoperable way [3,4]. These benefits have become apparent to the sector, resulting in an increase of attention and the development of methodologies to help institutions overcome the hurdles involved in publishing data according to the Linked Data principles [1,13,16].
The Linked Data version of the Rijksmuseum collection has some unique features. The data is a result of a joint effort between the museum, CWI and VU University Amsterdam and has evolved with input from many research projects [12,15,17]. Nowadays, employees of the museum are in control of the publishing process, creating and maintaining a conversion layer from collection management system to Linked Data. The museum’s digitisation process includes the use of external datasets for adding contextual concepts (e.g. creator or technique), creating manually curated links towards external datasets [8]. The data is continuously extended: every day new objects and descriptions are added and both metadata and images are released under open licenses when possible.
This paper describes the current state of the Rijksmuseum Linked Data and provides insights into the lessons learned during its creation. In the next section, we describe the characteristics of the Rijksmuseum collection and its digitisation process. The historical development of the dataset is given in Section 3 and the conversion approach in Section 4. Sections 5 and 6 provide details on the data model and the number of digital objects currently available. In Section 7 we give an overview of the links from collection objects to external data sources. In Section 8 we illustrate uses of the data, before we conclude in Section 9 with a discussion.
The Rijksmuseum in a digital age
The Rijksmuseum Amsterdam is one of the most visited museums in the Netherlands, with a mission to provide a representational overview of Dutch art from the Middle Ages onwards. It is well known for its Golden Age paintings, including artworks by Rembrandt and Vermeer. The collection comprises over a million objects, of which only a fraction can be on display at a given time. To open up the remaining collection the museum started digitising objects and publishing them online.
Digitising large collections is a time consuming and costly endeavour. To address the backlog of items to be digitised, the Rijksmuseum started a dedicated digitisation project, employing cataloguers and a professional photographer. The cataloguers register objects in the collection management system and describe the objects, using structured vocabularies if available [8]. The photographer takes high-quality images which are released under a public domain license when possible, waiving the rights of the museum.
The digitised collection items are accessible through the website of the museum. Online visitors can explore the collection using categories or they can search for specific keywords. The presentation of the website focusses on high-quality images of collection objects, encouraging users to save, manipulate, and share them [7]. Developers can use an Application Programming Interface (API) to get access to information about the collection objects, sub-collections created by users, and event information.1
The Linked Data version of the Rijksmuseum dataset has a long history, influenced by a number of research projects. A first Resource Description Framework (RDF) version comprising 750 top pieces was created by converting a datadump from an educational database [5]. As a next step, in an effort to integrate Dutch cultural heritage collections, the datamodel was changed to follow the VRA Core specification,2
In a next version, contextual concepts from in-house thesauri of the Rijksmuseum were aligned with the Getty thesauri and WordNet, resulting in a dataset of 27,993 triples [12]. At the time, the Getty vocabularies were only available under license and in XML format, which resulted in the need for an internally maintained conversion to RDF. In a similar effort, the vocabulary Iconclass was converted and aligned using the Simple Knowledge Organization System (SKOS) to formalise its structure [6]. The experiences gained served as input for the SKOS specification.
The Rijksmuseum dataset was one of the first entries in the Europeana Thought Lab,4
The Europeana Data Model today has a set of core and contextual classes that can capture collection information. The data model is designed with reuse of existing classes and properties in mind. It includes elements from the Dublin Core metadata initiative and the Object Reuse and Exchange definition of the Open Archives Initiative.5
To create Linked Data, a conversion needs to take place from the data contained in the collection management system into RDF. As of March 2016, the collection management system includes 597,193 registered objects which can be described using 597 available fields. Multiple steps are taken to select and convert a subset of fields and objects, which we will describe in the remainder of this section.
Data from the collection management is harvested daily and loaded into a database which serves the website. Not all of the 597 available metadata fields are included in the output of the collection management system, a subset of 245 fields is specified in a dedicated file.6
On top of the database runs an API, which is used for outputting RDF. Not all of the 597,193 registered collection objects are included in the output, a subset is selected based on copyright statements and the ownership of the object. This results in a set of 351,814 objects which are under management of the museum and are free of rights. Whether a collection object is under management of the museum is loosely defined, it includes objects owned by the museum, the state and the city of Amsterdam, but also objects which are on permanent loan. Objects which are on loan for a period shorter than six months are not considered.
Selected collection objects are converted into RDF with a second XSLT file.8
For some fields only textual values are available, others are described using contextual concepts. These concepts are manually added in the collection management system by employees documenting the collection objects. Employees can select concepts from a combination of Rijksmuseum thesauri and external datasets and all concepts have a unique identifier. If for selected fields such an identifier is encountered during the conversion, a reference to the resource is added as well as text in the form of the label of the resource.
The output of the API is used to obtain a complete harvest of the data, which is in turn loaded into a graph database, also known as a triple store. These harvests are run on a monthly basis by an employee of the museum, who updates the triple store by loading the latest version and who provides links to downloads of older datadumps, which are versioned according to the year and month they were obtained. The file
The Linked Data version of the Rijksmuseum collection is modelled according to the Europeana Data Model (EDM). EDM reuses elements from existing models such as Dublin Core. The structure of the model is expressed with RDF Schema, using constructs like subclass and subproperty relations. The Web Ontology Language is used to relate EDM elements to other data models.
The data model makes a distinction between a collection item and its digital representation(s). This is achieved with three core classes:

Example of the painting “Jeremiah Lamenting the Destruction of Jerusalem” modelled according to the EDM data model.
An
Every
When a digital representation is available, the aggregation points to the URL were the image can be obtained. This URL is of type
Not all subtleties of the collection data can be captured by using constructs included in the Europeana Data Model description. While the original data includes detailed information about creator roles and fields like ‘rejected creator’, no such properties exist in Dublin Core. EDM allows for refining and extending the data model, so in the future the museum can choose to introduce its own more specific constructs or find others to reuse. This could increase the coverage of data in the collection management system included in the Linked Data version.
Overview of the predicates that describe collection items
Persistent identifiers in the form of handles9
As of March 2016 the Linked Data version of the Rijksmuseum collection11
Table 1 lists the predicates used to describe collection items. A title is provided for almost all artworks, of which the majority is unique. Although most of the titles are in Dutch, some of them are also available in English. Over half of the objects have the predicate
There are over thirty thousand unique creators, which are mostly described using resources from the person database of the museum. Half of the
The
The museum describes temporal aspects using the predicates
The predicate
Objects can be connected to each other with the
The predicate
As outlined in Section 4, some of the literals are based on the labels of resources. Although adding both the literal and resource introduces redundant information, this can support applications that do not handle the added complexity of resources well. The literals from
Institutions often maintain their own vocabularies containing their perspective on contextual concepts. When the contextual concepts of collection items are replaced with objects from standardised vocabularies such as the Getty vocabularies, these nuances in perspectives are in danger of disappearing. So while collection objects and contextual concepts in the thesauri of the Rijksmuseum are linked to an increasing number of available datasets maintained by other institutions, the Rijksmuseum chooses to also maintain and use its own. This allows the museum to preserve its own perspective and in a later stage vocabulary alignment tools can be used to match the concepts with similar concepts in external datasets [14].
Types in the Rijksmuseum thesaurus with more than 500 values
Types in the Rijksmuseum thesaurus with more than 500 values
Five contextual classes are defined in the Europeana Data Model for relating collection items to contextual information:

Diagrams of contextual concepts representing “panel” thesaurus terms and the person “Rembrandt”.
The thesaurus forms a hierarchy of terms using relations such as broader and narrower, which are represented using SKOS. Figure 2(a) shows two object terms, where the type of term is indicated using the
The person database contains concepts of type
The Art & Architecture Thesaurus14
The Rijksmuseum uses the Art & Architecture Thesaurus for the
The Iconclass vocabulary15
The museum uses the Iconclass vocabulary to describe subject matter. Iconclass codes are added by cataloguers during the registration process described in Section 2. Out of the 39,578 concepts in the vocabulary, 10,434 are used to add information to an object. Of the 351,814 collection objects, 172,059 have one or more Iconclass annotations. As Fig. 3(b) shows, many of the concepts are often used, while on average a code is used 27 times.

Frequency distributions of the top 50 concepts of AAT and Iconclass that the collection objects are linked to.
The Short-Title catalogue Netherlands (STCN) is ‘the retrospective national bibliography of the Netherlands in the period 1540–1800’,16 http://www.kb.nl/expertise/voor-bibliotheken/short-title-catalogue-netherlands (accessed on 04-07-2014).
The cataloguers of the Rijksmuseum add references to the National Library by adding textual descriptions of the books in a notes field. To create links these descriptions are scanned for objects from the STCN that match the title, publication date and publisher. This automated matching process resulted in 3,598 links from the Rijksmuseum collection to 501 publications in the STCN catalogue. The links are encoded as
Uses of the Linked Data of the Rijksmuseum include search, recommendation, collection integration and browsing. In this section we give an overview of how the museum data has been used in various research projects and provide statistics about the Rijksmuseum API. Most projects that contributed to the process of data development had demonstrators illustrating the power of Linked Data. The
Other ways of accessing data were introduced in subsequent years. The
The Rijksmuseum maintains an API (see footnote 1) for application developers, optionally returning data formatted according to the Europeana Data Model. 587 people have registered for access to the API as of August 2015 and many different applications have been build on top of it.17
For decades, Linked Data has been a promise for data publication and integration in the cultural heritage sector. Despite widespread interest and apparent advantages, only a limited number of institutions have managed to make their collection available as Linked Data.18 As of March 2016
The majority of the Rijksmuseum collection items are part of the public domain since their intellectual property rights have expired. Although general understanding is that digitised representations of public domain works should again be released under the same license terms, many institutions are hesitant to do so, in fear of losing a possible revenue stream. The Rijksmuseum did release their high-quality images in the public domain in 2013, arguing that the increase in attention and exposure would result in a higher number of visitors [11]. In turn it allowed the museum to gain more control over the digital representations that had appeared online, replacing many inferior versions by its high-quality images.
The quality and correctness of metadata is of paramount importance to museums [13]. The Rijksmuseum has an extensive quality control process in place to ensure the correctness of metadata. By adding a direct conversion layer to the collection management system it ensures that the same level of quality is translated to the Linked Data version. All the criteria for five star Linked Data as defined in [10] are met. There is a description of the data online,19
Data aggregators such as Europeana enticed many institutions to provide digital versions of their collection, often relying on external expertise for the conversion process. This led to an increase of available collections, although providing access to data through aggregators has the major drawback that it creates a gap between the institution and its data [1]. We believe it is therefore still desirable that institutions publish their own data, if the required expertise is available. They thereby remain in control of choosing the most suitable data model, URI naming schemes, links to other datasets, and update processes.
The data of the Rijksmuseum is subject to constant change: newly digitised objects are added on a daily basis and employees extend and refine information regularly. The Linked Data version places the artworks in a broader context, allowing others to benefit from the progress made through easy reuse and the possibility to add new perspectives to the data.
Footnotes
Acknowledgement
This publication was supported by the Dutch national program COMMIT/.
