Abstract
The provision, processing and distribution of research information are increasingly supported by the use of research information systems (RIS) at higher education institutions. National and international exchange formats or standards can support the validation and use of research information and increase their informative value and comparability through consistent semantics. The formats are very overlapping and represent different approaches to modeling. This paper presents the data model of the Research Core Dataset (RCD) and discusses its impact on data quality in RIS. Subsequently compares it with the Europe-wide accepted Common European Research Information Format (CERIF) standard to support the implementation of the RCD with CERIF compatibility in the RIS and so that institutions integrate their research information from internal and external heterogeneous data sources to ultimately provide valuable information with high levels of data quality. As these are fundamental to decision-making and knowledge generation as well as the presentation of research.
Keywords
Introduction
Standardization of research information helps universities and non-university research organizations to aggregate, reuse and shares their research information. The demand for quality-assured and comparable research information has increased with the introduction of control mechanisms in accordance with New Public Management in the German higher education system. As a result of numerous and diverse reporting obligations, universities and non-university research institutions have begun to introduce research information systems (RIS) in recent years. An RIS is understood to mean a

RCD Entities and their relationships from the basic data model.
This chapter first introduces the RCD data model and its impact as an application case to the quality of RIS. Finally, the international CERIF data model will be presented.

Research information quality management framework.
The recommendation for the development and implementation of a RCD has the goal of both the standardized recording and updating of the performance data on research activities of universities and non-university research institutions in the context of decentralized data management [7] and the best practice for a better data quality of the RCD to reach research information. In 2016, the German Council of Science and Humanities published the recommendations for the specification of the RCD. Since February 2017, a central helpdesk of the German Center for Higher Education and Science Research (DZHW) supports the interpretation of the RCD specification. RCD defines six different areas of research reporting (

CERIF Entities and their relationship.

CERIF model metrics.

RCD model metrics.
Figure 1 shows the Entity Relationship Model (ERM) of the RCD. This contains the underlying objects of the specification, their attributes and the relationships between them.

Comparison conditions.
The German approach to standardization of research information reflects the heterogeneous research landscape and federal governance structure of Germany [4]. RCD serves as orientation for institutions intending to represent the RCD in their technical systems. Implementation can feasibly take place at both institutional and RIS provider level; both instances can be observed in the German science system. The RCD’s XML Schema can be utilized as a data source before importing into RIS and/or as an export format to facilitate report creation.
Mapping (basis data) between RCD and CERIF
Mapping (basis data) between RCD and CERIF
Extract of Mappings (semantics) between RCD and CERIF.
While the introduction of the RCD has likely numerous effects on research information management processes and research information quality, we focus here on effects we perceive to most immediately impact the data quality dimensions addressed in this paper. First, the standard provides the basis for a common understanding and interpretation of research information through its semantic specifications, thus likely improving consistency of the data over time and across institutions, as well as correctness and completeness. Second, it structures the data acquisition process at institutional level and, especially if incorporated in RIS software, potentially reduces the need to harmonize previously heterogeneous data sources and formats. Impact on correctness and completeness of the data is expected here as well. In addition, it specifies relationships between research information entities, which in combination with RIS capabilities facilitate data integration. We expect this aspect of the RCD to impact correctness, consistency as well as timeliness of the relationships described. All the impacts described here will be mediated by existing data quality assurance procedures present in Higher Education and research institutions. Figure 2 provides an overview about the research information management process and the RCD’s impacts.
With the increasing integration of research information from various sources in RIS and their growing importance for institutional management, data quality is becoming a growing area of interest for Higher Education and research institutions. Incorrect, inconsistent, inaccurate and missing data will lead to erroneous research information and interfere with decisions within an institution. In order to avoid the most costs in the academic institutions, a holistic data quality management process is required in RIS. The framework presented in this paper provides institutions with the means to improve the quality of research information before integration into RIS. We report positive results of the application of our framework for sample publication data (detailed information can be found in the work of [2]).
The framework further sketches the impact of the German research information standard RCD on data quality. Our results show that data quality is to some extent contingent on standard adoption and that data quality will likely improve as a result. A standardized data model, such as RCD, is an essential prerequisite for achieving data governance in terms of monitoring and strengthening data management in institutions. This makes it possible to introduce and permanently guarantee quality in institutions as an overall target for research information.
Using the CERIF data model or a CERIF compliant IT solution for current research information systems (CRIS) is a European Union recommendation to the member states [13]. The organization euroCRIS is committed to the development and distribution of the CERIF standard on data formats for research information. The uniform European format CERIF represents information about the entire research process (such as
Mapping RCD and CERIF
This section is intended to provide a meaningful mapping recommendation for the elements of the RCD data model and CERIF data model to simplify use of the RCD in existing CERIF-compliant systems. RCD and CERIF essentially include XML Schema, data model, and semantics specifications for the exchange of research information. Figure 4 and 5 below list and explain the metrics of RCD and CERIF.
RCD and CERIF are translated into classes and relationships in ontology and in elements of an XML schema. To make the implementation understandable, it is therefore necessary to record and manage the links between the content definitions and the various data models. The mapping of RCD base data to CERIF is straightforward and much of the elements mentioned in the RCD basic data model are also present in CERIF. This means that RCD extends the existing CERIF elements by further attributes but also adds missing, e.g. the aspect of promoting young talent and spin-offs. CERIF data model captures the data in full detail; the RCD aggregate data model instead focuses on an aggregated presentation of research information for reporting. Linking the RCD with the already defined concepts in CERIF appears to make sense through the investigation. These results were agreed with experts in this field at the workshop on “Using the RCD Data Model as the Standard for Processing Research Information and Comparison with CERIF” organized by RCD team.
For the conditions of the comparison for each area or objects of the RCD or tables of the CERIF we have selected two different colors to better understand them. This is illustrated as follows in Fig. 6.
Our mapping looks at two categories: Comparison of the basic data of RCD with CERIF Comparison of the semantics of RCD with CERIF.
The results of these categories between RCD and CERIF are shown in Tables 1 and 2.
The results of a mapping (basis data, semantics and link entities) of RCD and CERIF show that the elements of the RCD are mappable to the CERIF data model and have a common vocabulary, and that these two standards allow the exchange between different research information systems. The RCD and CERIF formats provide models to structure the research area into relevant objects and their relationships, while allowing their high-quality integration and interoperability into the RIS in a common format. These are not only beneficial for information management, but also for analyzing data and accessing data, information and knowledge. In addition, the two standards provide clarity in the collection of research information and to reduce the administrative burden and to improve the data quality of the research information and to represent sound and transparent decisions.
Conclusion
Summing up one can say that the two data models RCD and CERIF support the interoperability of research information in different formats, e. g. exchange, merge, sharing and mapping of data. CERIF and RCD can be considered as a basic data format and thus increase the flexibility of RIS. However, for better integration and compatibility between CERIF and RCD, the changes outlined above should be implemented in RCD version 2.0.
Footnotes
Acknowledgements
This work has been funded by the German Center for Higher Education Research and Science Studies (DZHW) and by the German Federal Ministry of Education and Research (BMBF) in the context of the project “Helpdesk to facilitate implementation of the Research Core Dataset” (
) (project period: 2017–2019; grant number: KDS2016).
