Importance of the Open Data Assessment: An Insight Into the (Meta) Data Quality Dimensions

Abstract

Data are the most important resource of the 21st century. The open data (OD) movement provides publicly available data for the development of a knowledge-based society. As such, the concept of OD is a valuable information technology (IT) tool for economic, social, and human development, which adds value. To further develop these processes on a global scale, users need to manage the quality of OD in their practices. Otherwise, what is the point of using data just for the sake of using it (in science or practice) without thinking about data compliance with norms, standards, and so forth? This article aims to provide an overview of (meta)data quality dimensions, sub-dimensions, and metrics used within OD assessment-related research papers. To achieve this, the authors performed a systematic literature review (SLR) and extracted data from 86 relevant studies dealing with the evaluation of OD. The article endows the progress made so far in OD assessment research. Findings of reviewing the assessment of the OD in the light of existing (meta)data quality dimensions unveil the potential of metadata. Furthermore, the analysis disclosed the need for greater use of quantitative methods in research, and metadata can greatly assist in this.

Keywords

open data open data assessment (meta)data quality dimensions systematic literature review science mapping

Introduction

The open data (OD) initiative has gained a lot of attention in recent years, as usage of OD can be beneficial to various stockholders. Together with the growth of interest in OD, certain concerns referring to OD arise, too. The questionable quality, indeed, poses a risk for the success of the OD initiative. To contribute to the issue, examination of what has been done so far is lacking. Therefore, insufficient examination of OD in research thus far was the primary motivation for carrying out this review research.

The main objective of the research is to provide a review of the state of the art for OD assessment. The main objective is further divided into two sub-objectives. One side is to identify papers that investigate the OD assessment (omit big data). To get a broader insight into what should be assessed in OD and how OD should be assessed regarding the metadata aspect, the literature review is carried out as a systematic review. The other sub-objective is to analyze papers identified as relevant, considering defined research questions.

This review paper is structured in the following way: OD’s potential is stressed in the “Theoretical Background” section. The research methodology that covers the process of choosing the relevant papers considering our research objectives is described in detail in the following section. To achieve the research objectives, six research questions are defined. The first research question is related to the identification of (meta)data quality (sub)dimensions that are used within studies about OD assessment. The term (sub)dimensions refers to dimensions as well as sub-dimensions, while the term sub-dimensions refers only to sub-dimensions. The second question covers measurements for identified (meta)data quality (sub)dimensions. The third research question focuses on methods used for the identification and development of (sub)dimensions. The fourth question is about the focus points of research within OD assessment papers. The fifth question investigates which research approach is used within studies identified as relevant. The sixth question provides guidelines for future research based on the literature review. In addition, the science mapping analysis is performed to identify and display how references are connected with each other and what are the theoretical gap areas. Answers to research questions are provided in the “Discussion” section. Findings of the research are highlighted together with limitations in the “Conclusion” section.

Theoretical Background

Our society strives to be knowledge based. Data are a basic prerequisite for knowledge discovery, and data should be free and open. The OD concept is still in its early stages, both in science and practice (Kassen, 2017). Roots of this political and socioeconomic phenomenon lie in the advancement of open government. The U.S. OD initiative was inaugurated in 2009 by the President’s Memorandum on Transparency and Open Government, followed by the U.K. government’s initiative regarding OD in 2011 (Meijer et al., 2014). Although the majority of OD initiatives are in public sectors, OD is not limited to “open government” but to other fields, too, including science, economics, and culture (Aiello et al., 2019; Uhlir & Schröder, 2007). OD is also becoming important in research and has the potential to improve the governance of public institutions (Schalkwyk et al., 2016). Thus, OD can be observed through various perspectives and offers various direct and indirect benefits (Schalkwyk et al., 2016). For instance, the economic perspective argues that innovation through OD leads to economic growth. The political and strategic perspectives focus on political issues such as privacy and security. The social perspective focuses on the benefits for society through data usage. Also, the social perspective explores how the benefits of OD can be visible to all citizens (Davies et al., 2013).

As seen previously, many studies found that OD initiatives aim toward societal values and benefits. Some examples of social, political, and economic benefits are highlighted below (Kassen, 2017). Political and social benefits yield more transparency, more participation and self-empowerment of citizens, their public engagement and creation of trust in government, new governmental services for citizens, innovative social services, improvement of policy-making processes, and simulation of knowledge developments. There are also various economic benefits, including economic growth and stimulation of competitiveness, stimulation of innovation, development of new products and services, not to mention the creation of a new sector adding value to the economy.

Despite many advantages, a heterogeneous and context-dependent OD concept makes analysis complex and requires the examination of a wide range of variables (Kassen, 2017). Furthermore, OD is still in an infancy stage. For it to develop further, it is necessary to explore the quality of OD and provide guidance for OD quality assessment. Hence, exploring its quality is a crucial next step for OD usage, especially if we want to analyze such data in a way that extracts meaningful new information and insights.

Research Methodology

An initial review of the literature did not uncover previous systematic literature reviews (SLRs) that would give a broader insight into the assessment of the OD within social sciences, computer science, and decision sciences research areas. To perform a replicable review of the literature with minimal bias, the SLR method was then applied for the current research. The review procedure is designed based on the application of this method by other authors (Attard et al., 2015; Kitchenham et al., 2010; Novak et al., 2019; Ruijer & Martinius, 2017) and guidelines proposed by Littell et al. (2008) and Petticrew and Roberts (2006). It is carried out as follows: framing the research questions, defining the exclusion criteria, determining the most relevant electronic sources, forming the search query, and selecting the most relevant papers according to identified research objectives.

Research Questions

Research questions represent guidelines through the process of conducting the SLR. These questions should be formed to be answerable. Respecting the defined research objectives, the following research question is framed: How to assess OD, but with the greatest emphasis on the assessment from the metadata aspect? It is further decomposed into the following subquestions:

Research Question 1 (RQ1): What (meta)data quality (sub)dimensions are used for OD assessment?

Research Question 2 (RQ2): How are (meta)data quality (sub)dimensions measured in studies that assess OD?

Research Question 3 (RQ3): Which research methods are used for identification and development of (meta)data quality dimensions/sub-dimensions as well as for metrics?

Research Question 4 (RQ4): Are studies that assess OD directed more to the assessment of the web portals or datasets published on them?

Research Question 5 (RQ5): Which research approach is mostly used in the studies that assess OD?

Research Question 6 (RQ6): What are the guidelines for the improvement of the OD as well as their evaluation?

Exclusion Criteria

Defining exclusion criteria revealed relevant papers for the current research. These criteria complement the defined research questions and assist in eliminating unsuitable papers from the research. The exclusion criteria are the following:

Papers that are not written in English. The whole paper should be written in English. If the paper retrieved by the search query has only the title and the abstract in English, it will be excluded from the review process.

Papers that are published on selected electronic sources after March 7, 2019.

Duplicate papers that have emerged as a result of searching through more electronic sources. Hence, it is possible that a paper has one or more duplicates.

Papers that meet the objectives of this study but are from the other research areas (e.g., chemistry, psychology, cardiology, geology, surgery, water resources, etc.) will be excluded, if within those papers there was not found enough relevant data or if the authors were not sufficiently familiar with the topic, research area, or terminology.

Papers that cover topics on OD, but are not focused on the assessment of OD.

Papers that examine quality of big data.

Type of articles that have a deficiency in their methodological rigor (notes, letters, reports, editorials, posters, etc.). These deficiencies can be caused by the length of the paper or because they are commissioned by some biased source or company, and so on.

Search Query for Selected Electronic Sources

The relevant electronic sources were identified to yield papers that answer the research questions. Review papers in which the SLR method was applied were used as a basis for selecting relevant electronic sources, or rather, electronic databases in which research papers can be found (Attard et al., 2015; Herala et al., 2016; Kitchenham et al., 2010; Ruijer & Martinius, 2017). Therefore, selected electronic sources include Association for Computing Machinery (ACM) Digital Library, Institute of Electrical and Electronics Engineers (IEEE) Xplore Digital Library, ScienceDirect, Scopus, Web of Science (only Web of Science Core Collection).

Apart from selecting electronic sources to obtain exact matches, selecting appropriate keywords and Boolean operators in a search query are important. Because some combinations return too many papers, different combinations of keywords (open, big, data, metadata, government, quality, assessment, benchmarking, analysis, framework, model, measurement, assessing) and Boolean operators (AND, OR, NOT) were used to search the electronic sources. The number of chosen papers should be reasonable in regard to various resources (time component, human component, etc.), but it should be sufficient and representative for analysis (Novak et al., 2019). Also, selected electronic sources enabled searches of specific parts (different sources use different terminology, e.g., field, data field) of a document, and others enabled searches of the entire document. Thus, the number of papers obtained in the results field depends on the field within which the search query is run.

Selection of Relevant Papers

The electronic sources (ACM Digital Library, IEEE Xplore Digital Library, ScienceDirect, Scopus, Web of Science but only Web of Science Core Collection) were searched on March 7, 2019. The results of the searches are displayed in Table 2.

There are statements (e.g., PRISMA stands for Primary Reporting Items for Systematic Reviews and Meta-Analysis, AMSTAR stands for Assessment of Multiple Systematic Reviews) to improve reporting of SLRs or meta-analysis (Littell et al., 2008). These statements can be used as quality standards to enhance reporting of SLRs or meta-analysis but cannot be used as quality assessment tools for the calculation of the quality of a systematic review (Littell et al., 2008; Moher et al., 2009). Hence, the PRISMA statement is used as a guide through the process of identifying a relevant paper. It consists of a checklist (27 items) and a flow diagram (nine steps arranged into four phases; Moher et al., 2009). The PRISMA diagram is modified because some additional steps were required for this research, while some existing ones were redundant. Reference management software Zotero was used for managing bibliographic data during the entire review process.

In addition, a detailed description of the steps for replicability of the study is given below. All papers were identified through database searches (S1). The search for papers included five electronic sources. Papers obtained in the search results were imported into the software Zotero. Other sources for searching additional papers were not used within the research (S2). Duplicates of the papers were removed from the further review process because some of the papers have two or more duplicates (S3). Most of the duplicates were removed automatically by Zotero, although some of them needed to be removed manually. Titles and abstracts of the papers were screened considering defined objectives, research questions, and exclusion criteria (S4). Hence, most irrelevant papers for the research were excluded in the fifth step (S5) and then, the full text was searched for papers that were not removed in this step. The inter-rater reliability (IRR) between raters was calculated at the end of steps S4 and S5. To further refine the list of the remaining papers, additional steps (S6, S7) were added to the review process. Along with the title and abstract, the introduction and conclusion sections were screened in the papers (S6, S7). Also, the IRR between raters was calculated at the end of steps S6 and S7. To obtain answers to the research questions, the full-text papers were read (S8). By reading the full text of the papers, it was noticed that some of them are not suitable for the study. So they were excluded together with the explanation (S9). Papers that provided answers to most of the research questions were included in the qualitative synthesis (S10).

IRR tests

The SLR studies are conspicuous because of the elaborate description of the process of choosing relevant papers as well as how the analysis of relevant papers is carried out. Raters have essential roles in the process of choosing and analyzing the papers because they need to make a lot of decisions, often subjectively, during the whole process (Belur et al., 2018). Decisions made by raters are the black box for everyone except for them. Therefore, the SLR would not be transparent and replicable if decisions made by reviewers are not reported.

IRR is the degree of agreement between the decisions made by two or more independent raters (e.g., agreement about exclusion of papers from an SLR). Different methods for calculating IRR are proposed, for instance, the Scott’s π, the Cohen’s κ, and the Krippendorff’s α (Belur et al., 2018). There are two reasons why the Cohen’s kappa is applied within the research. First, it is the most frequently used statistic that considers chance agreement between two or more raters (Belur et al., 2018). Second, it is a measure of agreement that has been adjusted for chance for categorical variables (Littell et al., 2008).

To calculate Cohen’s κ, the proportion of units in which the raters agreed ( $p_{o}$ ) and the hypothetical probability of chance agreement ( $p_{e}$ ) should be determined. To interpret the result, or rather the value of coefficient κ, there are proposed levels of agreement in the literature. The interpretation of the Kappa statistic proposed by Landis and Koch (1977) is used for the interpretation of obtained results within the research, whereby a value of 1 indicates perfect agreement and a value less than 1 signifies less than perfect agreement (see Table 1).

Table 1.

Interpretation of Cohen’s κ.

Value of Cohen’s κ	Level of agreement
<.00	Poor agreement
.00–.20	Slight agreement
.21–.40	Fair agreement
.41–.60	Moderate agreement
.61–.80	Substantial agreement
.81–1.00	Almost perfect or perfect agreement

There were two IRR tests conducted within the study. Both tests were conducted during the screening phase by two authors of this article using Microsoft Excel.

Science mapping analysis

Science mapping analysis is applied within the research to analyze the thematic evaluation of OD assessment within social sciences, computer science, and decision sciences research areas. The analysis is performed for the papers that were identified as most relevant according to the conducted SLR. Two types of techniques for building a science map are broadly used: document co-citation and co-word analysis (Cobo et al., 2011). The co-word analysis is used for building a map within this research as it is described as a technique that is effective in mapping the strength of association between information objects in textual data (Cobo et al., 2011). It analyzes the set of terms shared by documents, mapping the corresponding literature directly from the interactions of key terms. For example, the co-word technique is employed to analyze the association among constructs of m-commerce technology adoption theory (Chhonker et al., 2018), to analyze the thematic evolution of the fuzzy sets theory field (Cobo et al., 2011), or to explore the nature of linkages between characteristics of big data and characteristics of cognitive computing (Gupta et al., 2018).

The co-word analysis results show clusters, the conceptual groups of different topics that can be further used for various purposes. The clusters in this research will be used to quantify the research topic by performance analysis through two consecutive time periods.

Different software tools can be used for science mapping analysis, such as SciMAT, VOSviewer, CiteSpace, IN-SPIRE, VantagePoint (Cobo et al., 2012). The SciMAT software tool was used for researching deeper levels of data analysis within the study as it performs all steps of the science mapping workflow (Cobo et al., 2012).

Analysis and Discussion of Relevant Papers

Once the review procedure of the research was defined, diverse combinations of appropriate keywords and Boolean operators were tried out for searching papers on the electronic databases to choose a reasonable number of relevant papers that were sufficient and representative for analysis. A combination of keywords and operators (data OR “open data” OR metadata) AND (quality OR assessment OR benchmarking) returned 19,304 papers only on the Web of Science Core Collection, and a slightly different search term (data OR metadata OR government) -big AND (quality OR assessment OR benchmarking OR analysis OR ((framework OR model) AND (measurement OR assessing))) returned even more papers, that is, 97,242 on the Web of Science Core Collection. Therefore, the next search query was chosen and run within a field title:

(open AND (data OR metadata OR government)) -big AND (quality OR assessment OR benchmarking OR analysis OR ((framework OR model) AND (measurement OR assessing)))

This search query returned only 570 papers in comparison with previous queries that resulted in thousands of papers on the Web of Science Core Collection electronic source (see Table 2). Moreover, the final query is modified for every electronic source because the search rules on different sources are different, as listed in Appendix A.

Table 2.

Number of Papers Returned by Electronic Sources.

	Electronic source
Date of search	ACM Digital Library	IEEE Xplore Digital Library	ScienceDirect	Scopus	Web of Science Core Collection	Total no. of papers
March 7, 2019	84	97	113	723	570	1,587

Note. ACM = Association for Computing Machinery; IEEE = Institute of Electrical and Electronics Engineers.

As already mentioned, the PRISMA flow diagram is adjusted to the research. The extra steps (S6, S7) are added to the review process. Besides, the last step of the PRISMA flow diagram “Studies included in quantitative synthesis (meta-analysis)” was removed from the diagram because meta-analysis was not within the scope of this research (see Figure 1). In the ninth step of SLR, 11 papers are excluded because of the following reasons:

Papers that are directed to measurement of citizen participation, adoption, or attitudes about open government (Gómez et al., 2017; Nam, 2016; Susha et al., 2015; Wirtz et al.,2017, 2018);

The OD policies are in the focus of the research (Chatfield & Reddick, 2018; Styrin, 2017; Zuiderwijk & Janssen, 2015);

The (meta)data quality aspects are not included in the research (Kalo et al., 2015; Palacios et al., 2013; Saxena, 2018).

Figure 1.

Flow diagram of conducted systematic literature review.

Those papers were not recognized as relevant. They are, therefore, excluded from qualitative analysis as well as from the discussion.

The IRR test plays a prominent role in demonstrating the transparency and replicability of the conducted SLR. Hence, the first IRR test was performed at the end of the S4 when both raters read the titles and abstracts of 905 papers. The second one was performed when raters screened title, abstract, introduction, and conclusion of 198 papers. During the screening, raters make decisions about the papers by assigning a number 1 to a paper that has potential to be chosen as relevant, or 0 to a paper that should be excluded from the process. The results of these decisions are presented in a two-way contingency table of frequencies with the columns and rows indicating the categories of response for each rater (Appendix B). Results of tests are shown in Table 3.

Table 3.

Results of IRR Tests.

Categories	First IRR test	Second IRR test
Both raters agreed to include	167	88
Both raters agreed to exclude	707	101
Only Rater 1 decided to include	26	5
Only Rater 2 decided to include	5	4
Observed agreement: $p_{o}$	0.9657	0.9545
Expected agreement: $p_{e}$	0.6777	0.5021
κ Statistic	.894	.9086

Note. IRR = inter-rater reliability.

High values of the κ statistic for First and Second IRR test indicate almost a perfect level of agreement between raters. Because disagreement between raters is low, it was decided that each paper that was assigned number 1 by at least one rater would be included in the next step of the review process.

Qualitative analysis and discussion includes 86 papers that are identified as the most relevant regarding defined research objectives, research questions, and exclusion criteria. Those 86 papers will provide answers to all six research questions in the following subsections.

Identification of (Meta)Data Quality Dimensions

The inseparability of technology and OD is evident from the very definition and characteristics of OD. Because of that, it can be said that technology is an inherent part of OD. Metadata form a part of digital databases or systems and as such, the metadata are also part of OD. Because metadata are widely known as data about data (Xie & Matusiak, 2016), a lot of metadata quality dimensions are the same as data quality dimensions in general, but with slightly different definitions because of metadata’s characteristics. So, the identification of (meta)data quality dimensions in OD studies will be the focus of this subsection.

The levels of metadata quality assessment are determined by analyzing the most relevant papers. Also, in the same way, the synonyms for the levels of metadata quality assessment are identified. Data quality assessment includes dimensions. Dimension is a fundamental term used by the authors to explain core elements of (meta)data quality assessment. Analysis reveals that the term dimension is at the heart of OD assessment. Also, the term dimension is used in most of the significant studies. Because of these reasons, the first level of (meta)data quality assessment is called the dimension. Its synonyms are component, indicator, characteristic, factor, criteria, category, issue, phase, theme, functionality. The most common synonyms for the dimension that were mentioned or used in more than one relevant paper are shown in Figure 2. Synonyms for the first level that appeared only once in relevant papers are not displayed on the graph, and those are element, category, aspect, issue, characteristic, component, factor, criteria, approach, viewpoint, phase, benefit, initiative, theme, functionality, cluster, principle, index.

Figure 2.

The percentage of synonyms occurrence used for the first level of (meta)data quality assessment.

As already mentioned, dimension is a fundamental part of (meta)data quality assessment. The dimension can be further decomposed into smaller parts. Those decomposed parts make the second level of (meta)data quality assessment. Because the term sub-dimension is used mostly within relevant studies, the second level of assessment was named sub-dimension. Synonyms used for this level are characteristics, features, indicators, criteria, items, elements, categories, aspects, factors. The percentage of synonyms occurrence that is used for the second level of assessment is presented in Figure 3. Terms that occurred only once are not included in the graphical representation, and those are issues and components.

Figure 3.

The percentage of synonyms occurrence used for the second level of (meta)data quality assessment.

The analysis also reveals the third level of (meta)data quality assessment in studies that assess OD. Because the results of the analysis reveal that the term description is used in most of the relevant studies, this level is named description. Other terms used for this level in relevant studies are metric, formula, sub-dimension, attribute, criteria (see Figure 4). There are other synonyms that can be used for this level, such as metadata, factor, requirement, and aspect. As each is used only within one relevant study, they are not included in the graphic representation.

Figure 4.

The percentage of synonyms occurrence used for the third level of (meta)data quality assessment.

Overlapping of synonyms at the first, second, and/or third levels of metadata quality assessment occurred as different studies use the same terms for different levels.

Both the second and third levels of (meta)data quality assessment describe the highest level or, more precisely, the first level of data quality assessment in more detail in OD studies. So, dimension should be the focus of any (meta)data quality assessment, but at the same time, it has an assigned meaning, or rather, the description, and it can consist of smaller parts such as sub-dimension.

RQ1 focuses on the identification of existing data quality dimensions used for OD assessment. The answer follows. The 10 most-used data quality dimensions within the assessment of OD are highlighted in Table 4.

Table 4.

Overview of (Meta)Data Quality Dimensions in Open Data Initiative Evaluations.

Reference	Completeness	Accuracy	Consistency	Accessibility	Timeliness	Usage	Retrievability	Openness	Transparency	Understandability
Batini and Scannapieco (2016)	✓	✓	✓	✓	✓
Bellini and Nesi (2013)	✓	✓	✓
Caracas et al. (2018)					✓
Corsar and Edwards (2017)			✓						✓
Csáki and Prier (2018)				✓		✓
De Blasio and Selva (2018)									✓
Fan and Zhao (2017)	✓
Rani et al. (2018)					✓
Hjalmarsson et al. (2015)			✓	✓
Jarolímek and Martinec (2016)								✓
Karafili et al. (2019)	✓	✓	✓
Knap (2016)							✓
Kučera et al. (2013)	✓	✓	✓		✓
Langer et al. (2018)				✓
Wang et al. (2013)	✓
Martin et al. (2017)			✓
Matamoros et al. (2018)	✓
Neumaier et al. (2018)		✓				✓	✓	✓
Neumaier et al. (2016)	✓	✓
Ren and Glissmann (2012)	✓		✓	✓	✓
Sadiq and Indulska (2017)	✓		✓
Sanabria et al. (2018)	✓	✓				✓	✓	✓
Sayogo et al. (2014)	✓
Sheppard and Terveen (2011)		✓	✓
Strozyna et al. (2018)				✓
Kubler et al. (2018)	✓
Torchiano et al. (2017)	✓	✓	✓							✓
Trifonov et al. (2017)				✓
Utamachant and Anutariya (2018)										✓
Vetrò et al. (2016)		✓
Viscusi et al. (2014)	✓				✓
Yang et al. (2015)				✓
Yi (2019)		✓
Zheng and Gao (2016)					✓	✓		✓

The definitions for widely used (meta)data quality dimensions within OD studies are provided based on literature. Completeness refers to the level of information presented in the (meta)data. Because metadata consist of key and value, the level of value presence should not only be evaluated, but also the level of key existence. Accuracy indicates the level of error in the (meta)data. Also, accuracy can be described as compliance of real data with metadata, or with the quality certification document, and so forth. Consistency is defined as the absence of contradictions or rather as level of (meta)data conformance to its previous values or some norms, standards, and so on. Accessibility refers to an easy and quick approach to (meta)data, taking into consideration the nondiscriminatory feature. Timeliness indicates frequency of updating (meta)data. Usage refers to the level of presence of metadata keys to describe the data. Retrievability refers to the success of fetching the data or rather content (metadata, resources, etc.) by an agent. Openness refers to the level of metadata conformance to open license, open formats, and machine readability. Transparency indicates that published data are clear, visible, easily perceived, open to public scrutiny through authenticity, understandability, and reusability of the data. Understandability refers to the comprehensibility of the (meta)data or format of a dataset.

Based on the analysis of 86 studies regarding OD assessment, authors conceptualized the use of (meta)data quality dimensions within OD studies as:

Quality assessment through standalone dimensions

Karafili et al. (2019) defined quality attributes of OD, which are part of one dimension. Sáez Martín et al. (2015) developed a quality index consisting of several descriptions. Máchová and Lněnička (2017) identified characteristics for evaluating the quality of OD portals on the national level for one dimension. Yi (2019) groups metrics into two dimensions. Some authors focused only on one dimension, such as Jarolímek & Martinec (2016), who investigated the data openness dimension. Matamoros et al. (2018) pointed out two dimensions: accessibility and completeness. Bellini and Nesi (2013) suggest three dimensions: completeness, accuracy, and consistency. The same dimensions were investigated by Torchiano et al. (2017).

Quality assessment through grouped dimensions

Some of the authors grouped or clustered dimensions with similar fundamental idea and characteristics. Various research papers recognized intrinsic and extrinsic data quality dimensions (Mocnik, 2018; Sheppard & Terveen, 2011). Others divided dimensions into intrinsic data quality and contextual data quality dimensions (Martin et al., 2017). A few papers identified inherent and system dependent groups of dimensions (Torchiano et al., 2017). “Datasets can be analysed and classified through different dimensions, which can be contextual, trustworthy, intrinsic, among others” (Caracas et al., 2018).

Quality assessment with dimensions as part of integrated catalogues, models, or frameworks

Kučera et al. (2013) suggest a catalogue record measuring quality from two different perspectives, with each perspective consisting of several levels and dimensions. Batini and Scannapieco (2016) grouped dimensions into clusters and examined interrelationships between dimensions. Stróżyna et al. (2018) designed a framework for the quality-based selection of OD based on several dimensions and tested the case from the maritime domain. Rani et al. (2018) analyzed and compared several existing frameworks on the quality of linked open data (LOD), each based on various dimensions. Utamachant and Anutariya (2018) performed an analysis of high-value datasets based on the International Organization for Standards (ISO) formula. Wang et al. (2013) described analysis of local open government data portals based on a framework consisting of various parts, OD quality being one of them. Sayogo et al. (2014) developed a framework for benchmarking open government data efforts. Zheng and Gao (2016) developed a framework for assessing China’s Open Government Data Platforms. It should be pointed out that the majority of research papers in this category are from LOD or open government data platforms fields.

There are various exceptions to these three categories. For example, Marković and Gostojić (2018) identified variables for comparative analysis of OD. Those variables could be synonyms for metrics, however, the authors did not identify dimension(s) for those metrics. In a similar vein are the suggestions of Nikiforova and Bicevskis (2018); they emphasize measuring null values as a basis for data quality assessment.

Measurement of (Meta)Data Quality (Sub)Dimensions

The second and third research questions are related to the measurement of (meta)data quality (sub)dimensions within the OD studies. Hence, to gain insight on how to measure OD through (meta)data quality dimensions, the relevant papers were examined. Metrics of (sub)dimensions are grouped according to how they are defined in the reviewed papers. The next four groups of dimensions’ measurements or rather metrics of dimensions are identified: definitions of dimensions, formulas for dimensions, metadata-based description, and nothing mentioned. The occurrence percentage of each group is presented in Figure 5.

Figure 5.

The percentage of dimensions’ measurements per group.

The second research question displayed that most of the spotted quality metrics are descriptive, that is, qualitative. Because the metadata quality metrics in research about OD are defined by definitions, there is still room for their improvement, especially through a more quantitative approach.

Certain methods for identification and development of (sub)dimensions and metrics are proposed in the literature. The literature review method is applied within all relevant papers for the identification of quality (sub)dimensions as well as for quality metrics. Other research methods used for identification and/or development of dimensions and metrics are listed in Table 5.

Table 5.

Research Methods Used for Identification and Measurement of Dimensions.

Research method	Example literature source(s)
Aggregation method	Fan and Zhao (2017); Veljković et al. (2014); Viscusi et al. (2014)
Content analysis	Afful-Dadzie and Afful-Dadzie (2017); Gomes and Soares (2014); Máchová and Lněnička (2017); Neumaier et al. (2016); Oliveira et al. (2016); Zheng and Gao (2016); Zuiderwijk et al. (2013)
Survey methodology	Musyaffa et al. (2018); Neumaier et al. (2018); Vetrò et al. (2016)
Interview	Hjalmarsson et al. (2015); Ingrams (2017); Neumaier et al. (2018); Sheppard and Terveen (2011)
Goal Question Metric methodology	Behkamal et al. (2014); Vetrò et al. (2016)
Expert Judgment	Bellini & Nesi (2013); Martin et al. (2017)
AHP, the GMM of AHP	Mishra et al. (2017); Wang et al. (2018)
Comparative analysis	Ingrams (2017)
Calculation model	Wang et al. (2013)
Vector space model	Yang et al. (2015)
Conjoint analysis	Afful-Dadzie and Afful-Dadzie (2017)
Efficient promotion approach	Iamamphai et al. (2017)

Note. AHP = analytical hierarchy process; GMM = geometric mean method.

In addition to those methods, the use of other methods for forming metrics is essential for increasing the quality of the assessment. This is also confirmed by RQ5.

The Focus Points of OD Assessment

RQ4 is devised to determine what is evaluated by (meta)data quality dimensions in OD studies. The identified dimensions are not only used for quality assessment of OD in OD studies but for the evaluation of OD in general (e.g., assessing openness, assessing compliance with policies, evaluating transparency, etc.). Therefore, each relevant study is arranged into one of the following directions: (a) dataset direction, (b) portal direction, (c) portal direction including the dataset direction, and (d) other directions.

Content analysis reveals that most of the studies about OD assessment are directed to the assessment of OD portals and datasets published on them (see Figure 6).

Figure 6.

Number of papers per focus point of OD assessment.

This indicates that (meta)data dimensions, as well as their measurement, should not be assigned to portal level or dataset level, but rather to both of them. As previously stated, the OD initiative cannot be imagined without technology. The opportunities provided by technology should certainly be used to evaluate OD. Assessment of portal level and dataset level can be unified through metadata. Only one paper belongs to the other directions group as it is directed to data quality evaluation in citizen science projects.

Determination of Used Research Approach

The research approach is defined as a proposal of conducting the research (Creswell, 2014). The three approaches to research presented by Creswell (2014) were used as possible responses to RQ5. Thus, the analyzed relevant paper could be identified as qualitative, quantitative, or mixed-methods research.

The qualitative research approach and mixed-methods research approach are used in the relevant papers. Mixed-methods research approach is identified in 46 studies, while qualitative is used in 40 studies. There is no quantitative research approach in the relevant papers.

Combining quantitative and qualitative methods is becoming increasingly common in the social sciences (Creswell, 2014; Sekol & Maurović, 2017). Therefore, the results of the analysis confirmed the importance of using both, the quantitative and the qualitative methods in the evaluation of OD.

Recommendations for Future Research

Qualitative analysis disclosed heterogeneity among studies in various elements. The heterogeneity of terminology as described by RQ1 wherein different terms are used for a group of elements (e.g., dimension, sub-dimension, description), and elements of a group (e.g., existence term as synonym for completeness). (Sub)dimensions as well as metrics of a certain (sub)dimension vary among studies. The discrepancy is also caused by the fact that researchers evaluate only one portal of a country, multiple portals of the same country, or portals of only a few countries.

Despite the noticed heterogeneity, relevant papers are categorized regarding recommendations for future research. Some of them are placed in more than one category (see Table 6).

Table 6.

Number of Supporting Paper for Each Category.

Recommendation	Number of papers	Example literature sources
Develop quality metrics related to metadata	6	Behkamal et al. (2014); Bellini and Nesi (2013); Milic et al. (2018)
Homogeneity of metadata properties	4	Behkamal et al. (2014); S. Kubler et al. (2016); Neumaier et al. (2016)
Integrate participation and collaboration elements into evaluation of OD	12	Gomes & Soares (2014); Kubler et al. (2018); Pirozzi and Scarano (2019)
Provide interaction on web portals	5	Utamachant & Anutariya (2018); Veljković et al. (2014); Zuiderwijk and Janssen (2015)
Improve technology features of infrastructure	4	Iamamphai et al. (2017); Sanabria et al. (2018); Wang et al. (2018)
Develop new standard or expand existing one for different elements—terminology, metadata schema, programs	5	Afful-Dadzie and Afful-Dadzie (2018); Bellini & Nesi (2013); Kučera et al. (2013)
Bring value to different stakeholders	15	Bello et al. (2016); Kubler et al. (2018); Kučera (2017)
Develop new quality metrics or refine the existing quality metrics	6	Bogdanovic-Dinic et al. (2014); Corsar and Edwards (2017); Vetrò et al. (2016)
Include certain quality dimensions—accuracy, completeness, context, feedback, usability, organization, currentness	9	Caracas et al. (2018); Fan and Zhao (2017); Tygel et al. (2016)
Dataset formats	4	Afful-Dadzie and Afful-Dadzie (2017); Bello et al. (2016); Zheng and Gao (2016)
Different approaches to quality assessment	7	Rani et al. (2018); Matamoros et al. (2018); Pirozzi and Scarano (2019)
Compliance with various elements—government programs, open government data principles, standards	13	Csáki and Prier (2018); Máchová and Lněnička (2017); Torchiano et al. (2017)
Methodology improvement	4	Alanazi and Chatfield (2012); Viscusi et al. (2014); Wang et al. (2018)
Develop quality assessment model or framework of OD	8	Csáki and Prier (2018); Lourenço (2013); Ren and Glissmann (2012)
Other	12	Langer et al. (2018); Lin and Yang (2014); Sisto et al. (2018)

Note. OD = open data.

Bibliometric analysis was performed with the aim to identify research topics and research trends of OD assessment. To do so, research papers were divided into two periods: period 1 includes research papers published from 2011 to 2014 and period 2 includes research papers published from 2015 to 2019. The results of the carried co-word analysis are presented in two visualization techniques: strategic diagrams and overlapping maps. The overlapping-items graph across the two consecutive periods is shown in Figure 7. The overlapping can be measured through the stability index (Cobo et al., 2012).

Figure 7.

Stability between periods.

Stability analysis generated the following results. There are 37 items shared by both periods. There are 7 new items in period 2, whereas 4 items are present in period 1, but not in period 2. Stability index between two periods is 0.76, indicating a high degree of overlap between consecutive periods. Evolution areas in the number of common elements are used along with the stability index for further explanation of how themes evolved during two observed periods.

Keywords are grouped into themes that serve as a basis for the development of the thematic network. The most significant keyword from the network is used as the label of the network. Network grouping is based on the clustering algorithm and the most central keyword of a cluster is a label. Strategic diagrams are developed based on the thematic networks. Measures of centrality and density with their ranges are placed on a Cartesian axis, defining four regions (Callon et al., 1991; Cobo et al., 2011, 2012). Seeing as the strategic diagrams can be enhanced by adding the third dimension, the number of citations received by the documents associated with the theme are added as a third dimension to this research (see Figure 8).

Figure 8.

The strategic diagram based on citations. (A) The strategic diagram for period 1. (B) The strategic diagram for period 2.

The first quadrant keywords have strong centrality and high density, and they are motors of the research field. The second quadrant consists of basic and transversal themes that are important for the research field, but they are not developed. Themes in the third quadrant are weakly developed and marginal. Hence, within the third quadrant, either emerging or declining themes may occur. The fourth quadrant includes highly developed, but isolated, themes that are of only marginal importance.

The detected clusters of the first period in a three-dimensional space are motor themes (quadrant 1): data, data portal, participation, national government, organization; basic and transversal themes (quadrant 2): OD, open government data, research, research method; emerging or declining themes (quadrant 3): data quality, measurement, assessment; highly developed and isolated themes (quadrant 4): e-government, open government, data source, public organization (see Figure 8A). Fundamental themes were grouped together as emerging themes. The detected clusters of the second period in a three-dimensional space are motor themes (quadrant 1): organization, public organization, data model; basic and transversal themes (quadrant 2): open government data portal, research, research method; emerging or declining themes (quadrant 3): OD, data profile, challenges; highly developed and isolated themes (quadrant 4): open government, e-government, metadata, potential benefits, decision-making (see Figure 8B).

Comparison between periods leads to the following inferences. The organization stayed as a motor theme during two periods, while in the second period, it was joined by the hitherto marginal theme, the public organization. The open government data portal became a basic theme replacing the open government data. Such results indicate the maturity of the scientific field and focus on the domain. The open data theme evolved from the second to the third quadrant, that is, from basic themes to emerging or declining ones. The national government went from first to second quadrant, from motor themes to basic.

Discussion

OD assessment, especially assessment through (meta)data quality dimensions, is the focus of this article. Qualitative analysis of 86 relevant research papers identified patterns that could be used to develop complete OD assessment through three levels: dimensions, sub-dimensions, and metrics.

The main findings of the article are shown as answers to the defined research questions:

RQ1: What (meta)data quality (sub)dimensions are used for OD assessment? The authors have identified 10 mostly used (meta)data dimensions within OD studies and these are completeness, accuracy, consistency, accessibility, timeliness, usage, retrievability, openness, transparency, and understandability. Also, it is recognized that the assessment of (meta)data quality within OD studies is achieved through (a) standalone dimensions, (b) grouped dimensions, and (c) dimensions as part of integrated catalogues, models, or frameworks.

RQ2: How are (meta)data quality (sub)dimensions measured in studies that assess OD?

There are various measurements proposed in the literature. The issue is that most of them are just theoretically described and they are not applied and used for the evaluation of OD. Those measurements are categorized into one of the following categories: definitions, formulas, metadata, and without formulas or definitions. This means that dimensions and subdimensions are measured by (a) definitions and application of those definitions (in most cases), (b) using formulas for dimensions, and (c) using metadata-based description. The metadata-based approach is the most promising because it ensures that a dataset is part of a portal and because such an approach can ensure generalization.

RQ3: Which research methods are used for identification and development of (meta)data quality dimensions/subdimensions as well as for metrics?

The literature review method identifies the quality dimensions within all relevant studies. In certain papers, along with the literature review method, other methods were applied for identification or development of (sub)dimensions and metrics. The issue is that research methods used for identification and development are mainly qualitative, especially those for metrics.

RQ4: Are studies that assess OD directed more to the assessment of the web portals or datasets published on them?

Most of the reviewed studies focused on the evaluation of OD portals including the assessment of datasets. Therefore, to continue improving the OD paradigm, neither of the directions should be neglected. The consequences of neglecting the aforementioned may affect all users of such data as well as cause a negative influence on social and economic development.

RQ5: Which research approach is mostly used in the studies that assess OD?

The qualitative research approach and mixed-methods research approach are used in the relevant papers. The authors found that the qualitative approach is applied in a total of 40 studies, while in 46, a mixed-methods approach is applied.

RQ6: What are the guidelines for the improvement of the OD as well as their evaluation?

Guidelines for future research in the field of OD are successfully identified, regardless of the diversity of analyzed studies. Numerous reviewed studies highlight the lack of compliance of OD whether with government programs, different principles, or standards. Also, future research should include the development of new metrics based on the metadata. More participation and collaboration elements should incorporate into the basic technology features of OD platforms. Furthermore, new metrics on elements of participation and collaboration should be proposed, and then used for the evaluation of OD. Furthermore, special emphasis should be put on value for different stakeholders. The bibliometric analysis grouped OD, data quality, assessment, measurement, and challenges as emerging or declining themes. Hence, these themes need to be further studied and developed to not disappear.

Besides those already mentioned, the most compelling findings of the research are highlighted as follows. (a) The term dimension is the fundamental segment of (meta)data quality assessment, and it incorporates other segments important for evaluating (meta)data quality. Therefore, the authors identified that there are different levels of (meta)data quality assessment in OD studies and proposed an OD assessment that consists of three levels: dimensions, sub-dimensions, and metrics. (b) There are many synonyms for terms of each of those three levels, but no agreed-upon terms exist in the literature. So the authors conducted analysis and chose the most used terms that are at the same time relevant regarding the topic, that is, data quality. (c) The authors proposed the categorization of OD quality assessment concepts by the most applied concepts in the literature. Therefore, the evaluation of OD quality is conceptualized through standalone dimensions, through grouped dimensions, and with dimensions as part of integrated catalogues, models, or frameworks. (d) The authors proposed the categorization of measurements of identified (meta)data quality dimensions. (e) As almost half of the analyzed studies are qualitative, it is not surprising that definitions have proven to be the most common way of measurement. Also, the application of the qualitative approach in studies reflects on the methods used. So, the method of literature review proved to be the most widely used method. It is necessary to use other methods for forming quality metrics so that the OD can be evaluated in a semi-automated or even automated way. (f) Given that the entire OD initiative is about the public disclosure of data so that they can be used by individuals and private or public organizations, assessing the quality of portals/websites and data together is essential. Indeed, research may lean toward one direction, but not in such a way as to exclude others. (g) It was not expected to find that mixed-methods and qualitative research approaches would be applied in all analyzed studies. There is no identified quantitative research approach in any relevant papers. Hence, the results of the determination of the used research approach show that there are not enough papers that use quantitative methods for evaluation. (h) By listing noticed disadvantages in existing studies, the baseline for future research is provided. Identified recommendations should be taken into account in further research for the improvement of the assessment of OD. The results of this study comply with the latest extensive studies of the Governance Laboratory (The GovLab) and the European Data Portal (Huyer & Van Knippenberg, 2020; Verhulst et al., 2020). Both studies highlight the importance of the role of cross-sectoral participants in collaboration and, thus, in data collaboration, which ultimately creates opportunities for value creation and various impacts based on OD usage (Huyer & Van Knippenberg, 2020; Verhulst et al., 2020).

The article has several limitations. Because the formed search query is focused on quality, that is, assessment of OD and not to the assessment of data in general, various papers regarding data quality were discarded. Diverse variations of the search term that did not include open were tried, but all resulted in an enormous number of returned papers. Furthermore, the authors did not perform coding for all entries (possible answers on research questions).

It is not possible to attach all papers identified by database searches within the research due to constraints on length of research. A list of all extracted papers can be shared with interested parties by sending a request to authors by email.

Conclusion

OD is an emerging field of research. The initial literature review revealed that there is room for improvement of OD as well as the improvement of the current assessment of OD within existing research. This improvement can be directed to different aspects of the OD paradigm, for instance, generally recognized dataset attributes, requirements for publishing open datasets, compliance with various policies, required functionalities of OD infrastructure, quality assessment of datasets, openness, transparency, participation or collaboration, and assessment of social, economic, political, and human value in OD initiatives. The use of OD can be beneficial for different stakeholders because OD, by definition, are free, publicly available, nonexclusive (no restrictions from copyrights, patents, etc.), open licensed, structured for usability, and so on. Those benefits can take the form of new job potential, growth of economy, new products and services, improved existing products or services, increased citizen engagement, and support for decision-making. Hence, the OD paradigm is an example of how IT can contribute to social, economic, and human development.

In conclusion, the main contribution of this review is an exhaustive overview of the OD assessment field with an emphasis on the identification of present metadata quality dimensions, metrics, and methods for the identification and development of dimensions as well as metrics. The detailed overview revealed (a) the most used (meta)data quality (sub)dimensions within OD along with their definitions, (b) groups of (sub)dimensions measurement as (sub)dimensions of (meta)data quality can be measured differently, (c) research methods used for identification and development of (meta)data quality (sub)dimensions and metrics, (d) the most explored and the most promising directions of OD assessment, (e) the most applied research approach in the research of OD evaluation, and (f) guidelines for further research in the OD assessment domain.

Furthermore, this review paper contributes to the development of the framework in the field of OD assessment. Assessment of OD from the metadata aspect as well as the improvement of OD metadata is momentous for further growth of the OD movement. Future research should be directed to the development of a standardized, robust OD assessment framework.

Footnotes

Appendix A

Appendix B

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Barbara Šlibar

References

Afful-Dadzie

(2017). Open government data in Africa: A preference elicitation analysis of media practitioners. Government Information Quarterly, 34(2), 244–255. https://doi.org/10.1016/j.giq.2017.02.005

Afful-Dadzie

(2018). Local government open data (LGOD) initiatives: Analysis of trends and similarities among early adopters. In Baranauskas

L. W.

Liu

Nakata

(Eds.), IFIP advances in information and communication technology (Vol. 527, pp. 299–308). Springer. https://doi.org/10.1007/978-3-319-94541-5_30

Aiello

L. M.

Schifanella

Quercia

Prete

L. D.

(2019). Large-scale and high-resolution analysis of food purchases and health outcomes. EPJ Data Science, 8(1), 1–22. https://doi.org/10.1140/epjds/s13688-019-0191-y

Alanazi

J. M.

Chatfield

A. T.

(2012). Sharing government-owned data with the public: A cross-country analysis of open data practice in the Middle East. In 18th Americas conference on information systems 2012 (Vol. 1, pp. 335–344). https://ro.uow.edu.au/eispapers/291/

Attard

Orlandi

Scerri

Auer

(2015). A systematic review of open government data initiatives. Government Information Quarterly, 32(4), 399–418. https://doi.org/10.1016/j.giq.2015.07.006

Batini

Scannapieco

(2016). Data quality issues in linked open data. In Data and information quality: Dimensions, principles and techniques (pp. 87–112). Springer. https://doi.org/10.1007/978-3-319-24106-7_4

Behkamal

Kahani

Bagheri

Jeremic

(2014). A metrics-driven approach for quality assessment of linked open data. Journal of Theoretical and Applied Electronic Commerce Research, 9(2), 64–79. https://doi.org/10.4067/S0718-18762014000200006

Bellini

Nesi

(2013). Metadata quality assessment tool for open access cultural heritage institutional repositories. Information Technologies for Performing Arts, Media Access, and Entertainment, 7990, 90–103. https://doi.org/10.1007/978-3-642-40050-6_9

Bello

Akinwande

Jolayemi

Ibrahim

(2016). Open data portals in Africa: An analysis of open government data initiatives. African Journal of Library Archives and Information Science, 26(2), 97–106.

10.

Belur

Tompson

Thornton

Simon

(2018). Interrater reliability in systematic review methodology: Exploring variation in coder decision-making. Sociological Methods & Research, 52(2), 837–865. https://doi.org/10.1177/0049124118799372

11.

Bogdanovic-Dinic

Veljkovic

Stoimenov

(2014). How open are public government data? An assessment of seven open data portals. Springer. https://doi.org/10.1007/978-1-4614-9982-4_3

12.

Callon

Courtial

J. P.

Laville

(1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemsitry. Scientometrics, 22(1), 155–205. https://doi.org/10.1007/BF02019280

13.

Caracas

E. A.

López

D. F. M.

Gaona-García

P. A.

Cubides

J. F. H.

Montenegro-Marín

C. E.

(2018). Evaluation of the linked open data quality based on a fuzzy logic model. In Maglogiannis

P. V.

Iliadis

(Eds.), IFIP advances in information and communication technology (Vol. 519, pp. 556–567). Springer. https://doi.org/10.1007/978-3-319-92007-8_47

14.

Chatfield

A. T.

Reddick

C. G.

(2018). The role of policy entrepreneurs in open government data policy innovation diffusion: An analysis of Australian Federal and State Governments. Government Information Quarterly, 35(1), 123–134. https://doi.org/10.1016/j.giq.2017.10.004

15.

Chhonker

M. S.

Verma

Kar

A. K.

Grover

(2018). m-Commerce technology adoption: Thematic and citation analysis of scholarly research during (2008–2017). The Bottom Line, 31(3/4), 208–233. https://doi.org/10.1108/BL-04-2018-0020

16.

Cobo

M. J.

López-Herrera

A. G.

Herrera-Viedma

Herrera

(2011). An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the Fuzzy Sets Theory field. Journal of Informetrics, 5(1), 146–166. https://doi.org/10.1016/j.joi.2010.10.002

17.

Cobo

M. J.

López-Herrera

A. G.

Herrera-Viedma

Herrera

(2012). SciMAT: A new science mapping analysis software tool. Journal of the American Society for Information Science and Technology, 63(8), 1609–1630. https://doi.org/10.1002/asi.22688

18.

Corsar

Edwards

(2017). Challenges of open data quality: More than just license, format, and customer support. Journal of Data and Information Quality, 9(1). https://doi.org/10.1145/3110291

19.

Creswell

J. W.

(2014). Research design: Qualitative, quantitative and mixed methods approaches (4th ed.). SAGE.

20.

Csáki

Prier

(2018). Quality issues of public procurement open data. In Francesconi

K. A.

(Ed.), Electronic government and the information systems perspective (pp. 177–191). Springer. https://doi.org/10.1007/978-3-319-98349-3_14

21.

Davies

Perini

Alanso

(2013). Researching the emerging impacts of open data [Working paper]. World Wide Web Foundation. https://idl-bnc-idrc.dspacedirect.org/handle/10625/56313

22.

De Blasio

Selva

(2018). Implementing open government: A qualitative comparative analysis of digital platforms in France, Italy and United Kingdom. Quality and Quantity, 53, 871–896. https://doi.org/10.1007/s11135-018-0793-7

23.

Fan

Zhao

(2017). The moderating effect of external pressure on the relationship between internal organizational factors and the quality of open government data. Government Information Quarterly, 34(3), 396–405. https://doi.org/10.1016/j.giq.2017.08.006

24.

Gomes

Á.

Soares

(2014). Open government data initiatives in Europe: Northern versus southern countries analysis. In Barbosa

J. M.

Estevez

L.S.

(Eds.), Proceedings of the 8th international conference on theory and practice of electronic governance (Vols. 2014–January, pp. 342–350). Association for Computing Machinery. https://doi.org/10.1145/2691195.2691246

25.

Gómez

E. A. R.

Criado

J. I.

Gil-Garcia

J. R.

(2017). Public managers’ perceptions about open government: A factor analysis of concepts & values. In Hinnant

O. A.

(Ed.), Proceedings of the 18th annual international conference on digital government research (Vol. Part F128275, pp. 566–567). Association for Computing Machinery. https://doi.org/10.1145/3085228.3085248

26.

Gupta

Kar

A. K.

Baabdullah

Al-Khowaiter

W. A. A.

(2018). Big data with cognitive computing: A review for the future. International Journal of Information Management, 42, 78–89. https://doi.org/10.1016/j.ijinfomgt.2018.06.005

27.

Herala

Vanhala

Porras

Krri

(2016). Experiences about opening data in private sector: A systematic literature review. In 2016 SAI computing conference (SAI) (pp. 715–724). Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/SAI.2016.7556060

28.

Hjalmarsson

Johansson

Rudmark

(2015, March). Mind the gap: Exploring stakeholders’ value with open data assessment. In 2015 48th Hawaii international conference on system sciences (pp. 1314–1323). https://doi.org/10.1109/HICSS.2015.160

29.

Huyer

Van Knippenberg

(2020). Economic impact of open data: Opportunities for value creation in Europe. European Data Portal. https://www.europeandataportal.eu/en/impact-studies/open-data-impact

30.

Iamamphai

Noymanee

San-Um

Pasupa

(2017). Investigations and comparisons of government open data websites through systematic functional analysis and efficient promotion approach. In 2016 management and innovation technology international conference (MITicon, MIT-142). https://doi.org/10.1109/MITICON.2016.8025229

31.

Ingrams

(2017). The legal-normative conditions of police transparency: A configurational approach to open data adoption using qualitative comparative analysis. Public Administration, 95(2), 527–545. https://doi.org/10.1111/padm.12319

32.

Jarolímek

Martinec

(2016). Analysis of open data availability in Czech Republic agrarian sector. Agris On-line Papers in Economics and Informatics, 8(3), 57–67. https://doi.org/10.7160/aol.2016.080306

33.

Kalo

J.-C.

Homoceanu

Rose

Balke

W.-T.

(2015, June). Avoiding Chinese whispers: Controlling end-to-end join quality in linked open data stores [Conference session]. ACM Web Science Conference. https://doi.org/10.1145/2786451.2786466

34.

Karafili

Spanaki

Lupu

E. C.

(2019). Access control and quality attributes of open data: Applications and techniques. In Abramowicz

Paschke

(Eds.), Lecture notes in business information processing (Vol. 339, pp. 603–614). Springer. https://doi.org/10.1007/978-3-030-04849-5_52

35.

Kassen

(2017). Open data and e-government—Related or competing ecosystems: A paradox of open government and promise of civic engagement in Estonia. Information Technology for Development, 25(3), 552–578. https://doi.org/10.1080/02681102.2017.1412289

36.

Kitchenham

Pretorius

Budgen

Pearl Brereton

Turner

Niazi

Linkman

(2010). Systematic literature reviews in software engineering: A tertiary study. Information and Software Technology, 52(8), 792–805. https://doi.org/10.1016/j.infsof.2010.03.006

37.

Knap

(2016). Increasing quality of Austrian open data by linking them to linked data sources: Lessons learned. In Auer

S. H.

Lange

Mladenic

Rizzo

Steinmetz

(Eds.), Semantic web, ESWC 2016 (Vol. 9989 LNCS, pp. 243–254). Springer. https://doi.org/10.1007/978-3-319-47602-5_42

38.

Kubler

Jérérmy

Neumaier

Umbrich

Traon

Y. L.

(2018). Comparison of metadata quality in open data portals using the Analytic Hierarchy Process. Government Information Quarterly, 35(1), 13–29. https://doi.org/10.1016/j.giq.2017.11.003

39.

Kubler

Robert

Traon

Y. L.

Umbrich

Neumaier

(2016). Open data portal quality comparison using AHP. In Kim

Y. S.

(Ed.), Proceedings of the 17th international digital government research conference on digital government research (Vol. 08-10-June-2016, pp. 397–407). Association for Computing Machinery. https://doi.org/10.1145/2912160.2912167

40.

Kučera

(2017). Analysis of barriers to publishing and re-use of open government data. In Gerhard

V. O.

Petr

(Eds.), IDIMT 2017: Digitalization in management, society and economy—25th interdisciplinary information management talks (Vol. 46, pp. 305–314). Trauner.

41.

Kučera

Chlapek

Nečaský

(2013). Open government data catalogs: Current approaches and quality perspective. Lecture Notes in Computer Science, 8061, 152–166. https://doi.org/10.1007/978-3-642-40160-2_13

42.

Landis

J. R.

Koch

G. G.

(1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310

43.

Langer

Siegert

Göpfert

Gaedke

(2018). SemQuire—Assessing the data quality of linked open data sources based on DQV. Lecture Notes in Computer Science, 11153, 163–175. https://doi.org/10.1007/978-3-030-03056-8_14

44.

Lin

C. S.

Yang

H.-C.

(2014). Data quality assessment on Taiwan’s open data sites. In Wang

L. S. L.

June

J. J.

Lee

C. H.

Okuhara

Yang

H. C.

(Eds.), Multidisciplinary social networks research, MISNC 2014 (Vol. 473, pp. 325–333). Springer.

45.

Littell

J. H.

Corcoran

Pillai

(2008). Systematic reviews and meta-analysis (Poc ed.). Oxford University Press.

46.

Lourenço

R. P.

(2013). Open government portals assessment: A transparency for accountability perspective. Electronic Government, 8074, 62–74. https://doi.org/10.1007/978-3-642-40358-3_6

47.

Máchová

Lněnička

(2017). Evaluating the quality of open data portals on the national level. Journal of Theoretical and Applied Electronic Commerce Research, 12(1), 21–41. https://doi.org/10.4067/S0718-18762017000100003

48.

Marković

Gostojić

(2018). Open judicial data: A comparative analysis. Social Science Computer Review, 38(3), 285–314. https://doi.org/10.1177/0894439318770744

49.

Martin

E. G.

Law

Ran

Helbig

Birkhead

G. S.

(2017). Evaluating the quality and usability of open data for public health research: A systematic review of data offerings on 3 open data platforms. Journal of Public Health Management and Practice, 23(4), e5–e13. https://doi.org/10.1097/PHH.0000000000000388

50.

Matamoros

J. H. M.

Rojas

L. A. R.

Bermúdez

G. M. T.

(2018). Proposal to measure the quality of open data sets. In Hadzima

T. I.

Uden

(Eds.), Communications in computer and information science (Vol. 877, pp. 701–709). Springer. https://doi.org/10.1007/978-3-319-95204-8_58

51.

Meijer

Conradie

Choenni

(2014). Reconciling contradictions of open data regarding transparency, privacy, security and trust. Journal of Theoretical and Applied Electronic Commerce Research, 9(3), 32–44. https://doi.org/10.4067/S0718-18762014000300004

52.

Milic

Veljkovic

Stoimenov

(2018). Comparative analysis of metadata models on e-government open data platforms. IEEE Transactions on Emerging Topics in Computing. https://doi.org/10.1109/TETC.2018.2815591

53.

Mishra

Misra

D. P.

Kar

A. K.

Babbar

Biswas

(2017). Assessment of open government data initiative: A perception driven approach. In Simintiras

J. M.

Al-Sharhan

Kar

A. K.

Ilavarasan

P. V.

Gupta

M. P.

Mantymaki

Dwivedi

Y. K.

(Eds.), Lecture notes in computer science (pp. 159–171). Springer. https://doi.org/10.1007/978-3-319-68557-1_15

54.

Mocnik

F.-B.

(2018). Linked open data vocabularies for semantically annotated repositories of data quality measures. In Winter

Griffin

Sester

(Eds.), Leibniz international proceedings in informatics, LIPIcs (Vol. 114). Schloss Dagstuhl—Leibniz-Zentrum fur Informatik GmbH, Dagstuhl. https://doi.org/10.4230/LIPIcs.GIScience.2018.50

55.

Moher

Liberati

Tetzlaff

Altman

D. G.

, & PRISMA Group. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Annals of Internal Medicine, 151(4), 264–269. https://doi.org/10.7326/0003-4819-151-4-200908180-00135

56.

Musyaffa

F. A.

Engels

Vidal

M.-E.

Orlandi

Auer

(2018). Experience: Open fiscal datasets, common issues, and recommendations. Journal of Data and Information Quality, 9(4), Article 191. https://doi.org/10.1145/3190576

57.

Nam

(2016). Citizen attitudes about open government and government 2.0: A path analysis. International Journal of Electronic Government Research, 12(4), 46–66. https://doi.org/10.4018/IJEGR.2016100104

58.

Neumaier

Thurnay

Lampoltshammer

T. J.

Knap

(2018). Search, filter, fork, and link open data: The ADEQUATe platform: Data- and community-driven quality improvements. Companion Proceedings of the Web Conference, 2018, 1523–1526. https://doi.org/10.1145/3184558.3191602

59.

Neumaier

Umbrich

Polleres

(2016). Automated quality assessment of metadata across open data portals. Journal of Data and Information Quality, 8(1), 21–29. https://doi.org/10.1145/2964909

60.

Nikiforova

Bicevskis

(2018). Open data quality. In Lupeikiene

Matulevičius

Vasilecus

(Eds.), CEUR workshop proceedings (Vol. 2158, pp. 151–160). CEUR-WS.

61.

Novak

Joy

Kermek

(2019). Source-code similarity detection and detection tools used in Academia: A systematic review. ACM Transactions on Computing Education, 19(3), Article 271. https://doi.org/10.1145/3313290

62.

Oliveira

M. I. S.

De Oliveira

H. R.

Oliveira

L. A.

Lóscio

B. F.

(2016). Open government data portals analysis: The Brazilian case. In Kim

(Ed.), Proceedings of the 17th international digital government research conference on digital government research (pp. 415–424). Association for Computing Machinery. https://doi.org/10.1145/2912160.2912163

63.

Palacios

A. M.

Sánchez

Couso

(2013). Boosting fuzzy rules with low quality data in multi-class problems: Open problems and challenges. In 2013 IEEE international workshop on genetic and evolutionary fuzzy systems (GEFS) (pp. 28–35). https://doi.org/10.1109/GEFS.2013.6601052

64.

Petticrew

Roberts

(2006). Systematic reviews in the social sciences: A practical guide (1st ed.). Blackwell.

65.

Pirozzi

Scarano

(2019). Syntactical heuristics for the open data quality assessment and their applications. Lecture Notes in Business Information Processing, 339, 591–602. https://doi.org/10.1007/978-3-030-04849-5_51

66.

Rani

H. G.

Sapna

Mishra

(2018). An investigative study on the quality aspects of linked open data. In ACM international conference proceeding series (pp. 33–39). Association for Computing Machinery. https://doi.org/10.1145/3291064.3291074

67.

Ren

Glissmann

(2012). Identifying information assets for open data: The role of business architecture and information quality. In Proceedings of the 2012 IEEE 14th international conference on commerce and enterprise computing (CEC 2012) (pp. 94–100). https://doi.org/10.1109/CEC.2012.23

68.

Ruijer

E. H. J. M.

Martinius

(2017). Researching the democratic impact of open government data: A systematic literature review. Information Polity, 22(4), 233–250. https://doi.org/10.3233/IP-170413

69.

Sadiq

Indulska

(2017). Open data: Quality over quantity. International Journal of Information Management, 37(3), 150–154. https://doi.org/10.1016/j.ijinfomgt.2017.01.003

70.

Sáez Martín

Rosario

A. H. D.

Pérez

M. D. C. C.

(2015). An international analysis of the quality of open government data portals. Social Science Computer Review, 34(3), 298–311. https://doi.org/10.1177/0894439315585734

71.

Sanabria

M. A. O.

Fernández

F. O. A.

Zabala

M. P. G.

(2018). Colombian case study for the analysis of open data government: A data quality approach. In Proceedings of the 11th international conference on theory and practice of electronic governance (pp. 389–394). https://doi.org/10.1145/3209415.3209474

72.

Saxena

(2018). Proposing a total quality management (TQM) model for open government data (OGD) initiatives: Implications for India. Foresight, 21, 321–331. https://doi.org/10.1108/FS-07-2018-0073

73.

Sayogo

D. S.

Pardo

T. A.

Cook

(2014). A framework for benchmarking open government data efforts. In 2014 47th Hawaii international conference on system sciences (pp. 1896–1905). https://doi.org/10.1109/HICSS.2014.240

74.

Schalkwyk

Van Willmers

McNaughton

(2016). Viscous open data: The roles of intermediaries in an open data ecosystem. Information Technology for Development, 22(Suppl. 1), 68–83. https://doi.org/10.1080/02681102.2015.1081868

75.

Sekol

Maurović

(2017). Miješanje kvantitativnog i kvalitativnog istraživačkog pristupa u društvenim znanostima—Miješanje metoda ili metodologija? [Mixing quantitative and qualitative research approaches in social science-mixing methods or methodology?] Ljetopis Socijalnog Rada, 24(1), 7–32. https://doi.org/10.3935/ljsr.v24i1.147

76.

Sheppard

S. A.

Terveen

(2011). Quality is a verb: The operationalization of data quality in a citizen science community. In Proceedings of the 7th international symposium on Wikis and open collaboration (pp. 29–38). https://doi.org/10.1145/2038558.2038565

77.

Sisto

García López

Paéz

J. M.

Múgica

E. M.

(2018). Open data assessment in Italian and Spanish cities. In Bisello

Vettorato

Laconte

Costa

(Eds.), Green energy and technology (pp. 121–131). Springer.

78.

Stróżyna

Eiden

Abramowicz

Filipiak

Małyszko

Węcel

(2018). A framework for the quality-based selection and retrieval of open data: A use case from the maritime domain. Electronic Markets, 28(2), 219–233. https://doi.org/10.1007/s12525-017-0277-y

79.

Strozyna

Eiden

Abramowicz

Filipiak

Malyszko

Wecel

(2018). A framework for the quality-based selection and retrieval of open data: A use case from the maritime domain. Electronic Markets, 28(2), 219–233. https://doi.org/10.1007/s12525-017-0277-y

80.

Styrin

P. E.

(2017). Open government data policy & governance: Applicability of ecosystem approach. Comparative cross-country analysis. In Hinnant

O. A.

(Ed.), Proceedings of the 18th annual international conference on digital government research: Vol. Part F128275 (pp. 610–611). Association for Computing Machinery. https://doi.org/10.1145/3085228.3085244

81.

Susha

Zuiderwijk

Janssen

Gronlund

(2015). Benchmarks for evaluating the progress of open data adoption: Usage, limitations, and lessons learned. Social Science Computer Review, 33(5SI), 613–630. https://doi.org/10.1177/0894439314560852

82.

Torchiano

Vetro

Iuliano

(2017). Preserving the benefits of open government data by measuring and improving their quality: An empirical study. 207 IEEE 4th annual computer software and applications conference, 1, 144–153. https://doi.org/10.1109/COMPSAC.2017.192

83.

Trifonov

Yoshinov

Jekov

Pavlova

(2017). Open data assessment. In Klimis

(Ed.), Applied mathematics and computer science (Vol. 1836). American Institute of Physics. https://doi.org/10.1063/1.4982018

84.

Tygel

A. F.

Attard

Orlandi

Campos

M. L. M.

Auer

(2016). “How much?” is not enough: An analysis of open budget initiatives. In Proceedings of the 9th international conference on theory and practice of electronic governance (pp. 276–286). https://doi.org/10.1145/2910019.2910054

85.

Uhlir

Schröder

(2007). Open data for global science. Data Science Journal, 6, OD36–OD53. https://doi.org/10.2481/dsj.6.OD36

86.

Utamachant

Anutariya

(2018). An analysis of high-value datasets: A case study of Thailand’s open government data. In 2018 15th international joint conference on computer science and software engineering (JCSSE) (pp. 1–6). https://doi.org/10.1109/JCSSE.2018.8457350

87.

Veljković

Bogdanović-Dinić

Stoimenov

(2014). Benchmarking open government: An open data perspective. Government Information Quarterly, 31(2), 278–290. https://doi.org/10.1016/j.giq.2013.10.011

88.

Verhulst

S. G.

Young

Zahuranec

A. J.

Aaronson

S. A.

Calderon

Gee

(2020). The emergence of a third wave of open data. https://opendatapolicylab.org/images/odpl/third-wave-of-opendata.pdf

89.

Vetrò

Canova

Torchiano

Minotas

C. O.

Iemma

Morando

(2016). Open data quality measurement framework: Definition and application to open government data. Government Information Quarterly, 33(2), 325–337. https://doi.org/10.1016/j.giq.2016.02.001

90.

Viscusi

Spahiu

Maurino

Batini

(2014). Compliance with open government data policies: An empirical assessment of Italian local public administrations. Information Polity, 19(3–4), 263–275. https://doi.org/10.3233/IP-140338

91.

Wang

Chen

Richards

(2018). A prioritization-based analysis of local open government data portals: A case study of Chinese province-level governments. Government Information Quarterly, 35(4), 644–656. https://doi.org/10.1016/j.giq.2018.10.006

92.

Wang

Zhou

(2013). Quality analysis of open street map data. In Guilbert

S. J.

(Eds.), 8th international symposium on spatial data quality (Vol. 40, pp. 155–158). Copernicus Gesellschaft MBH, Bahnhofsalle 1E. https://www.scopus.com/inward/record.uri?eid=2-s2.0-84929074534&partnerID=40&md5=cc6cb4934bc6f9cdeea8a05303363311

93.

Wirtz

B. W.

Weyerer

J. C.

Rösch

(2017). Open government and citizen participation: An empirical analysis of citizen expectancy towards open government data. International Review of Administrative Sciences, 85(3), 566–586. https://doi.org/10.1177/0020852317719996

94.

Wirtz

B. W.

Weyerer

J. C.

Rösch

(2018). Citizen and open government: An empirical analysis of antecedents of open government data. International Journal of Public Administration, 41(4), 308–320. https://doi.org/10.1080/01900692.2016.1263659

95.

Xie

Matusiak

K. K.

(2016). Chapter 5: Metadata. In Xie

Matusiak

K. K.

(Eds.), Discover digital libraries (pp. 129–170). Elsevier. https://doi.org/10.1016/B978-0-12-417112-1.00005-3

96.

Yang

H.-C.

Lin

C. S.

P.-H.

(2015). Toward automatic assessment of the categorization structure of open data portals. Multidisciplinary Social Networks Research, 2015, 540372–540380. https://doi.org/10.1007/978-3-662-48319-0_30

97.

(2019). Exploring the quality of government open data. Electronic Library, 37(1), 35–48. https://doi.org/10.1108/EL-06-2018-0124

98.

Zheng

Gao

(2016). Assessment on China’s open government data platforms: Framework, status and problems. Proceedings of the 17th international digital government research conference on digital government research (pp. 408–414). https://doi.org/10.1145/2912160.2912213

99.

Zuiderwijk

Janssen

(2015). Participation and data quality in open data use: Open data infrastructures evaluated. In Adams

(Ed.), Proceedings of the 15th European conference on egovernment (pp. 351–358). Association for Computing Machinery.

100.

Zuiderwijk

Janssen

Parnia

(2013). Participation and data quality in open data use: Open data infrastructures evaluated. In Proceedings of the 14th annual international conference on digital government research (pp. 166–171). https://doi.org/10.1145/2479724.2479749