Abstract
Since the beginning of the release of open data by many countries, different methodologies for publishing linked data have been proposed. However, they seem not to be adopted by early studies exploring linked data for different reasons. In this work, we conducted a systematic mapping in the literature to synthesize the different approaches around the following topics: common steps, associated tools and practices, quality assessment validations, and evaluation of the methodology. The findings show a core set of activities, based on the linked data principles, but with additional critical steps for practical use in scale. Furthermore, although a fair amount of quality issues are reported in the literature, very few of these methodologies embed validation steps in their process. We describe an integrated overview of the different activities and how they can be executed with appropriate tools. We also present research challenges that need to be addressed in future works in this area.
Introduction
Open government data (OGD) has proliferated in the last decade in most of the countries, with an increase in the number of datasets available on the Web. It intends to transform democracy by leveraging the value of data for society through the principles of openness, participation, and collaboration [96]. Thus, open data serves as a mechanism to promote citizen engagement with governments [95]. However, these efforts still have some limitations. According to a report from the World Wide Web Foundation [97], only 7% of the data is fully open, only half of the datasets are machine-readable, and only one-fourth has an open license.
With this increase in the number of data available to the public, linking and combining datasets have become important research topics [46,63]. Although many data consumers can achieve their goals using only one dataset, more value can be obtained by exploring different, and related data sources [12,14].
The pioneering initiatives in the U.S. and U.K. to produce linked government data have shown that creating high quality linked data from raw data files requires considerable investment into reverse-engineering, documenting data elements, data clean-up, schema mapping, and instance matching [57,81]. In addition, a bulk of data files were converted using triplification tools, using minimal human efforts [56] without much curation, therefore limiting the practical value of the resulting RDF.
Alternatively, datasets that are curated and of high quality are limited to restricted subjects (e.g., life sciences, such as in
The production of linked data has been increasing since its conception, as can be seen from the number of datasets available in the Linked Open Data (LOD) Cloud [40], and compiled in Fig. 1. Government data has many vital applications [15,69,82,89,94], and it is one of the most popular categories of the LOD cloud, with almost 200 linked datasets to date. According to the Open Knowledge Foundation’s Open Data Index (

Number of datasets in the LOD cloud, since 2007 (numbers taken from
Even though Semantic Web technologies based on this idea have flourished, only a tiny portion of the information on the World Wide Web is presented in a machine-readable way (CSV, XLS, and XML files, in most cases). Notably, in open government data, this number is still low. For example, in [61], the authors elicited open datasets from federal, state, and municipality-level in Brazil and encountered no files with linked data and just one case in which RDF datasets were found. A similar picture in Colombia [76], Italy [14], and in Greece, [3], with 5%, 5%, and 2% of the datasets in the 4th or 5th level, respectively. A look into the Although RDF is not the only serialization format towards linked data, it is acknowledged that it is the most popular format and can be used here as a proxy for the use of linked open government data.
As will be outlined in the next section, some methodologies for publication of linked open government data were proposed, but the adopters claim that they are too generic for their purpose, without guidelines for software tools, templates, techniques, or other artifacts that could help in the adoption of this technology [14,45,55]. Although there are many guidelines for publishing linked data on the Web, many producers do not have sufficient knowledge of these practices. Few studies detail the whole process, leaving out the methods, tools, and procedures used [23], and proposing ad-hoc methods to produce linked open data, usually based only on the four principles with different interpretations on how to implement them. In [86] it is indicated, based on interaction with practitioners, that literature on publishing Linked Open Government Data (LOGD) has dealt with less complex, non-operational datasets and needs an engineering point of view, the identification of practical challenges, and consider the organizational limitations. In [14] the authors also argue on similar issues, such as linking quality to external datasets, the lack of domain-specific ontologies, and their proper alignment when they exist, and the expertise in SPARQL queries when consuming linked data.
Besides, several problems have been occurring regarding the quality of the linked data published on the Web. For instance, Hogan et al. [38] identified three recurrent problems by surveying LOD papers from the Semantic Web Journal: the existence of inadequate2 As described by the authors,
In this work, we aim to systematically map the literature regarding the processes and methodologies developed to publish linked open government data on the Web, targeting data publishers who seek to publish LOGD systematically correctly. Furthermore, we set out the research questions willing to discover the activities, tools, and quality control checks employed in LOGD publication and how they were evaluated. Finally, we integrate the findings into a unified model and discuss key challenges that remain to be explored.
Open government
Since the late 2000’s governments around the world started to move towards publishing increasing volumes of government data on the Web, perhaps most notably after the launch of national data portals in the United States (www.data.gov) and the United Kingdom (www.data.gov.uk). This opening has been happening according to the Open Data philosophy,3 Open data refers to data that “can be freely used, reused and redistributed by anyone”. Definition available at:
OGD provision presents some limitations that hamper data reuse. The organizational limitations originate mainly from the fact that in public administration, each agency manages data according to their norms since there is no central entity assigned with this role. Also, public agencies formulate hierarchical structures that contain several administrative levels. This organizational structure of the public sector suggests that in certain cases public agencies in different administration levels and different functional areas produce, maintain and possibly disseminate similar data, i.e. data about the same real-world object (e.g. a specific school) or the same real-world class (e.g. schools) [47]. The decentralization of open data publishing leads to data heterogeneity which makes the data hard to link, integrate and analyze, even when the domain and technical expertises constraints are satisfied.
Many studies [37,64,101] illustrate that the use of OGD is often hampered by the multitude of different data formats and the lack of machine-readable data, imposing restrictions on their consumption by end-users, in terms of discoverability, usability, understandability, access, and quality, among other aspects. Besides, even when the formats available are the same the information may be structured differently – with different labels or different granularities. Although publishing government information as open data is a necessary step to realize the mentioned benefits, it is not sufficient. In practice, gaining access to raw data, placing it into a meaningful context, and extracting valuable information is extremely difficult [48]. A possibility of reusing open government data is by linking them to other data so that relationships with other data can be explored [12].
In summary, Linked Data is about using the Web to create typed links between data from different sources – with diverse combinations of organizations, data formats, and exchange standards [13]. It refers to data published on the Web in such a way that it is machine-readable, its meaning is explicitly defined, and it is linked from/to other external data sets. Berners-Lee [12] outlined a set of design principles for publishing and connecting data on the Web, to become part of a single global data space, establishing the principles for linked data:
Use URIs as names for things; Use HTTP URIs so that people can look up those names; When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL); Include links to other URIs, so that they can discover more things.
These were the initial principles to publish linked data on the Web. Berners-Lee [12] extends these principles to include the concept of open, by defining the 5-star scheme for linked open data, interested particularly in government data, but arguing that it could be also used for other types of sources:
Available on the Web (whatever format) but with an open license, to be Open Data;
Available as machine-readable structured data (e.g. Excel instead of image scan of a table);
as (2) plus non-proprietary format (e.g. CSV instead of Excel);
All the above, plus use open standards from W3C (RDF and SPARQL) to identify things so that people can point at your stuff;
All the above, plus: link your data to other people’s data to provide context.
Related works
The production and publication of linked data are intensive engineering processes that demand high efforts to achieve high quality and existing general guidelines may not be sufficient to make the processes repeatable [75]. Since the conception of linked data, some principles and processes were proposed, with varying degrees of sophistication, practices, and tools.
The following studies presented some form of synthesis from previous methodologies. In [7] the authors also presented a systematic review of OGD initiatives (not linked data in particular) and presented a lifecycle deduced from the related papers, along with related challenges in different levels (organizational, economic and financial, policy, legal and cultural). In [87] the authors compiled the steps from 8 different linked open data methodologies but did not specify what were the criteria to select the primary studies. However, the proposed framework is also at a high level of abstraction. The LOD2 Project [1] also developed a lifecycle for linked data and provided software tools for the steps, although leaving out important steps – such as data modeling, alignment, and the publication of the data on the Web.
This study complements other systematic mappings or reviews, such as those of [26], which surveyed the adoption of best practices for publishing linked data, discussing which of the W3C best practices [91] are explicitly more present in the literature; and the systematic review on the use of software tools for linked data publishing, conducted by [8], which points out that most of the current state-of-the-art tools are concentrated in only a few of the steps of the publishing process, leaving important steps out. These systematic mappings did not provide information on the tasks involved during the process of linked data production. Moreover, in [23] the authors performed a systematic mapping of publishing and consuming data on the Web – a more generic approach than the one in the present study. One of their findings was that most of the papers surveyed did not mention publishing methodologies (28 out from 46) and most of the ones which did (12 from the remaining 18) just used the basic linked data principles as a guideline for the process. Other systematic mappings/reviews were carried out in different domains, such as enterprise linked data [73] and education [43,44], and applications such as linked data mashups [85], recommender systems [27], quality assessment [99].
To the best of our knowledge, there is no systematic mapping of linked open government data methodologies in the literature. Government data reflects the structural organization of the different bodies of the public administration. Even though they share the same top governance which provides the general guidelines, usually each public body has autonomy to collect, process and publish under their own norms. Thus, in this work, we sought to make a systematic mapping of methodologies proposed in the literature, to provide a synthetic comparison of the steps, tools, and validations proposed by these methodologies and how they were evaluated. Additionally, we propose a generic model, integrating these findings, and embedding some contemporary practices, such as those in the W3C’s
Methodology
In this paper, we use the systematic mapping method [72], aimed to identify research related to a specific topic to answer a broad question, essentially exploratory (e.g.
Research questions
The research questions defined in this work aim to gather information about how to effectively publish linked open data in government settings, both for the steps involved and for the tools developed to accomplish it. We argue that this is an important contribution to the scientific community and practitioners alike, to describe what has been done and the gaps that should be addressed to systematically publish LOGD. Data quality plays a crucial role in the reuse of government data [9,66,101], so we sought to investigate what tasks were systematically embedded along the process to assure the quality of the published data. Data quality carries many different dimensions [99]. In this work we sought to find any kind of quality procedures, in particular verification and validation steps, involved in the methodologies to assure data quality. At last, we examined how the proposed methodologies were assessed to understand the rigor applied in their evaluation. This is important to understand what are the limitations of the proposals, given the constraints in which they were evaluated. Thus, we defined the following research questions:
The answers to these questions provide a big picture of the relevant literature, with important steps to suggest a clear methodological framework for the publication of LOGD.
Search strategy
The following datasets were used for this systematic mapping since they are the most significant repositories in subjects that involve Computer Science: ACM Digital Library, IEEE Explore, Science Direct, Springer Link, ISI Web of Knowledge and Scopus. Google Scholar was also included since many studies are not indexed by these repositories, but only a fraction of its results were used, as discussed in the threats to validity subsection.
The keywords used to cover the research questions were
Terms used for the search
Terms used for the search
The selection of the studies should reflect the primary works to identify different types of methods used to publish linked open government data. To that end, we elaborated the following criteria:
Inclusion criteria The study provides a process for publishing linked data in government settings as the main contribution; The study is from a peer-reviewed source; The language of the study is English; The text of the study is available;
Exclusion criteria
The study does not present a process for publishing LOGD;
The study is a previous version from another in the list;
The study focuses on the application of LD in a specific domain;
The study only investigates one step of the process;
The study does not investigate linked data, but open data more generally.
The procedure for selecting the primary studies for this mapping was carried out in late March 2020. There was institutional access using the university’s subscriptions to these databases.4 University of São Paulo Integrated Search:

Procedure to select the final studies.
Systematic mappings may present multiple threats to validity [100]. We composed the search string into three aspects: process, publishing, and linked open government data. The use of synonyms was based on textual analysis. These terms, particularly for linked open government data, were difficult to specify, because they had a different ordering of words and sometimes not used together. We acknowledge that some synonyms may be missing, which may cause some studies to be left out. To control the quality of the results, we used the studies described in the W3C Linked Data Best Practices [91] as a control to tune the query string, i.e., the papers cited in the recommendation were also returned by the search string. We also restricted ourselves to the execution of the query in the data repositories, not applying manual searches in other platforms. Some papers were not available, and for those, we searched on the Web for a copy and contacted the first author to try to obtain a copy of the work, but sometimes that was not possible. For Google Scholar, only the first one hundred results were considered, since it returned thousands of links and from that moment on, no other study was selected. At last, the synthesis of the papers were made based on the information provided by the papers’ full text. Any implicit information could not be assumed.
Results
The final selection resulted in 25 primary papers, with dates ranging from 2011 to 2020, which were used to extract information regarding the research questions. Table 2 presents the selected papers. We notice that important studies were made at the beginning of this decade and it has again been leveraged in the last few years. The reason for the creation of these methodologies in the period of 2011–13 is arguably the deployment of governmental open data portals, such as in the USA (2009) [35] and the UK (2010) [80] that released hundreds of datasets in their first years, glimpsing the opportunity for a “Web of data” [12]. However, none of the papers, since 2016, cited a different reason for not existing a large scale production of LOGD. One possible reason is the realization that publishing linked open data encompasses more than technological steps. In the last few years, the concept of a
Final set of primary papers selected
Final set of primary papers selected
This research question aimed to map what are the commonalities and differences among the different methodologies that have been proposed for publishing linked open government data. One first challenge was to find the correct granularity for this. Most of the studies divided the publication into phases and, in turn, in more atomic steps with clearer outputs. To analyze these data, we mapped out all the activities that were explicitly described as an important step in the papers, creating a matrix of steps

Mapping of tasks and studies for the selected methodologies. The last column accounts for the total number of appearances.
Figure 3 lists all the explicit tasks identified and close to their ordering, as described in the papers.
The first step, sometimes implicit, concerns the
Next, some studies consider
One of the pillars of linked data is the unique and persistent identification of data resources. As such, the careful
As different data sources may expose the same information in different representations, there is a need for a consensus on how to represent this data. A step is the In most of the studies the following terms are used interchangeably: vocabularies, taxonomies and ontologies. In this work we use the same approach.
The
Next, the careful
The great advantage of linked data is to
Some studies use this linking to perform the
Having the original data, metadata, and mappings to vocabularies, the step of
A final task in data conversion is the
Since datasets and their distributions change over time, a
An important step for opening data is to
The point in opening government data is to make it available in
With these linked data and metadata resources, one needs to
The publication step can be leveraged by publicizing it and
Some studies point to the importance of
After the publication of the linked data, the government must receive feedback from the data consumers,
With all set, it is important to have a plan to keep all this working overtime. To that end, the studies specify tasks to
As the publication of government data should be planned for decades, other
Observing the last column (total occurances), we can see that some steps are much more present than others. In particular, the very basic steps are:
Although the prescription of tools is not mandatory in a methodology, it surely offers a good starting point for practitioners, in making decisions like
Figure 4 shows the mapping of tools used in the selected studies. When a particular task (on the left) has an empty row (on the right), it means that no tool or concrete guideline was specified in any of the studies.

Artifacts used or suggested by the studies, according to the steps previously identified.11
Given the number of tools, the list of references can be found in the full report, available at:
Although a crucial step for any data project, where it can take up to 60% of total time,12
For the
For the
The
The tools for
The biggest diversity in tools was found for the
Some tools provide the feature of
For the
Once the data is linked, enriched, and converted to RDF, some works applied tools to evaluate the correctness of the RDF generated and enable the
Regarding the
For the
Concerning the
The announcement of the datasets published on the Web is made by tools and practices that
To
For the
As for the
The
A possible categorization for the tools can be made by their generalizability across different tasks – most of the tools are specific to just one step of the process; however, other tools provide most of the steps of the whole process. In the latter category, we may find examples such as OpenRefine and the D2RQ stack.
Based on the coverage of the steps, both tools (Open Refine and D2RQ) can be considered as the most comprehensive ones. In fact, some works use them as the main instruments on the publication of linked open government data (e.g., for Open Refine: W6, W17, W15; for D2RQ: W9, W16). No formal comparisons were found in the literature between both tools, limited to the publishing patterns proposed by [34], based on the underlying data type and storage. However, some empirical works seem to suggest that OpenRefine is very user-friendly with its human-computer interface, but it does not scale well for large datasets.17
A drawback of this list is the discontinuity of the tools. A major part of the tools elicited in Fig. 4 can not be found any longer. Most of them were developed by universities as part of research projects, and, as they ended, so did the evolution of the tools’ features.
Some important gaps were found, such as the lack of tools for the proper management of metadata; an efficient mechanism to version the datasets, coupled to the other tools; features for the engagement with the community; and tools and guidelines for the definition of non-functional requirements.
In this work, we consider the methodologies for publishing linked open government data as artifacts18 Here we adopt the notion of
Evaluation methods adopted in the selected studies
As illustrated in Table 3, 52% of the studies (13 out of 25) did not provide an empirical evaluation in the paper, being restricted to make a list of steps and recommendations, mostly justified by the basic principles of linked data and the 5-stars schema. Most of the papers (12 out of 25) provided illustrative scenarios of the application of the methodology. The actual validations were varied, ranging from the visualization of weather statistics [
In software engineering,
Few studies proposed an explicit phase or mechanisms to make validations throughout the lifecycle of linked data production.
Other studies mention the importance of validations during the process. However, they offered suggestions and did not contemplate dedicated tasks.
Based on the steps extracted from the papers, we built the following process model depicted in Fig. 5, with all the steps grouped by the most common phases present in the studies. In addition, there is a validation step at the end of each phase to ensure that the outputs are correct and valid. Thus, it can be used as a roadmap for LOGD initiatives and resource estimation, where managers may decide what level of formalism should be implemented according to their context.
As publishing linked data is a complex process, we argue that these are essential aspects that must be taken into account in the scenario of publishing open government data as linked data on the Web. In this work, we adopt W3C’s Data on the Web Best Practices [92] as a complementary framework, since it suggests multiple practices aimed to facilitate the interchange of data using Web standards, it is focused on data publishers. Moreover, it enables data consumption both by humans and machines – a desirable point for LOD initiatives. Also, we adopt verification and validation principles at the end of each phase to ensure that the data is being produced with high-quality standards.

Unified process model proposed in this paper. The sequence of tasks flow from left to right, up to bottom. In orange, the mandatory tasks for having linked data, in blue, the optional tasks. Adapted from [70].
The first phase, named
The second phase, the
In the third phase, the
The fourth phase, the
In the fifth phase, the
The final phase, the
This process may also be seen as a lifecycle since the tasks from the exploitation and maintenance phases can lead to refinements of the specification or the collection of new data, making it more usable for the community in further iterations. It is an integrative model, and we acknowledge the lack of formal validation; however, we argue on the utility of this model as a reference for practitioners.
Although the open government data movement is still producing large amounts of data worldwide, the linked data still represents a tiny portion. This work sought to map methodologies developed to publish linked open government data on the Web and propose a unified model covering steps with established and modern practices. As the main contribution, our model raises awareness on multiple aspects that should be considered when publishing open government linked data. Thus, adopting specific steps depends on assessing the risks for not considering those steps and their impacts on the data community.
According to our search results, multiple studies in the last few years concerning applying a method to create linked data for a particular purpose, sometimes based on one of the studies listed here and, most of the time, by creating an ad-hoc approach for their problems.
The justification is that the existing methodologies are too generic and do not consider the particularities of their domain. Some domains were more prevalent than others in linked data applications: geographical data, e-procurement, agricultural and environmental data, smart cities, and legislative data. Also, a subset of the studies that were ruled out investigated just one or a few steps of the whole process – for example, techniques for data quality enhancement, automatic interlinking of datasets, vocabularies/ontologies development, the licensing resolution, semantic data extraction from HTML tables, among others.
Nonetheless, as pointed in [45], the existing Linked Data methodologies have a varying number of steps but still generally cover the same activities. The main difference between the methodologies is the grouping of actions within different steps and on different levels of granularity. However, apart from some apparent differences, which we will further examine, they cover the palette of actions involved in the process of generating and publishing a linked dataset and thus can be grouped into six general phases, as exemplified in Fig. 5. We argue that the model proposed in this work can be applied in different domains with varying strategies.
As this is a relatively mature area, we considered starting from established practices from the literature and analyzing the different aspects that are embedded in other methodologies. Therefore, we posed different research questions to discover and triangulate the steps and tools in each methodology and how they were empirically evaluated. Additionally, we also sought to investigate the specific tasks related to quality control in these processes – since this is also an important issue, as pointed in the literature.
Regarding our first research question, we showed the commonalities of the different methodologies. Most of the studies addressed the primary tasks of selecting data sources, converting them to RDF, linking them to other datasets, and publishing the resulting files. Although these are all essential tasks to publish linked data, some studies did not explicitly mention it. For example,
The second research question assessed how these methodologies prescribed tools or practices to support their execution. The use of tools may be considered a systematic substantiation of the methodology since it provides a common ground that can be applied and compared in different situations. However, most studies suggested a small set of tools or just a single one to different steps. This oversimplification may also be a reason why they are perceived as too generic in later works. The major exception in this list was
Our third research question assessed how these methodologies were evaluated in their original proposal. The assessment framework we adopted here was based on the literature of information systems and design science research, which focuses on the design, development, and evaluation of artifacts to address real-world problems [36]. The artifact type here is a method, i.e., actionable instructions that are conceptual, not algorithmic. An essential phase of this framework is the evaluation process, with different degrees of formality. We found that some of the selected papers did not present any empirical evaluation of the methodology (logical arguments), primarily written to be used as a tutorial or a set of best practices rather than a formal inquiry. That is arguably another reason why they are perceived as too generic and not adopted in later works. Three studies (
Our fourth research question explored what quality control validations were employed during the process of linked data production. As pointed previously, data quality is still an essential issue for linked open data on the Web, so a validation model throughout the process could bring benefits to the availability of the final product. Our findings show that few studies presented explicit validation tasks during the process. Most of the studies either just recommended that some steps would be advisable or did not include it at all. The studies which did specify them either did not evaluate them with a real case study or did it for specific steps of the process – particularly, to validate the format of the input data (mostly, tabular data) or to validate the links to other datasets identified automatically. The exception was again We consider as
We list in this section some possible research directions concerning improvements in methodologies to publish LOGD, in general. Other important aspects, such as data consumption, are out of the scope of this work.
Considering all the variabilities and commonalities from the different methodologies, we consider creating a process model for publishing LOGD. Since we have core activities that appear to be shared to all the contexts (
Methodologically, illustrative scenarios provided are cross-sectional studies where the methodology was applied and evaluated for feasibility. It would be interesting to have longitudinal studies where the application of the method is evaluated over time. Furthermore, it should consider the usage of the linked data, how the methodology evolved in the context in which it was applied, and drive for the maintenance phase requirements. Although illustrative scenarios are helpful to demonstrate how it can be applied with actual data, the production of (linked) open data is a sociotechnical process [98,101] through which there is a continual interplay between technological (process, tasks, technology) and social aspects (relationships, reward systems, authority structures) which may result in additional requirements to be sustained over time. Besides, the papers did not seem to consider how their methodologies fit into the context of public organizations – their administrative structures, hierarchies, the need for communication and sharing of information, among other aspects, focusing on the technological aspects of publishing data.
The inclusion of explicit validation steps along the process may be helpful to ensure a higher quality product early on in the process. Some validations can be automated, particularly concerning structural aspects, and some may be considered prone to human analysis, especially in semantic modeling. Methodologies such as the V-model [29] for software development considers a validation point after the end of each phase and could be adapted to this end. Also, the application of acceptance criteria for user stories from agile methods could be applied. Quality frameworks such as the one provided by Zaveri et al. [99] and the Data on the Web Best Practices [92] could be used to support these steps.
Another research direction is the possibility to make large-scale deployment, reusing legacy open data. A large amount of structured and semi-structured data is already available in most countries and provides a valuable source to ‘cross the chasm’ and reach network effects on the already existing data. The task that requires the most effort is arguably modeling the data, either by carefully selecting existing and validated vocabularies or creating new ones for each of the datasets and their distributions over time. We argue that this could be achieved by deriving ontologies from the data files, from simple automatic mappings [10] to more elaborate approaches [30,74] as a starting point, leveraging the mature state of the data, applying a pragmatic perspective of linked data [67], which considers ontologies as a lightweight representation tool for an open and decentralized environment like the Web. The evolution of these vocabularies could be done collaboratively by data consumers and domain specialists inside or outside the government’s scope – thus, also decentralized. In addition, since the same information can be structured in many different forms, the standardization of both file format and information structure may be necessary, which involves the collaboration between public administration and certain communities (W3C, Open Government Partnership, Schema.org, etc.). A good starting point may be on the development of ontologies for the most common categories of government data currently published (government budget and national statistics, as shown in the Introduction), such as [21] or the Core Vocabularies from the ISA2 Initiative from the European Commission.23
As argued previously, the distributed nature of the Web makes it difficult to assure that all linked components are working or have high quality over time. Besides, the lifecycle of governmental datasets is very dynamic, reflecting administrative changes, domain refinement, new legislation or guidelines around the data, etc. Keeping track of these changes and making them transparently available is a big challenge. Thus, the maintenance phase is critical and should be developed further to monitor if what was once produced remains valid in this decentralized context.
We also point to the importance of success stories and pilots reports on the adoption of LOGD by public administrators. It may help promote and clarify this approach for adopting the practices and their challenges by detailing implementation steps and organizational contexts,. Initiatives such as the European Commission’s Semantic Interoperability Community (SEMIC) pilots24
Publishing LOGD is a complex social-technical task [98,101]. Although the release of OGD is still growing, the steps to transform it to linked data – with high quality – is an open issue. As discussed in this work, there are relatively few linked data on the Web, and they present quality problems. Although this is a complex multidimensional phenomenon, some technological and methodological approaches may support its development. Some methodologies were carefully designed, but it seems that they failed to base later works on publishing linked open government data. As argued in [90], there is no one-size-fits-all process and set of tools to publish linked data, given the different contexts, data sources, technologies, etc. However, the products of the process and most of the steps to achieve it are shared among different approaches. This paper followed this rationale by deducing what has been done in different contexts and deriving a unified methodology with practices adopted during the last decade.
Footnotes
Acknowledgements
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001. The authors also would like to thank the reviewers for the thorough feedback on the manuscript.
