Abstract
The past decade has witnessed a rapid development of open government data practices and academic research. However, there is no systematic survey of existing research to understand the evolution of open government data. Such research can facilitate knowledge transfer within and across domains, and foster learning for countries in the early stages of open government data development. This study quantitively extracted the evolution trajectory of open government data based on the main path analysis method and then analysed the underlying motivations. The results show that open government data research went through four main phases and that the open government data movement has spread towards developing countries and smart cities. Different challenges and issues faced by the researchers in each phase drove the evolution of open government data research. Finally, we discuss future directions of open government data research based on our findings and recent development. There is a tendency to create sustainable open government data and smartness by employing artificial intelligence and creating data marketplaces.
Points for practitioners
Open government data efforts have evolved over the years into a global phenomenon. Countries have learned from each other and more and more efforts are focused on innovating with open government data by stimulating co-creation and using other incentives. The way that data are opened should focus on achieving goals like innovation, participation, transparency and accountability. There is a tendency to create sustainable open government data and smartness by employing artificial intelligence and creating data marketplaces.
Keywords
Introduction
Open government data (OGD) initiatives have spread rapidly in recent years (Johnson and Robinson, 2014). OGD refers to publishing public sector information in open and reusable formats, without restriction or monetary charge for use by society (Kalampokis et al., 2011). This movement’s main purpose is to ensure transparent administration and stimulate citizen participation and engagement. Besides this, OGD can help to generate public value through innovation (Janssen et al., 2012; Zuiderwijk and Janssen, 2014a). Under these motivations, the European Union (EU) and the US took the lead in launching OGD activities. There has also been a recent spurt of OGD initiatives in other parts of the globe, where both the number of OGD websites and data sets have increased steadily (Saxena, 2017).
Similarly, academic research about OGD has also developed at a fast pace. There is currently an abundance of studies on OGD. For example, some publications have examined the factors that triggered the adoption of OGD initiatives across government agencies (Coglianese, 2009; Hossain and Chan, 2015; Zhao and Fan, 2018), while some studies have explored the innovation activities driven by OGD (Mergel et al., 2018), and some have evaluated the quality and effects of OGD initiatives (Sayogo et al., 2014; Zuiderwijk et al., 2019).
There have been several literature reviews on OGD. To provide guidance for future research development, Hossain et al. (2016) assessed the status of OGD research from three levels and proposed future directions, whereas Attard et al. (2015) investigated existing OGD tools and approaches, and extracted the challenges and issues that hinder initiatives from achieving their full potential. Finally, Safarov et al. (2017) attempted to frame the utilization of open government data to suggest future research directions. These reviews are beneficial to research development but no attempt has been made to understand research evolution.
Understanding the evolution of OGD research can help researchers comprehend the key developments, why certain developments happened and how OGD will develop in the future, facilitating knowledge transfer within and across domains (Barbieri et al., 2016; Liang et al., 2016). OGD initiatives were launched at different times in different countries, and countries might address different challenges. Research development derived from the evolution of studies in Western countries may help OGD action in developing countries (Saxena, 2018), despite the different drivers and situations.
This study carries out main path analysis to trace the OGD evolution trajectory from 2000 to 2019. The results of this are used to scrutinize the research topics, theoretical foundations and research motivations of previous research. Main path analysis, combined with motivation analysis, was used to identify emerging research frontiers. The following research questions are specifically addressed:
How has OGD research evolved over time? What are the motivations underlying the evolution trajectory? What further development and research needs are anticipated?
The second section of this article introduces the research method and the data-retrieval process. The third section presents the results of the main path analysis and motivation analysis. The fourth section contains a discussion of the findings and a future research agenda. Conclusions, implications for research and practice, and limitations are presented in the fifth section.
Research method and data
Research method
To achieve the aforementioned research aims, we employed two research methods: bibliometric and document analysis. Figure 1 shows how the two methods work in conjunction with each other. 1 The bibliometric analysis is used to obtain the main path of the research domain, while the document analysis is used to analyse the topics of the core articles on the main path and the research motivations. Finally, based on the analysis results and the latest research progress, future research progress is proposed.
Bibliometric analysis is a method that can be used to analyse the scientific development processes and structural relationships of a research domain based on citation, co-citation and authors (Chen, 2004). The citations in publications contain rich information on how knowledge disseminates or flows (Chen et al., 2013). Main path analysis is a common citation-based technique that aims to extract evolutionary trajectories from citation networks.
The first step is the construction of an article citation network. Thereafter, weights of the links in the network are calculated. One measure suggested by Hummon and Dereian (1989) is the Search Path Link Count (SPLC). The SPLC of a directed link counts all possible paths in the network traversed through the link (Batagelj, 2003). The SPLC is considered as the best measure for traversal weight since a higher SPLC value indicates a higher number of search paths (Batagelj, 2003; Hummon et al., 1990). Finally, the main path was extracted based on the traversal weight of links. Existing studies have introduced several search algorithms for selecting the most important links, namely, the local approach, global main path and key-route search methods (Liu and Lu, 2012). The local approach chooses the links with the largest traversal count from the current start point, while the global main path is the path that has the largest overall traversal count (Liu and Lu, 2012). In comparison with the local main path, the global main path emphasizes the overall importance of knowledge flow. The key-route search enables the inclusion of the top links and selects them as seeds to search backward and forward. The key-route search can be either local or global (Liu et al., 2019). To obtain the main evolution path, we constructed the
The analysis of the research topics of the core nodes in the main path can shed light on the evolution of research activities in the area (Bindu et al., 2019). One of the commonly used methods of topic modelling is based on the latent Dirichlet allocation (LDA) model, which is designed to process large archives of documents based on latent topics characterized by a distribution over words (Blei and Jordan, 2003). However, it is theoretically impossible to identify topics from a small number of documents (Tang et al., 2014). Moreover, there is no best way to decide the right number of topics (Arun et al., 2010). We selected document analysis to capture a diversity of topics and to conduct the motivation analysis.
Data collection
The literature analysis starts from selecting appropriate data sources and metadata. The Web of Science (WoS) is widely considered the most comprehensive and high-quality database for scholarly work, as it indexes thousands of prominent journals (Dahlander and Gann, 2010). Therefore, we adopted the WoS Core Collection as our data source.
Establishing search keywords is particularly crucial to obtaining an accurate literature dataset (Lu and Liu, 2014). First, we searched the articles in the WoS and obtained a small number of studies. Then, the title and abstract of these articles were analysed to obtain a list of candidate terms. After consultation with experts and discussion among the authors, we finally determined ‘open govern*’, ‘open government data’ and ‘open data’ as final search terms. Considering that the EU published the Reuse of Public Sector Information Directive (PSI Directive) in 2003 and research articles come earlier than practice, we retrieved the literature published from 2000 to 2019. To reduce the risk of bias, we used the following inclusion and exclusion criteria to select articles for use in our review:
we only considered articles and conference papers that were written in English; and we only included government data publication studies, so studies on other open data domains such as research data were excluded.
We then performed a manual study selection by reading the titles and abstracts. This resulted in a total of 1008 publications obtained for analysis.
Results
We retrieved the literature data about OGD from the WoS between 2000 and 2019. As shown in Figure 2, the period from 2000 to 2008 did not yield much literature; however, the previous research explored open data and the open government issue (e.g. Brooks, 2008; Horsley, 2006). The number of articles increased significantly in subsequent years, which can be linked to the Open Government Directive at the end of 2009. Thereafter, OGD research received widespread attention.
Tracing the main evolution path
Our first question concerns the evolution of OGD over time. We obtained the global main path shown in Figure 3. After that, we analysed the topic evolution of articles in the main path using the document analysis method. Table 1 lists the detail of milestone articles on the main path.
According to the difference in research topics, the evolution’s main path can be classified in four phases, starting with the
Phase 1: OGD launch
The research emphasis in the first phase of OGD evolution was
Phase 2: Evaluation and learning
Following the development of the OGD movement, the next issue was the evaluation of OGD efforts. The major task in the second phase was
The second branch was continued by Janssen and Zuiderwijk (2014), who pointed out the need for
Phase 3: OGD adoption and use
The third phase’s research trend is
Phase 4: Implementation and comparison among countries
OGD initiatives originated from Western countries but recent studies have increasingly focused on
Table 1 shows the shifts to research conducted in other countries and the change of use of theory models by researchers. Most of the researchers who focused on OGD in the first phase mainly came from the US, while in the second phase, European researchers played an important role. After that, researchers from developing countries, such as China and India, joined this domain. Regarding research theories, the initial research was largely empirical and focused on the US and EU practice, whereas in the next phases, researchers started to translate existing theories to the OGD domain. For example, the technology acceptance model (TAM) and its extended model were used to analyse users’ adoption behaviour. In addition, Zuiderwijk and Janssen (2014b) developed the context–input–output model to evaluate open data policy implementation.
Motivation of evolution
One of the essential research motivations is to face challenges or solve current problems (Kothari, 2004), which is also the motivation for evolution. Such an analysis can help to identify which issues have not fully matured yet. The results are summarized in Table 2.
Phase 1: Creating transparency, participation, innovation and economic value
OGD initiatives were initially launched by the EU and US. The aim of European open data strategies was to develop the information market, enhance collaboration between public sectors and third parties, and drive innovation by disclosing data. In the US, President Obama (2007) promised to ‘create a transparent and connected democracy’ through OGD and to change the sphere of mistrust.
These initiatives have the potential to
Phase 2: Going beyond a technical focus and the national level
The concept of linked open government data was widely spread in the information system research domain. Many discussions on collecting, publishing and using data effectively are based on a technical perspective (Shadbolt et al., 2012; Stadler et al., 2012). However,
With the OGD movement’s progress, all kinds of public organizations have come under pressure to release their data since
Despite the narratives in the first phase emphasizing the transformative potential of OGD,
Phase 3: Usage lags behind and different open data policies
To realize the benefits of opening data, OGD need to be used by various stakeholders, particularly in private organizations. There are very
As the OGD movement spread, researchers started
Phase 4: Less focus on open data programmes in developing countries
Academic research on OGD was more concentrated in the West (e.g. Huijboom and Van den Broek, 2011; Ruijer et al., 2017), and
Future research agenda
Based on the preceding analysis and recent research advancements, a framework for a future research agenda was developed, as shown in Figure 4. It includes investigating the
Goal: purpose and benefits of OGD
As we mentioned in the motivation analysis, there are different objectives for OGD emphasized across countries and by different government levels. For example, in Europe, supporters of OGD often stress its economic value, whereas reasons given by developing countries are often strongly focused on transparency, accountability and citizen participation (Schwegmann, 2012). Prior research also found that the reason driving openness at the subnational level is the national or local legal framework (Cañares and Shekhar, 2016), or political leadership (Hossain et al., 2016), rather than the perceived benefits from open data (Yang and Wu, 2016).
Furthermore, the interpretation of OGD objectives such as transparency and accountability differs. People are likely to give different meanings to what constitutes transparency (Matheus and Janssen, 2019). In some countries, the publication of data is viewed as a form of transparency, whereas other countries consider that there is only transparency if the public are also able to interpret the data (van Zyl, 2014). Therefore, there is a need to clarify concepts and create consensus.
Process: artificial intelligence for creating smartness in OGD
In the motivation analysis, we found that the usage of OGD is lagging behind and there is a lack of evidence of creating value from OGD. With the rise of cloud computing and artificial intelligence (AI) technology, governments tend to employ data analytics through AI to create more value from open data (Gao and Janssen, 2020). Data analytic tools are often employed to gain new ideas from OGD. The emerging field of AI provides the potential to create even more value from OGD. This is creating a new research area of AI in open data and research questions on what AI techniques can be used and how value can be created from OGD using AI.
The publication of data needs to be accompanied by an infrastructure that is able to handle the data in a user-friendly way in order to lower the threshold for users (Janssen and Zuiderwijk, 2014). What should such an infrastructure look like and how should it be designed? Moreover, the internet of things (IoT) is opening a large amount of data collected by sensors. How can IoT be employed for open data? Open data infrastructures might integrate mechanisms to open data in real time and provide tools for the public to process data.
Result: innovating with OGD
Innovation can be defined as ‘the implementation of a new or significantly improved product (good or service)’ (OECD and Statistical Office of the European Communities, 2005: 46). As such, innovating with OGD can be described as users employing open data to create new applications or new insights into the current situation. The motivation analysis showed that usage lagged behind. To address this problem, some scholars focused on the adoption behaviour of open data users (e.g. Zuiderwijk et al., 2015). Some other researchers studied the open data platform and innovations after the adoption of open data (Ojo et al., 2015, 2016).
However, the issues about the open data innovation process or the outcome and impact of open data innovation have not been fully discussed. It is generally anticipated that OGD will contribute to economic growth and social development. However, the economic and social impact and success factors are yet to be reported. Future research is needed to understand the economic and social contributions of open data better. What are successful business models for open data? How do they create products and social value?
On the other hand, innovating with OGD needs high-quality data. Political decisions direct or mandate agencies to disclose data but bureaucrats may decide to release less useful or less valuable data (Peled, 2011; Zuiderwijk and Janssen, 2014b). Future research may also pay attention to how to change bureaucrats’ willingness to open more data by gamification (Kleiman, 2019; Kleiman et al., 2020). In addition, innovation contests like hackathons could help create new ideas that can be transformed into applications or services. This does raise further research questions, such as how to support the later stage of open data innovation (Juell-Skielse et al., 2014). How can governments meet the needs of users by creating an open data market while at the same time ensuring privacy?
Theory: an integral approach of OGD
As mentioned earlier, the case-study method was frequently adopted to obtain empirical knowledge in the initial stage of OGD. After that, theoretical models from other domains were adopted to answer research questions. For example, the multi-level perspective was applied to explore the nature of the barriers currently faced by the OGD agenda (Martin, 2014), and the TAM and its extended model were used to analyse the adoption behaviour of users (Zuiderwijk et al., 2015). In addition, innovation diffusion theory (Ojo et al., 2015) and social capital theory (O’Hara, 2012) are also used to study open data issues. All these researchers focused on a single area, whereas Harrison et al. (2012) advocated an ecosystem view. There is no integral theory or framework to understand the whole process of open data, including the purpose of open data, the innovation process and its outcome or impact. Therefore, we expect more integral approaches towards OGD to rise.
Direction: sustainable OGD initiatives
We also need more investigation to ensure the sustainability of open data initiatives. OGD might not bring the value as expected, and governments might decide not to update their data continuously. OGD sustainability means that governmental data continue to be released and used regularly with at least the same or improved quality and quantity. Only when data are relevant, up to date and accurate can they generate more value. Furthermore, sustainability is essential for companies that are creating OGD-based business models or for transparency efforts. If OGD are not available or changes, then their business model might fail. Yet, there is decreasing evidence of sustained scaled-up open data examples, particularly in developing countries (World Bank Group, 2016). What makes OGD initiatives sustainable? Who is already using the data? Who will pay for the release of these data?
Conclusion
This study has systematically investigated the evolution of OGD research and proposed future research directions. We analysed the evolutionary trajectory of OGD and the underlying motivation. The themes represent four major phases of evolution. The first phase focused on creating transparency, participation, innovation and economic value, as characterized by the publication of OGD initiatives (e.g. the PSI Directive). In the second phase, researchers realized that the discussion on OGD projects should go beyond the technical and national level, and focus on finding more evidence for creating value from data. In this phase, benefits and barriers, the infomediary business model, and the open data ecosystem approach of OGD initiatives were introduced and studied. The third phase was mainly motivated by the limited usages of OGD. The academic research focus shifted to open data adoption and usage, and the evaluation of open data policies. The fourth stage focused on the implementation of open data initiatives and learning through comparisons of initiatives among countries. The next phase is expected to focus on ensuring the sustainability of OGD initiatives and the use of IoT and AI to create smartness, which should result in the creation of more value from open data. Finally, we derived future OGD research directions, which include re-understanding the purpose and benefits of OGD, creating smartness by employing AI and data marketplaces, innovating with OGD, developing a more integral approach, and ensuring the sustainability of OGD initiatives.
This article provides some implications for researchers and practitioners. First, the global main path concisely describes the major knowledge flow and helps researchers quickly obtain a clear understanding of OGD research through its basic knowledge framework. Second, the main path analysis method can be applied to conduct comprehensive reviews of other important subfields. Third, the article provides a future research agenda framework that can be used to predict trends of OGD development for both OGD researchers and practitioners in the near future. Future recommendations for OGD practitioners are to re-examine the purpose of the expected benefits of OGD and apply the latest technology to open data to create a smart and open government.
Although our study was as comprehensive as possible, there are still some limitations. First, citation motivation may influence the accuracy of our main path analysis. A theory called ‘remote citation’ argues that an article cites others not because of a close connection with the main subject, but merely because of a connection in a broad sense (Liu et al., 2013). This type of citation may influence the main path analysis result. In the future, multiple main paths could be investigated. Next, even though the WoS is one of the most comprehensive sources of scientific articles, it still cannot guarantee the coverage of all possible OGD publications. Future research could include a broader range of articles, including those written in other languages.
Supplemental Material
sj-pdf-1-ras-10.1177_00208523211009955 - Supplemental material for Understanding the evolution of open government data research: towards open data sustainability and smartness
Supplemental material, sj-pdf-1-ras-10.1177_00208523211009955 for Understanding the evolution of open government data research: towards open data sustainability and smartness by Yingying Gao, Marijn Janssen and Congcong Zhang in International Review of Administrative Sciences
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the NSFC project of China (No. 71734002). Furthermore, Yingying Gao was sponsored by the China Scholarship Council. Part of this work is funded by the European Commission within the Erasmus+ programme in the context of the CAP4CITY project (see:
) under grant agreement No. 598273 EPP 1 2018 1 AT EPPKA2 CBHE JP.
Notes
![]()
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
