Abstract
Despite the efforts made by most African countries in incubating the practice of research data sharing, little is known on the status of the research data sphere in the African landscape. This study was conducted to establish the status of research data and research data repositories (RDRs) in Africa. Specifically, the study intended to identify the country-wise contributions in terms of research data and research data repositories, trends of publication of research data by African countries, content type and the nature of research data repositories hosted by African countries. The study applied the quantitative research approach. The registry of research data repositories (re3data), which is the prominent worldwide registry for the research data repository, and the data citation index (DCI) hosted by the Web of Science were used in this study. Re3data and DCI index research data repositories and datasets respectively, from almost all the countries around the world. The study identified parameters that were extracted from re3data and DCI using the browse and search facilities. The extracted data were recorded into Google Sheets, refined, and analysed using MS Office Excel. The study found little contribution from the African countries regarding research data and research data repositories to re3data and DCI. The study found further that, despite the presence of a few repositories, most of them had restricted access to external users. This study concludes that African countries are still having a lot to do for the research data practices to be fully incubated. The study suggests that governments and other stakeholders should improve the facilitating conditions, including establishing relevant policies and infrastructures for the practice to be fully embraced.
Keywords
Introduction
The advancement of Information and Communication Technologies (ICTs) have changed the way researchers keep and store their research data (Buhomoli and Muneja 2023a). With the advancement of ICTs, research data are easily captured, collected, stored and shared to other researchers formally and informally (Peng et al., 2019). Some institutions have formalised their research data-sharing practices by establishing research data repositories, policies, and guidelines that dictate where data could be shared, who should access and how data could be accessed (Borghi and Van Gulick, 2021; David et al., 2022; Ramsay, 2022). Some journals have also introduced mandates that require research data to be archived along with the publication of journal articles (Bedeker et al., 2022; Borghi and Van Gulick 2021). Thus increasing and facilitating the research data-sharing practices among the scholars.
The practices and status of research data sharing vary across the globe and among African countries as well. The practice is more dominant in developed countries as compared to developing countries (Archana and Padmakumar 2023; Onyancha, 2016). In most African countries the practice of research data sharing is still in the infancy stages (Buhomoli and Muneja, 2023a). Though the concept of research data sharing was first introduced in the early 2000s, the concept has been spreading at a very low pace (Borghi and Van Gulick, 2019; Katabalwa et al. 2021). This has been partly associated with the limited evolution of ICTs for some of the African countries, the presence of contradicting laws, policies, and other regulatory frameworks that hinder the full adoption of research data sharing in their respective countries (Archana and Padmakumar, 2023; McMaster University, 2022; Thoegersen and Borlund, 2022). Moreover, lack of political will, poor mechanisms for data quality assurance, and insufficient knowledge and skills related to research data management are other factors for the low uptake of research data sharing through research data repositories (RDRs) (McMaster University, 2022). The low incubation of research data-sharing practices in most African countries has led the researchers, funders, and the government to continue suffering from the duplication of research findings, data recollection, poor mechanisms for critiquing and verifying research findings, thus wastage of money and time (Bangani and Moyo, 2019; Onyancha, 2016).
The African countries have made efforts to advance the research data sphere. These efforts include but are not limited to; technological infrastructural developments such as the establishment of fibre optic networks, advancements in digital technologies, formulation of related legal, policies and other regulatory frameworks, provision of research grants, and the establishment of institutions dealing with research data management (Avuglah, 2020; Buhomoli and Muneja 2023a; Elsayed and Saleh, 2018; Lawal et al., 2017; Ng’eno and Mutula, 2018; Onyancha, 2016). Despite these efforts, the status of contributions of research data and RDRs by African countries needs to be better articulated. There is little evidence of the presence of the documented established status of research data and RDRs in Africa using a prominent registry and data citation index (DCI). Therefore, this study examined the African landscape of research data using the re3data and DCI.
Re3data (https://www.re3data.org/) is the registry that indexes the RDRs around the globe. It was established in 2012 by the German Research Foundation (DFG) (Archana and Padmakumar, 2023; Boyd, 2021). The registry indexes the RDRs from a wide range of academic backgrounds (Kindling et al., 2017). Re3data aims to promote the culture of research data sharing. Funders, publishers, academic institutions, and researchers have used it to identify areas where datasets can be uploaded or accessed (McMaster University, 2022; Pampel et al., 2013). Re3data also provides detailed information about RDRs, including their technical standards, policies, and scope. On the other hand, DCI is the indexing and searchable tool offered by Clarivate Analytics, which is one of the components of Web of Science (Onyancha, 2016; Patience et al., 2017). DCI provides the means to reveal and cite datasets as well as track their impact (Onyancha, 2016). Re3data and DCI have complementary roles in research data management practices ecosystems. While re3data provides an exhaustive registry of RDRs, DCI provides the means to enhance the discoverability and track the usage and citation of datasets housed within these RDRs. This complementarity enhances research data and the discovery, access and usage of datasets (Archana and Padmakumar, 2023; Onyancha, 2016; Patience et al., 2017).
Despite the presence of re3data and DCI which act as the tools for indexing research data and RDRs, the African landscape of the same is not well established. The status of RDRs in Africa is not well documented. This has led to a limited understanding of the RDRs in the African context. It has also led the research institutions, policymakers, researchers, and other stakeholders to miss opportunities that are associated with research data sharing (Lawal et al., 2017; Onyancha, 2016). Developed inefficient policies capable of addressing research data sharing in the African context and underutilisation of African RDRs are among the issues brought about by insufficient documentation of research data sharing (Spallek et al., 2019). Following these, the landscape of the research data in African countries is not well known, as well as their contributions in terms of research data and RDRs. Moreover, few studies have been conducted in this area using the re3data and DCI. Misgar et al. (2022) concentrated on open RDRs by Brazil, Russia, India, China, and South Africa (BRICS) countries using re3data, Archana and Padmakumar (2023) examined the status of Indian RDRs indexed in re3data, Kindling et al. (2017) assessed the landscape of RDRs in 2015, Khan et al. (2023) performed the overview of the RDRs using re3data, while Onyancha (2016) used DCI to study open research data in Sub-Sahara Africa. There were no enough evidence of the studies that were conducted to assess the landscape of the African RDRs using re3data and DCI. This is the gap that this study intends to fill.
Purpose of the study
The study aimed to investigate the African research data landscape using re3data and DCI within the diffusion of innovation theory, with the view to develop guidelines for policy development and practices.
Specific research questions
The following research questions guided the study.
What are the extent and trends of African country's contributions to research data and RDRs? What types of African RDRs are indexed in re3data? What is the level of variability in research data access and upload policy among the African RDRs indexed in re3data? What types of subjects and contents are presented in African RDRs indexed in re3data and DCI? Which standards have been adopted by the African RDRs indexed in the re3data? What types of software are employed in African RDRs indexed in the re3data?
Literature review
This section provides a review of the related literature. The section is organised according to the research questions guiding this study.
Country-wise contribution of research data and RDRs
Most of the scholars (Buhomoli and Muneja, 2022; Cho, 2019; Kim and Choi 2017; Moyo and Bangani, 2023; Mushi et al., 2020; Onyancha, 2016) agree that one of the best places for sharing research data is through the research data repository. The study by Kindling et al. (2017), which assessed the landscape of RDRs in 2015 using re3data, showed that the USA was leading in terms of the number of repositories, followed by Germany and then the UK. These results were also supported by those of Khan et al. (2023) who found that the USA, Germany, and the UK pioneered these practices. On the other hand, India, ranked 12th by Khan et al. (2023) among the countries with a large number of RDRs, was named the leading country among BRICS by the study of Misgar et al. (2022). However, contrary to Misgar et al. (2022), China appeared to lead in terms of RDRs in a study conducted by Khan et al. (2023). The reason for these variations could be the time the two studies were conducted, as the re3data is constantly updated (Boyd, 2021; Khan et al., 2023; Misgar et al., 2022). The differences also entail that China did a lot of efforts, which contributed to the rise of the number of data repositories from 28 reported by Misgar et al. (2022) to 73 reported by Khan et al. (2023). On the same note, Onyancha (2016), who performed the study on open research data for Sub-Saharan Africa using DCI, found that South Africa was leading in terms of a number of data repositories and data documents. Despite the overall increase in datasets, the study also noted the variations of datasets year after year brought about by research activities, policy changes, and advancements in technology (Khan et al., 2023; Misgar et al., 2022; Onyancha, 2016).
RDRs types and institutional responsibilities
The re3data indexes three types of RDRs. These are institutional RDRs, disciplinary RDRs and “other” types of RDRs (Boyd, 2021). Institutional RDRs are the types of RDRs that are hosted within the institution (Kim and Choi, 2017). Under this type (institutional RDRs) of RDRs, the institution is the one which is responsible for the management and administration of RDRs (Boyd, 2021; Kim and Choi, 2017). Normally, the Institutional RDRs only host datasets from the scholars who are found within the institution, though in some cases they also allow other members from outside their institution to archive their datasets, depending on the regulatory frameworks that guide the management of RDRs (Khan et al., 2023; Pampel et al., 2013). On the other hand, disciplinary RDRs are the types of RDRs that host datasets based on a certain academic domain (Pampel et al., 2013). They are normally owned by professional associations or institutions whose members have related academic orientations (Archana and Padmakumar, 2023; Kim and Choi, 2017; Kindling et al., 2017). The “other” type category, includes all RDRs that do not either fall under institutional RDRs or disciplinary RDRs (Khan et al., 2023). This categorisation of RDRs serves the discoverability of the RDRs as they are organised according to their primary focus.
Additionally, all these types of RDRs may be categorised under four institutional responsibility types (Boyd, 2021). The re3data categorises these institutional responsibilities as general, technical, funding, and sponsoring (Archana and Padmakumar 2023). The institutional responsibility type describes the specific roles and responsibilities that institutions have to perform in connection to RDRs (Kindling et al., 2017). The general responsibilities show the broader responsibility that has to be performed by an institution. This aspect of responsibilities signifies the overall roles in the maintenance and operation of RDRs, including funding, technical and administrative and governance of RDRs (Archana and Padmakumar, 2023). RDRs that are categorised under technical responsibility type their support is basically based on RDR infrastructures for supporting RDR operations (Pampel et al., 2013). The support may include software, hardware, server hosting, technical expertise, and other related matters. Responsibility support, categorised as funding, plays a key role in providing financial support for managing and operating RDRs. The RDRs categorised under sponsoring in the re3data, their major role is the provision of support in all roles related to RDRs (Archana and Padmakumar, 2023; Cho, 2019; Pampel et al., 2013). The categorisation implies that institutions must consider how the repository would be supported before establishing any research data repository.
Variability in research data access and upload policy for the RDRs indexed in Re3data
The major aim of establishing RDRs is to ensure the reusability of data (Boyd, 2021; Pampel et al., 2013). The reusability of data can be ensured by archiving data and restricting them from being accessed locally or by authorised people, but the wide reusability of research data can be achieved by ensuring its accessibility to the wide community including the public (Pampel et al., 2013). According to the re3data, access to RDRs is categorised into three categories. The first level of access is access to the database of the RDRs, the second category is access to research data sets, and the third category is access to upload RDRs (Boyd, 2021; Pampel et al., 2013). Moreover, for each category (database, dataset access and access to upload) the access can be closed, restricted, embargoed, and/or open (Boyd, 2021; Kindling et al., 2017). The closed access level means the external repository user cannot overcome the stated access barrier (Pampel et al., 2013). Restricted access means the external repository user may be able to overcome the stated access barrier (Pampel et al., 2013). Embargoed implies that the external user cannot overcome the stated access barrier until the data sets are released either as open or restricted (Boyd, 2021). On the other hand, open means the facilities are open and there is no barrier to access at all (Boyd, 2021; Pampel et al., 2013). This categorisation shows the importance of balancing research data management practices with ethical and security considerations. Kindling et al. (2017) found that 468 RDRs required registration, 102 fee payments, and 17 required institutional membership for the data to be accessed. The findings emphasise the complexities in deciding research data accessibility policies.
Subject and content representation of RDRs in the Re3data
The re3data requires the data archiver to specify the types of data contents being archived together with the subject at which the archived data can be categorised. The study done by Zhi-Feng (2014) showed that the prominent data type was scientific and statistical data (370), while the least prominent was configuration data (14). The prominent content type being scientific and statistical data and the least being configuration data was also noted by Khan et al. (2023) and Kindling et al. (2017). Likewise, subjects are also built into the system, whereby a person who is archiving data is required to specify the subject category. Zhi-Feng (2014) showed the natural sciences subjects followed by life sciences as the leading subject and computer science and engineering subjects as the least covered subjects archived in RDRs during that particular time. These findings contradict those by Khan et al. (2023) and Misgar et al. (2022) who found that humanities and social sciences were leading, followed by life sciences and natural sciences. On the other hand, (Kindling et al., 2017) found the natural sciences to be the leading subjects followed by life sciences and, humanities and social sciences.
Standards adopted by RDRs in the re3data
Standards are very important for making the RDRs uniform. They are also very important in the interoperability of research data, which aids the research data to be findable, accessible, interoperable, and reusable (Boyd, 2021; Pampel et al., 2013). It is easier for the data to be transferred from repositories that share the same standards than from two repositories with different standards (Boyd, 2021; Kindling et al., 2017; Pampel et al., 2013). The common standards used in RDRs indexed in re3data include; quality management, enhanced publication, versioning, persistent identifier (PID), and syndicates. The study conducted by Kindling et al. (2017) showed that FTP (File Transfer Protocol) was the most common type of the API (Application Programming Interface) standard, while SPARQL (SPARQL Protocol and RDF query language) was the least used API standard. This indicates the reliance on simple and straightforward API, which gives an opportunity for more expansion of RDR technologies.
Software used in establishing RDRs indexed in the re3data
Most of the RDRs make use of the available software platforms such as Dataverse, Comprehensive Knowledge Archive Network (CKAN) or Dspace (Archana and Padmakumar, 2023; Boyd, 2021; Kim and Choi, 2017; Kindling et al., 2017; Pampel et al., 2013). The mentioned studies have indicated Dataverse as the most popular software for establishing RDRs followed by Dspace, CKAN, MySQL, Eprint, Fedora and Nesstar. Scholars have explained Dataverse software as user-friend software, it offers excellent metadata management and citation, the software can easily be integrated with a range of authentication systems, and offers easy data citation (Boyd, 2021; Kindling et al., 2017). However, when working on large datasets the software might require extra optimisation efforts (Boyd, 2021). Also, the software may not be easily customised as compared to Dspace (Kindling et al., 2017). Dspace focuses more on the preservation of various types of digital content, including research data (Boyd, 2021; Tripathi et al., 2017). The software has a large Community, it can easily be customised, and it also offers strong access control and features for user management (Khan et al., 2023). CKAN on the other hand, is an excellent software in data discoverability, the software has been widely employed for open data initiatives (Dora and Kumar, 2015). CKAN possesses plugin ecosystem that allows the user to widen its functionalities (Dora and Kumar, 2015; Pampel et al., 2013). However, this software primarily focuses on data access and discovery, therefore, it cannot perform well in the management of research data as compared to Dspace and Dataverse (Boyd, 2021; Kindling et al., 2017). But the use of these software in RDRs differs based on a number of factors including the size of the community which have to be supported, the existing systems that have to be integrated, the required level of expertise, as well as the types of research data to be supported (Archana and Padmakumar, 2023; Boyd, 2021; Kim and Choi, 2017; Kindling et al., 2017; Pampel et al., 2013).
The study conducted by Boyd (2021) on understanding research data infrastructures found that most of the RDRs registered in re3data were made of unclassified software such as unknown software (1233), unspecified software (528), and other software (457). Moreover, the study revealed that for the classified software, most of the RDRs were made up of Dataverse (98), while Opus scored the lowest. On the same note, Khan et al. (2023) found that a large number of software were made up of unclassified software that is unknown (1207) and other (564). For the classified software, the findings also noted that RDRs made up of Dataverse software took the lead (119), followed by Dspace (110), MySQL (86), CKAN (85), and the research data made up of Opus were least (2). These findings were contrary to those of Kindling et al. (2017), who found that apart from the unclassified software, Dspace was the most used software, followed by the Dataverse and CKAN software, while eSciDoc was the lowest-used software. This implies that different types of RDRs have unique requirements that necessitate using particular software.
Diffusion of innovation theory
This study adopted the Diffusion of Innovation Theory (DoIT). The DoIT examines how innovations, new ideas, practices, and technologies are adopted by spreading through the community (Rogers, 2006). The DoIT is composed of the following major concepts; innovators, adopters, diffusion, communication channels, time, relative advantage, compatibility, complexity, trialability, observability, and adoption critical mass. The theory explains how the technology or innovations in this case research data and RDRs are adopted across different countries and spread. However, the spread of these innovations may not be smooth and may be affected by some other factors such as policies, technology, economics, cultural and institutional factors. In the context of this study, innovations are research data and research data repositories; adopters are different countries that decide to establish and use RDRs. The diffusion is the spread of adoption and use of research data and RDRs in the African landscape, compatibility refers to different standards used in RDRs while observability narrates the associated advantages of RDRs.
Methodology
This study employed the multistage method of data collection to achieve the objectives of this study. The two stages were used as follows:
Stage 1: selecting the data source
This study applied the quantitative research approach whereby a descriptive research design was used. The re3data and DCI were used as the data sources. Re3data is the prominent worldwide registry for the RDRs, while DCI is the indexing tool used for the discoverability of the datasets indexed in the Web of Science. Moreover, Re3data offers a broad view of RDRs across the region and disciplines, while DCI is a strong tool for bibliometric analysis, allowing scholars to measure datasets’ impact (Kindling et al., 2017; Onyancha, 2016). The re3data indexes repositories from almost all the countries around the world. Also, the DCI covers data worldwide by considering several factors such as stability of the repository, links to the research literature, age of the material, and language (Onyancha, 2016). The same method and data source were also applied by Archana and Padmakumar (2023), Khan et al. (2023), Boyd (2021), Kim and Choi (2017), Kindling et al. (2017), and Pampel et al. (2013) in their studies.
Stage 2: data extraction
Data from the re3data portal and DCI were examined and extracted based on the parameters established by this study. The data extraction exercise from re3data portal and DCI was done in February 2024. The used parameters were repository language interface, country-wise contributions, subject-wise contributions, software used, nature of data access, data access restrictions, standards adopted, and application programming interface (API) for the case of re3data. While year of publications, country wise contributions, and subject-wise contribution were the parameters that were adopted to extract data from DCI. From the DCI data were extracted from the year 2001 to the year 2023. The choice of the year 2001 was because most of the open science initiatives started in the early 2000s while the choice of 2023 was of the current relevance as the researchers wanted the data to be more current.
Data were extracted from DCI using 54 names of African countries. The country search tag in DCI was used in the advanced search, that is, CU = Country name (for instance, CU = South Africa). When the search tag, CU = X, where X is the country name, the query retrieved all data records related to that country name X. The search was limited to the years 2001 to 2023. Results were then analysed using the built-in functionalities into major subjects. In re3data, which was accessible through https://www.re3data.org browse and search facilities were used to access data. After navigating into the repository section using the registry web address, from the browse option located in the menu, “Browse by Country” was selected, which gave the list of countries with RDRs indexed in the registry. From the list, researchers clicked on the names of African countries one by one, whereby repository language interface, country-wise contributions, subject-wise contributions, software used, nature of data access, data access restrictions, standards adopted and API were recorded. The same method and parameters were also employed in the studies by Archana and Padmakumar (2023), Khan et al. (2023), Boyd (2021), Kim and Choi (2017), Kindling et al. (2017), Pampel et al. (2013) and Onyancha (2016). The extraction of data based on these parameters was done using the browse and search facilities, which are the built-in functionalities in the re3data. The extracted data were recorded into Google Sheets. Finally, the data were refined by removing the unwanted information, and then data were migrated to MS Word Excel. Data extracted from re3data and DCI were then analysed separately using descriptive analysis and integrated during the discussion.
Findings
This subsection presents findings of this study. The findings are organised according to the research questions that guided the study.
The African country-wise contributions data sphere
Tables 1 and 2 show the distribution of the research data sphere by country. Specifically, Table 1 shows the distribution of RDRs by countries sourced from re3data while Table 2 shows the contribution of research data sourced from DCI. The findings have shown that South Africa was leading with 17 RDRs, equivalent to 41.5% of 41 African RDRs indexed in re3data. The second position was occupied by Kenya 6 (14.6%), and the third position was taken by Burkina Faso 3 (7.3%). Other thirteen (13) nations had contributed between one and two repositories, as shown in Table 1. Out of sixteen (16) nations that contributed to the re3data by at least one RDR, eight countries were from West Africa, four from North Africa, three from Southern Africa, and only one from East Africa. Moreover, the Western African countries contributed a combined total of twelve (12) RDRs equivalent to 29.3% of the total African repositories, North African countries contributed a combined total of four (4) repositories equivalent to 9.8%, Southern African countries contributed a combined total of nineteen (19) RDRs, equivalent to 46.3%, while East Africa region contributed six (6) RDRs, equivalent to 14.6%. In terms of research data, the findings have shown that South Africa (8642) was leading followed by Egypt (1232) and Kenya (1181). Other countries which were in the top ten were; Tunisia (4th), Algeria (5th), Madagascar (6th), Cameroon (7th), Nigeria (8th), Morocco (9th), and Ethiopia (10th), other countries are shown in Table 2. Among the ten countries that were leading in terms of their contributions, two countries were from Southern Africa, two were from Eastern Africa, four were from the Northern region and two were from Western Africa. However, five countries were not listed in the DCI; these were Chad, Comoro, Sao Tome and Principe, Djibouti, and Guinea Bissau.
Country-wise contributions of RDRs in Re3data.
Country-wise contributions of research data in the DCI, 2001–2023.
Trends of coverage of research data
Table 3 presents the trends of coverage of research data in DCI from the year 2001 to the year 2023. The trend is shown worldwide and in Africa. Results have shown that the contribution of Africa to the total number of research data was an average of 0.25%. Results have shown further the coverage was 27 (0.06%) in 2001 and 3059 (0.29%) in the year 2023, and there was no steep ascent from one year to another as it has noted the rise and fall of coverage of research data across the years. For instance years 2005, 2008, 2009, and 2016, it was observed a higher percentage share of the research data by African countries covered in the DCI as compared to the rest of the years. Moreover years 2001, 2002, 2004, 2018 and 2019 were the years that recorded the lower percentage share of the African research data to the total world. More details are shown in Table 3.
Trends of coverage of research data in DCI, 2001–2023.
Types of African RDRs indexed in re3data
This study also inquired about the types of repositories established by the African countries indexed in the re3data. This study (Table 4) found that the most common type of RDRs in African countries was disciplinary RDRs 25 (53.2%) followed by institutional RDRs 18 (38.3%), while other RDRs occupied 4 (8.5%) research data repositories. Despite the large number of RDRs being disciplinary RDRs, in South Africa, which is the leading country in terms of large number of RDRs indexed in re3data, the large number of RDRs were institutional (9) compared to disciplinary (8). Moreover, all the regions appeared to combine both institutional and disciplinary RDRs, except for the North Africa region, where there were no any institutional RDRs. Table 4 shows the details.
Research data repository type in Re3data.
Note: It is not mandatory for the RDRs to be exclusively in one type of RDR.
The variability in research data access and upload policy for the African RDRs indexed in re3data
The study (Figure 1) found that most of the RDRs that were indexed in re3data were open 35 (49.3%) for anyone to access them, 23 (32.4%) were restricted, 7 (9.9%) closed for access to external members, while 6 (8.5%) RDRs were embargoed. Figure 1 shows further that for the case of RDRs database access, 39 (95.1%) of RDRs were open while 2 (4.9%) had closed access. More details are shown in Figure 1.

Nature of access to RDR databases and datasets in Re3data.
Moreover, for those RDRs that appeared to be restricted, the findings revealed that most of them required registration 16 (76.2%), which was followed by other restrictions 10 (32.2%), those which required institutional membership 3 (14.3%) while those which required fees were 2 (9.5%). On the other hand, RDRs restricted databases required registration for the external user to access them. This study has also found that most of the RDRs indexed in re3data restrict external users from uploading the datasets themselves 29 (72.5%), 9 (22.5%) are closed for an external user to upload research data themselves, while 2 (5%) repositories are open for an external user to upload research data themselves. Findings have shown further that the two repositories that allowed researchers to upload their RDRs were from Kenya and Egypt. Please see Table 5 for more details.
RDRs nature of access to data upload in Re3data.
Note:
Moreover, results (Table 6) have also shown that among the restricted repositories, 11 (31.4%) required registration while 17 (48.6%) required institutional membership for a person to be able to upload a data set.
RDRs data upload restriction level in Re3data.
Institutional responsibility type in Re3data
Findings (Figure 2) have shown that most of the RDRs were supported by general responsibility type 38 (45.7%), technical support responsibility type 23 (27.7%), funding support 21 (25.3%) while sponsoring support was 1 (1.2%)

Institutional responsibility type in Re3data.
Subject and content representation of African RDRs in the re3data
This study has found that subjects related to life sciences 26 (33.8%) were mostly covered by RDRs indexed in re3data, followed by humanities and social sciences 23 (29.9%), natural sciences 20(25.8%) while engineering was least covered 8 (10.4%). For the case of DCI, natural sciences were found to be the most covered (48.5%), and engineering was the least covered (2.3%), as shown in Table 7.
Major subjects in DCI
The study has also found that the re3data-indexed RDRs mostly had contents related to scientific and statistical data (34), followed by standard office documents (31), structured formats (22), and images (19). Other details are shown in Figure 3.

RDRs indexed in Re3data content types.
Standards adopted by RDRs in African countries
The findings indicated that RDRS, indexed in re3data, uses various standards. The standards used are indicated in Figures 4, 5, and 6. The findings have shown that 22 RDRs have quality management, 17 RDRs it is unknown whether they have or not. Moreover, 16 RDRs had enhanced publications, 22 were unknown whether they had enhanced publications or not, and 1 RDR was not indicated to have enhanced publications. On the same note, 12 RDRs indicated to have versioning, 5 RDRs appeared to lack versioning, where there were no any RDRs with unknown versioning. Moreover, findings have shown further that most of the RDRs (17) use digital object identifier (DOI) as the persistent identifier, 5 used handle (HDL), where 13 RDRs did not use any persistent identifier. On top of that, regarding syndicates, 2 RDRs appeared to use atoms, whereas 10 appeared to use really simple syndication (RSS).

Quality management, enhanced publication, and versioning standards of the RDRs indexed in Re3data.

Persistent identifiers used by RDRs indexed in Re3data.

Syndicates used by African RDRs indexed in the Re3data.
Software used in the development of RDRs in Africa
Results in Figure 7 show that the most used software for establishing RDRs was Dataverse (5), followed by Dspace (3) and CKAN (2). However, findings have also shown the presence of unknown software and other software that were used to establish RDRs in re3data. Further results are shown in Figure 7.

Software used in the development of RDRs in Africa from Re3data.
Application programming interface
This study has shown that the most used API was REST (9), followed by SWORD (4), OAI-PMH (3), and others were 1. For further reference, see Figure 8.

Application programming interface in Re3data.
Discussion
South Africa which appeared to have a higher percentage share of both research data and RDRs compared to other African countries, indicates its significant contributions to research data management practices in the continent (Onyancha, 2016). This can be attributed to several reasons including the presence of advanced infrastructural ecosystems, strong higher education systems, international collaborations, and the government's commitment to research data (Sambo, 2022). Notably, Kenya and Burkina Faso were also found to actively participate in research data practices, and they were ranked second and third respectively in re3data. Based on their economic power, technical infrastructures, and financial resources dedicated to research and innovations, countries like Egypt, Nigeria, and Algeria were expected to do even better, unfortunately, this study found otherwise.
Therefore, the findings indicate the facts that besides the economic power of the country, in order for the research data sphere to be fully adopted, other factors like legal and regulatory frameworks, human expertise, political and researchers willingness, and nature and extent of collaborations may be required to be in place (Ng’eno and Mutula, 2018). South Africa was also leading in terms of datasets and data studies, but this time was followed by Egypt and then Kenya. Burkina Faso which was ranked 3rd in re3data, was ranked 18th in DCI. This finding may explain why Burkina Faso might have succeeded in establishing RDRs and lagged behind in research data sharing. This fact might be explained by the situation that a significant portion of research data from Burkina Faso was not indexed in the DCI, and probably they were accessible through other platforms for various reasons such as language, disciplinary focus, and others. Moreover, the DCI recognised some more 33 countries that were not listed as the contributors on the re3data. This might be because re3data indexes repositories established in that particular country, while DCI indexes datasets which are not necessarily to be archived in their country's RDRs (Onyancha, 2016; Pampel et al., 2013).
Moreover, the Western African region which was seen to be contributing a smaller percentage of RDRs compared to the Southern African region, showed a higher number of countries engaged in RDRs registered in re3data. This explains a high level of research collaboration among these Western African countries, which is important for research data management practices that rely on shared human expertise and infrastructural ecosystems (Buhomoli and Muneja, 2023a). From the perspectives of DoIT, South Africa with its numerous RDRs, can be termed as an innovator and early adopter advantaging from the established legal and regulatory frameworks, supportive infrastructures, and human expertise. Besides that, Kenya, Egypt, and Burkina Faso can be labelled as early majorities, countries with fewer repositories may be late adopters, while those countries that are not listed in either re3data or DCI may be termed as laggards (Boyd, 2021; Rodgers, 2003). From these findings, it can be surmised that the more developed countries in Africa tend to be earlier adopters, while less developed ones tend to lag behind. The study has also noted the increase of datasets from the year 2001 to the year 2023, though there were ups and downs between years. Initially (2002, 2003, 2004) there were an increase in research data practices, which led it to gain some trajectory. However, depending on the presence of various factors such as changes in funders’ priorities, data breaches, shifts in research paradigms and others, the practices observed the fluctuations. A trend like this was also observed by Onyancha (2016)
The presence of both disciplinary and institutional RDRs asset a versatile approach that allows researchers to engage in a wide range of research data management practices (Kindling et al., 2017). This is one of the diffusion of innovation (RDRs) which reflects a dynamic landscape where researchers have access to multiple options for sharing their data (Khan et al., 2023; Menzli et al., 2022). However, the higher incidence of disciplinary RDRs compared to institutional RDRs points out that some countries place more weight on supporting research data sharing within specified domains of the academic disciplines. This approach can be particularly beneficial for researchers working in specialised fields that will make them be provided with tailored support and other resources aligning with the principles of the DoIT. However, on the other hand, this approach may limit interdisciplinary collaborations (Ng’eno and Mutula, 2018). This diffusion of innovation in specialised RDRs can enhance the visibility of datasets, as in most cases, researchers prefer accessing disciplinary (specialised) repositories over multidisciplinary ones. This preference conforms with the emphasis given by the theory on relative advantage, where specialised RDRs offer unique benefits to the adopters (Kim and Choi, 2017). While disciplinary repositories offer several advantages, it is essential for institutions to strike a balance between both types of repositories to promote all forms of research and research collaborations within the country (Boyd, 2021; Khan et al., 2023). The DoIT suggests that this balance is essential to accommodate the varying preferences and needs of adopter categories, from innovators seeking specialised resources to late adopters who may benefit from a more general diffusion strategy (Archana and Padmakumar, 2023; Rodgers, 2003). Moreover, the coexistence of institutional and disciplinary RDRs in the African research data landscape represents a dynamic diffusion of innovation where countries adapt their repository strategies to meet the evolving needs of researchers (Rodgers, 2003; Tedersoo et al., 2021).
The higher number of institutions assuming the role of providing general support to RDRs aligns with the principles of the DoIT. This diffusion pattern suggests that RDRs benefit from a diverse array of support mechanisms offered by institutions (Elsayed and Saleh, 2018; Menzli et al., 2022). These encompass infrastructural, technical, financial, administrative, and other types of support, collectively driving the adoption and practices of research data sharing (Buhomoli and Muneja, 2023a). Moreover, the general responsibility type implies a wide spectrum of skills and knowledge that RDRs receive, ranging from expertise in ICTs, management, library sciences, funding, to the insights of researchers and various stakeholders (Rogers, 2006; Staunton et al., 2021). Conversely, the technical responsibility type ranking second implies that, given the newness of RDRs in most African countries, they may require technical expertise to facilitate their incubation and growth (Akmon, 2014). This technical support, as per the DoIT, may come from entities with higher expertise. Such support can encompass RDR hosting, data storage, software development, networking infrastructure, and other technical aspects necessary for the effective operation and diffusion of RDRs (Akmon, 2014; Archana and Padmakumar, 2023).
The findings have also revealed a diverse mix of dataset subjects indexed in re3data. RDRs indexed in re3data cover a wide range of broad subjects, indicating a diffusion of innovation in data archiving practices across multiple domains. The higher number of RDRs related to life sciences suggests a strong presence of data sets in this field. Several factors may contribute to this phenomenon, including the size and enthusiasm of the life sciences research community, their early adoption and innovation in research data sharing, the presence of funders’ support, and their proactive publishing of data to RDRs (Gunjal and Gaitanou, 2017; Staunton et al., 2021). On the other hand, the limited representation of engineering subjects within RDRs indexed in re3data suggests room for growth in this field. Several factors may have contributed to this, such as a smaller number of scholars in engineering, less research activity, or a delayed adoption of RDR practices, potentially placing engineering in the category of late majority adopters (Pampel et al., 2013). Globally, life sciences subjects lead, followed by humanities and social sciences, natural sciences, and engineering, this trend is different from that found in African RDRs, where natural sciences was found to be ranked second. These variations can be attributed to multiple factors, including regional research priorities, the composition of research institutions, funding emphasis, collaboration dynamics, and RDR objectives (Obiora et al., 2021; Ramsay, 2022). These observations align with the DoIT, where different regions and countries may exhibit varying adopter categories based on their readiness to embrace new innovations. Furthermore, the primary content of these subjects is scientific and statistical data, demonstrating a wide range of data types available for researchers to deposit in various formats. However, the presence of data in formats requiring specialised software or expertise may pose challenges if repositories aim to transition to open research data repositories. The open data principle calls for datasets to be free from technical and legal restrictions (Buhomoli and Muneja, 2020). Thus, repositories considering a shift to open data may need to address the conversion of archived data into accessible and usable formats.
RDRs exhibit various access natures influenced by a range of factors. The higher dominance of RDRs open to external users signifies the commitment of African countries to open access and research data sharing. However, the substantial presence of RDRs with access restrictions which have also been noted by this study may stem from researchers’ concerns, including copyright issues, data privacy, data quality, sensitivity, and the cost of data collection (Aleixandre-Benavent et al., 2020; Katabalwa et al., 2021). RDRs falling under this category can be accessed by researchers after adhering to specific usage conditions. On the other hand, closed-access RDRs imply stricter access control, often limited to specific internal user groups with no public access (Khan et al., 2023). The presence of embargoed-access RDRs indicates that certain data are not immediately accessible, typically due to a waiting period before the data becomes public (Boyd, 2021). This delay aligns with the relative advantage principle, giving originators time to publish their research findings before sharing the associated data. These findings call attention to researchers and institutions to be aware of different research data access policies when establishing RDRs. The findings also suggest that researchers must navigate ethical and legal implications when dealing with restricted, embargoed, and closed research data, ensuring compliance with governing rules, procedures, and agreements. Additionally, the higher number of RDRs requiring user registration enhances the repository's ability to track usage, communicate effectively, and tailor services to users’ needs.
The prevalence of restrictions on research data upload in most RDRs reflects a widespread commitment to moderating and controlling the data submission processes. These controls serve the purpose of ensuring compliance, maintaining data integrity, and upholding quality standards within RDRs (Buhomoli and Muneja, 2023b; Mbughuni et al., 2022). Additionally, the existence of closed RDRs that prohibit external users from depositing data underscores the strict adherence of these repositories to their standards, content guidelines, and governance policies (Akmon, 2014; Mbughuni et al., 2023; Zuiderwijk et al., 2012). Conversely, the presence of open RDRs that allow data upload encourages community engagement, promoting a more inclusive approach (Akmon, 2014). However, these repositories must also exercise caution to ensure the quality and expertise of their users concerning dataset submissions, reflecting the relative advantage principle of the DoIT
Moreover, in this study, DSpace and Dataverse software appeared to be the most popular software used in African RDRs. This could be due to their user-friendly interfaces and comprehensive features tailored to research data sharing (Boyd, 2021; Khan et al., 2023). However, CKAN's relatively lower count raises questions about its suitability for RDRs in the African context. Khan et al. (2023) showed that the choice of software to be used in RDRs depends on the specific needs and priorities of the research data management practices at hand. Mozersky et al. (2021) narrated that DSpace is a robust choice for preservation and customisation, Dataverse excels in user-friendliness and metadata management, while CKAN is renowned for data discovery and open data initiatives. These findings give emphasis to the need for in-depth evaluations of CKAN's suitability in the African context despite being the renowned software for data discovery. The presence of a large number of “unknown” software instances indicates a potential lack of standardised software identification in RDRs (Khan et al., 2023). This is more likely to impede research data sharing and collaboration as it will be difficult to determine the software interoperability (Sun et al., 2020). When viewed through the lens of the DoIT, the findings have shown the diverse landscape of software usage for the RDRs adoption. Dataverse and DSpace are embraced by innovators and early adopters within the research data management community (Boyd, 2021; Zhang et al., 2015). CKAN suggests the emergence of an early majority (Boyd, 2021; Mozersky et al., 2021; Zhang et al., 2015). However, the presence of “unknown” software indicates the lack of awareness of which software to adopt for use, aligning with the characteristics of laggards (Kindling et al., 2017; Zhang et al., 2015).
The findings have also shown that the African RDRs adopted various APIs. REST stood out as the most widely adopted API. This reflects its popularity, simplicity, and versatility in facilitating the sharing of research data (Kim and Choi, 2017). The presence of a notable number of RDRs using OAI-PMH API, signifies that many RDRs continue to rely on OAI-PMH as the standardised protocol for metadata sharing and harvesting (Borghi and Van Gulick, 2019). OAI-PMH is one of the standardised protocols that is well-established and widely used (Borghi and Van Gulick, 2019; Kim and Choi, 2017). The presence of SWORD suggests its significance as a tool for content deposition and management in digital repositories (Pampel et al., 2013). The “ others “ category points out the diversity in API usage, potentially encompassing more specialised or emerging APIs tailored to specific RDRs needs. These findings signify that the digital library community values a mix of established and modern API technologies to enable effective data sharing, which is critical in an era where collaborative research and open access are of utmost importance. In the context of the DoIT, these findings illustrate that RDRs have diverse environments that are at different stages of API adoption. Innovators and early adopters gravitate towards innovative and emerging solutions like REST and SWORD, while the early majority often chooses established standards such as OAI-PMH (Al-Rahmi et al., 2019; Khan et al., 2023).
This study has also found various standards which have been adopted by the African RDRs. The higher number (22) of RDRs implementing quality management practices demonstrates a commitment to maintaining and providing high-quality content (Archana and Padmakumar, 2023). However, the significant number of the RDRs (17) repositories marked as “unknown” in the aspect of quality management indicates a potential lack of transparency or documented quality assurance procedures in those RDRs (Kim and Choi, 2017). More number of African RDRs adopting enhanced publications reflects a growing interest in providing enriched, multimedia-rich content (Khan et al., 2023). Versioning, a crucial element for tracking changes in digital resources, was found to be implemented by 22 RDRs, but 15 repositories lacked this feature. It is positive to see most of the RDRs using DOIs as their PID, which are widely recognised and provide a robust system for resource identification. However, the 5 repositories using HDL and the 13 repositories not using any PID require careful consideration to ensure interoperability and long-term access to their digital holdings (Paraskevas, Zarouchas, Angelopoulos & Perikos, 2013). In terms of syndication, RDRs using atoms and RSS reflect the diverse approaches to content distribution (Kindling et al., 2017). These findings emphasise the importance of standardisation, quality assurance, documentation, and metadata management in digital library repositories, as well as the need for clear communication of practices and the adoption of widely recognised standards to enhance the accessibility, usability, and preservation of digital resources.
Conclusion, recommendations, and area for further studies
This study, which was conducted within the framework of the DoIT, has illuminated various facets of the research data sphere in Africa. The coexistence of institutional and disciplinary repositories stresses the region's commitment to diverse data-sharing practices. Furthermore, the variation in access policies, ranging from open to restricted and embargoed, reflects the complex nature of data accessibility on the continent. The subjects covered by the African RDRs, display a diversity of disciplines, with life sciences leading the way. However, the study also reveals disparities that suggest opportunities for growth in fields like engineering. Additionally, upload restrictions show the importance of quality control and compliance within RDRs.
This study recommends the following;
For the RDM practices to excel, African countries should foster collaboration to increase their contributions of research data and RDRs in indexing registries by improving the facilitating conditions including the establishment of relevant legal and regulatory frameworks for the practices to be fully embraced; African countries should aim for a balanced development of institutional and disciplinary RDRs to cater for the diverse needs of researchers; There should be a balance between the openness, security, ethical, and compliance requirements surrounding research data and RDRs; There should be deliberate efforts aiming at increasing the representation of datasets and RDRs from disciplines with a low number of datasets in indexing registries; African countries should ensure with utmost determination that research data management standards are established, if possible, within the African context, and that they are followed to ensure the interoperability and quality of research data and RDRs; and The variability of software among the RDRs in African countries calls for more expertise.
The major limitation of this study is that the study relied on data harvested from the re3data and DCI; thus other research data indices may yield different results. Moreover, Re3data primarily categorises RDRs by the country where they are hosted or registered. Unfortunately, it does not provide a specific feature to indicate if an RDR is managed or operated by multiple countries. Future research in this domain should focus on in-depth analyses of the factors influencing data access,data sharing policies and comparisons with RDRs in other continents. The major contribution of this study lies in its comprehensive examination of the RDRs in Africa within the framework of the DoIT, thus enlightening the RDM stakeholders on the diverse practices and access policies within RDRs, thereby informing future policy development and fostering a sense of community among researchers and institutions engaged in research data sharing in the region. Additionally, it contributes to the body of knowledge by giving valuable insights into the evolving landscape of the research data sphere in Africa, thus building a foundation for further research and strategic decision-making to advance data-sharing practices on the continent.
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
About the authors
![]()
