Abstract
Summary of the key findings
• The application of big data analytics in the healthcare industry, particularly in developing countries, faces various challenges, such as a lack of evidence for its practical benefits, inadequate skilled personnel, a lack of appropriate policy and regulatory frameworks, and the required substantial financial investment. • The available literature focuses only on the qualitative benefits of applying big data analytics to the healthcare industry. Future studies should quantitatively assess the practical benefits and analyze the implementation costs of integrating big data analytics technologies in the healthcare sector, particularly in developing countries with competing priorities and limited resources. • By leveraging the potential benefits of digitization of healthcare data, the healthcare industry in developing countries can harness the power of big data analytics to improve the quality of care and health outcomes.
Introduction
“Big data” refers to data that can neither be managed with traditional software or hardware nor easily handled with traditional data management tools and techniques due to their high complexity and heterogeneity.1,2,3 Big data has been classically defined based on its volume, variety, velocity, veracity, and value.4,5,6,7 Volume, as applied to big data, refers to the quantity of data measured by petabyte, exabyte, zettabyte, or yottabyte.2,6,8,9 The variety implies that big data occur in different types and forms and have heterogeneous characteristics.10,11 The speed at which data is generated is known as the velocity, encompassing static and real-time, dynamic data.2,11,12 Veracity, as applied to big data, refers to the completeness, consistency, accuracy, reliability, credibility, and certainty of information. 13 It is usually influenced by fraud, ambiguities, and missing values. 14 The veracity as applied to big data is determined by how accurately and consistently data is collected, and this is of the most significant concern because informed decisions depend on high-quality and accurate data.2,14 The value of data refers to how valuable the insights extracted from the data are in informing decisions and policymakers. 1 These characteristics make the traditional management and utilization of healthcare data overwhelming,14,15 not only because of its vast volume 16 but also because of its complexity, heterogeneity, and the rate at which they are generated.15,17
The field of big data analytics focuses on gaining an in-depth understanding of a massive volume of data by applying algorithms for analyzing and extracting valuable information from complex datasets.13,18,19 With the swift transition to digital technologies across various sectors, big data analytics has gained attention in recent years due to its wide range of applications. In the healthcare context, where data are highly complex and heterogeneous,2,13,20 applying big data analytics tools and techniques ensures the conversion of raw data into actionable information by discovering hidden patterns, trends, correlations, and associations within these complex datasets.2,21,22 This eventually leads to developing specific, insightful, data-driven interventions to improve overall health outcomes. 23
In developed countries, big data analytics technologies have been widely used to assist clinical decision support in targeted drug therapy, medical signalling analytics, disease surveillance, and bioinformatics.1,10,12,13,24 On the contrary, big data analytics in healthcare settings in developing countries has yet to be integrated, leaving a significant percentage of healthcare data unstructured and underutilized.13,25 Exploring the opportunities and challenges of applying big data analytics in the healthcare industry in developing countries is crucial for developing effective strategies for implementing and counteracting related challenges.
This article reviews the concept of big data in healthcare and its sources. It points out potential applications of big data analytics technologies in the context of developing countries in terms of challenges and opportunities. It further identifies what can be implemented to address the existing challenges.
Methodology
This was a narrative review study design. A comprehensive literature search was conducted in different databases, including PubMed, ScienceDirect, MEDLINE, Scopus, and Google Scholar. Relevant references cited in recent articles were also reviewed. The literature review included all articles in full text and written in English. Key findings were summarised and discussed.
A search strategy with the search strings “Big Data,” “Bioinformatics,” “Data Analytics,” “Data Mining,” “Machine Learning,” “Artificial Intelligence,” “Healthcare,” “Health Sector,” “Medical,” “Hospital,” “Health Information Systems,” “Developing Countries,” “Low—and Middle-Income Countries,” “LMIC,” “Emerging Economies,” and “Third World Countries” were performed.
A comprehensive search across mentioned databases was done using predefined search strings. The selected studies were critically reviewed to identify recurring topics related to the integration of big data analytics in healthcare, particularly in developing countries. These recurring topics were grouped into key thematic areas, such as sources of healthcare big data, potential applications of big data analytics, challenges of application of big data in the healthcare sector in developing countries, and mitigation measures for the application of big data analytics in developing countries. Themes were refined based on their repetitiveness and their relevance to the research objectives, ensuring a comprehensive representation of the literature’s findings.
Findings
Summary of the key findings from the review of the literature.
The sources of healthcare data
As one of the world’s largest and fastest-growing industries, the healthcare sector generates massive, complex and heterogeneous data.2,20 The generation of healthcare data is driven by record keeping, compliance with regulatory requirements, and the need to improve the quality of care. 20 These data are generated from different sources and occur in different structures and formats. 26 They include demographic characteristics, medical images, clinical notes, administrative databases, clinical databases, electronic health records, and laboratory information systems.27,28,29 Others include biometric data from wearable and smart devices, data from social media, biomarkers, and genetic data.30,31,32 Healthcare data occur in different types, from unstructured and semi-structured to structured datasets, which may also be discrete or continuous. 1 While these data are primarily stored in hard copies, especially in developing countries, there has been a recent digitization trend of healthcare data through electronic medical records and health information management systems.2,13 As the quest for evidence-based medicine continues, these vast quantities of complex data have the potential to improve the quality of care at a lower cost 22 through their wide range of insightful applications, such as clinical decision support, disease surveillance, and population health management.8,10,19,33,34,35,36
Potential applications of big data analytics in the healthcare sector in developing countries
Medical image processing and analysis
Medical Image Processing and Imaging Informatics have revolutionized the aspects of disease screening and diagnosis. 12 The introduction of imaging modalities such as Ultrasound, X-rays, Computed Tomography Scanning (CT-scan), Magnetic Resonance Imaging (MRI), and Positron Emission Tomography (PET scan) have by far improved the accuracy of diagnostic services in developing countries. These imaging modalities are widely used for screening and diagnosis of cancers,37,38,39 cardiovascular diseases,29,40 neurological conditions,41,42 and some prevailing infectious diseases such as pulmonary tuberculosis.43,44 The domain of medical imaging informatics also encompasses digitized cytopathological and histopathological images, which are used to detect different types of cancers. With the ongoing digitization of the healthcare sector in developing countries, medical image processing and imaging informatics can create a huge repository of big data, which can be used to train algorithms for computer-assisted image analysis and interpretation.2,12,19,44,45,46,47,48 This will result in the early detection of diseases and improved efficiency and accuracy, improving the treatment and survival outcomes. For example, an algorithm developed from massive datasets of chest radiographs could accurately detect pulmonary tuberculosis in India, Nepal, and Cameroon.43,44 Although these breakthroughs are yet to be fully integrated into clinical practice, they have proved to be potentially valuable for increasing diagnostic accuracy and efficiency, enabling early detection of diseases in resource-limited settings.
Population health management and disease surveillance
Traditionally, disease surveillance and population health management are achieved by collecting epidemiological data to study how frequently diseases occur in a particular age group and geographical area. 12 This information is used to formulate and evaluate disease prevention interventions and provide guidelines for managing patients.12,49 The presence of laboratory-based detection tools has drastically reduced the time to accurate diagnosis in limited-resource and crisis settings, as observed during the 2014 West African Ebola virus outbreak and Zika virus epidemic.50,51,52,53 Furthermore, the availability of rapid immunologically-based diagnostic tests has ensured the near-real-time detection of emerging viral threats. 54 In addition to the increasing sophistication of these conventional disease surveillance systems, surveillance systems have rapidly developed using big data streams.55,56 These novel surveillance systems can incorporate big data streams from Internet search queries, social media data, and crowdsourcing to provide useful information about public health threats at a particular time and locale, a discipline known as public health informatics.12,55 The big streams of public or social media information can be used to predict, monitor, and diagnose different disease conditions.51,57 For example, Young et al., 2014 57 obtained a positive correlation between HIV-related tweets and HIV cases after extracting over 9800 keywords and geographical annotations that contain HIV risk words from 553,186,016 tweets from Twitter.
In another study, Nambisan et al., 2015 58 used big data analytics tools to extract hidden insightful patterns from messages and tweets on social media to detect depression. They concluded that the behavioural and emotional patterns in messages depicted symptoms of depression. During the COVID-19 pandemic in December 2020, the UNDP accelerator laboratory extracted over 20,422 related tweets in four consecutive days, which were used to determine the level of misinformation about the pandemic in one of the districts in Tanzania. 59 Although the study involved one district, it provided theoretical evidence on using big data analytics techniques to extract valuable information from social media data. In the order of these ideas, and with the increase in internet use, big data analytics in population health management seems essential in real-time disease surveillance and understanding real-time situational awareness about public health.33,55
Clinical informatics
Unstructured clinical information such as clinical documents, laboratory investigation results, medical images, and patient discharge summaries are the major sources of healthcare data, accounting for over 80% of data concerning patient health and wellbeing.12,60 Despite the rapid digitization of these data, a large portion still needs to be more structured and utilized. 25 The unstructured format renders it ineffective in informing data-driven decisions. The unstructured data must be converted into structured datasets to extract valuable insights and discover trends and patterns within the data.12,61 Big data analytics tools can be effectively deployed to organize patients’ clinical data, laboratory tests, and reports into structured and computerized forms to increase data retrieval and extraction and inform clinical decision-making.60,61,62
Clinical informatics is, however, not implemented in developing countries, partly due to limited resources. If fully implemented, clinical informatics can transform the healthcare sector by ensuring maximum utilization of healthcare data through the extraction of trends, patterns, associations, and correlations within various datasets and, therefore, provide actionable information to inform evidence-based clinical practice and policy making.
Bioinformatics
The concept of precision medicine has been gaining attention in recent years. 63 This entails tailoring preventive and curative interventions to a small group of people or even individual patients based on their genetic makeup, environmental exposure, and lifestyle.63,64,65 Bioinformatics is becoming increasingly valuable due to its vitality in achieving precision medicine. Bioinformatics is an interdisciplinary area involving molecular biology, computer science, mathematics, and statistics, focusing on the computational analysis of biological datasets.63,66 Bioinformatics has been widely used to understand disease processes at a molecular level, enabling disease risk stratification and early detection, developing individualized treatment strategies, and predicting treatment outcomes and patient response to therapy.66,67,68
The field of bioinformatics involves the analysis of a huge volume of data, which has been growing dramatically. 68 For example, a single sequenced human genome is approximately 200 gigabytes. 69 Although the availability of this enormous volume of data is essential for accurate analysis, it also poses significant analytical challenges, especially in developing countries where big data analytics and cloud computing tools have not been effectively integrated.66,68 Furthermore, the heterogeneous nature and the ethnic and geographical variations of these data increase the complexity of their analysis. 68 Applying big data analytics in organizing and analyzing biological data will ensure the maximum realization of the power of bioinformatics in disease risk stratification, targeted therapy, disease prevention, treatment, and cure.
Challenges to implementation of healthcare big data analytics in developing countries
Despite the potential benefits and opportunities for big data analytics in the healthcare sector, developing countries still lag in adopting big data analytics in the healthcare sector. 13 With the substantial available repository, it becomes difficult to understand which data will be used and for what purpose. 70 The healthcare sector also needs more appropriate infrastructure, such as storage facilities and a reliable internet connection, which is essential for the management and analysis of big streams of data.19,71,72 Despite the ongoing digitization of healthcare data, most patient-related data are paper-based, posing challenges of inconsistency, validity, incompleteness, and unreliability, rendering them incompatible with big data analytics tools.6,73 Therefore, healthcare systems must be redesigned using distributed data processing.74,75
The resistance to incorporating digitized healthcare systems15,76 and the need for massive investment 77 make it more challenging to utilize big data technologies in developing countries with existing competing priorities.27,78,79 Studies have further shown that due to a lack of technical knowledge about the best algorithms to be used 70 and the unavailability of trained health data scientists and big data managers,22,28,72,79 most developing countries are far from realizing the potential benefits of big data analytics in the healthcare sector. Another big concern that has been raised regarding the application of big data analytics in the healthcare sector is the processing of information without human supervision, which can lead to erroneous conclusions and false-positive associations.80,81,82 Therefore, it becomes imperative to develop simple, convenient, and transparent big data analytics algorithms that can be utilized for real-time cases. 2 Integration and interoperability challenges are also among the setbacks for applying big data analytics in healthcare industries.3,73,75 This might be because healthcare data are fragmented,70,83 arise from different sources, 28 and occur in different forms and structures. 26 Lastly, data security, patient privacy, and confidentiality are paramount. Integrating big data analytics in the healthcare sector may involve sharing patients’ data between stakeholders and facilities.76,81 With the lack of data protocols and regulatory frameworks, informed consent and privacy have become critical areas of concern.25,78
Discussion
This review provides a comprehensive narrative of the opportunities and challenges of adopting big data analytics in healthcare, specifically in developing countries. It bridges a critical knowledge gap by synthesizing existing literature and highlighting potential applications and obstacles. It further identifies unique issues such as data fragmentation, regulatory challenges, and limited infrastructure, particularly prevalent in developing countries, while proposing solutions for healthcare transformation through big data. These insights are crucial for policymakers, healthcare practitioners, and researchers advancing digital health in resource-limited settings.
The concept of big data encompasses various definitions, ranging from data that is challenging to handle using conventional analytical tools and methods to describing it as having an immense volume, high speed, diverse variety, and varying accuracy.1,2,3,11 While most studies defined big data based on the mentioned characteristics, one study characterized big healthcare data as having energy and lifespan. 3 The lack of consensus definition of “big data” indicates that despite the term “big data” being widely known, its objective definition is still unclear. This may affect the implementation of these technologies since it has not been objectively defined as what data is really “big data”. Furthermore, defining “big data” based on the fact that it cannot be analyzed by traditional data analytics software is somewhat subjective because the sophistication and capacity of this “traditional” software vary according to the level of technological advancement in a particular region. Moreover, it has not been objectively stated as to what software is considered “traditional”. These controversies must be addressed to facilitate the adoption of big data analytics technologies in developing countries.
It has also been demonstrated that big data analytics techniques are distinct from traditional data analysis approaches. While conventional analysis typically utilizes deductive reasoning to monitor care and quality outcomes retrospectively, big data analytics relies on inductive reasoning for prospective data analysis. 77 This approach generates hypotheses rather than testing by identifying associations and correlations within observational data without focusing on the causal relationships between variables. 13 Nonetheless, validating these hypotheses through hypothesis testing is crucial before implementing the findings into clinical practice. The implementation of big data analytics in the healthcare sector should, therefore, focus on testing and validating hypotheses rather than just generating them.
Literature on applying big data analytics in the healthcare industry revealed that big data analytics technologies could be helpful in clinical decision support, personalized medicine, and disease surveillance. Big data analytics techniques can be integrated into different healthcare domains, including medical image analysis and imaging informatics,38,39,40,41,42,43,44,45 population health management,55,56,57,58,59 clinical informatics,60,62 and bioinformatics.66,68,71 Integrating big data technologies in healthcare can enhance the quality of care and identify high-risk patients early through real-time analytics, optimising clinical operations and saving lives at a lower cost.1,13,84 Studies on applying big data analytics technologies in cardiovascular diseases, diabetes, oncology, infectious diseases, mental health, and clinical research have shown that they can improve timely care delivery and reduce cost.22,52,55 Few studies, however, suggested that it is essential to incorporate human judgment and supervision in interpreting insights obtained through big data analytics algorithms.80,81,82 This is necessary to mitigate the potential occurrence of adverse events that may arise from relying solely on the results generated by big data analytics algorithms. Despite the qualitative description of the application of big data analytics technologies in healthcare in developing countries, none of the literature reviewed here presented a quantitative assessment of the practical impact, cost implications, and operational efficiency. We believe that assessing the impact and cost analysis for integrating big data technology in healthcare through large-scale implementation studies would provide strong evidence for large-scale adoption. Therefore, future research should focus on assessing the quantitative impacts of big data analytics technologies in the healthcare sector and their cost implications. This will better inform decision-makers on integrating big data technologies in the healthcare sector, especially in developing countries with limited resources.
Despite the significant value that big data analytical tools can add to the healthcare industry in developing countries, 85 the adoption of this technology has been limited due to various challenges. These include inadequate IT infrastructure, high implementation costs, data privacy and security concerns, fragmented data ownership, lack of sustainable and reliable internet connection, and technical challenges such as data quality and multidimensionality.19,79 Additionally, the lack of evidence regarding the practical benefits of big data analytics in healthcare, the shortage of skilled big data analysts with healthcare knowledge, and the complexity of the analytical systems have also contributed to the limited integration of this technology.22,28 To fully realize its potential, strategies such as proper governance and regulatory frameworks, promoting health information exchange, training key health personnel, developing simple and transparent big data systems, utilizing cloud storage and distributed data processing, and strengthening IT security are some of the measures that have been proposed to harness the potential of big data analytics technologies in the healthcare sector in developing countries.19,76,83,86,87
Recommendations
Several challenges exist when applying big data analytics in the healthcare sector, particularly for developing countries. The following measures might be useful in addressing such challenges when applicable; • Provision of necessary training in big data analytics to health professionals: To extract meaningful insights and valuable information from large sets of healthcare data, health professionals should be trained with big data analytics competencies.
13
This training is critical because incorrect analysis and interpretation could lead to unforeseen outcomes.
19
This can be achieved by introducing optional modules within the academic curriculums, introducing short courses on the application of big data analytics in healthcare for health professionals, and increasing the awareness and engagement of health professionals in understanding the potential applications of big data analytics in the healthcare sector, especially in developing countries. • Implementing big data governance as well as policy and regulatory frameworks: Inadequate management of healthcare data and a lack of regulatory and policy frameworks for big data analytics result in significant expenses related to big data infrastructure.
35
Implementing appropriate governance will lead to effective utilization of healthcare data, provided that proper policy and regulatory frameworks are in place.76,79 • Creating a data-sharing culture: Developing an information-sharing culture between stakeholders and facilities in developing countries is crucial in addressing the interoperability challenges and maximizing the predictive and prescriptive potentials of big data analytics in the healthcare industry.26,83 By sharing consolidated healthcare data, stakeholders can effectively utilize these resources to improve health outcomes. • Implementing data security measures: To address the concern of data security, measures such as robust data encryption, source data verification, data access control, authentication, de-anonymization, and de-identification can be effectively used to ensure data security and confidentiality.86,87 These measures are crucial in protecting sensitive information from unauthorized access. • Incorporating Cloud Computing in Healthcare Data Storage: To address the enormous volume of healthcare data, healthcare organizations in developing countries can incorporate cloud computing in their data storage and management strategy.
13
This would allow even small healthcare organizations to overcome storage challenges and reduce data storage-related costs.
19
Integration of cloud computing in data storage and management will enable healthcare organizations in developing countries to store and manage large volumes of healthcare data efficiently.
Limitations
The exclusion of non-English literature from the data synthesis of this review might have limited the extent of potential information included in this study, hence its discussion, conclusion, and recommendations. Moreover, the study might be limited by the inconsistent availability and data quality across different developing countries. Data fragmentation and lack of standardization may have led to incomplete or inaccurate data synthesis. Furthermore, this study’s findings and conclusion might only be generalized to some developing countries due to possible differences in healthcare infrastructures, policies, and socio-economic conditions.
Conclusion
Integrating big data analytics in developing countries’ healthcare industry can open new avenues for enhancing healthcare delivery. By leveraging the potential benefits of digitization of healthcare data, the healthcare industry in developing countries can harness the power of big data analytics to improve quality and outcomes while reducing costs. Big data analytics’ predictive nature and pattern recognition capabilities can revolutionize the healthcare sector’s quest for transformation from an experience-based approach to evidence-based practice. Despite the benefits big data analytics can have in the healthcare industry of developing countries, highlighted strategies should be implemented to address a number of challenges these countries encounter towards using this technology.
Moreover, political will among developing countries is essential to ensure the sustainability and effectiveness of big data analytics technology in developing countries. Furthermore, since most of the currently available literature focuses more on the qualitative benefits of the application of big data analytics in the healthcare industry in developing countries, this review further emphasizes the need for quantitatively assessing the practical impacts and implementation cost of integrating big data technology in the healthcare sector, where resources are limited, and priorities are competing.
Footnotes
Acknowledgements
We acknowledge Professor Muhammad Bakari for reviewing the first draft of the manuscript and provide constructive comments.
Author contributions
Project Supervision and Administration: Harold L. Mashauri. Study conceptualization and design: David Muhunzi, Lucy Kitambala and Harold L. Mashauri. Acquisition of data: David Muhunzi, Harold L. Mashauri and Lucy Kitambala. Analysis and interpretation of findings: David Muhunzi. Drafting of the manuscript: David Muhunzi. Critical review: Lucy Kitambala and Harold L. Mashauri. Approval of manuscript: David Muhunzi, Harold L. Mashauri and Lucy Kitambala.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration of generative AI and AI-assisted technologies in the writing process
No generative AI or AI-assisted technology was used in the preparation of this manuscript.
Data availability statement
Data sharing does not apply to this article as no datasets were generated or analyzed during the current study.
