Abstract
Objective
The correlation between health data quality and disease management is of utmost importance. Optimal data quality enhances decision making, minimizes medical errors, and boosts efficiency in disease management. Hence, current research is focused on mapping of co-occurrences and the relationship between health data quality and disease management.
Methods
The present study employed a scientometric approach known as co-occurrence analysis to analyze the data extracted from the Web of Science. Our research community aimed to investigate the association between health data quality and disease management, encompassing all documents produced in this field from 1991 to September 2023. The data was meticulously scrutinized and recorded on 1 October 2023. To generate graphs and tables, we utilized Excel 2019 software utilized, while VOSviewer scientometric software was employed to create co-occurrence maps.
Results
The top three countries with the highest scientific production in this field were the United States, England, and Australia. The study also uncovered three clusters centered on data quality, management, and public health. Co-occurrence maps drawn from the study showed that accuracy, coverage, and completeness of health data are related to the provision of healthcare and therapeutic interventions, as well as treatment outcomes. Furthermore, management was found to be associated with prevention, epidemic, prevalence, disease, and mortality rate. The study also revealed that data quality is linked to evaluation, validity, reliability, public health, and quality of life.
Conclusion
It is clear from this research that accuracy, coverage, and completeness are crucial characteristics of data quality. These factors play a key role in the management of prevention, epidemics, and disease outbreaks, as well as and mortality rates. Evaluating data quality is also important as it can have a positive impact on the quality of life, public health, and health information technologies. This study has the potential to be very useful in improving the quality of health data, the clinical informationist performances, and in turn, promoting more effective disease management.
Introduction
The enhancement of service delivery and the formulation of strategies necessitate backing dependable health data, generating a coherent body of evidence regarding health conditions. To maintain the data's trustworthiness, the data quality management process must be implemented. 1 One of the main factors that negatively affect health decision making is the low quality, unavailability, and lack of integrity of health data. These factors lead to data quality issues, which, in turn, cause information loss. Despite the large volume of data, it remains decentralized when it should be centralized to aid the decision-making process.2,3
Data plays a critical role in various domains and sectors such as healthcare, finance, social media, transportation, scientific research, and e-commerce. Each type of data has its own challenges and requirements in terms of quality, standardization, and privacy. 4
In the healthcare sector, data heterogeneity is related to the diverse and complex information produced by health services and research. Medical language is highly heterogeneous and constantly evolving, making it sometimes ambiguous. Process automation generates a huge amount of data, and new technologies continue to emerge, adding to the already complex nature of health data. 5
There exist diverse perspectives on the characteristics of high quality data. According to the World Health Organization, quality data must reflect the information provided by the primary source and incorporate accuracy and validity, reliability, completeness, readability, timeliness and punctuality, accessibility, usefulness, confidentiality, and security. 6 The quality of data can be undermined at various stages, including collection, coding, or nonstandardization of terminologies by clinical informationists.7–10 Technical, organizational, behavioral, and environmental factors can impact the quality of data. 11
Health information systems are important for improving healthcare systems in countries, especially those with low or middle incomes. However, despite the attention given to these systems, it is believed that their data is of low quality and does not contribute much to decision-making processes.7,12,13
A disease management approach to patient care involves the coordination of resources across the healthcare system. This includes patient education, the use of practice guidelines by healthcare providers, appropriate counseling, and the provision of medications and ancillary services. All of these components work together to form an effective disease. 14 The integration of information into care can yield numerous advantages for both healthcare providers and consumers. Facilitating the exchange of more precise and timely information, it can enhance operational efficiency, mitigate redundant tasks, and augment the process of decision making. The comprehensive sharing of precise and accurate information, such as during the transition of care, is imperative to ensure the uninterrupted and secure provision of patient care across primary and acute healthcare services. Governments have also acknowledged the significance of the effective utilization of clinical information systems and electronic decision support tools to gather, disseminate, and utilize information to guide ongoing health reform, develop policies, and create strategic work plans that aim to implement safe, efficient, and coordinated care throughout the entirety of the patient's journey within the healthcare system. 15
Computational and visual analytic techniques facilitate systematic scientometric reviews, enhancing the accessibility, reproducibility, and timeliness of research literature studies. 16
A scientometric overview of a research area serves as a crucial foundation for conducting systematic reviews, especially in cases where relevant and current reviews are lacking. The integrity of the input data is vital for ensuring the overall quality of subsequent analyses and reviews. This issue becomes even more critical when we need to select specific subsets of articles from a vast pool of available research or when we aim to narrow the focus of a study to particular disciplines instead of leaving it open to all fields, and prioritizing high-quality data, we enhance the reliability and relevance of our research outcomes. 16
An increasingly popular trend is the rise of scientometric evaluations aided by science mapping tools. These tools take a collection of bibliographic records within a specific research domain and provide a comprehensive overview of the underlying knowledge landscape. One such tool is VOSviewer. 17 VOSviewer has been used to conduct systematic evaluations on topics like citizen science and the intersection of climate change and medical sciences. A scientometric overview of a research field provides valuable information for conducting systematic evaluations, especially when relevant and current systematic reviews may not be easily accessible. Science maps are important because they offer a wide-ranging context for a particular research interest. 18 The purpose of this study is to conduct a scientometric review of health data quality, with a specific focus on its use in disease management.
Methods
The current study is being conducted using a cross-sectional descriptive method with a scientometric approach. The data required for the study were collected from the Web of Science (WOS) from 1991 until the end of September 2023, using a census-type collection method.
In this study, only the WOS database was chosen because the authors aimed to select the highest quality and most relevant research.
The database includes several indexes such as SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, BKCI-S, BKCI-SSH, and ESCI, which are used for science citation. To collect the data, and then use the following search strategy: “Results for ((TS = (“data quality” OR “information quality”)) AND TS = (health OR med*)) AND ALL = (Disease manag*) and Health Care Sciences Services or Medical Informatics or Health Policy Services (Web of Science Categories)”.
The inclusion criteria for this review focused on articles published in Web of Science from 1991 to 2023. These articles were identified using specific keywords from the search strategy. However, records such as letters to the editor and other non-article formats were excluded based on the established exclusion criteria.
To store a set of 521 recovered documents, the storage section of the science website was utilized. The data was stored in the form of 500-number packages in plain text format, and then all the 500-number packages were collected in a single file. In order to analyze the findings, the analysis section of the WOS was first used. Then, Excel version 2019 software was used to draw tables. To draw science maps and co-occurrence maps and determine scientific clusters and newly formed co-occurrence clusters, Vosviewer 1.6 software was used. 19
Network visualization algorithms were utilized to create the scientific map, employing default values to illustrate layers and clusters in VOSViewer, with a scale of 1 and circular shapes. Additionally, cartographic maps were generated.
The visualization of the software is done on a weight based on the co-occurrence of subjects with size variation of the type of circles with Max.length 30, and Min.strength and association strength method was used to normalize Network Visualization. Before the data review process, irrelevant data was removed by monitoring the data.
Co-occurrence results offer valuable insights into the frequency with which different health concepts—such as diseases, medications, and procedures—are associated in patient data. By revealing these connections, we can significantly improve our understanding of health data quality in disease management. Utilizing this knowledge not only enhances data quality assessments but also aids in identifying potential errors and refining disease management strategies, ultimately leading to better patient outcomes.
All visual representations in this study were created using VOSviewer.
It must be mentioned that the examined samples were scientific documents such as research articles and reviews. Therefore, the present study does not have ethical considerations.
Results
Figure 1 showcases the collaborative efforts of authors in the subject of health data quality, a crucial factor in disease management. Upon examination of the co-authorship map, two distinct clusters are evident. The first cluster, shaded in red, includes 10 active authors, including Jawn, Loide t, Zaman, El Areifeen, and Kong. The second cluster, shaded in green, is comprised of 7 main authors, including Kimberly, Avinash, and Claudia, among others, who have worked together to advance research in this field.

Co-authorship network in health data quality.
By analyzing this network, one can gain insights into the pattern of collaborations among these authors. The network can help identify the most significant authors in this field based on the number and strength of their collaborations. Overall, this network provides a comprehensive overview of the collaborations and connections among the authors in this field, which can be useful for researchers, practitioners, and policymakers alike.
According to Figure 2, the leading countries in producing scientific documents related to health information quality for disease management are displayed.

Of the top countries producing scientific documents in the field of health data quality.
The data is organized according to the authors’ affiliations, and to effectively manage collaborations across multiple countries, the contributions of all authors were fully counted based on their affiliations.
It is evident from the diagram that the United States holds the first position, followed by England and Australia in second and third place, respectively. Notably, China and Iran are among the top 12 Asian countries, with China ranked fourth and Iran ranked eleventh. South Africa is the only African country that appears in the top 12 countries producing scientific documents in this field. Due to a significant disparity between these 12 countries and the others, this study focused exclusively on examining these 12 countries.
The Figure 3 shows the collaboration between countries (co-countries) regarding the quality of health data, which is effective in controlling diseases. The visual representation reveals two main clusters. The first cluster includes Iran, France, Spain, Italy, Portugal, and Greece, while the second cluster shows the collaboration between the United States, England, Sweden, Bangladesh, and several other countries. This figure shows the two clusters of cooperation among countries actively involved in improving the quality of health data.

Cooperation between countries in health data quality.
Figure 4 outlines the evolution of the field of health data quality over time. Before 2015, the most critical domains related to data quality included mortality, risk, management, accuracy, disease, and quality of life. However, from 2017 to 2019, the focus shifted to outcomes, care, prevalence, diagnosis, and other related spheres. Subsequently, from late 2019 through the start of 2020 and beyond, the field of health data quality has exhibited a more pronounced interest in information quality, public health, data accuracy, and COVID-19. This shift in focus is attributed to the COVID-19 epidemic, which has brought about significant changes in the field of data quality.

Co-occurrences between data quality and health management.
Figure 5 illustrates the key co-occurrences associated with health data quality. The analysis identifies three main clusters within this topic. The findings indicate that the most significant co-occurrences in the field of health data quality include management, prevalence, epidemiology, care, health, disease, mortality, and diagnosis.

The most important co-occurrence with health data quality.
The first cluster formed around the concept of data quality. This cluster primarily deals with concepts related to healthcare, HIV, and care. The data quality criteria that have been addressed in this cluster are accuracy, completeness, and coverage. These criteria are considered to be very important in the relationship between data quality and disease management.
On further analysis of this cluster, we can observe that the mentioned data quality criteria have the greatest impact on disease management. Accuracy is essential to ensure that the data being collected is precise and free from errors. Completeness of data is necessary to ensure that all relevant information has been recorded, and there is no missing data. Coverage of data is important, too.
Therefore, it can be concluded that the three data quality criteria mentioned in this cluster play a vital role in disease management. By ensuring that data quality is maintained at all times, healthcare professionals can make informed decisions and provide better care to patients.
The second cluster focuses on management-related topics that are essential for effective disease control and epidemic management. The cluster encompasses a range of concepts related to disease, diagnosis, mortality, and prevention, all of which are vital considerations for ensuring effective control measures. Importantly, this cluster does not include data and information quality concepts, as its focus is primarily on management and its impact on disease control and prevention efforts. Overall, this cluster provides valuable insights into the complex interplay between management, disease control, and epidemic management, and highlights the importance of effective management in mitigating the impact of disease outbreaks.
The third cluster consists of nine items, which are grouped based on the quality of information. In the current era, heavily influenced by the COVID-19 pandemic, this cluster highlights the importance of information quality in various areas such as cancer, COVID-19, public health, and more. Moreover, mobile health is another crucial concept that falls under this cluster, emphasizing the significance of information quality in this field. Hence, the quality of information in cancer, COVID-19, public health, and other similar fields is critical, and the importance of information quality in mobile health cannot be overstated.
Discussion
The top 12 countries in the world have been ranked, with the United States taking the first spot. China and Iran came in 4th and 11th, respectively.
There are two main clusters of countries that show cooperation. The first cluster includes Iran, France, Spain, Italy, Portugal, and Greece. The second cluster represents cooperation between the United States, England, Sweden, Bangladesh, and several other countries.
Prior to the year 2015, the areas that held significant importance to data quality included mortality, risk, management, accuracy, morbidity, and quality of life. The findings from that era suggest that data quality played a vital role in disease management by reducing mortality and risk while increasing the quality of life. Notably, accuracy is the most crucial dimension among the different dimensions of data quality. A study conducted by Chen et al. 20 reviewed the data quality assessment methods in public health information systems.
The most important dimensions of data quality in public health systems are completeness, accuracy, and timeliness. The accuracy dimension was highlighted as one of the crucial concepts before 2015 in a current study. Similarly, in Mashoufi et al.'s 21 study, accuracy was also mentioned as one of the most important dimensions of data quality in emergency medicine.
There are several reasons why the findings of two other studies are not similar to the present study. One reason is that the two studies under review had different objectives from the present study. The present study focused on examining the quality of the effective data on disease management, while the other two studies aimed to examine the quality of the data in disease management.
Between 2017 and 2019, there was a shift in focus towards outcomes, care, prevalence, diagnosis, and other related areas. This indicates that the most critical aspect of data quality during this time was in disease management for patient care, accurate diagnoses, and reducing the prevalence of diseases. This finding was reported by D'Ambrosio et al. 22 Her study also highlights the significance of data quality concerning issues related to disease prevalence. Also, among other studies that have pointed out the importance of data quality in prevalence, O'Bryan et al. 23 and Harsh Vivek Harkare et al. 24 named Studies such as Dixon et al. 25 and Chen et al. 20 mentioned the existence of a relationship between data quality and diagnosis.
Between late 2019 and early 2020, there has been a growing interest in the field of health data quality, particularly in relation to public health, data accuracy, and the COVID-19 pandemic. The focus on data quality could be attributed to the impact of the COVID-19 outbreak. The pandemic has highlighted the vital role of data quality in public health management, with accuracy being the most crucial aspect of data quality. In a study by Cristina Costa-Santos et al., 26 the researchers explored the role of data quality in the context of COVID-19. The study findings emphasize the importance of accurate data in managing the pandemic.
During the period from late 2019 to early 2020 and beyond, the field of health data quality has experienced a notable increase in interest, particularly in the areas of public health, data accuracy, and COVID-19. This shift in focus may be attributed to the impact of the COVID-19 pandemic. As evidenced in a recent study by Cristina Costa-Santos et al., 26 COVID-19, as the most significant epidemic of our time, has brought to the fore the importance of data quality in public health management. Accuracy represents the most fundamental dimension of data quality, and it can be argued that its role in the management of the disease is equally critical. In their study, Cristina Costa-Santos et al. 27 explored the role of data quality in the context of COVID-19. The study findings highlight the crucial role of accurate data in managing the pandemic.
The studied area is characterized by the presence of three co-occurring clusters, the foremost of which centers on data quality. This cluster is of paramount importance in the context of disease management. Within this cluster, three distinct criteria are identified as being critical: accuracy, completeness, and coverage. These criteria are highly significant in ensuring that the data collected is complete, accurate, and reliable.
In Behrouz Ehsani-Moghaddam et al.'s 28 study, accuracy and completeness were identified as important criteria for evaluating data quality in the field of health. Similarly, in the present study, these criteria were also found to be significant. The study did not investigate the coverage criterion, which may be attributed to the difference between the aims of the current study and Ehsani-Moghaddam's study. Or David Overman and Clarke 29 say in their book that Accurate and complete data is the foundation of any database. Without such data, the database cannot be trusted to provide accurate information to its users and participants.
The second cluster focuses on management for effective disease control and epidemic management.
Cluster three comprises nine items that are grouped based on the quality of information. This cluster is heavily impacted by the COVID-19 pandemic and emphasizes the significance of information quality in different domains, including cancer, public health, and COVID-19. Additionally, mobile health is another crucial concept included in this cluster that underscores the importance of information quality in this field. After the COVID-19 pandemic, many articles discussed the relationship between data quality and COVID-19. This shows the importance of this keyword and validates the findings of the present study. For example, studies: Ferrari et al., 30 Ayele W et al. 31 and Ngueilbaye et al. 32
Conclusion
The study area is divided into three clusters. The first cluster focuses on data quality, while the second cluster deals with effective management of diseases and epidemics. The third cluster has nine items related to information quality, which have been significantly impacted by the COVID-19 pandemic. The third cluster also highlights the importance of mobile health and emphasizes information quality in this field.
For future research endeavors, it is advisable to conduct a comprehensive study that focuses on the scientometric assessment of data quality within the health domain, leveraging the PubMed and Scopus databases. Moreover, it is important for health managers to utilize insights derived from similar studies to inform and enhance health policy decisions.
When it comes to managing health records, it's important to prioritize the accuracy, coverage, and completeness of data. By ensuring that all relevant information is properly recorded and maintained, healthcare professionals can make more informed decisions about patient care and treatment options. This is especially crucial when dealing with serious illnesses or chronic conditions, where even small oversights or inaccuracies can have a significant impact on the outcome. As such, it's vital that healthcare organizations invest in the necessary tools and resources to ensure that their data quality features are up to par and that they have the insights they need to provide the best possible care to their patients. Limitations of this study include reliance on a single WOS database, the exclusive use of co-occurrence analysis, and potential bias from the search strategy
Footnotes
Acknowledgements
Not applicable.
Ethics approval and consent to participate
This research is part of a thesis approved by the Ethics Committee of Isfahan University of Medical Sciences under code “IR.MUI.NUREMA.REC.1402.025”.
Consent for publication
Not applicable.
Informed consent
None.
Contributorship
AR and SS conceived the study, prepared the analysis plan, conducted the analysis, and prepared the draft manuscript. HBH, MJ, and RR conceived the study, prepared the analysis plan, performed the search, screening for study, and prepared the draft manuscript. All authors contributed to the final version of the manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study received support from Isfahan University of Medical Sciences (Code: 340248).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Gurantor
None.
Peer review
None.
