Abstract
Objective
This study aims to explore the awareness, attitudes, and actual usage of medical big data platforms among healthcare professionals to provide practical guidance and theoretical support for improving data quality for the development of medical informatization.
Method
Semistructured interviews were conducted with 19 doctors and nurses from a tertiary hospital in Wuhan City between April and June 2024.
Results
The analysis yielded seven major themes and nine subthemes: cognitive status, Value of Medical Big Data Platforms, data trust (subjective data, objective data), purposes of data recording (patient condition observation, self-protection, task completion), practical challenges (conflict between work purposes and recording requirements, inconsistent departmental training standards, influence of leadership style and team culture), standardization of data recording, and concerns about data privacy.
Conclusion
The insufficient understanding of medical big data platforms among healthcare professionals affects the quality of data recording and research value, emphasizing the necessity of strengthening training, standardizing data recording, providing technical support, and ensuring data security.
Introduction
In the global healthcare sector, informatization has become key to enhancing services. Informatization has become pivotal for enhancing services. The rapid development of the healthcare industry has led to the generation of vast amounts of medical data, characterized by the “5V” features of big data: Volume, Velocity, Variety, Variability, and Veracity. 1 These features present unique challenges in data management and quality control that differ from those in traditional health information systems.
Unlike traditional electronic health record (EHR) systems, which focus on the electronic storage and retrieval of patient records, medical big data platforms integrate a vast array of data sources, including inpatient information systems, outpatient electronic medical records, laboratory information management systems, and nursing information systems. 2 By collecting, integrating, cleaning, and standardizing these data, researchers can obtain real-world patient medical data, providing new avenues for clinical research and patient management. 3 This is of great significance for improving medical quality and efficiency, reducing costs and risks, and promoting medical innovation and development. 4
However, medical big data platforms face significant challenges in realizing their potential. These challenges are primarily focused on the quality of the data itself. Issues such as low data quality, lack of consistency and completeness, and textual ambiguity can lead to incomplete and confusing information, affecting the adequacy of medical decision-making and thus severely impacting the transformation and utilization of data value.5–7
These issues are caused by multiple factors or processes at various levels.8,9 Specifically, at the data source level, there are problems such as data unreliability, data replication, and inconsistency; at the data generation level, errors in human data input, inaccurate readings from sensor devices, the irregularity of social media data, and the difficulty in processing unstructured data all pose challenges to data quality; and at the data processing and application level, problems in data collection and transmission, as well as difficulties in tasks such as data transformation and integration during the data preprocessing stage, further affect the overall quality of the data.
To address these challenges, some scholars 10 have proposed establishing management systems based on the full data life cycle and constructing a general management process for source data. This process covers the entire data life cycle, from generation, collection, and processing to analysis and visualization, ensuring that data meet quality requirements at each stage. Targeting various stages of data processing, scholars have developed and applied corresponding technical tools, such as extraction, transformation, and loading and traceability technologies. 11 They have also proposed an integrated medical data management automation framework based on the Web, including data evaluation, quality control, and standardization modules. The main approach to data quality control is detecting outliers, data similarity, and duplicate data removal. 12 Preprocessing ensures that data meets certain quality standards before entering subsequent processing stages,13,14 and the use of automation technologies or detection programs has improved the efficiency and accuracy of data validation.15,16
Although existing research has proposed various measures to improve the quality of medical big data, these measures are mostly focused on the stages of data processing, storage, and use. However, controlling data quality at the source is equally important. Errors in raw data can trigger a series of chain reactions, leading to the inauthenticity of results. At the data creation stage, that is, when data are first recorded and entered into the system, ensuring the accuracy and completeness of the data is key to improving overall data quality. In this regard, medical staff are the direct recorders and users of medical data, and their accuracy, completeness, and consistency in recording directly affect the quality of the data.
Unlike existing studies, this study focuses on the subjective agency of healthcare professionals in the data recording process of medical big data platforms. Qualitative interviews aim to explore their knowledge, attitudes, and behaviors regarding data quality control, thereby providing new perspectives to address data quality issues and offering theoretical support for the further development of medical informatization.
Methods
Study design
This study grounded in phenomenological theory, complies with the COREQ standards for reporting qualitative research. 17 It was carried out from April to June 2024 in a tertiary hospital in Wuhan, which has many medical staff and serves as a cooperative platform for Beijing Yiduyun's medical big data platform.
Participant selection
Purposive sampling was utilized. The inclusion criteria consisted of: (1) healthcare staff currently employed in the hospital, encompassing both doctors and nurses; (2) those with at least one year of clinical work experience; and (3) individuals who voluntarily agreed to participate in the study with informed consent. The exclusion criteria were visiting students and interns. Potential participants were identified through meetings and expert recommendations and then contacted via WeChat, a popular social networking application in China. The sample size was determined based on content saturation, meaning that sampling was halted when newly collected data no longer provided any new valuable information to the study. The number and reasons of those who declined to participate or withdrew midway were documented.
Data collection
Two female members of the research team conducted the interviews: a chief nurse (HSF) with over 20 years of clinical experience and a master's student (HJJ) with a relevant research background in utilizing medical big data platforms. Both interviewers had undergone training in interview methodologies. They had no relationship with the participants before recruitment, and the participants were unaware of the interviewers’ contact information and research objectives before the study. The participants had limited knowledge of the interviewers’ personal goals and reasons for conducting the research before the interviews. The interviewers maintained neutrality throughout the study to prevent personal biases and assumptions from influencing the research outcomes.
The data were gathered in the participants’ offices, conference rooms, or online through video, with only the participants and researchers in attendance. Key characteristics of the participants, such as age, gender, educational background, and work experience, were documented.
The interview outline was developed by a team comprising one chief nurse from a tertiary general hospital, one doctoral student, and four graduate students. After reviewing relevant literature and gaining a comprehensive understanding of the medical big data platform, the researchers combined their clinical work experience to conceive and preliminarily design the preinterview outline. After conducting preinterviews with two doctors or nurses, the final draft was revised, based on the interview results. The content of the interview outline is as follows: Cognition: (1) What do you know about the medical big data platform currently used in our hospital? (2) What do you think is the value of medical big data? Beliefs: (1) What is your attitude towards the medical big data platform and its use? (2) How much do you trust medical big data? (3) What is your view on the authenticity and reliability of medical big data? Behavior: (1) What do you think is your motivation and purpose for recording and providing medical big data? (2) What considerations and reasons do you have? (3) What difficulties have you encountered in recording and providing medical big data that affect the data?
No repeat interviews were conducted. The entire interview process was recorded, and field notes were taken during the interviews. Each interview lasted at least 30 min. Interviews and transcription reviews continued until no new themes emerged, at which point theoretical saturation was deemed to have been achieved. The interview transcripts were not returned to the participants for their comments.
In addition to interviews, the research team also observed the actual behavior of medical staff in practice. The team observed some medical staff's actual operational behavior in recording data into healthcare systems. The observation content included the accuracy and timeliness of data recording, as well as the ways of dealing with problems when encountered, etc., to supplement the interview data and make the research results more comprehensive and objective.
Data analysis
Four members of the research team, who were master's degree graduate students, independently coded the transcribed texts. The coding process, including initial coding, thematic coding, and constructing thematic frameworks, was meticulously documented. Themes were derived from the data, rather than being preset. The final themes were reviewed by all members of the research team in a discussion meeting to ensure the accuracy of data interpretation.
Patient and public involvement
No patients or members of the public were involved in the design and implementation of this study.
Results
Sample characteristics
The sample size was determined by the criterion of theoretical saturation, including 19 participants. Four interviewees declined our invitation. The sample composition is as follows: 10 nurses (coded as N1–N10) and 9 physicians (coded as D1–D9). The gender distribution includes 7 males and 12 females. The age range of the participants spans from 26 to 50 years, with a median age of 33.5 years. The work experience of the participants ranges from 2 to 28 years, with a median of 11.5 years. Basic information is shown in Table 1.
Basic information of interviewers.
Themes identified
The current study identified seven themes through interviews with healthcare professionals, encompassing their cognitive status regarding medical big data platforms, perceived value of these platforms, trust in data, purposes of data recording, practical challenges, standardization of data recording, and concerns about data privacy. These themes reveal the complex issues that healthcare professionals face in terms of cognition, trust, and practice while utilizing medical big data platforms. The specific themes and their contents are presented in Table 2.
Summary of themes and subthemes.
Theme 1: Cognitive status
The interview results show that healthcare professionals generally lack sufficient awareness of medical big data platforms. Some respondents indicated that they do not fully understand the functions and purposes of these platforms, sometimes even mistaking them for tools to search for literature. N3: “I haven't heard of it before. I only know about it because you mentioned it. Does our hospital have this? … Is it something like an online library for searching literature?” D1: “I've heard of it before, but I haven't used it myself. I don't know how to extract data from the medical big data platform.” Additionally, some respondents mentioned that they lack training and guidance on the potential applications and operational processes of the platform, further limiting their understanding and use of it. N5: “We're usually too busy to learn about these new things, and the hospital doesn't provide specific training.”
Theme 2: Value of medical big data platforms
Most healthcare professionals recognize the value of medical big data platforms, noting that they save manpower and resources and allow data extraction according to research needs. N9: “It's definitely helpful. When we used to work on projects, we had to go to the medical records department to apply, check each medical record one by one, and flip through the materials. It was very troublesome. Now, it's much easier and saves a lot of time. It's very valuable and convenient.” N10: “It's really difficult for us to do research in clinical settings because we have to collect data ourselves. Data collection is very challenging and time-consuming. We don't have time to collect data, design research, or analyze data. But if we have such a database now, it would be very helpful for our research.” D3: “With this platform, we can quickly get the data we need, reducing many cumbersome steps.”
Theme 3: Trust in data
Trust in Objective Data: There is generally high trust in objective data. N1: “Data like lab results and vital signs are measured and unchangeable, so the trust level is higher.” Respondents generally believe objective data, due to standardized measurement methods and recording processes, is highly reliable. D2: “Laboratory data are usually very accurate because of strict operating procedures.”
Distrust in Subjective Data: There is lower trust in subjective data (such as personal assessments and descriptions) due to inconsistent standards and potential biases. N5: “Everyone has different standards, and assessments can be biased. It could be due to changes in the patient's condition, but that's an uncontrollable factor.” Some respondents also mentioned that subjective data recording is often influenced by the recorder's personal experience and judgment, leading to lower reliability and consistency. N8: “Subjective descriptions can vary because of the recorder's experience, so it's hard to fully trust them.” D7: “Each doctor's experience and writing habits are different. This kind of textual content still needs to be standardized.”
Theme 4: Purposes of data recording in healthcare information systems
The purposes of healthcare professionals recording data into healthcare information systems reflect their practical considerations, including monitoring patient conditions, self-protection, and completing work tasks.
Monitoring Patient Conditions: Healthcare professionals record patient data into EHRs and other healthcare information systems to monitor the patient's health status. D3: “For example, if you complete the treatment measures within the golden time frame, it will be the best. If not, you need to track and improve the effect to ensure the patient's condition.” N4: “We record this data mainly to understand the patient's condition changes in time and ensure the effectiveness of the treatment.” Self-Protection: Detailed records are kept to avoid potential medical disputes. D4: “On the other hand, we need to keep these records complete so that during inspections or if there are disputes with patients, we can provide evidence of our work.” N10: “As someone who has worked for over ten years, I mainly do it for self-protection. If the records are complete, patients won't argue.” N6: “Detailed records can protect us and provide evidence if disputes arise in the future.” Completing Work Tasks: Ensuring that medical orders are executed and work tasks are completed. N2: “Completing work tasks, executing doctors’ orders.” D9: “This is part of our daily routine, writing medical records and progress notes.” N7: “Recording data also ensures that we complete all our tasks and avoid omissions.”
Theme 5: Practical challenges
Conflict between work purposes and standards
Busy work environments make it difficult to fully adhere to standards. N6: “Especially in the hematology department, everyone is very busy. It's the busiest department, with only a dozen people. Sometimes, the records are not very detailed. Many blood samples are coming in every day. There's no time to constantly measure vital signs. Besides vital signs, there are many subjective descriptions in nursing records. Sometimes, they are not very comprehensive. As long as the patient has no life-threatening issues, we don't go into too much detail.” D2: “There are so many surgeries scheduled every week, and we are rushing to complete them. It's impossible to record medical records in such detail. We don't have the time or energy to write them. Sometimes, we hire undergraduates to write medical records. But internal medicine might be stricter.” N5: “When work is too busy, it's really hard to record all the data according to standards. Sometimes, we can only make brief notes.”
Inconsistent training
Differences in training content and effectiveness are due to varying times and departments. N7: “We need to rotate through different departments, each with different recording focuses. The rotations are quick, staying in one department for a month, so it's impossible to fully train.” D3: “Wave after wave of rotating and intern students, each department has different ways of writing medical records. Everyone is different.” N9: “Training content is not uniform, and each department has different requirements, which often confuses us when recording data.”
Impact of leadership style and team culture on data recording quality
The emphasis on data recording quality varies depending on the leadership and team culture. N9: “Quality control depends on each department and the leadership. Even if there is unified training, if the department doesn't enforce it strictly and the leaders don't care, the training won't be very effective.” N2: “If everyone writes like this, then I will too, as long as there are no problems.” D6: “If the leadership doesn't emphasize the quality of data recording, it's hard for us to maintain high standards. After a while, things go back to the way they were.”
Theme 6: Standardization of data recording
Stricter data recording standards are needed to reduce difficulties in data extraction caused by differences in personal writing styles. N2: “For example, writing the date, some people use Chinese characters, some use Arabic numerals. Two thousand eighteen and 2018, right? And the format of year, month, and day. Different writing habits and units need to be standardized to make later processing easier.” D2: “We can set a standardized template for describing diseases, including symptoms and their order. This way, during quality checks, we can know what is included and what is missing.” N10: “Unified recording standards can greatly reduce the difficulty of data processing later and improve data utilization efficiency.”
Theme 7: Concerns about data privacy
There are concerns about the security and privacy protection measures of the data platform. N8: “Although there are measures like watermarks and transfer restrictions if someone wants to leak the data, they can still do it.” Some respondents also mentioned that despite technical measures to protect data privacy, the risk of data leakage still exists in practice. N10: “We worry that the data might be misused. Although there are some protection measures, it still doesn't feel very safe.”
Discussion
This study, through interviews and observations, attempts to explore the impact of medical staff on the data quality of medical big data platforms from aspects such as cognition, beliefs, and behaviors. The study reveals that medical staff's insufficient understanding of medical big data platforms, low trust in subjective data, deviations in the purpose of data recording, challenges in actual work, as well as the impact of leadership style and team culture on data quality, all jointly affect the data quality of medical big data platforms. A deep understanding of medical big data platforms by medical staff significantly enhances their trust in the data. This trust serves as the foundation for positive data-recording behaviors. Trust in the data further encourages medical staff to record and provide data more actively.
This study found that some healthcare professionals still lack an understanding of these platforms and even have misconceptions. This lack of knowledge may limit the full utilization of the potential of medical big data platforms. Healthcare professionals, when recording patient's information in to healthcare system, usually focus more on immediate work needs, such as patient condition observation, self-protection, and task completion, rather than the long-term research value of the data. This bias may affect the purpose and quality of data recording. Especially for young healthcare professionals with less research experience, this phenomenon is more obvious. They may not fully recognize the research value of data recording. This phenomenon may be related to the emerging state of China's big data platforms and insufficient promotion. Since 2016, China has gradually built medical big data centers, and the amount of medical data has grown rapidly. However, due to the lack of uniform standards and professional talents, data quality is uneven, forming data silos, which hinders the development of health medical big data.18,19 Therefore, strengthening the training and education of healthcare professionals on medical big data platforms is particularly important to improve their understanding and ability to use these tools.20,21
This study also found that medical staff doubts regarding the reliability of subjective data. This is because subjective data are influenced by individual experience and judgment during the recording process, leading to potential discrepancies in assessments of the same situation among different medical staff. Additionally, variations in writing styles further contribute to the challenge of ensuring standardization and consistency in subjective data. The distrust of subjective data in medical data reflects the problem of data standardization and consistency. 19 Similar studies have also highlighted the importance of standardizing data recording to improve data quality.22,23 To improve the quality and research utilization of data recording, it is necessary to raise awareness of the research value of data among healthcare professionals and to develop and promote standardized guidelines. 24
Concurrently, high workloads and time pressures also impact the data-recording behaviors of medical staff. As healthcare professionals need to deal with a variety of diseases and conditions, and department rotation is frequent, traditional uniform training methods may not meet their needs. High personnel mobility can lead to discontinuity of knowledge and skills, hindering collective cohesion and group trust, affecting service quality and the consistency of data recording.25,26 And this finding is consistent with other studies.23,27 To address this issue, necessary technical support, such as machine-assisted data recording systems, should be provided to reduce the burden on healthcare professionals and improve the accuracy and efficiency of recording; at the same time, more flexible and customized training programs should be developed to meet the specific needs of different departments and disease types; and continuous education and professional development of healthcare professionals should be encouraged and supported. 28
Leaders play a decisive role in emphasizing the importance of data quality.29,30 If leaders do not value the quality of data recording, then even if training is conducted, healthcare professionals may not prioritize it. Therefore, specialized leadership training should be provided to help leaders understand the importance of data quality and learn how to encourage team members to follow best practices through positive role modeling and communication strategies31,32; team members tend to imitate the behavior of others, especially when they see that the behavior of others does not cause problems. This “conformity effect” can lead to a lowering of standards across the entire team, thereby affecting data quality.33,34 It is necessary to encourage team members to improve data quality by rewarding those who adhere to high standards for data recording, 35 organizing team-building activities to promote communication and collaboration among team members, and establishing a positive team culture where everyone is committed to maintaining high-quality data recording.
The concerns of healthcare professionals about data privacy may stem from worries about data leaks and misuse. 36 These concerns may hinder their use of medical big data platforms. To alleviate these concerns, it is recommended that medical institutions strengthen data security measures and provide comprehensive data privacy training to ensure that healthcare professionals can handle and use patient data safely.
Limitation
Firstly, the interview guidelines were designed by the research team and may not cover all relevant topics, which could affect the comprehensiveness of the study results. Secondly, due to limitations in time and resources, the sample size was relatively small, which may impact the representativeness of the findings. Additionally, the study was conducted in only one tertiary hospital, which may limit the generalizability of the results to other medical institutions.
Conclusion
This study, through qualitative interviews and observations, thoroughly explored the cognition, beliefs, and behaviors of medical staff in data quality control of medical big data platforms. The research revealed that medical staff's insufficient understanding of the platforms, distrust in subjective data, deviations in the purpose of data recording, challenges in actual work, and the impact of leadership style and team culture on data quality all significantly influence the data quality of medical big data platforms. Focusing on the subjective agency of medical staff, this study provides new theoretical support and practical guidance for improving the quality of medical big data. The findings emphasize the importance of enhancing training, providing technical support, optimizing leadership style and team culture, and offering specific improvement measures for medical institutions. These contributions are instrumental in enhancing data quality and promoting the further development of medical informatization.
Supplemental Material
sj-pdf-1-dhj-10.1177_20552076251326697 - Supplemental material for Leveraging healthcare professionals’ insights to enhance data quality in medical big data platforms: A qualitative study
Supplemental material, sj-pdf-1-dhj-10.1177_20552076251326697 for Leveraging healthcare professionals’ insights to enhance data quality in medical big data platforms: A qualitative study by Huang Jingjing, Huang Sufang, Lang Xiaorong, Liu Yuchen, Zhang Kexin and Liu Shiya in DIGITAL HEALTH
Supplemental Material
sj-docx-2-dhj-10.1177_20552076251326697 - Supplemental material for Leveraging healthcare professionals’ insights to enhance data quality in medical big data platforms: A qualitative study
Supplemental material, sj-docx-2-dhj-10.1177_20552076251326697 for Leveraging healthcare professionals’ insights to enhance data quality in medical big data platforms: A qualitative study by Huang Jingjing, Huang Sufang, Lang Xiaorong, Liu Yuchen, Zhang Kexin and Liu Shiya in DIGITAL HEALTH
Footnotes
Acknowledgments
The authors would like to express our sincere gratitude to all the medical staff interviewed for this study. Their willingness to share their experiences and insights has been invaluable to our research.
Contributorship
SFH supervised the study; JJH finalized the analysis, JJH wrote the drafts of the manuscript; YCL, KXZ, XRL, and SYL interpreted the interview and commented on and helped revise drafts of the manuscript. All authors read and approved the final manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
This work was approved by the Ethical Committees of Tongji Medical College (TJ-IRB202403047).
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
