Dilemmas and prospects of artificial intelligence technology in the data management of medical informatization in China: A new perspective on SPRAY-type AI applications

Abstract

Objectives: This study aims to address the critical challenges of data integrity, accuracy, consistency, and precision in the application of electronic medical record (EMR) data within the healthcare sector, particularly within the context of Chinese medical information data management. The research seeks to propose a solution in the form of a medical metadata governance framework that is efficient and suitable for clinical research and transformation. Methods: The article begins by outlining the background of medical information data management and reviews the advancements in artificial intelligence (AI) technology relevant to the field. It then introduces the “Service, Patient, Regression, base/Away, Yeast” (SPRAY)-type AI application as a case study to illustrate the potential of AI in EMR data management. Results: The research identifies the scarcity of scientific research on the transformation of EMR data in Chinese hospitals and proposes a medical metadata governance framework as a solution. This framework is designed to achieve scientific governance of clinical data by integrating metadata management and master data management, grounded in clinical practices, medical disciplines, and scientific exploration. Furthermore, it incorporates an information privacy security architecture to ensure data protection. Conclusion: The proposed medical metadata governance framework, supported by AI technology, offers a structured approach to managing and transforming EMR data into valuable scientific research outcomes. This framework provides guidance for the identification, cleaning, mining, and deep application of EMR data, thereby addressing the bottlenecks currently faced in the healthcare scenario and paving the way for more effective clinical research and data-driven decision-making.

Keywords

artificial intelligence medical metadata governance knowledge-based data informatization

Introduction

With the aid of information technology (IT), human society is embarking on a new era of advancement. An economic framework anchored on intelligence, networks, and big data is taking shape, characterized by “fusion,” or the profound integration of IT and industrial manufacturing, humans and machines, and information and material resources. This integration has triggered a seismic shift in various facets of human life, including education, medical treatment, public service, and social interaction, moving from local to global intelligence, thereby altering people’s lifestyles and behavioral patterns.¹ Despite these gains, healthcare IT remains mired in inherent problems. Information construction lacks application scenarios, informatization design falls short in realizing complete informatization and digital operation, hospital information system (HIS) integration is poor, data center construction lags behind, and overall informatization is characterized by “reconstruction light use”.² The purpose of this paper is to shed light on the inherent dilemma of artificial intelligence (AI) technology in acquiring knowledge data through data governance in China’s healthcare informatics by briefly outlining the current state of healthcare data informatization, scientific governance technological advancements in clinical data, and meta medical data governance framework for generating knowledge data.

Informatization and medical data

In the era of AI, medical data serves as the cornerstone and foundation of hospital informatization.^3,4 High-quality medical data support is essential for the effective functioning of medical AI applications. Hospital informatization in China started in the 1990s, with financial accounting and fees as the core of hospital management informatization. Over time, the HIS has gradually expanded to encompass all clinical business and life-cycle diagnosis and treatment data for a vast number of patients, aided by IT development.⁵ However, for historical development reasons, there are several issues with the data. Informatization planning in most hospitals is not uniform, resulting in different information systems being introduced at various times to different business departments. This has led to scattered clinical diagnosis and treatment data across multiple information systems with varying data types, versions, structures, and formats, creating challenges for system interconnection and data integration.^6,7 With the construction and extensive application of medical informatization platforms, more medical procedures rely on the HIS, and the volume, complexity, and sources of health medical data are skyrocketing. This explosion of data poses new challenges for the governance and application of healthcare data.

Poor data consistency and completeness

Data fragmentation resulting from the lack of interoperability between different business systems within hospitals has led to the emergence of information silos.^8,9 The current state of hospital information systems (HIS) highlights some issues, such as incomplete data schema descriptions, unclear data connections between systems, and inconsistent system value domain standards.¹⁰

The primary cause of poor data consistency is the lack of uniform data standards. For instance, multiple information systems in hospitals adopt different reference standards and coding modalities for the same subject dictionary. Some data entries are manually customized and not generated according to dictionary tables, such as customized medical orders and check items. The business table and the dictionary table field lengths are not set optimally, leading to truncated fields.

One reason for the poor data integrity is the lack of a verification mechanism in the information system, which results in inadequate quality control during data generation. This may be due to HIS construction not keeping pace with the development of medical management needs.¹¹ Another reason is inadequate integration, where data are dispersed among different systems, and logically linked data cannot be related because the association information is not stored in the database, such as lung function testing devices.

Disparities in data standards

Uniform data standards are essential for ensuring data consistency and improving data quality in the process of hospital informatization in China. The lack of such standards has resulted in issues such as illegal data formats, non-standard coding, and ambiguous business logic.¹² For instance, earlier versions of the International Classification of Diseases 10 (ICD10) used for case number coding in hospitals differ across provinces. Information shared across different business systems also varies in terms of format and content. Furthermore, standards for fields like gender, region, and occupation differ across systems. Without proper master data management, integrating data associations from various systems becomes challenging.

Data security and privacy protection

Currently, many HISs can no longer be physically isolated, and the construction of internet hospitals must be based on HISs to provide online medical services such as appointment registration, fee payments, report examinations, and health monitoring. However, data aggregation and outsourcing of such services can pose security risks such as data leaks, illegal access, and denial of service attacks.¹³ With regard to personal privacy, hospitals have not fully recognized the significance of data analysis and utilization. Patient information desensitization services are inadequate, and targeted healthcare data security prevention systems have yet to be established.

Difficulties in data mining

In the HIS, structured, semi-structured, and unstructured data coexist, with a significant portion of medical data being stored in unstructured formats such as text and images.¹⁴ Extracting valuable insights from unstructured medical text requires structured processing, posing a major challenge in data mining. Presently, an authoritative Chinese language medical terminology and text structuring tool is lacking, and effectively extracting accurate information from hospital textual data remains a significant challenge in structured medical natural language text.¹⁵ For instance, medical records are often rife with irregular symbols, typos, and inconsistencies resulting from doctors’ writing errors.

China’s medical data issues are diverse and pressing, exposing four main problems: inconsistent standards, poor consistency and integrity of electronic medical record (EMR) data, difficulties in data mining and utilization, as well as security and privacy protection concerns. Addressing these data challenges necessitates urgent adoption of data governance strategies complemented by AI.

AI in data governance

A mounting number of medical institutions are realizing that medical data is underutilized due to poor quality. Effective medical data governance and data mining can provide vital support for medical information sharing, personal health planning, personalized clinical decision-making, diagnostic optimization of treatment processes, disease prevention and management, and national health strategy development.¹⁶

Currently, there is no standardized definition of data governance at either domestic or foreign levels. The Global Data Management Community defines data governance as the exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets.¹⁷ The Chinese National Standard “Specification of data governance” defines data governance as a collection of control activities, performance, and risk management related to data resources and their applications.¹⁸ The medical data governance platform comprises data storage, metadata management, data quality control, and data desensitization. To achieve this, we must develop various techniques, such as extraction, transformation, and loading (ETL)¹⁹; textual information extraction and structuring²⁰; knowledge mapping; and multi-source data fusion. Additionally, the analysis and application of the data necessitate the use of a variety of statistical methods and AI algorithms.

Metadata management

As the key links of data governance, data integration faces difficulties mainly in poor data consistency and integrity.²¹ Therefore, it is necessary to obtain a better understanding of the data in business systems through metadata management so as to assist in understanding data. Metadata are data that describe the attributes of data, which provide a means of identifying, defining, and classifying data in a subject domain.²² Metadata management can facilitate the integration of data from multiple sources.²³

One of the core elements of metadata management is uniform and clear metadata standards. China already has relatively detailed industry standards, including the Specification for drafting of a health information basic dataset, the Electronic Medical Records Basic Framework, and the Data Standards. In terms of coding standards, there are Medical Subject Headings (MeSH),²⁴ the Unified Medical Language System (UMLS),²⁵ ICD,²⁶ the Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT), and the Logical observation identifiers Names and Codes (LOINC). However, in practice, these standards are abridged and expanded to varying degrees depending on the purpose of the application. Therefore, establishing a metadata management mechanism based on metadata standards can help complement the standards and correlate different systems. In terms of technology, relational association rule mining, named entity recognition, and relation extraction can be applied to “metadata governance” of medical data.

Information privacy security architecture

With the rapid development of the Internet, cloud computing, 5G, and other information technologies, users have increasingly high requirements for network and data security. Specifically, in big data applications, data application security has become a top priority owing to the increasing amount of data. Owing to the unique sensitivity of medical data, medical institutions proactively strengthen data privacy protection measures to prevent active leakages.²⁷ The common approach is to follow the national framework of cybersecurity laws and regulations, as well as the national standard system of cybersecurity hierarchical protection, and give different access rights to medical data for different users.^28,29 There are also AI methods such as bifactor and facial recognition, which enable human-machine isolation and in-hospital and out-of-hospital isolation so as to achieve data accessibility without abuse.

In addition to the abovementioned technical means, as medical personnel have direct contact with medical data, whether these personnel can adhere to the principle of protecting patient privacy directly affects the medical information security. At present, organizations at all levels in China regulate the patient privacy protection behaviors of medical staff by formulating policies related to patient privacy protection and imposing privacy protection requirements on medical staff.³⁰ Our team also promotes better patient privacy protection by constructing a theoretical framework for doctor-patient confidentiality.³¹

Exploiting EMRs with artificial intelligence

An EMR contains information on demographics, medical history, vital signs, diagnoses, tests, treatments, and disease progression. In clinical data, in addition to structured data such as diagnoses and laboratory tests, there is also a significant amount of textual data such as medical records and examination text reports. Much of these data are stored in XML, HTML, TXT, and other formats. The data contain valuable information such as symptoms in the history of a present illness and tumor location in radiology reports, which need to be extracted from the free text by a certain technical means, that is, structuring medical text. At present, there are two ways to realize the structuring of medical text.

One is based on the natural language processing (NLP) algorithm model, which mainly includes steps such as named entity recognition and relation extraction. The NLP algorithm model is characterized by large-scale raw data that need to be labeled for algorithm training, a high labor cost in the early stage, and high requirements on computer performance. The effect and efficiency of the well-trained NLP model are better than those of regularization in large areas across multiple site texts.

In addition, based on a standardized coding system, it is relatively easy to extract structured medical data. First, keywords are extracted using term extraction technologies such as medical language extraction and an encoding system (MedLEE), UMLS, MetaMap, Hitex,³² and the knowledge map concept identifier. Then, based on clinical experience and guidelines, rules are formulated by automatic rule generation methods such as the RETE algorithm, JBoss rule engine, feature selection algorithm, and discriminant analysis model.³³ For unstructured text, instead of extracting keywords and form rules from standardized medical language, we use NLP techniques such as coreference resolution, time analysis, assertion, semantic network, and unstructured information management architecture (UIMA).

Medical metadata governance framework

The acquisition of labeled data primarily involves annotating entities, attributes, and relationships within unstructured data found in EMR text and medical images.²⁵ The quality of labeled data plays a crucial role in training deep learning or neural network models. To ensure proper governance of labeled data, it is imperative to establish comprehensive labeling specifications for entities of varying granularity. This involves creating a medical terminology based on data from diagnoses, surgeries, tests, and pharmaceuticals, in conjunction with basic, bridge-building, and clinical disciplines. It is also necessary to standardize the management of all labeling process elements and perform cross-validation of labeling results. Data cleansing processes primarily consist of four steps: analyzing dirty data types, defining cleaning strategies, cleaning based on said strategies, and verifying data quality. The core component lies in defining the cleaning strategy, including standard data type definitions, data integrity constraints, and cleaning function rules.

Because knowledge-based data representation varies across agencies and complex relationships exist between knowledge, it is essential to uniformly represent knowledge-based data and clarify their relationships. Many informatization vendors have completed the initial governance of the database layer through a business understanding of integrated data, but useless for directly applied to clinical or scientific research tools.

Structured fields in the HIS, such as test names, diagnoses, doctors’ orders, and inspections, have no relevant national standards, let alone mandatory standardization requirements. However, diagnoses and surgeries in medical records are coded according to the National Health Commission’s requirements using ICD10 and ICD-9-CM3. As a result, most domestic medical big data companies reference national standards and medical literature to generate a set of internal standards. These companies typically collaborate with hospitals to develop corresponding normalization algorithms. Common normalization standards in HISs and related fields are shown in Table 1. While data cleansing is essential for ensuring the quality and reliability of research data, it also poses potential risks, primarily legal in nature, which are often overlooked. The inadvertent degradation of data quality due to improper data cleansing can lead to legal challenges, especially under the scrutiny of medical device regulations. This issue requires careful consideration and adherence to stringent legal and ethical standards to maintain the integrity of the research and to safeguard against any adverse consequences that may arise from such practices.

Table 1.

Application scope of common medical related standards.

Standard name	Common application content	Application status
ICD10	Diagnosis	The home page of medical records in third-class a hospitals in various localities is currently encoded according to ICD10, but there are many versions. The national health commission of the People’s republic of China has issued the national standard version 2.0 to try to unify the diagnoses
ICD11	Diagnosis	The national health commission of the People’s republic of China has issued a Chinese version and recommended its application in third-class a hospitals in various localities, but the local implementation is unknown
ICD-9-CM-3	Surgical operation	The operation is currently encoded in medical record homepages in third-class a hospitals in various localities according to this standard, but there are many versions. The national health commission of the People’s republic of China has issued version 2.0 of the national standard to try to unify the code
SNOMED-CT	Clinical information such as diagnoses and symptoms	A Chinese translation version was published by concorde in 1997, but it has not been updated for many years. No official Chinese version has been introduced. Domestic medical big data companies refer to SNOMED-CT when building their own terminology systems
LOINC	Verify related fields	There is neither an official Chinese version nor an official institutional application. Domestic companies related to medical treatment big data use LOINC as a reference for internal standard tables
UMLS	Clinical, pharmaceutical, and other information	There is neither an official Chinese version nor an official institutional application. Drug terminology and coding are mainly american-listed drugs. These can be referred to in the coding rules, but the application of content is of little value
MESH	Diagnosis	The Chinese translation version has been published, and the Chinese copyright belongs to the institute of medical information of the Chinese academy of medical sciences. At present, it is in use in only some hospitals and medical schools, and there is no official promotion. Domestic medical big data companies use MESH as the diagnostic index label for searches and queries
Anatomical therapeutic chemical (ATC)	Drugs	The official classification system of the world health organization for drugs is an integral part of world health organization drug dictionary and is mainly used in clinical trials in China
MedDRA	Adverse drug reaction	A report on the names of adverse drug reactions/adverse events used in individual case reports as standard medical terms adopted by drug regulators in the international council for harmonisation of technical requirements for pharmaceuticals for human use; it is mainly used in clinical trials in China

In summary, we believe that scientific governance of clinical data can only be achieved by integrating metadata management and master data management based on clinical practices, medical disciplines, and scientific exploration, within an information privacy security and medical law architecture. AI mining can successfully produce knowledge-based data based on this framework, which we call medical metadata governance, as depicted in Figure 1.

Figure 1.

Metamedical data governance framework. Similar to multivariate linear regression analysis, EMR data culminate in the formation of knowledge data based on clinical practices, medical disciplines, scientific exploration, metadata management, master data management, and AI mining. AI, artificial intelligence; EMR, electronic medical record.

Application of SPRAY-type AI

Although the background of informatization and the weak connectivity of medical data make it difficult to transform the clinical research on EMR data, this does not prevent the use of the “Service, Patient, Regression, A-base/Away, Yeast” (SPRAY)-style AI application in the current environment. In selecting the acronym “SPRAY” for our custom designation, we were motivated by the observation that this application paradigm aligns seamlessly with the principles of stepwise development. While the foundation is indispensable, the progression is equally crucial. This represents a quintessential example of an artificial intelligence application in medical data that evolves from a focal point to a comprehensive coverage. Even if the problems of data consistency and integrity are solved through metadata management, data standard harmonization through master data management, data security and privacy protection through information security architecture, and mining difficulties through multiple AI tools, this is only a small segment of the scientific applications that have been completed for clinical data. To fully benefit from the value of EMR data, we must exploit all that AI has to offer. SPRAY serves the informational base parts, upholding patient-centeredness, giving back to clinical research in a sustainable manner, following the three laws of robotics, and always acting based on the three quadrants of medical ethics. We can thus finally realize the value of EMR data (Figure 2).

Figure 2.

Applications of SPRAY-type AI.

Base services

Over the past two decades, China’s hospital informatization has gradually caught up with other developed countries, but it remains a constantly evolving and optimizing process that requires updates and advancements in IT, particularly with the help of the Internet of Things, cloud computing, and machine learning. With the emergence of COVID-19 and its latest variant, Omicron, there is a growing trend towards strengthening hospital infrastructure, based on the principle of being prepared. Some cities and provinces have built “XiaoTangShan”-style hospitals that combine 5G, AI, the Internet of Things, and big data to deal with outbreaks.³⁴ The launch of health codes during the pandemic has made it more effective to implement epidemic prevention management in cities. The health code uses big data from multiple sources, such as healthcare, civil aviation, railways, and geographic location information. Big data governance and statistical analysis of real data are crucial. Hospital infrastructure, metadata governance, and the use of AI, including the classification of medical big data text, entity extraction, relationship extraction, event extraction, reading comprehension, normalization, and generation techniques, as well as medical imaging classification, segmentation, registration, detection, reconstruction, generation, hyper-resolution, denoising, deblurring, depolarization, and pseudo-imaging techniques, are essential for achieving high-quality data. Structured EMRs using NLP can process most NLP data and unstructured text into structured medical records.^35,36 The convolutional neural network is used to identify medical images, including ocular fundus identification, tumor detection, tumor progression tracking, and pathologic interpretation. Emotional analysis can be used to improve patient education and doctor-patient communication through opinion mining, opinion information extraction, emotion mining, subjectivity analysis, bias analysis, emotion analysis, and comment mining on doctor-patient communication or social platforms. Heterogeneous clinical data should also be considered in the application process, with multimodal outcome exploration of clinical text data, image waveform data, and biomics data achieved through knowledge mapping or knowledge libraries in multimodal technologies and decision fusion.^37,38

Patient-centeredness

How can hospitals effectively provide patients with their medical data in a patient-centered environment? Although the storage of medical data in a hospital’s electronic business system is necessary due to informatization requirements, challenges such as misinterpretation of medical data and litigation issues can limit solutions to this problem. Therefore, maximizing the clinical, scientific, and human value of patients’ personal data is crucial in real-world research data.

In December 2016, the US Congress approved the use of “real-world evidence” in place of traditional clinical trials for expanded indications, providing further insights into real-world research.³⁹ In 2018, China published its first Real World Research Guide,⁴⁰ and in 2021, the National Drug Administration’s Drug Review Center developed “Real-World Data Guidelines (Pilot) for Generating Real-World Evidence” to guide and standardize the use of real-world data by candidates to generate real-world evidence for drug development.⁴¹ Our real-world data sources are categorized by functional type, including HIS data, Medicare payment data, enrollment research data, active monitoring of drug safety, natural population cohort data, mortality registration data, patient report outcome data from mobile devices, individual health monitoring data, and patient-specific follow-up data. Collecting real-world data has become the key direction for medical AI, which includes structured extraction from EMR systems, optical character recognition for electronic data capture, semantic analysis, and algorithmic models for clinically assisted decision-making.

Furthermore, out-of-hospital data collection requires distributed relational databases, real-time flow analysis, and the resulting model of whole-process intelligent patient services, which can comprehensively cover multiple scenarios such as pre-consultation, referral, in-hospital navigation, consultation, follow-up, and medication services to properly address the issues of “unmanaged health, unguided care, and undirected medication.” Edge-to-edge computing on IoT wearables is also essential. Only through the collection of these real-world data, combined with the application of AI algorithms in knowledge mapping, can we truly achieve patient-centered HISs and give back to patients.

Data regression

As patients serve as intermediaries between clinical and scientific research, data forms a crucial link in the entire healthcare process. The Chinese philosophy of “Yin and yang are what is called Dao” emphasizes the closed loop of “clinical-data-research-clinician,” with data discovery and regression being the key features of AI (Figure 3). AI is extensively utilized in personalized medicine, relying on various machine learning algorithms such as artificial neural networks, decision trees, random forests, and support vector machine.⁴² In addition to the integration, governance, and structuring of data, which need to be aligned with EMR, there is a pressing need to optimize scientific computing with embedded statistical analysis modules, including conventional descriptive analysis, differential analysis, and impact factor analysis, as well as advanced techniques such as logical regression, support vector machine, simple Bayes, decision trees, stochastic forest, ascending tree, K-nearest neighbors algorithm, hierarchical classification algorithms, survival algorithms, and the Gaussian mixture model, to address clinical challenges.

Figure 3.

Data links for communicating clinical and scientific research and for use in pre-diagnosis, diagnosis, and post-diagnosis. In the diagnosis and treatment global link, real-world data from patients and real-world evidence complement each other.

At the statistical analysis stage, dirty data can clearly indicate the problem and data cleansing provides a temporary solution rather than a permanent cure. The long-term remedy is to standardize data governance or EMR data entry stages and follow the rules of inversion of data acquisition based on scientific research requirements. The ideal approach involves adhering to real-world thinking in research design and data management, strictly implementing experimental design to collect data within and outside of hospitals, without limitations on collection systems or methods. As mentioned above, AI can be beneficial at every stage.

Medical AI rule: based on (bAse) doctors and not far away (away)

With the increasing popularity of medical data and the development of algorithms, AI has become a trend in healthcare.^43,44 If there is one idiom to describe the relationship between AI and doctors, it is “be neither friendly nor aloof.” AI is designed to assist doctors and improve clinical and scientific efficiency. However, given the inherent thinking of the Three Laws of Robotics and the rapid development of AI, it is generally more difficult for the physician community to accept AI interventions in medical settings and to accept that doctors cannot be replaced.^45–47 However, several studies have shown that the general public has a positive attitude toward AI technologies and applications in medical scenarios. Also, most people expected AI to completely or partially replace human doctors and have a more positive attitude toward medical AI.⁴⁸ Applications in which AI works more closely with doctors include clinical assisted decision-making, surgical robotics, and digital therapy. For medical ethics and robotics, and to strengthen the supervision and management of AI medical software products, the National Drug Administration has issued Guidelines for the Classification of Artificial Intelligence Medical Software Products.⁴⁹ Digital therapy is a mixed bag, drawing support and raising worry among experts.^50,51 In summary, it is not hard to see how AI and doctors can work together at the intersection of medical data production. We strongly agree that AI in the medical setting is primarily about assisting and improving the efficiency of doctors’ clinical research, not substituting it. There is a common understanding that AI and doctors can work together.

Three quadrant diffuse development (yeast)

In the context of information privacy security, technology and medical personnel serve as favorable guarantees for firmly establishing the framework. Specifically, medical ethics, law, and technology form the three quadrants that ensure the safe operation of AI in healthcare. Among these, medical ethics is an indispensable barrier and the core of the application of medical big data. Following the publication of the Personal Information Protection Act,⁵² the Health and Medical Data Security Guide,⁵³ and other laws and regulations, it has become necessary to train and assess the security of all participants to ensure their commitment and responsibility to complete big data-related work. Our team previously conducted an exploratory study on the intrinsic mechanism of doctors’ patient-protective behavior in Chinese public medical institutions and constructed a theoretical model framework for such behavior.³¹ At the technical level, the privacy algorithm creates bridges between data elements and data values by employing multiple secure computing, blockchain technology, homomorphic encryption, and zero knowledge protocols, including anonymity, access control models, special processing and control, and role control models based on privacy expansion, encryption, and correlation technologies for datasets. We also require the adoption of decentralized storage solutions to safeguard data security. In a scholarly contribution to the field of medical ethics, Mueller, H. et al. Have delineated a compendium of ten foundational ethical principles intended to guide the practical application of AI in medicine. These principles are as follows: Identifiability, Communicative Transparency, Accountability, Transparency and Interpretability, Comprehensibility and Reproducibility, Explanation Based on Current Scientific Theory, Non-Misleading, Legality and Non-Maleficence, Non-Discrimination, and Objective Setting, Control, and Monitoring, serving as a beacon for the responsible integration of AI into the medical field, emphasizing the importance of ethical considerations in the pursuit of technological advancement.⁵⁴ Furthermore, the ethical considerations in the application of AI within healthcare encompass critical issues such as privacy, bias, and patient consent. The implementation of AI solutions necessitates a comprehensive approach that integrates ethical, legal, and societal considerations. This approach must include a clear ethical framework, interdisciplinary collaboration, and measures to ensure transparency and interpretability, alongside robust regulatory mechanisms. Such strategies are imperative to address the multifaceted challenges posed by the integration of AI in healthcare, thereby safeguarding the well-being of patients and upholding the highest standards of medical practice.^55,56

In summary, we can realize medical ethical thinking based on the principles of autonomy, non-harm, advantage, and fairness by selecting relevant medical ethical processes such as privacy, informed consent, and data sharing at the application level. This guarantees the data application across the board and correctly guides the diffuse development of AI in big medical data. However, we can only achieve this with medical ethics at the core of a framework based on law and technology, and with three-quadrant management.

Conclusion

Hospital EMRs and related informatization supported by medical data still have some problems, such as inconsistent standards, poor integrity, lack of security and privacy protection, and difficulty in mining and application. Our “metadata governance” offers a good combination of clinical scenarios to achieve applications of medical knowledge data. In addition, by extending SPRAY-style medical AI, we propose creating more value for EMR data. In this paper, the data management of medical information in China and AI technology for transforming EMR data into scientific research achievements provide guidance and circumstantial evidence for the identification, cleaning, mining, and development of further applications of EMR data.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Projects of the Science and Technology Commission of Shanghai Municipality (STCSM) (20Y11909500 to H.L.F.).

Ethical statement

ORCID iD

Zhongzhou Xiao

Appendix

References

Tan

Liu

. System integration project management engineer course (Intermediate Level). 2nd ed. Beijing: Tsinghua University Press, 2016, p. 1.

Zhou

. Research on hospital information construction and management in the era of intelligence. Computer knowledge and technology, 2020; 2020: 8377674.

Lee

Yoon

. Application of artificial intelligence-based technologies in the healthcare industry: opportunities and challenges. Int J Environ Res Publ Health 2021; 18: 271.

Chen

P-T

Lin

C-L

W-N

. Big data management in healthcare: adoption challenges and implications. Int J Inf Manag 2020; 53: 102078.

Nie

Yao

. Problems and Development Countermeasures of hospital informatization construction. Medical information 2021; 34: 16–18.

Wang

. Application of electronic medical record in hospital information management. Information recording materials 2018; 19: 201–202.

Wang

. Research on hospital informatization construction under the background of Medical Union. Wireless Communications and Mobile Computing. 2022; 2020: 1–12.

Kopanitsa

. Integration of hospital information and clinical decision support systems to enable the reuse of electronic health record data. Methods Inf Med 2017; 56: 238–247.

Torab-Miandoab

Samad-Soltani

Jodati

, et al. Interoperability of heterogeneous health information systems: a systematic literature review. BMC Med Inf Decis Making 2023; 23: 18.

10.

Gamal

Barakat

Rezk

. Standardized electronic health record data modeling and persistence: a comparative review. J Biomed Inf 2021; 114: 103670.

11.

Yue

Zhang

, et al. Blockchain-based verification framework for data integrity in edge-cloud storage. J Parallel Distr Comput 2020; 146: 1–14.

12.

Lan

Zeng

, et al. A framework for big data governance to advance RHINs: a case study of China. IEEE Access 2019; 7: 50330.

13.

Hassan

Rehmani

Chen

. Differential privacy techniques for cyber physical systems: a survey. IEEE Communications Surveys & Tutorials 2019; 22: 746–789.

14.

Sun

Cai

, et al. Data processing and text mining technologies on electronic medical records: a review. Journal of healthcare engineering 2018; 2018: 4302425.

15.

Tian

Yang

Le Grange

, et al. Smart healthcare: making medical care more intelligent. Global Health Journal 2019; 3: 62–65.

16.

Zhang

, et al. Problems and Countermeasures of medical data governance in the context of interconnection. Beijing: Chinese Digital Medicine, 2021.

17.

Brackett

Earley

. The DAMA guide to the data management body of knowledge (DAMA-DMBOK guide). Sedona: Technics Publications, 2009.

18.

Committee NITST . Information technology service governance Part 5: data governance specification. Beijing: Chinese Standards Press, 2018.

19.

Mate

Köpcke

Toddenroth

, et al. Ontology-based data integration between clinical and research systems. PLoS One 2015; 10: e0116656.

20.

Hyppönen

Saranto

Vuokko

, et al. Impacts of structuring the electronic health record: a systematic review protocol and results of previous reviews. Int J Med Inf 2014; 83: 159–169.

21.

Mahanti

. Data governance and compliance. In: Data governance and compliance. Berlin: Springer, 2021, pp. 109–153.

22.

admin . Metadata in data governance. Mlada Boleslav: Secoda, 2012.

23.

Brewster

. Informational institutions in the agrifood sector: meta-information and meta-governance of environmental sustainability. Curr Opin Environ Sustain 2016; 18: 73–81.

24.

MeSH Database . Bethesda (MD): National library of medicine (US). California: National Center for Biotechnology Information.

25.

PML

Los

Choy

. Improving China's corporate governance within the big data era: integration of knowledge management and data governance. In: International conference on intellectual capital and knowledge management and organisational learning. Oxfordshire: Academic Conferences International Limited, 2015, p. 183.

26.

Classification of diseases, functioning, and disability. U.S. Department of health & human services. Hyattsville: National Center for Health Statistics, 2021.

27.

Gao

. Big data security and privacy protection research. Mumbai: Wireless and connected technologies, 2019, vol 14.

28.

. Security policy for Oracle databases. New technical products in China 2018: 137–138.

29.

Zheng

Song

Guo

, et al. Security testing study of B/S architecture software. Computer technology and development 2012; 22: 221–224.

30.

Liu

. Medical ethics. Hunan: Central South University Press, 2003.

31.

JLL

Xing

Shi

, et al. Theoretical approach and scale construction of doctors’ protection behavior of patients’ privacy in Chinese public medical institutions. JMIR Preprints 2022; 6(12): e39947.

32.

Zeng

Goryachev

Weiss

, et al. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inf Decis Making 2006; 6: 1–9.

33.

Tang

Jiang

. Applications and perspectives of artificial intelligence in EHR based research. Acta of the Second Military Medical University 2018; 39: 928–934.

34.

Pang

Cao

, et al. Hospital infrastructure development and Countermeasures in China under the background of ‘new infrastructure. Building economy 2020; 41: 24–29.

35.

Huang

. Research and implementation of a structured processing approach for medical semantic understanding. Chongqing: Chongqing Jiaotong University, 2019.

36.

Jiang

Guan

. Syntactic analysis fusion model for Chinese electronic medical records. Acta aut 2019; 45: 276–288.

37.

Zhai

Pan

Xie

, et al. Parallel loading algorithms supporting multimodal medical data fusion. Data Acquisition and Processing 2018; 33: 758–768.

38.

Holzinger

Haibe-Kains

Jurisica

. Why imaging data alone is not enough: AI-based integration of imaging, omics, and clinical data. Eur J Nucl Med Mol Imag 2019; 46: 2722–2730.

39.

Sherman

Anderson

Dal Pan

, et al. Real-world evidence—what is it and what can it tell us? Massachusetts: Mass Medical Soc, 2016, pp. 2293–2297.

40.

Wujieping Medical Foundation CtorcgC . Chinese real world research guidelines. 8th China oncology clinical trials Development Forum, 2018; 8: 777698.

41.

Administration CfdroND . Real world data guidelines for generating real world evidence (Trial). Nanshan: Administration CfdroND, 2021.

42.

Erickson

Korfiatis

Akkus

, et al. Machine learning for medical imaging. Radiographics 2017; 37: 505.

43.

Beam

Kohane

. Translating artificial intelligence into clinical care. JAMA 2016; 316: 2368–2369.

44.

Lancet

. Artificial intelligence in health care: within touching distance. Lancet 2017; 390: 2739.

45.

Blease

Kaptchuk

Bernstein

, et al. Artificial intelligence and the future of primary care: exploratory qualitative study of UK general practitioners’ views. J Med Internet Res 2019; 21: e12802.

46.

Kim

Choi

S-W

, et al. Physician confidence in artificial intelligence: an online mobile survey. J Med Internet Res 2019; 21: e12422.

47.

Pinto Dos Santos

Giese

Brodehl

, et al. Medical students' attitude towards artificial intelligence: a multicentre survey. Eur Radiol 2019; 29: 1640–1646.

48.

Gao

Chen

, et al. Public perception of artificial intelligence in medical care: content analysis of social media. J Med Internet Res 2020; 22: e16649.

49.

Administration

. Product classification definition guidelines for AI medical software, 2021. https://www.nmpa.gov.cn/ylqx/ylqxggtg/20210708111147171.html.

50.

Kim

H-S

. Apprehensions about excessive belief in digital therapeutics: points of concern excluding merits. J Kor Med Sci 2020; 35: e373.

51.

Food

Administration

. Enforcement policy for digital health devices for treating psychiatric disorders during the coronavirus disease 2019 (COVID-19) public health emergency. US Food and Drug Administration. 2020. https://www.regulations.gov/document/FDA-2020-D-1138-0068.

52.

Zhu

. Personal information protection act of the people's Republic of China, 2021.

53.

Council NISST . Information security technology: a health care data security guide. MandiGobindgarh: Council NISST, 2021.

54.

Muller

Mayrhofer

Veen

EBV

, et al. The ten commandments of ethical medical AI. Computer 2021; 54: 119–123.

55.

Guan

. Artificial intelligence in healthcare and medicine:Promises,Ethical challenges and governance. Chinese Journal of Medical Sciences 2019; 34: 8.

56.

Golnar

Elena

Evers

SMAA

. The ethical issues of the application of artificial intelligence in healthcare: a systematic scoping review. AI and Ethics 2022; 2: 539–551.