A rapid review on the application of common data models in healthcare: Recommendations for data governance and federated learning in artificial intelligence development

Abstract

Objective

This rapid review was undertaken to summarize contemporary knowledge on the application of common data models (CDMs) for semantic data standardization in the field of healthcare and provide a set of recommendations to guide the development of a CDM.

Methods

The review adapted the Cochrane methodological recommendations for rapid reviews, namely (1) topic refinement, (2) setting eligibility criteria, (3) searching, (4) study selection, (5) data extraction, and (6) synthesis.

Results

A total of 69 studies were included in the analysis. The analysis resulted in three interconnected layers covering (1) the federated network, (2) the iterative application process of a CDM, and (3) the data management process of each partner.

Conclusion

Development and implementation of CDMs is a collaborative and iterative process, highly affected by the boundaries set by the individual federated learning partners, and the nature of their data. Interdisciplinary collaboration in application of CDMs for federated learning and data governance of health data is mandatory, with a call to increase domain expert involvement in data management.

Keywords

Common data model federated learning data governance delivery of health care informatics standardization

Introduction

The growing availability of health data provides promising and exciting opportunities to advance healthcare through digital health technologies.¹ Appropriate health data governance practices can enhance data privacy and security, data management and linkage, data access management, and secondary data use outcomes.² Data governance entails the strategic control and regulation of data management processes, including data authority, policies, used data standards, and procedures.³

The identification of internal and external factors tied to data governance is supported by the information technology–governance of IT–governance of data (ISO/IEC 38505-1:2017) standard.⁴ Adapted to digital health research, data governance strategies can increase the value, manage the costs, and complexities as well as ensure adherence to regulations and confidentiality issues related to healthcare data.⁵ In health practice, data governance promotes ethical and professional protocols to ensure confidential, secure, and reliable patient data management.⁶ Data governance policies can also be complemented to consider more complex issues such as data justice, promoting socially just health data use and collection.⁷

Artificial intelligence (AI) is an umbrella term to describe systems functioning with a level of autonomy, processing inputs and functions from humans and machines to achieve given objectives.⁸ They may use machine learning approaches to produce predictions, classifications, recommendations, decisions, and generative outputs to influence the environment in which the system operates.⁸ AI technologies provide ample opportunities to transform approaches related to patient care and administrative processes to better support healthcare provision.⁹ Combined with the massive amounts of patient data collected in healthcare, AI and particularly machine learning hold the potential to improve and speed up care processes, improve diagnostic accuracy, optimize the delivery and timing of treatments, and further improve care quality.^10–12

Sensitive data. Using highly confidential and sensitive patient data for research purposes raises substantial ethical and legal questions and risks concerning the patient's privacy as well as data security, including cyber security.^13,14 Furthermore, training AI models using data from single sources creates a risk of biased analysis outcomes.¹⁵ These biases can be harmful to the most vulnerable patient groups, exacerbating underlying inequalities stemming from patients’ social, ethnic, cultural, racial, or gender status.¹⁶ The necessity to collect diverse large-scale data sets to support sustainable AI model development is evident.¹⁵

Federated learning is a framework to guide decentralized research collaboration, predominantly machine learning. AI models are trained locally, sharing model updates instead of collecting and combining data sets, preserving the data privacy of individual participants.^17,18 It presents a viable solution to enable development of powerful AI models and support large-scale AI-driven analytics within healthcare, increasing the potential to conduct impactful universal research on a data secure way.^17–19 Federated learning in healthcare has historically focused on tasks relevant to medical fields such as radiology and internal medicine, with limited practical clinical applications.²⁰

Benefits of federated learning approaches include a means toward preserving patient confidentiality and ensuring data security while providing a standard-based collaborative learning strategy.²¹ It appears to present a viable solution to support large-scale AI-driven analytics within healthcare, increasing the potential to conduct impactful universal research.¹⁹ However, issues related to varying data types between the federated learning partners need to be addressed to promote the development and training of AI-based models.²²

Health informatics standards are officially approved documents, developed through consensus and evidence, setting rules, guidelines, or characteristics for activities and outcomes related to health information and communications technology.⁶ They address multiple aspects of data use and exchange, covering different functional layers and levels of abstraction. Structural standards, such as data exchange standards HL7 and FHIR^®, are developed to promote interoperability and data sharing between different stakeholders and systems within healthcare ecosystems.²³

Semantic standards, such as standardized terminologies, consist of standardized terms and definitions providing a unified, discipline-specific language that can be used for data entry, storage, and use.²⁴ Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) is a globally recognized and widely adopted standardized terminology, covering a broad range of health-related topics, with concepts including clinical findings, procedures, body structures, social contexts, and clinical qualifiers.^25,26 Another widely used standardized clinical terminology, Logical Observation Identifiers Names and Codes (LOINC), provides codes for health measurements, observations, and documentation.²⁷

Compared to standardized terminologies, standardized classification systems present broader categories for data organization, such as the International Classification of Diseases (ICD)²⁴ and the Anatomical Therapeutic Chemical (ATC) classification to classify pharmacological substances.²⁸ Despite the continuous work conducted in the field of data governance, the systems, customs, structures, and terminologies used in healthcare documentation vary between institutions and geographical locations.²⁹

Common data models (CDMs) have been developed to facilitate the utilization of heterogeneous data sources in large-scale collaborative research.³⁰ They are frameworks to consistently organize and store data from different sources into large, standardized data storages, promoting cross-institutional collaboration.³⁰ Data standardization is the process of converting data from different sources to a common format to promote data sharing.³¹ It is commonly carried out using the extract, transform, and load (ETL) process that captures both the structure and the semantics of the data.³² The ETL process comprises methods used to retrieve source data (extract), convert the data to a standardized format (transform), and integrate the transformed data to the target data repository (load).³³ Semantic models are used to standardize various clinical observations and other patient data needed to conduct analytical tasks in health research.³⁴ In federated learning, CDMs are implemented to enable sharing analytical codes between institutions without a need for model modifications.³⁵

The Observational Health Data Sciences and Informatics (OHDSI) community is an ample example of an active multinational research collaboration to facilitate large-scale data analytics. The Observational Medical Outcomes Partnership (OMOP) CDM, managed by the OHDSI group, is developed to standardize observational healthcare data in structured and free-text formats.³⁶ OMOP CDM has been widely adopted in health research for the standardization of data to support AI development, especially within machine learning and natural language processing.³⁷ SNOMED CT and LOINC are the most notable health informatics standards incorporated into the OHDSI vocabularies used in the OMOP CDM.³⁶

The aim of this rapid review was in particular to summarize present contemporary knowledge on the application of CDMs for semantic data standardization in the field of healthcare and provide a set of recommendations to guide the development and utilization of CDMs in AI technology development.

Materials and methods

A rapid review was undertaken to efficiently summarize the current knowledge regarding the implementation of CDMs for semantic data standardization in healthcare research to support the CDM development for a federated learning project.³⁸ Rapid reviews are a streamlined approach of knowledge synthesis, modifying and accelerating the process of systematic reviews to provide timely results to support decision making.³⁹ The review adapted the Cochrane methodological recommendations proposed by Garritty et al., namely (1) topic refinement, (2) setting eligibility criteria, (3) searching, (4) study selection, (5) data extraction, and (6) synthesis, as presented in Figure 1.³⁹ The PRISMA guideline for reporting systematic reviews was adapted to report the results.⁴⁰

Figure 1.

Overview of the rapid review process following the Cochrane rapid reviews methods group recommendations.

Topic refinement

The topic refinement included identifying the research questions and developing a review protocol.³⁹ The review was set to answer the following research questions:

What elements have been reported on the development and use of CDMs when dealing with health data?

What considerations need to be undertaken when implementing CDMs in the development of AI-based technologies for healthcare?

A protocol was published on the OSF registries (https://osf.io) on 17.1.2024, and it can be accessed through https://doi.org/10.17605/OSF.IO/YJGBS.

Setting eligibility criteria

The eligibility criteria were set using the “PICo strategy” used commonly for qualitative reviews, with the core elements including Population (P), phenomenon of Interest (I), and Context (Co).⁴¹ The following PICo strategy was constructed within the research team to define the search phrase:

Population = all healthcare users, all healthcare settings

Intervention = semantic data standardization using a CDM

Context = all healthcare contexts, textual real-world health data

Searching

The literature search was conducted in August 2024 using the PubMed (MEDLINE) and CINAHL (Ebsco) databases that comprehensively cover biomedical and health sciences literature. No explicit date range was set, and all historic studies accessible in the databases that were published by August 6, 2024, were admitted.

To comply with the PICo strategy, the search was conducted using the search phrase “Common data model” OR ((“data harmoni*” OR “data standard*” OR “data model” OR “data interoper*”) AND (“federated learning” OR “distributed machine” OR “distributed learning” OR “decentralized learning” OR “decentralised learning” OR “collaborative learning”)) in both databases. No applicable MeSH and CINAHL subject headings were available or added to the search phrase. No filters or refinements provided by the databases were used, including but not limited to filters for the text availability, article attribute, article or source type, publication date, or article language.

Study selection

All peer reviewed original study designs describing the process of standardizing textual data from real-world healthcare settings using a CDM were included in the study. No exclusions were made regarding the study participants or setting. Only articles written in English were included. Studies were excluded, if they did not contain a clear description of the process of standardizing and transforming textual real-world health data using a CDM. For this review, health data was defined as any patient-level data created by health professionals within a healthcare system. This data includes, but is not limited to, electronic health record (EHR) data and administrative data containing patient information, observations, or outcomes.

Identified article titles and abstracts were downloaded into the Rayyan web application (www.rayyan.ai) for title and abstract screening. Duplicates were identified using the automatic detection provided by Rayyan and removed manually by comparing the identified titles and deleting the confirmed duplicates by one researcher. Two researchers then independently and manually reviewed the titles and abstracts, followed by full-text screening. The title, abstract, and full-text screenings were conducted using the “Blind On” function provided by the Rayyan web application, where the decisions, labels, or notes made by individual researchers are not visible to the others. All contradicting screening results related to inclusion or exclusion were discussed between the reviewers to reach a consensus.

Data extraction

A data extraction strategy was created to extract the relevant information from the included studies. The strategy was tested and refined using a sample (n = 10) of the included full-text articles. The following data elements were extracted and collected manually by one researcher from all the included articles using the Webropol Survey & Reporting application (webropol.com):

Manuscript details (authors, year of publication, countries, aim of the study, and setting)

Source data details (data source, sample size, data type, and data elements)

Clinical coding systems used in source and target data

Phases of the data standardization process and considerations

Clearly discussed elements of data governance

Considerations regarding the application of CDMs in federated learning tasks

Synthesis

The data were downloaded into a spreadsheet. Quantitative data were calculated manually and analyzed using descriptive analysis, and qualitative data with the content analysis methods.⁴² The elements of data governance were analyzed deductively, using the good data governance practices checklist for real-world health data (adopted from Solà-Morales et al.).² To evaluate the acceptability, quality, and integrity of data governance practices, the elements of data governance have been divided into four sections, namely

Data privacy and security, including patient consent, data de-identification, anonymization, or pseudonymization

Data management and linkage, including data standardization, source data quality, consistency, accuracy and completeness, and possible data bias

Data access management, including ethical or institutional review board grant for data access

Generation and use of real-world evidence, including minimum quality criteria for the data for secondary use purposes

Results

Overview of the included studies

A total of 890 manuscript titles were retrieved based on the database searches for this study. After removing the duplicates, 672 unique entries were included in the title and abstract screen, as illustrated in the PRISMA flowchart diagram in Figure 2. A total of 314 full-text manuscripts were screened, resulting in 69 original studies included in this review, after applying the exclusion criteria as described in Figure 2.

Figure 2.

PRISMA flowchart diagram of the article screening. Adapted from Page et al.⁴⁰

Nearly half (n = 30, 43.5%) of the studies were conducted in Europe, followed by North America (n = 21, 30.4%), Asia (n = 8, 11.6%), South America (n = 1, 1.4%), and Oceania (n = 1, 1.4%), with eight (11.6%) studies conducted as collaborative intercontinental research, as described in Figure 3.^21,^43–110 The studies were published between 2010 and 2024, with more than half (n = 36, 52.2%) of them published after 2020.

Figure 3.

Characteristics of the admitted studies. *In setting: healthcare includes data combined from various different healthcare environments not explicitly defined by the authors, other include, for example, skilled nursing facilities and memory clinics. **In data sources: data warehouse includes data from multiple sources such as electronic health records, health data registries, and biobanks.

The aim of the studies was predominantly to standardize data by applying an existing CDM (n = 48, 69.6%). Over half of the studies used data that was derived from different healthcare settings (n = 37, 53.6%) that were not explicitly described by the authors. Over half of the data (n = 40, 58.0%) used in the studies were stored in data warehouses that included multiple data sources, such as EHRs, health data registries, and biobanks, using mostly structured data (n = 50, 72.5%). The number of patients in the datasets used was reported in 43 (62.0%) studies, with numbers ranging from 100 to 88 million patients, with an average of 4.3 million patients.

The data extraction results are presented in full in the Supplemental materials.

The most used standardized coding systems reported in the source data were ICD-10 (n = 25, 36.2%), SNOMED CT (n = 13, 18.8%), and ICD-9 (n = 11, 15.9%), as presented in Figure 4. The standardized data elements were commonly related to medication and prescriptions (n = 51, 73.9%) followed by conditions and diagnoses (n = 49, 71.0%), procedures (n = 35, 50.7%), and patient demographics (n = 40, 58.0%). Other data elements (n = 31, 44.9%) included a variety of clinical data, symptoms, assessments, and risk factors. These elements were predominantly standardized using SNOMED CT (n = 34, 49.3%), RxNorm (n = 24, 34.8%), and LOINC (n = 15, 21.7%) in the CDMs.

Figure 4.

Reported standardized coding systems and data elements used in the reviewed articles (n = 69) for source and standardized data.

CDM adoption in federated learning projects

The elements related to the development and implementation of CDMs in federated learning projects were parted into three categories: (1) the federated network, (2) the development and application of CDM, and (3) the federated learning partners, as visualized in Figure 5. To facilitate successful data standardization, communication and collaboration were perceived as key in the reviewed studies.

Figure 5.

Overview of the elements related to development and application of common data models (CDM) in federated learning projects, where ETL represents the “extract, transform, and load” process and arrows indicate an iterative development process highlighting the importance of communication and collaborative work between partners in the federated network for the targeted CDM.

The federated network

The tools, strategies, analytical variables, and research questions steering the implementation of a CDM under development were discussed within the network and refined based on the feedback provided by the partners. Communication between the project partners and interdisciplinary working teams was portrayed as a key element to facilitate successful collaboration in the reviewed studies.^44,53,106 It was important that the partners shared an understanding of the research questions and roles and responsibilities for each participant.^21,62 This included exchanging knowledge and understanding the differences between the research datasets.^107,108 These differences could include a variation in used coding systems, terminologies, data elements, data structures, and even languages.^{44,45,78,93,107,108} CDMs were perceived as a facilitator for multicenter research to overcome the issues regarding heterogeneity of the source data, but their deployment required careful and consistent planning regarding, for example, clearly defined variables and data elements needed for the analysis as well as the tools used in data standardization.^44,46,74,93^104–106

Development and application of CDMs

Development and application of CDMs was driven by the needs of federated networks while also considering possibilities and constraints of the federated learning partners regarding the used coding systems, methods, and required data elements. In the reviewed studies, the majority (n = 51, 73.9%) applied or extended an existing CDM to standardize raw data, predominantly the OMOP CDM (n = 43, 62.3%). The choice of a CDM was guided by the use case of the target data and basic data elements needed for the data analysis.^{44,51,54,57,62,70,80,86,93,99,104,108} If an existing CDM was considered inadequate to meet the needs of the project, it could be extended through the implementation of new concepts or vocabularies.^{49,54,55,57,62,64,72,97}

A study-specific CDM might result in less data loss than using a standard CDM.^55,78 In total, 18 (26.1%) studies sought to resolve this issue by developing a novel CDM. A successful CDM development process required collaboration between domain experts, information system architects, and researchers.^50,75 The key data elements, relevant ontologies, and standards were identified through expert consultations, publicly available data, established procedures, or various clinical indicators.^{43,48,75,77,78,86,89}

The federated learning partners

Individual partners within the federated network were responsible for creating suitable environments as well as providing the data to facilitate data standardization. All partners were required to have access to hardware that fulfilled the memory, space, and performance requirements needed to run the processes.^44,76 The information technology infrastructure was required to promote data extraction, posing demands for the project-specific software and tools used for data storage and preparation.^44,55,77,101

The standardization process was predominantly (n = 49, 71.0%) explained through the ETL process, in which the source data was retrieved from the EHR or database, standardized to the CDM format, and loaded into a data repository. ETL was usually (n = 52, 75.4%) accompanied by various concept mapping methods. The standardization process was guided by the source data as well as the target CDM, combining automated and manual methods.

Finally, data quality and characterization checks were important to ensure the completeness and validity of the standardization process.^{43,45,48,53,55,65,71,73,94,96,101,106}

Data governance in CDM development and federated learning

A total of 13 (18.8%) of the included studies mentioned the concept of data governance. When looking at individual categories of the data governance checklist (see Materials and methods), practices related to section 1—data privacy and security were discussed in 46 studies (66.7%), as presented in Figure 6. These included addressing methods to safeguard patient identity through anonymization (n = 23) meaning permanent removal of all identifiable information, pseudonymization (n = 4) meaning replacement of identifiable information with pseudonyms or codes, and de-identification (n = 21), which can include either or both aforementioned methods. Data protection measures included discussion related to data privacy (n = 20), security (n = 17), and confidentiality (n = 6), whereas participant rights were discussed addressing informed consent (n = 14).

Figure 6.

Reporting of data governance elements guided by the data governance checklist, where IRB* refers to the institutional or ethical review boards appraising the compliance of the research with ethical and regulatory standards.

As data standardization using a CDM was one aspect of data governance in section 2—data management and linkage, all studies (n = 69, 100%) touched on subjects in this category. Other practices discussed included data quality (n = 13) in terms of data consistency, precision, coverage, and timeliness, followed by discussion related to data bias and representation (n = 9). The practices covering section 3—data access management were mentioned in 40 (58.0%) studies, mainly disclosing the approval for data use as reviewed by institutional or ethical review boards (n = 28).

Finally, practices related to section 4—generation and use of real-world evidence were referred to in 28 (40.6%) studies, as the quality assurance (n = 14) and replicability (n = 9) practices were discussed for future evidence creation.

Discussion

This rapid review presented the growing body of knowledge on standardizing health data using established and novel CDMs from a federated learning perspective. The most common feature we found was the iterative nature of the process which was highlighted through continuous collaborative efforts when developing compatible, adaptable, and user-friendly solutions. While recognizing the sensitivity and privacy concerns related to using real-world health data, the issue of data governance has received limited attention in research surrounding CDM applications.

The results of this review underline the importance of acknowledging not only the possibilities provided by the utilization of standardized data in large-scale federated learning projects, but also the boundaries set by the individual partners regarding the source data as well as the resources to conduct the standardization process. This is in line with previous literature on federated learning projects aiming to standardize data using an established CDM. Participants of the European Health Data & Evidence Network project highlighted the significance of multiprofessional teams with expert knowledge on the source data, the CDM, and the implementation of the process to facilitate successful data standardization.¹¹¹ The choice of CDM should be guided by its suitability for the intended use, resulting in a need to carefully inspect its completeness, simplicity, integration, and implementability for the project.¹¹²

Issues related to data quality were reported in some of the reviewed articles, underlining the institutional and national diversities of data governance policies and the associated complications. Previous studies have reported the dual nature of data governance in big data research, which is predominantly defined by over- or underregulation, calling for interdisciplinary roadmap development to support efficient data governance policies.¹¹³ In the European Union, relevant directives to regulate data use in AI development include the General Data Protection Regulation, the European Health Data Space, the Artificial Intelligence Act, and the Medical Device Regulation.¹¹⁴ While the key elements of data governance have been adopted in most economies, it has been reported that local governments seldom utilize collected feedback in revising the legislation.¹¹⁵ The need to develop a global framework and a golden standard for data governance and reporting in healthcare through interdisciplinary collaboration is evident to ensure ethical, consistent, and accurate data storage and use in future technology development.

The second issue related to data quality is closely tied to the healthcare professionals’ competencies and contextual understanding of documentation practices, data models, and system logic. Commonly used data structures and standards should serve a broad range of users within the healthcare domain, including physicians and nurses, affecting the primary and secondary use of generated health data. Future research is needed to investigate the AI literacy of healthcare professionals and means to develop their readiness to engage in such projects. Previous research investigating quality of healthcare documentation suggests a need to revise current documentation practices to ensure the quality, conformity, usability, and readiness of health data for secondary purposes.¹¹⁶ Comprehensive guidelines to increase the understanding and guide the use of different health informatics standards, complemented with tools to evaluate the quality of documentation, are needed to ensure the completeness and cohesion of healthcare data to comply with the technical requirements related to sustainable, safe, and ethical AI development. Increasing commitment and engagement with these guidelines as well as novel technologies through educational interventions and support from healthcare management will result in increased work efficiency, more reliable AI tools, and safer patient care.

Limitations to this study concern the nature of the rapid review, as only two (albeit the most widely used) databases were searched. Moreover, the included articles were not assessed for their scientific quality. Additionally, only articles written in English were admitted to this study, which may contribute to a bias.

Current research literature acknowledges the lack of comprehensive methodological frameworks to guide CDM development.³⁰ Additional and repeated research efforts are warranted to refine methodological principles to streamline CDM development and data standardization in healthcare research. Furthermore, reporting guidelines on CDM development and implementation are needed to certify adequate transfer of knowledge.

Conclusions

This rapid review summarizes current knowledge on the development and applications of CDMs in healthcare with a particular perspective on supporting the development and implementation of AI technologies in federated learning. Our findings

emphasize the essential role of interdisciplinary collaboration in the iterative process of development and application of CDMs for federated learning in healthcare and

highlight the importance of developing unified data governance policies to ensure safe and reliable AI development, with an urgent call to increase domain expert involvement in data management.

Healthcare professionals representing a variety of domain knowledge should actively seek engagement in interprofessional collaborations. This is deemed to increase the number of healthcare professionals with sufficient competencies, skills, knowledge, and motivation to facilitate the development of AI and the secondary use of health data, but also on interdisciplinary teamwork. The competencies and knowledge possessed by information technology professionals and researchers is vital to ensure functioning technological infrastructures and facilitate the implementation of necessary tools. Likewise, stakeholder, medical, and health professional knowledge is the key in understanding the data elements, concepts, and vocabularies as well as the clinical needs and favorable outcomes. Further, health informatics specialists representing different health domains are needed to promote a common understanding within the team, as well as enhance the ethical and regulatory knowledge guiding the process. It requires systematic strategic leadership to facilitate the organizational and national infrastructures that promote participation and motivation in interprofessional research and development initiatives. It would be of particular interest to investigate the AI literacy of health professionals and means to develop their readiness to engage in such projects, with the knowledge and the ability to engage in discussions and understand the complexities related to large-scale AI development. More education and guidance to enhance and tools to evaluate the quality of documentation are needed.

Supplemental Material

sj-xlsx-1-dhj-10.1177_20552076251395536 - Supplemental material for A rapid review on the application of common data models in healthcare: Recommendations for data governance and federated learning in artificial intelligence development

Supplemental material, sj-xlsx-1-dhj-10.1177_20552076251395536 for A rapid review on the application of common data models in healthcare: Recommendations for data governance and federated learning in artificial intelligence development by Hanna von Gerich, Taridzo Chomutare, Ville Kytö, Peter Lundberg, Troels Siggaard and Laura-Maria Peltonen in DIGITAL HEALTH

Footnotes

Acknowledgments

We extend our gratitude to Hercules Dalianis (Department of Computer and Systems Science, Stockholm University, Kista, Sweden) for his support and assistance in preparing this review.

ORCID iDs

Hanna von Gerich

Taridzo Chomutare

Ville Kytö

Peter Lundberg

Troels Siggaard

Laura-Maria Peltonen

Ethical considerations

Ethical approval was not required to conduct this review study.

Author contributions

Conceptualization: HvG, TC, PL, TS, and LMP. Data curation: HvG. Formal analysis: HvG. Investigation: HvG and L-MP. Methodology: HvG, TC, PL, TS, and L-MP. Visualization: HvG and L-MP. Writing—original draft: HvG. Writing—review and editing: HvG, TC, VK, PL, TS, and L-MP.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Nordic Innovation. Vinnova & MedTech4Health, Sweden (PL) and National Board of Health and Welfare, Sweden (PL)

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

All research data associated with this review are available in the .

Supplemental material

Supplemental material for this article is available online.

References

Feldman

Johnson

Chawla

. The state of data in healthcare: path towards standardization. J Healthc Inform Res 2018; 2: 248–271.

Solà-Morales

Sigurðardóttir

Akehurst

, et al. Data governance for real-world data management: a proposal for a checklist to support decision making. Value Health 2023; 26: 32–42.

Abraham

Schneider

Vom Brocke

. Data governance: a conceptual framework, structured review, and research agenda. Int J Inf Manage 2019; 49: 424–438.

ISO 38505-1:2017. Information technology–governance of IT–governance of data. Part 1: Application of ISO/IEC 38500 to the governance of data.

Juddoo

George

Duquenoy

, et al. Data governance in the health industry: investigating data quality dimensions within a big data context. Appl Syst Innov 2018; 1: 43.

Hussey

Kennedy

. Chapter 6: Health informatics standards. In: P

Hussey

Kennedy

(eds) Introduction to nursing informatics. 5th ed. Cham, Switzerland: Springer International Publishing, 2021, pp.139–174.

Shaw

Sekalala

. Health data justice: building new norms for health data governance. NPJ Digit Med 2023; 6: 30.

The European Parliament and the Council of the European Union. Proposal for a regulation of the European Parliament and of the Council. Laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain union. Document 52021PC0206, 21 April 2021. Brussels, Belgium.

Rony

MKK

Parvin

Ferdousi

. Advancing nursing practice with artificial intelligence: enhancing preparedness for the future. Nurs Open 2024; 11: 10.1002/nop2.2070.

10.

ZQP

Ling

LYJ

Chew

HSJ

, et al. The role of artificial intelligence in enhancing clinical nursing care: a scoping review. J Nurs Manag 2022; 30: 3654–3674.

11.

Shehab

Abualigah

Shambour

, et al. Machine learning in medical applications: a review of state-of-the-art methods. Comput Biol Med 2022; 145: 105458.

12.

Buccheri

Dell'Aquila

Russo

, et al.

Can artificial intelligence simplify the screening of muscle mass loss?

Heliyon 2023; 9: e16323.

13.

Carter-Templeton

Nicoll

Wrigley

, et al. Big data in nursing: a bibliometric analysis. Online J Issues Nurs 2021; 26: 2.

14.

Shah

Khan

. Secondary use of electronic health record: opportunities and challenges. IEEE Access 2020; 8: 136947–136965.

15.

Wang

Preininger

. AI in health: state of the art, challenges, and future directions. Yearb Med Inform 2019; 28: 16–26.

16.

Mittermaier

Raza

Kvedar

. Bias in AI-based models for medical applications: challenges and mitigation strategies. NPJ Digit Med 2023; 6: 113.

17.

Sharma

Guleria

. A comprehensive review on federated learning based models for healthcare applications. Artif Intell Med 2023; 146: 102691.

18.

Topaloglu

Morrell

Rajendran

, et al. In the pursuit of privacy: the promises and predicaments of federated learning in healthcare. Front Artif Intell 2021; 4: 746497.

19.

Rieke

Hancox

, et al. The future of digital health with federated learning. NPJ Digit Med 2020; 3: 119.

20.

Teo

Jin

Liu

, et al. Federated machine learning in healthcare: a systematic review on clinical applications and technical architecture. Cell Rep Med 2024; 5: 101419.

21.

Jefferson

Cole

Mumtaz

, et al. A hybrid architecture (CO-CONNECT) to facilitate rapid discovery and access to data across the United Kingdom in response to the COVID-19 pandemic: development study. J Med Internet Res 2022; 24: e40035.

22.

Cremonesi

Planat

Kalokyri

, et al. The need for multimodal health data modeling: a practical approach for a federated-learning healthcare platform. J Biomed Inform 2023; 141: 104338.

23.

HL7 International. Introduction to HL7 Standards. https://www.hl7.org/implement/standards/index.cfm?ref=nav (2025, accessed 14 May 2025).

24.

De Quirós

FGB

Otero

Luna

. Terminology services: standard terminologies to control health vocabulary. Yearb Med Inform 2018; 27: 227–233.

25.

National Library of Medicine. Health information technology and health data standards at NLM. https://www.nlm.nih.gov/healthit/index.html (2025, accessed 22 July 2025).

26.

SNOMED International. SNOMED International. https://www.snomed.org (2025, accessed 22 July 2025).

27.

LOINC. The international standard for identifying health measurements, observations, and documents. https://loinc.org/ (2025, accessed 22 July 2025).

28.

World Health Organization. Anatomical Therapeutic Chemical (ATC) classification. https://www.who.int/tools/atc-ddd-toolkit/atc-classification (2025, accessed 22 July 2025).

29.

De Groot

Triemstra

Paans

, et al. Quality criteria, instruments, and requirements for nursing documentation: a systematic review of systematic reviews. J Adv Nurs 2019; 75: 1379–1393.

30.

Ahmadi

Zoch

Kelbert

, et al. Methods used in the development of common data models for health data: scoping review. JMIR Med Inform 2023; 11: e45116.

31.

Gliklich

Leavy

. Data standards. In: RE

Gliklich

Leavy

Dreyer

(eds) Tools and technologies for registry interoperability, registries for evaluating patient outcomes: a user’s guide. 3rd ed, 3. Rockville, MD: Agency for Healthcare Research and Quality (US), 2019; Chapter 3.

32.

Quiroz

Chard

, et al. Extract, transform, load framework for the conversion of health databases to OMOP. PLoS One 2022; 17: e0266911.

33.

Denney

Long

Armistead

, et al. Validating the extract, transform, load process used to populate a large clinical research database. Int J Med Inform 2016; 94: 271–274.

34.

Weeks

Pardee

. Learning to share health care data: a brief timeline of influential common data models and distributed health data networks in U.S. health care research. EGEMS (Wash DC) 2019; 7: 4.

35.

You

Lee

Choi

, et al. Establishment of an international evidence sharing network through common data model for cardiovascular research. Korean Circ J 2022; 52: 853–864.

36.

Observational Health Data Sciences and Informatics. OMOP common data model. https://ohdsi.github.io/CommonDataModel/ (2024, accessed 22 July 2025).

37.

Reinecke

Zoch

Reich

, et al. The usage of OHDSI OMOP—a scoping review. Stud Health Technol Inform 2021; 283: 95–103.

38.

Smela

Toumi

Świerk

, et al. Rapid literature review: definition and methodology. J Mark Access Health Policy 2023; 1: 2241234.

39.

Garritty

Gartlehner

Nussbaumer-Streit

, et al. Cochrane rapid reviews methods group offers evidence-informed guidance to conduct rapid reviews. J Clin Epidemiol 2021; 130: 13–22.

40.

Page

McKenzie

Bossuyt

, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. PLoS Med 2021; 18: e1003583.

41.

Stern

Jordan

McArthur

. Developing the review question and inclusion criteria. Am J Nurs 2014; 114: 53–56.

42.

Elo

Kyngäs

. The qualitative content analysis process. J Adv Nurs 2008; 62: 107–115.

43.

Abad-Navarro

Martínez-Costa

. A knowledge graph-based data harmonization framework for secondary data reuse. Comput Methods Programs Biomed 2024; 243: 107918.

44.

Abbasizanjani

Torabi

Bedston

, et al. Harmonising electronic health records for reproducible research: challenges, solutions and recommendations from a UK-wide COVID-19 research collaboration. BMC Med Inform Decis Mak 2023; 23: 8.

45.

Abtahi

Pajouheshnia

Durán

, et al. Impact of 2018 EU risk minimisation measures and revised pregnancy prevention programme on utilisation and prescribing trends of medicinal products containing valproate: an interrupted time series study. Drug Saf 2023; 46: 689–702.

46.

Almeida

Silva

Matos

, et al. A two-stage workflow to extract and harmonize drug mentions from clinical notes into observational databases. J Biomed Inform 2021; 120: 103849.

47.

Carus

Trübe

Szczepanski

, et al. Mapping the oncological basis dataset to the standardized vocabularies of a common data model: a feasibility study. Cancers (Basel) 2023; 15: 4059.

48.

Denaxas

George

Herrett

, et al. Data resource profile: cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER). Int J Epidemiol 2012; 41: 1625–1638.

49.

Fischer

Stöhr

Gall

, et al. Data integration into OMOP CDM for heterogeneous clinical data collections via HL7 FHIR bundles and XSLT. Stud Health Technol Inform 2020; 270: 138–142.

50.

Harris

Shi

Brealey

, et al. Critical care health informatics collaborative (CCHIC): data, tools and methods for reproducible research: a multi-centre UK intensive care database. Int J Med Inform 2018; 112: 82–89.

51.

Hechtel

Apfel-Starke

Köhler

, et al. Harmonisation of German health care data using the OMOP common data model—a practice report. Stud Health Technol Inform 2023; 305: 287–290.

52.

Kilintzis

Kalokyri

Kondylakis

, et al. Public data homogenization for AI model development in breast cancer. Eur Radiol Exp 2024; 8: 42.

53.

Kroes

Bansal

Berret

, et al. Blueprint for harmonising unstandardised disease registries to allow federated data analysis: prepare for the future. ERJ Open Res 2022; 8: 00168–02022.

54.

Lamer

Abou-Arab

Bourgeois

, et al. Transforming anesthesia data into the observational medical outcomes partnership common data model: development and usability study. J Med Internet Res 2021; 23: e29259.

55.

Lamer

Depas

Doutreligne

, et al. Transforming French electronic health records into the observational medical outcome partnership's common data model: a feasibility study. Appl Clin Inform 2020; 11: 13–22.

56.

Lamer

Moussa

Marcilly

, et al. Development and usage of an anesthesia data warehouse: lessons learnt from a 10-year project. J Clin Monit Comput 2023; 37: 461–472.

57.

Maier

Lang

Storf

, et al. Towards implementation of OMOP in a German university hospital consortium. Appl Clin Inform 2018; 9: 54–61.

58.

Martinez-Costa

Abad-Navarro

. Towards a semantic data harmonization federated infrastructure. Stud Health Technol Inform 2021; 281: 38–42.

59.

Matcho

Ryan

Fife

, et al. Fidelity assessment of a clinical practice research datalink conversion to the OMOP common data model. Drug Saf 2014; 37: 945–959.

60.

Mateus

Moonen

Beran

, et al. Data harmonization and federated learning for multi-cohort dementia research using the OMOP common data model: a Netherlands consortium of dementia cohorts case study. J Biomed Inform 2024; 155: 104661.

61.

Mayer

. Conversion of CPRD AURUM data into the OMOP common data model. Inform Med Unlocked 2023; 43: 101407.

62.

Meurisse

Estupiñán-Romero

González-Galindo

, et al. Federated causal inference based on real-world observational data sources: application to a SARS-CoV-2 vaccine effectiveness assessment. BMC Med Res Methodol 2023; 23: 248.

63.

Oja

Tamm

Mooses

, et al. Transforming Estonian health data to the observational medical outcomes partnership (OMOP) common data model: lessons learned. JAMIA Open 2023; 6: ooad100.

64.

Pacaci

Gonul

Sinaci

, et al. A semantic transformation methodology for the secondary use of observational healthcare data in postmarketing safety studies. Front Pharmacol 2018; 9: 435.

65.

Papez

Moinat

Payralbe

, et al. Transforming and evaluating electronic health record disease phenotyping algorithms using the OMOP common data model: a case study in heart failure. JAMIA Open 2021; 4: ooab001.

66.

Paris

Lamer

Parrot

. Transformation and evaluation of the MIMIC database in the OMOP common data model: development and usability study. JMIR Med Inform 2021; 9: e30970.

67.

Park

You

Jeong

, et al. A framework (SOCRATex) for hierarchical annotation of unstructured electronic health records and integration into a standardized medical database: development and usability study. JMIR Med Inform 2021; 9: e23983.

68.

Reinecke

Zoch

Wilhelm

, et al. Transfer of clinical drug data to a research infrastructure on OMOP—a FAIR concept. Stud Health Technol Inform 2021; 287: 63–67.

69.

Unberath

Prokosch

Gründner

, et al. EHR-independent predictive decision support architecture based on OMOP. Appl Clin Inform 2020; 11: 399–404.

70.

Wegner

Jose

Lage-Rupprecht

, et al. Common data model for COVID-19 datasets. Bioinformatics 2022; 38: 5466–5468.

71.

Zhou

Murugesan

Bhullar

, et al. An evaluation of the THIN database in the OMOP common data model for active drug safety surveillance. Drug Saf 2013; 36: 119–134.

72.

Belenkaya

Gurley

Golozar

, et al. Extending the OMOP common data model and standardized vocabularies to support observational cancer research. JCO Clin Cancer Inform 2021; 5: 12–20.

73.

Boyce

Handler

Karp

, et al. Preparing nursing home data from multiple sites for clinical research—a case study using observational health data sciences and informatics. EGEMS (Wash DC) 2016; 4: 1252.

74.

Bradwell

Wooldridge

Amor

, et al. Harmonizing units and values of quantitative data elements in a very large nationally pooled electronic health record (EHR) dataset. J Am Med Inform Assoc 2022; 29: 1172–1182.

75.

Deakyne Davies

Grundmeier

Campos

, et al. The pediatric emergency care applied research network registry: a multicenter electronic health record registry of pediatric emergency care. Appl Clin Inform 2018; 9: 366–376.

76.

FitzHenry

Resnic

Robbins

, et al. Creating a common data model for comparative effectiveness with the observational medical outcomes partnership. Appl Clin Inform 2015; 6: 536–547.

77.

Hornik

Atz

Bendel

, et al. Creation of a multicenter pediatric inpatient data repository derived from electronic health records. Appl Clin Inform 2019; 10: 307–315.

78.

Kumar

Arnold

James

, et al. Developing a common data model approach for DISCOVER CKD: a retrospective, global cohort of real-world patients with chronic kidney disease. PLoS One 2022; 17: e0274131.

79.

Makadia

Ryan

. Transforming the premier perspective hospital database into the observational medical outcomes partnership (OMOP) common data model. EGEMS (Wash DC) 2014; 2: 1110.

80.

Ogunyemi

Meeker

Kim

, et al. Identifying appropriate reference data models for comparative effectiveness research (CER) studies based on data from clinical information systems. Med Care 2013; 51: S45–S52.

81.

Ong

Kahn

Kwan

, et al. Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading. BMC Med Inform Decis Mak 2017; 17: 134.

82.

Pfaff

Champion

Bradford

, et al. Fast healthcare interoperability resources (FHIR) as a meta model to integrate common data models: development of a tool and quantitative validation study. JMIR Med Inform 2019; 7: e15199.

83.

Phuong

Zampino

Dobbins

, et al. Extracting patient-level social determinants of health into the OMOP common data model. AMIA Annu Symp Proc 2022; 2021: 989–998.

84.

Post

Kurc

Cholleti

, et al. The analytic information warehouse (AIW): a platform for analytics using electronic health record data. J Biomed Inform 2013; 46: 410–424.

85.

Reimer

Milinovich

. Using UMLS for electronic health data standardization and database design. J Am Med Inform Assoc 2020; 27: 1520–1528.

86.

Reisinger

Ryan

O'Hara

, et al. Development and evaluation of a common data model enabling active drug safety surveillance using disparate healthcare databases. J Am Med Inform Assoc 2010; 17: 652–662.

87.

Sharma

Mao

Zhang

, et al. Developing a portable natural language processing based phenotyping system. BMC Med Inform Decis Mak 2019; 19: 78.

88.

Weng

. An OMOP CDM-based relational database of clinical research eligibility criteria. Stud Health Technol Inform 2017; 245: 950–954.

89.

Smeltzer

Reeves

Cooper

, et al. Common data model for sickle cell disease surveillance: considerations and implications. JAMIA Open 2023; 6: ooad036.

90.

Voss

Makadia

Matcho

, et al. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J Am Med Inform Assoc 2015; 22: 553–564.

91.

Ruddy

Hong

, et al. ADEpedia-on-OHDSI: a next generation pharmacovigilance signal detection platform using the OHDSI common data model. J Biomed Inform 2019; 91: 103119.

92.

Ruddy

Mansfield

, et al. Detecting and filtering immune-related adverse events signal based on text mining and observational health data sciences and informatics common data model: framework development study. JMIR Med Inform 2020; 8: e17353.

93.

Ahn

Kwon

, et al. Cardionet: a manually curated database for artificial intelligence-based research on cardiovascular diseases. BMC Med Inform Decis Mak 2021; 21: 29.

94.

Alnofal

Alrwisan

Alshammari

. Real-world data in Saudi Arabia: current situation and challenges for regulatory decision-making. Pharmacoepidemiol Drug Saf 2020; 29: 1303–1306.

95.

Choe

Lee

Park

, et al. Development and application of an active pharmacovigilance framework based on electronic healthcare records from multiple centers in Korea. Drug Saf 2023; 46: 647–660.

96.

Jung

Yoo

Kim

, et al. Patient-level fall risk prediction using the observational medical outcomes partnership's common data model: pilot feasibility study. JMIR Med Inform 2022; 10: e35104.

97.

Sathappan

SMK

Jeon

Dang

, et al. Transformation of electronic health records and questionnaire data to OMOP CDM: a feasibility study using SG_T2DM dataset. Appl Clin Inform 2021; 12: 757–767.

98.

Tan

Teo

DCH

Lee

, et al. Applying the OMOP common data model to facilitate benefit-risk assessments of medicinal products using real-world data from Singapore and South Korea. Healthc Inform Res 2022; 28: 112–122.

99.

Yoon

Ahn

Park

, et al. Conversion and data quality assessment of electronic health record data at a Korean tertiary teaching hospital to a common data model for distributed network research. Healthc Inform Res 2016; 22: 54–58.

100.

Zhang

Wang

Miao

, et al. Analysis of treatment pathways for three chronic diseases using OMOP CDM. J Med Syst 2018; 42: 260.

101.

Lima

Rodrigues

Jr Traina

AJM

, et al. Transforming two decades of ePR data to OMOP CDM for clinical research. Stud Health Technol Inform 2019; 264: 233–237.

102.

Ward

Hallinan

Ormiston-Smith

, et al. The OMOP common data model in Australian primary care data: building a quality research ready harmonised dataset. PLoS One 2024; 19: e0301557.

103.

Brauer

Wong

ICK

Man

, et al. Application of a common data model (CDM) to rank the paediatric user and prescription prevalence of 15 different drug classes in South Korea, Hong Kong, Taiwan, Japan and Australia: an observational, descriptive study. BMJ Open 2020; 10: e032426.

104.

Cheeseman

Levick

Sopwith

, et al. Ovarian real-world international consortium (ORWIC): a multicentre, real-world analysis of epithelial ovarian cancer treatment and outcomes. Front Oncol 2023; 13: 1114435.

105.

Hong

Zhang

, et al. Preliminary exploration of survival analysis using the OHDSI common data model: a case study of intrahepatic cholangiocarcinoma. BMC Med Inform Decis Mak 2018; 18: 116.

106.

Junior

EPP

Normando

Flores-Ortiz

, et al. Integrating real-world data from Brazil and Pakistan into the OMOP common data model and standardized health analytics framework to characterize COVID-19 in the Global South. J Am Med Inform Assoc 2023; 30: 643–655.

107.

Lai

Ryan

Zhang

, et al. Applying a common data model to Asian databases for multinational pharmacoepidemiologic studies: opportunities and challenges. Clin Epidemiol 2018; 10: 875–885.

108.

Otero Varela

Le Pogam

Metcalfe

, et al. Empowering knowledge generation through international data network: the IMeCCHI-DATANETWORK. Int J Popul Data Sci 2020; 5: 1125.

109.

Papez

Moinat

Voss

, et al. Transforming and evaluating the UK Biobank to the OMOP common data model for COVID-19 research and beyond. J Am Med Inform Assoc 2022; 30: 103–111.

110.

Park

Lee

Cho

. Enhancing healthcare process analysis through object-centric process mining: transforming OMOP common data models into object-centric event logs. J Biomed Inform 2024; 156: 104682.

111.

Voss

Blacketer

van Sandijk

, et al. European health data & evidence network-learnings from building out a standardized international health data network. J Am Med Inform Assoc 2023; 31: 209–219.

112.

Garza

Del Fiol

Tenenbaum

, et al. Evaluating common data models for use with a longitudinal community registry. J Biomed Inform 2016; 64: 333–341.

113.

Tse

Chow

, et al.

The challenges of big data governance in healthcare. In:

17th IEEE international conference on trust, security and privacy in computing and communications/12th IEEE international conference on big data science and engineering (TrustCom/BigDataSE), New York, United States, 31 July–3 August 2018, 2018, pp.1632–1636. New York: IEEE.

114.

European Union. Artificial intelligence in healthcare. https://health.ec.europa.eu/ehealth-digital-health-and-care/artificial-intelligence-healthcare_en (2025, accessed 22 June 2025).

115.

Struett

Aaronson

Zable

Global data governance mapping project. Year 4. Digital Trade & Data Governance Hub. https://globaldatagovernancemapping.org (2024, accessed 22 July 2025).

116.

Ebbers

Kool

Smeele

, et al. The impact of structured and standardized documentation on documentation quality; a multicenter, retrospective study. J Med Syst 2022; 46: 46.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.06 MB