Privacy-by-Design with Federated Learning will drive future Rare Disease Research

Abstract

Up to 6% of the global population is estimated to be affected by one of about 10,000 distinct rare diseases (RDs). RDs are, to this day, often not understood, and thus, patients are heavily underserved. Most RD studies are chronically underfunded, and research faces inherent difficulties in analyzing scarce data. Furthermore, the creation and analysis of representative datasets are often constrained by stringent data protection regulations, such as the EU General Data Protection Regulation. This review examines the potential of federated learning (FL) as a privacy-by-design approach to training machine learning on distributed datasets while ensuring data privacy by maintaining the local patient data and only sharing model parameters, which is particularly beneficial in the context of sensitive data that cannot be collected in a centralized manner. FL enhances model accuracy by leveraging diverse datasets without compromising data privacy. This is particularly relevant in rare diseases, where heterogeneity and small sample sizes impede the development of robust models. FL further has the potential to enable the discovery of novel biomarkers, enhance patient stratification, and facilitate the development of personalized treatment plans. This review illustrates how FL can facilitate large-scale, cross-institutional collaboration, thereby enabling the development of more accurate and generalizable models for improved diagnosis and treatment of rare diseases. However, challenges such as non-independently distributed data and significant computational and bandwidth requirements still need to be addressed. Future research must focus on applying FL technology for rare disease datasets while exploring standardized protocols for cross-border collaborations that can ultimately pave the way for a new era of privacy-preserving and distributed data-driven rare disease research.

Keywords

rare diseases federated learning data protection artificial intelligence personalized medicine

Introduction

Rare diseases (RDs) are complex health conditions characterized by their low prevalence in the general population. RDs affect between 1 in 2000 to 1 in 200,000 people, with changing definitions depending on the region.^1,2 A disease is classified as an ultra-rare disease (URD) if it affects less than 1 in 50,000 individuals.³ Despite their low individual prevalence, collectively, RDs impact approximately 3.5% to 5.9% of the world's population.^1,2,4 Their phenotypes are frequently chronic, progressive, and potentially life-threatening,⁵ presenting unique challenges in diagnosis, treatment, and research compared to more common diseases.⁶ This results in RD patients being chronically underserved due to limited advancements in RD management. For example, the accuracy of RD diagnosis, even when molecular data is considered, is only 50%.⁷ Improvement is hampered by limited expertise, lack of research funding, and too small patient populations for clinical trials.^8,9 However, RD research can reveal fundamental insights into biological pathways and genetic mutations, leading to broader scientific advancements that benefit rare and common conditions alike.¹⁰

One inherent roadblock in RD research is the limited availability of comprehensive datasets that can facilitate the development of precise diagnostic tools and personalized treatment strategies.¹¹ To address this challenge, several crucial repositories for rare disease (RD) datasets have been established, including the Global Rare Diseases Registry and Repository,¹² Orphadata,¹³ DECIPHER,¹⁴ PhenomeCentral.¹⁵ These siloed data sources can be obstacles to discovering various causes of considering the spectrum of overall RDs. To tackle this issue, the Matchmaker Exchange Application Program Interface¹⁶ initiative has been developed, which uses a common data-sharing protocol to enable seamless searches and interactions across multiple databases while allowing each to maintain its own data organization schema. However, despite these valuable resources, the vast diversity of RDs necessitates even more comprehensive open-source datasets that include various genetic, phenotypic, and clinical information. The existing repositories, while important, do not contain all the estimated 10,000 RDs, especially newly discovered ones, covering around 6000–7000 each.^1,17 Hence, collaboration among different hospitals or clinics to share their patient data for specific rare diseases is essential for advancing research in this field. It highlights the need for additional data sources to fill gaps and ensure robust research that can enhance accuracy and improve patient outcomes.

Artificial intelligence (AI) has demonstrated superior performance over traditional statistical methods in processing complex datasets essential for RD research.¹⁸ A subcategory of AI is Machine learning (ML), which describes the ability of algorithms to extract correlations, distributions, probabilities, or other metrics from data that have predictive or classifying value.¹⁹ AI models and its category of deep learning (DL) models turned out to be essential tools in enhancing our understanding of RDs by leveraging neural network structures to learn feature correlations,²⁰ integrating multimodal data sources, and enabling comprehensive analysis.^21,22 These methodologies can facilitate personalized medicine development by identifying patterns that correlate with treatment success, which benefits RDs with limited treatment options and variable patient responses.^23–25 AI-driven biomarker discovery already facilitates disease subtyping, patient stratification, and the identification of therapeutic targets.^26,27 In this way, AI contributes to developing novel interventions and drug repurposing strategies for many diseases.^28–30

While AI also offers transformative potential for RD research, the methods used to train these models present distinct challenges in balancing data privacy, model performance, and generalizability. Common approaches to model training come with specific limitations. Local training within one clinic (Figure 1A) is privacy-friendly but often results in models that don't generalize well due to limited and potentially biased datasets.³¹ Centralized learning (Figure 1B) involves data collection from multiple sources, potentially balancing the data and improving model accuracy. However, it encounters significant administrative and patient privacy-related barriers as, in many cases, patient data cannot be freely shared between collaboration partners. This limitation arises from concerns regarding patient privacy, compliance with legal regulations, preventing unauthorized access, and maintaining data integrity.³² In recent years alone, millions of clinical records have been affected by data breaches,³³ leading to patient stigmatization,^34,35 even though data protection is always a critical concern when collecting medical data.^36,37 Consequently, strict data protection regulations, like the EU's General Data Protection Regulation (GDPR), are crucial and indispensable to ensuring patient privacy.³⁸ However, data protection laws further increase the hurdles for collaborative research projects, rendering centralized data collection more challenging due to administrative overhead. Therefore, innovative approaches that balance the need for comprehensive datasets with stringent privacy requirements are crucial, especially in the context of RDs.^28,39

Figure 1.

AI approaches for Rare Disease Research and Treatment—an overview of data processing methods in machine learning and specific applications and areas in healthcare. There are generally three approaches to analyzing patient data with AI techniques: local, centralized, and federated learning. Patients suffering from a rare disease are highlighted. In non-collaborative research scenarios, institutions limit their model training to locally available data (A). In comparison, centralized learning (B) collects data from all patient groups centrally. Federated learning (C) combines the advantages of the two previous approaches. Local datasets are used to create local models, and their parameters are then aggregated into a global model. (D) shows possible applications of the global model that offer improved performance while protecting patient data by applying the FL approach.

Federated learning (FL) has emerged as a promising solution to overcome the challenges associated with centralized AI models (Figure 1C).⁴⁰ In FL, the local datasets of participating clients are used to create local AI models. After each local training round, the parameters of these models are aggregated to create a global model without the need to share sensitive data.⁴¹ After multiple training rounds, this approach results in a model trained on all local datasets, not communicating any patient data, to reduce data bias and provide high-accuracy models.⁴² Crucially, as the data remains within each institution, FL ensures privacy-preserving training of distributed datasets while maintaining data security.^36,43 This characteristic is particularly relevant for RDs, where the number of individual cases in a single institution for a given condition is typically very small,^44,45 and federated collaborative research opens the gates for previously unthinkable amounts of data without compromising privacy.^28,46 This way, FL has the potential to advance RD research and treatment in an era of stringent data protection requirements.^28,37,47

This review examines AI's contribution to healthcare to highlight its possibilities, with a specific focus on FL's role and potential in RD research. We assess AI's current state, limitations, and applications in healthcare, particularly for RDs, and discuss how FL can address these limitations, highlighting existing FL applications in healthcare. Additionally, we explore how FL can advance our understanding of RDs while addressing data privacy concerns.

Main

Artificial intelligence in healthcare

AI systems, specifically ML and DL approaches are revolutionizing healthcare, offering unprecedented opportunities to enhance patient care and clinical efficiency. From advanced diagnostic tools to operational automation, AI is reshaping how healthcare challenges are addressed. Integrating AI technologies promises significant improvements in early disease detection, personalized treatment plans, and overall patient management. As the healthcare sector continues to generate vast amounts of data, AI's role in processing and deriving actionable insights becomes increasingly crucial.

State of the art and current solutions

Approximately 10% of global healthcare expenditure is currently attributable to fraud and abuse, which can be reduced through AI.⁴⁸ Apart from means of protection, AI has demonstrated remarkable progress in healthcare applications, consistently outperforming traditional statistical methods in processing large and complex datasets.^18,49 As computer vision is a highly advanced field within AI, medical imaging and radiology have been at the forefront of AI adoption in healthcare.^50,51 New means of analysis were required, as the increasing speed and resolution of medical imaging devices, producing more and higher-quality output, has led to a significant rise in the workload for healthcare professionals, who are experiencing difficulties in keeping up with processing all the data.^48–50 AI systems can process the growing volume and complexity of imaging data, providing faster and more precise analyses that complement or even outperform the work of healthcare professionals. The early detection of lung cancer is one area where computer vision has shown potential, facilitating the use of imaging techniques such as computed tomography or X-ray imaging.⁵² On the other hand, analysis of molecular-level biological data enables more accurate disease diagnoses and deepens our understanding of disease mechanisms.⁵³ This is exemplified by using AI models to identify diagnostically and prognostically relevant correlations between genetic variants and cytomorphological changes in myelodysplastic syndrome, with the long-term goal of achieving disease classification only based on genomic data.⁵⁴ Recently, integrating multimodal data sources enables comprehensive analysis to enhance our understanding of diseases further, allowing the joint integration of genomics, imaging, and clinical records into one model.^21,22,55,56 Such models are used in various contexts. For example, Unlearn.AI uses digital twin models that validate clinical studies on Alzheimer's disease and multiple sclerosis by simulating subjects with the same clinical attributes as the actual subjects.^22,57,58 The used data include a variety of clinical and demographic covariates.⁵⁸ All these systems provide decision support to healthcare specialists, enabling them to analyze and interpret larger amounts of data in a significantly shorter time and more accurately.^24,25

Even more possibilities arise from the recent emergence of large language models (LLMs) like OpenAI's GPT, Meta's LLama, or Google's Gemini.^59–61 Their potential to enhance patient education and support, given a diagnosis or treatment, is already evident.^62,63 Also, specialized LLMs for healthcare professionals, such as Med-PaLM,⁶⁴ have emerged, offering significant promise for further advancement. Possible application areas include LLM-driven assistants capable of comprehensive analysis and contextualization of a patient's medical history or electronic health records (EHRs), which could greatly benefit physicians.⁶⁴ In addition, EHRs and other unstructured data can now be analyzed more easily and categorized, expanding their role as information sources in healthcare applications.⁶⁴ This includes the extraction of medical relationships from doctoral notes in EHRs, such as the identification of drug-side effect relationships, as well as the resolution of medical questions.⁶⁴

In drug development and repurposing, AI allows for the identification of new target molecules, potentially leading to more effective drugs.^22,23 By analyzing the molecular structure of existing drugs, drug discovery processes can be accelerated, particularly for immune-related genes, enabling the identification of off-targets or repurposing potential.^26,27 For instance, in 2019, Insilico Medicine used AI to design and synthesize a novel drug candidate for fibrosis in 21 days and validate it in just 46 days, a process that traditionally takes years.^65,66 Additionally, the introduction and improvement of DeepMind's AlphaFold has revolutionized protein structure prediction, a critical component of drug research, by achieving near-experimental accuracy in determining protein structures from amino acid sequences^67,68

As the healthcare industry continues to shift towards more patient-centric and technology-driven care models, remote healthcare delivery has become an increasingly important aspect. The field of telemedicine, which provides clinical services remotely, is of increasing importance in healthcare due to its capacity to improve access to medical services and reduce costs for both patients and providers.^69,70 Integrating AI into telemedicine has significantly enhanced its capabilities by improving diagnostic accuracy, enabling timely virtual triage, and providing personalized treatment recommendations.⁷⁰ These improvements are achieved by utilizing the knowledge and expertise distributed across the data on which the model is based.⁷¹ It has also proven invaluable during public health crises and for individuals in underserved areas or with limited mobility by enabling remote consultations, monitoring, and treatment.^69,72,73 A prime example of a successful telemedicine platform is Teladoc Health, which conducted up to 20,000 virtual visits per day during the COVID-19 crisis,⁷⁴ significantly reducing exposure risks and ensuring continuous care for patients, particularly in underserved regions, serving the demand for telemedical solutions at the time.^75,76

These diverse applications demonstrate the broad impact of AI across healthcare domains, from molecular-level research to direct patient care. As AI technologies evolve, they promise to transform healthcare practices further, potentially leading to more precise, efficient, and personalized medical care.

Limitations of AI in medical applications

The advancement of precise AI models in the healthcare sector is confronted with many considerable obstacles, the most pressing being the need for extensive and uniform datasets.⁷⁷ As data harmonization is often challenging due to data scarcity and lack of homogeneity, the power of AI stays limited, particularly in the context of RDs, where data is scarce and fragmented.^1,78 These issues led researchers to propose generating artificial time series data for electrocardiograms (ECG) and electroencephalograms due to the significant challenges posed by data scarcity and class imbalance in research.⁷⁹

On the other hand, a global trend towards data protection is evident, with more countries having enacted or preparing to enact relevant legislation,⁸⁰ all while RD patients are reported to be comparably open for data sharing yet scared of data misuse.^81,82 Additionally, the EU AI Act introduces stringent regulations that could limit the application of AI in the medical field. It considers AI systems in healthcare as potentially high-risk as it can assess individuals’ health or profile patients.⁸³ This classification may lead to increased compliance requirements such as extensive risk management, data quality assurance, transparency measures, human oversight, and robustness checks, increasing the time, complexity, and overall cost of developing and deploying AI in healthcare.⁸³ However, to support organizations in preparing for these regulations, the European Commission has introduced the AI Pact, encouraging voluntary early compliance with the AI Act's requirements.⁸⁴

Conversely, even when comprehensive data protection measures are implemented, and centralized data aggregation is feasible, enforcing data quality and standardization remains a significant challenge.⁸⁵ This is due to the lack of consensus regarding evaluating various dimensions of data quality and its measures.⁸⁵ One source for quality issues is data entry errors that arise from the manual input of patient information.⁸⁶ Clinical records, but even more so, EHRs often include non-standardized questions, free-text patient responses, and personal comments from treating physicians, invoking substantial barriers to data quality, subjectivity, and standardization.⁸⁷ Similarly, in genomics research, issues like inconsistent naming conventions and varying experimental protocols are prevalent.⁸⁸ These and more quality and standardization obstacles inherently compromise the effectiveness of AI in medical applications by limiting the accuracy and reliability of the data used to train and operate AI systems.

AI in rare disease research

The potential for improving RD research and treatment has significantly increased with the advent of AI approaches. The developed models have been shown to outperform traditional statistical methods in processing large and complex datasets essential for studying RDs such as Huntington's Disease,⁸⁹ Hypophosphatasia,⁹⁰ Multiple osteochondromas,⁹¹ Amyotrophic lateral sclerosis.^18,92–94

From the initial stages of detection and diagnosis to the subsequent phases of treatment planning and drug development, AI models are transforming numerous aspects of RD management as they continue to evolve. For the detection and diagnosis of RD, AI models have shown remarkable effectiveness in phenotypic and genetic analysis.⁹⁵ By analyzing patient data encompassing symptom-related and genetic information, AI enables more expedient and accurate RD diagnosis, facilitating the identification of patterns that might otherwise be overlooked due to potential human error. For instance, the capacity to predict the progression of rare neurodegenerative diseases, such as multiple system atrophy, represents a promising avenue of research.^93,96 Furthermore, AI models have exhibited promise in detecting RDs, such as Gaucher's disease and generalized pustular psoriasis, by detecting subtle patterns in clinical, genomic, and imaging data.^78,95 Several studies use EHR data for the diagnosis of acute hepatic porphyria, thereby enabling a reduction in diagnostic delay with the potential to enhance patient outcomes.^97,98 Another important advancement is the interpretability of results, one approach being a model that predicts infection risks in pediatric leukemia patients.⁹⁹ Using interpretable models ensures that the predictions are understandable to clinicians, enabling better clinical decision-making. By incorporating these sophisticated models into the decision-making process, clinicians can enhance their ability to select the most effective treatments, potentially improving patient outcomes and overall quality of care.^24,25

Drug discovery and development are other highly relevant fields for RD research, and AI's ability to rapidly analyze vast amounts of scientific data is proving invaluable. It can facilitate the identification of novel therapeutic targets for various RDs by pinpointing specific molecular pathways and potential drug candidates with higher precision and efficiency.²³ A study has used various AI models to develop a new method for evaluating factors related to Metachromatic Leukodystrophy, a rare genetic disorder that destroys the protective fatty layers around nerves in the central and peripheral nervous systems with the accumulation of sulfatides.¹⁰⁰ This method helps understand the disease's pathogenesis, progression, and potential treatment options, identifying single and dual drug combinations as promising therapeutic targets.

Advancements in gene and protein replacement therapies and stem cell and genome editing technologies have been driven by detailed studies of the biological pathways involved in rare genetic disorders.¹⁰¹ For example, AI approaches have successfully identified key genetic variants and pathways associated with mitochondrial dysfunction in rare neurodegenerative disorders, which are crucial for developing targeted therapies.¹⁰² Additionally, AI models were developed to examine gene expression patterns to predict and identify genes that could serve as early diagnostic or therapeutic markers, resulting in valuable findings for Chronic Nonbacterial Osteomyelitis, a rare autoinflammatory bone disease caused by abnormality in the immune system.²⁷ Another exciting approach is using AI to detect splicing defects in genetic data that are often overlooked in standard diagnostic workflows. This is advantageous in the context of RDs such as spinal muscular atrophy, in which splicing defects are known to play a significant role in the manifestation and progression of the disease.¹⁰³ From identifying genetic variants to uncovering specific pathways, the power of AI in advancing our understanding of RDs and paving the way for more effective treatments has been worth mentioning.¹⁰⁴ TRANSLATE NAMSE showed that AI can identify ultra-rare genetic disorders and novel gene-disease associations.¹⁰⁵ It identified 370 different genetic causes and discovered 34 new and 23 potential genotype-phenotype associations, mainly under the umbrella of neurodevelopmental disorders.

While AI shows great promise in improving RD research and treatment,²⁸ it's important to note that research demonstration is still relatively limited, often focusing on the more common RDs.¹⁰⁶ For URDs, AI applications remain scarce, but ongoing efforts are expanding AI's reach, as exemplified by the development of TRANSLATE NAMSE or AI-MARRVEL, a machine learning system designed to prioritize potentially causative variants for a broader range of Mendelian disorders, including ultra-rare cases.¹⁰⁷ Recently, AI has been employed to examine genetic data and discern novel genetic variants linked to a collection of ultra-rare metabolic disorders resulting from deficiencies in glycosylation. This process, which involves the attachment of sugars to proteins and lipids, is known as the congenital disorder of glycosylation. In general, the use of AI has enhanced diagnostic precision and facilitated a more complete comprehension of associated disease mechanisms.¹⁰⁵

Federated learning in healthcare

The potential of federated learning

FL emerges as a promising approach to overcome the challenges associated with centralized AI training in healthcare (Figure 1C) by effectively combining the benefits of centralized and local training.⁴⁰ This combination of local data utilization and collaborative learning allows institutions to benefit from training access to large-scale, heterogeneous datasets without being constrained by data privacy or security concerns.^108,109 By limiting the transferred information to only model updates, FL mitigates the risk of data breaches, thereby addressing critical privacy and security concerns.^33,43 At the same time, global models trained with FL on diverse datasets are usually more robust, thus less susceptible to bias, and can achieve high accuracy.⁴² A study utilizing FL to diagnose patients with COVID-19 infection, based on chest radiographs from five international healthcare systems, demonstrated that local models trained on local data and integrated into a global model demonstrated greater accuracy when applied to previously unseen datasets.¹¹⁰ Further, FL may significantly reduce communication costs and bandwidth usage, as the total size of communicated parameters is usually smaller than the raw data size.^111,112 It has additional potential to be more cost-efficient than centralized learning, as it can utilize otherwise unused, locally available computing resources, in contrast to the need for investing in a central, high-performance computing system.¹¹² Since each clinic only requires sufficient computing power for its resources, FL provides a scalable solution for large-scale machine-learning applications.^112,113

Continuous FL techniques can be employed to periodically train adaptable and optimized models, as data is constantly generated but not actively reshared at all times. This thereby reduces regulatory hurdles that might arise from direct data sharing.^{105,108,109,114} This approach enables quicker adaptation to new data, which is particularly beneficial in healthcare, where lots of data are generated, and data patterns frequently change.¹¹⁵

Multiple instances of the utilization of FL capitalize on all the advantages. For instance, FL has been demonstrated to facilitate the development of more robust and accurate models by integrating data from diverse healthcare institutions and patient populations following privacy-compliant protocols, for example, for the computer-aided diagnosis of cancer.¹¹⁴ Furthermore, using a federated framework validated the ability to predict the risk of sepsis and acute kidney injury in ICU patients from a range of clinical settings.¹¹⁶ Moreover, FL offers potential for improvements in treatment decision support systems as new and more data, improved diagnostic procedures, and new medications become relevant to disease treatment and detection.²⁸ For non-small cell lung cancer (stage I-III), a two-year survival model has been trained on routine data from radiation oncology in a federated fashion.¹¹⁷

Even a political incentive to establish FL infrastructures is evident from more than 28 ongoing EU-funded projects indexed by the Community Research and Development Information Service.¹¹⁸ Notable projects include FLUTE, which applies FL techniques to prostate cancer research,¹¹⁹ and dAIbetes which aims to create personalized type 2 diabetes treatment outcomes by developing a federated health data platform that consolidates data across multiple international cohorts.¹²⁰ Additionally, the extensive research project “Controls for Deep and Federated Learning” aims to develop new FL technologies with privacy preservation guarantees.¹²¹ This extensive research initiative and other ongoing projects demonstrate the growing recognition of FL's potential, upcoming solutions, and investment in FL research within the European Union.

Challenges in implementing FL

Despite its potential, FL faces several significant challenges in implementation, particularly in healthcare settings. One of the most fundamental issues is the presence of non-independent and identically distributed (non-IID) data originating from different hospitals.⁴⁷ These significantly disparate statistical characteristics of the data, which can arise, for example, due to the different ethnic distribution of patients in the individual hospitals, can propagate biases and inconsistencies into the global model.⁴⁷ One potential solution to this issue is data augmentation, designed to expand and enrich local datasets. The objective is to achieve a more uniform data distribution.¹²² Another proposed solution is the combination of FL with transfer learning (TL) methods. TL uses the knowledge acquired in one situation to improve performance in a related situation. The application of TL in FL is known as federated transfer learning (FTL)¹²³ and shares insights gained by a model at one location with models at other locations in a federated manner, potentially mitigating data scarcity issues and enhancing the model's capacity for generalization.¹²⁴ FTL techniques can counteract these problems by either pre-processing the data or trying to address the challenges in the model itself.¹²³ Another issue that can be addressed by FTL is the heterogeneity of labels.¹²³ This refers to the situation in which different labeling practices and criteria are employed in different hospitals, resulting in inconsistent and heterogeneous labels and meanings behind identical labels.^125,126 Such inconsistencies can potentially introduce biases into the AI model.¹²⁷ Further, different hospitals may display disparate distributions of characteristics, including variations in units of measurement, scales, and the availability of specific characteristics.^123,126 Such discrepancies can result in the model performing inadequately when generalizing across disparate sources, as it relies on characteristics that are not uniformly present or similarly distributed. To address this issue, feature normalization, standardization, and transformation techniques can be employed to align feature distributions.^18,128,129 Furthermore, the temporal variability of data represents a significant challenge.^114,123 Data collected over different time periods may exhibit temporal variability, such as changes in medical practice, diagnostic criteria, or patient demographics. Variability over time can cause a model to perform well with data from the past but poorly with more recent or future data, reducing its practical utility and sustainability.¹³⁰ To counteract this, the inclusion of temporal validation and time series analysis techniques is crucial. Continuous learning approaches that regularly update models with new data can also help to maintain their relevancy and accuracy over time.¹²³

While FL aims to protect privacy by keeping data where they are primarily generated or stored, the risk of reconstruction attacks remains a concern. Attackers listening to the communicated information in an FL network could infer original data from model updates, compromising patient confidentiality.¹⁸ This risk necessitates implementing robust security measures, including access control, authentication, and verification mechanisms, as well as privacy-enhancing techniques like secure multi-party computation and differential privacy,^131,132 which further increases the complexity and demands on the infrastructure.³⁶

Setting up a FL system also comes with its own issues. FL requires a robust technical infrastructure, presenting a significant obstacle for smaller or technically limited organizations or countries.^40,47 Failure to participate in an ongoing FL training process due to setup-related issues could result in the generation of skewed or underrepresented models, as important health data could be missed.³⁷ Moreover, even if training on every FL node of a federated network is successful, FL often leads to longer training times due to increased communication and waiting times.⁴⁰ This increases the complexity of implementing models in comparison to classic centralized approaches. This issue is compounded by the limited availability of high-quality, labeled, and harmonized data in real-world healthcare settings, which poses a significant hurdle for effective model training.¹²⁴ Their complexity highlights the necessity for international collaboration within the scientific community.²⁸

Existing FL applications in healthcare

The potential of FL in healthcare is immense, but its practical implementations still need to be improved. A recent study found that only 5.2% of the results obtained from FL studies in healthcare are applied in practice, with the biggest part being feasibility studies.⁴³ Nevertheless, several noteworthy applications demonstrate FL's potential in various healthcare domains.

For example, in cardiovascular health, FL has been successfully used to detect hypertrophic cardiomyopathy, a condition where the heart muscle thickens abnormally, by combining ECG and echocardiogram data from multiple institutions.¹³³ Similarly, the ICU4Covid project developed a decision support system using FL that enhances the early identification of high-risk hypertensive patients while preserving data privacy, demonstrating its potential as a reliable predictive tool.⁷⁰ The ICU4Covid's telemedical network was used to propose an FL framework to significantly reduce communication costs and latency and enable small and medium-sized healthcare organizations to benefit from collective intelligence.⁷⁰ Finally, ADMarker shows that it is feasible to identify comprehensive multidimensional biomarkers, thereby facilitating the precise and early detection of the diverse manifestations of Alzheimer's disease.¹³⁴

Another significant area where FL has brought about substantial opportunities is health monitoring, particularly through the use of Internet of Medical Things (IoMT) devices that focus solely on Internet of Things (IoT) applications within the healthcare sector. IoT devices facilitate data exchange from physical devices connected to clinical management and patient well-being monitoring via the Internet.¹³⁵ The FedHome model highlights FL's ability to enable collaborative model training across multiple devices without sharing raw data.¹³⁶ The proposed approach effectively handles imbalanced data and reduces communication costs, common challenges in health monitoring scenarios. By accomplishing this, FedHome provides health monitoring in a residential setting through edge devices (end-devices), which facilitate the detection of falls, the monitoring of health-related activities, and the delivery of personalized health recommendations and alerts based on individual health data.¹³⁶ FL's capacity to enable personalized healthcare is evident in its ability to leverage the computational power of individual devices, allowing for the refinement of global models while maintaining data privacy, as exemplified by ClusterGAN for stress-level prediction using ECG signals.¹³⁷ This makes FL a compelling solution for health monitoring applications involving wearable devices. Smart monitoring systems that protect privacy and employ edge devices and sensors can alleviate pressure on healthcare systems and caregivers through continuous monitoring and early detection of potential health issues.^138,139 Based on this, IoMT and FL can also be employed to construct decentralized networks that specialize, for instance, in neurological and metabolic diseases, thereby enhancing the collective analysis of sensor data for early detection and differential diagnosis.¹⁴⁰

Federated learning for rare diseases

FL has emerged as a powerful and highly considerable solution to address the limitations mentioned above and privacy concerns that limit the usability of general AI systems.¹¹⁴ However, the scarcity of medical data for RDs poses significant challenges in training effective predictive models. This limitation stems from two main factors: the inherent rarity of these conditions and the heightened privacy concerns surrounding patient information.⁸² As a result, the already limited number of available datasets can suffer from severe bias, for instance, patients of different socioeconomic statuses,¹²³ making it extremely difficult to develop accurate and generalizable models for disease prediction.³⁷

FL frameworks have the potential to enhance the accuracy of RD detection by drawing information from diverse data sources.³⁷ Furthermore, as the data does not leave the hospital, privacy-preserving training of distributed datasets is ensured, maintaining data security.^36,43 This is particularly valuable for RDs as the number of individual cases in a single institution and for a given disease is usually very small.^44,45

Recently, to explore its practicality, FL was applied to detect Tall Cell Morphology (TCM) in thyroid cancer, a rare but aggressive variant, from Whole Slide Images (WSIs). The federated training was simulated across the WSI datasets in three virtual clients to classify tissue patches as “tall” (expressing TCM) or “non-tall”. Model parameters from each client were then aggregated to ensure convergence. In various experiments, the FL models achieved accuracy comparable to that of centralized models, demonstrating the potential of FL to enhance diagnostic accuracy for rare but clinically significant detections.¹⁴¹

Another noteworthy application case of FL in the detection of RDs is a model for glioblastoma, an aggressive form of brain tumor.³¹ The comprehensive study utilized data from 71 geographically distinct sites across six continents and 6314 glioblastoma patients, representing one of the most extensive global FL studies. This approach enabled the creation of a robust and generalizable model for tumor boundary detection, showing improved performance for detecting different tumor regions, such as 27% for enhancing tumor (ET), 33% for tumor core (TC), and 16% for whole tumor (WT) against local validation data, and 15% for ET, 27% for TC, and 16% for WT against unseen data.³¹

In the context of RD research, novel FL approaches have been developed. One notable contribution is the Dynamic Federated Meta-Learning approach that addresses not only the challenges of small sample sizes resulting in less usable models per data site but also focuses on optimizing the model's performance, where having very few positive samples makes training a robust model difficult, w.r.t. RD research.¹⁴² Separately, FedIIC proposes a novel method specifically designed to address the issue of class imbalance in medical image classification, such as having far fewer examples of RDs compared to healthy cases.¹⁴³ Another promising approach is feature-context-driven federated meta-learning, which employs dynamic weighting of clients based on the accuracy of each local model to enhance the prediction accuracy of RDs.¹⁴⁴ This approach has been demonstrated to be effective when applied to adapted datasets simulating RD data based on cases of cardiac arrhythmias and skin injuries. Another innovative initiative is GenoMed4All, which creates an FL platform to connect European clinical and biomedical datasets on RDs. It uses a common data model to maintain uniform client data standards.¹⁴⁵ In contrast, FedRare implements a technique called “contrastive learning”, facilitating more effective representation learning in the context of data heterogeneity.¹⁴⁶ The technique that is used allows a model to learn to group data points that are similar together and to keep those that are dissimilar apart.¹⁴⁷ As illustrated in Figure 1 (D), FL presents many potential applications in the context of RD. These are repeated in Table 1 with corresponding studies. It is noteworthy that there is a significant need for research into RDs, which is why the table also includes comparative studies that do not relate specifically to RD but are available for common diseases. While these models (“General FL”) may not have been explicitly designed for RDs, they nonetheless demonstrate potential for adaptation. With the requisite datasets for the desired RDs, these models could be constructed similarly.

Table 1.

Overview of progress in general FL compared to rare disease applications the table compares the healthcare-related scenarios and the associated application of fl in general (general FL) or specifically for RD (FL on rare diseases). A lack of previous work is indicated with an “x”.

Application case	Scenarios	General FL	FL on Rare Diseases
Patient Education	Personalized education programs	¹⁴⁸	x
Data-driven Diagnosis	AI-based early diagnosis	^133,149	^31,142,146
Personalized Treatment	Tailoring treatment to individual genomic data	^150–152	¹⁴⁵
Flexible Intervention	Real-time adjustment of therapy based on patient data	¹⁵³	x
Treatment Response Forecasting	Predicting therapy effectiveness using machine learning	¹⁵⁰	x
Precision Medicine	Genome-based treatment strategies	^154–156	¹⁴⁵
Telemedicine	Remote monitoring and consultation for patients	⁷⁰	x
Lifestyle & Health Monitoring	Monitoring and analyzing patient lifestyle	^136,139	x
Treatment Response	Assessing and optimizing treatment outcomes	^152,157	x
Drug Discovery	Identifying new drugs and new uses for existing drugs	^158–161	x
Disease Mechanotyping	Identification and classification of disease patterns	¹⁰⁹	x
Biomarker Identification	Discovering new biomarkers for diagnosis and treatment	¹³⁴	x

The future of federated learning in rare diseases

The technological advancement of FL, coupled with its scalable and privacy-preserving architecture, presents a promising avenue for utilizing sensitive data in RD research while adhering to stringent data privacy regulations. Key advancements in FL, including edge learning techniques^162,163 and continuous FL approaches,^164,165 will significantly enhance the applicability and effectiveness of this technology in tomorrow's healthcare. It will particularly benefit RD research, where data and expertise are more widely distributed across multiple clinics.^28,37 Innovations in the field of FL help to make it more accessible to a broader range of clinicians and researchers, potentially accelerating the detection and treatment of various RDs.

Building on these technological advancements, the increasing user-friendliness, scalability, and computing efficiency of FL frameworks^41,146,166 have reduced the technical barriers to studying conditions that have traditionally been challenging due to limited data and resources, like RDs. Moreover, FTL has opened up new possibilities for cross-regional collaboration, addressing the persistent challenge of data heterogeneity in RD research.¹²³

Importantly, FL's distributed nature promotes more ethical use of patient data,⁴³ fostering greater trust and involvement from patients in the research process. This aspect is particularly crucial in RD research, where patient participation is often a prerequisite for successful research initiatives.¹⁶⁷

Looking to the future, applying these cutting-edge technologies promises to facilitate a profound enhancement of our comprehension and treatment of RDs. Large-scale collaboration will promote standardization as data need to be harmonized. Increased patient involvement could be hoped for with improved data privacy security, resulting in more data serving as a base for more promising research. These benefits could lead to a significant acceleration in the research and treatment of RDs. However, realizing the full potential of FL in RD research will require sustained efforts from the scientific community. There must be increased collaboration between healthcare institutions and technology developers to refine and implement FL technologies further. Additional research is required to address the remaining and pressing challenges, such as ensuring model fairness and robustness in the face of diverse and potentially biased datasets to not perpetuate existing inequalities and biases.

Conclusion

FL shows great potential to help patients with RDs identify fundamental causes, explore treatment options, and receive more accurate and faster diagnoses. However, this potential still must be realized by transferring expertise present in centralized AI development to federated RD research. With the increasing availability of general FL frameworks, incentives for cross-border collaborations must be established. This approach will facilitate the development of new RD-specific models based on existing AI or newly proposed paradigms to address the unique challenges of RDs. Simultaneously, the refinement and development of FL technologies can accelerate this transition to more FL-driven RD research, ultimately leading to improved patient outcomes worldwide. Future collaborations should focus on implementing data standardization protocols for impactful cross-border collaborations.

Footnotes

Acknowledgments

was created with BioRender.com.

ORCID iDs

Md Shihab Ullah

Niklas Probul

Andreas Maier

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The dAIbetes project has received funding from the European Union's Horizon research and innovation programme under the Grant Agreement no: 101136305. The information contained in this press release are however those of the author(s) only and do not necessarily reflect those of the European Union. This work was developed as part of the FeMAI project and is funded by the German Federal Ministry of Education and Research (BMBF) under grant number 01IS21079.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Nguengang Wakap

Lambert

Olry

, et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur J Hum Genet 2020; 28: 165–173.

Smith

Bergman

Hagey

. Estimating the number of diseases – the concept of rare, ultra-rare, and hyper-rare. iScience 2022; 25: 1–11.

Schlander

Garattini

Kolominsky-Rabas

, et al. Determining the value of medical technologies to treat ultra-rare disorders: a consensus statement. J Mark Access Health Policy 2016; 4: 33039.

Abozaid

Kerr

McKnight

, et al. Criteria to define rare diseases and orphan drugs: a systematic review protocol. BMJ Open 2022; 12: e062126.

Wästfelt

Fadeel

Henter

J-I

. A journey of hope: lessons learned from studies on rare diseases and orphan drugs. J Intern Med 2006; 260: 1–10.

Griggs

Batshaw

Dunkle

, et al. Clinical research for rare disease: opportunities, challenges, and solutions. Mol Genet Metab 2009; 96: 20–26.

Boycott

Rath

Chong

, et al. International cooperation to enable the diagnosis of all rare genetic diseases. Am J Hum Genet 2017; 100: 695–705.

Forman

Taruscio

Llera

, et al. The need for worldwide policy and action plans for rare diseases. Acta Paediatr 2012; 101: 805–807.

Shafie

Chaiyakunapruk

Supian

, et al. State of rare disease management in Southeast Asia. Orphanet J Rare Dis 2016; 11: 07.

10.

Boycott

Vanstone

Bulman

, et al. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat Rev Genet 2013; 14: 681–691.

11.

Solebo

Hysi

Horvat-Gitsels

, et al. Data saves lives: optimising routinely collected clinical data for rare disease research. Orphanet J Rare Dis 2023; 18: 85.

12.

Rare diseases registry program (RaDaR). [cited 7 Aug 2024]. Available: https://ncats.nih.gov/research/research-activities/RaDaR.

13.

Orphadata – orphanet datasets. [cited 7 Aug 2024]. Available: https://www.orphadata.com/.

14.

Chatzimichali

Brent

Hutton

, et al. Facilitating collaboration in rare genetic disorders through effective matchmaking in DECIPHER. Hum Mutat 2015; 36: 941–949.

15.

Buske

Girdea

Dumitriu

, et al. Phenomecentral: a portal for phenotypic and genotypic matchmaking of patients with rare genetic diseases. Hum Mutat 2015; 36: 931–940.

16.

Buske

Schiettecatte

Hutton

, et al. The matchmaker exchange API: automating patient matching through the exchange of structured phenotypic and genotypic profiles. Hum Mutat 2015; 36: 922–927.

17.

Zhu

Nguyen

D-T

Grishagin

, et al. An integrative knowledge graph for rare diseases, derived from the Genetic and Rare Diseases Information Center (GARD). J Biomed Semantics 2020; 11: 13.

18.

Decherchi

Pedrini

Mordenti

, et al. Opportunities and challenges for machine learning in rare diseases. Front Med 2021; 8: 747612.

19.

Salehi

Burgueño

. Emerging artificial intelligence methods in structural engineering. Eng Struct 2018; 171: 170–189.

20.

Janiesch

Zschech

Heinrich

. Machine learning and deep learning. Electronic Markets 2021; 31: 685–695.

21.

Moynihan

Monaco

Ting

, et al. Cluster analysis and visualisation of electronic health records data to identify undiagnosed patients with rare genetic diseases. Sci Rep 2024; 14: 5056.

22.

Acosta

Falcone

Rajpurkar

, et al. Multimodal biomedical AI. Nat Med 2022; 28: 1773–1784.

23.

Wang

, et al. The use of artificial intelligence in the treatment of rare diseases: a scoping review. Intractable Rare Dis Res 2024; 13: 12–22.

24.

Adam

Rampášek

Safikhani

, et al. Machine learning approaches to drug response prediction: challenges and recent progress. NPJ Precis Oncol 2020; 4: 19.

25.

Choi

Chung

, et al. Development of a machine learning-based clinical decision support system to predict clinical deterioration in patients visiting the emergency department. Sci Rep 2023; 13: 8561.

26.

Roman-Naranjo

Parra-Perez

Lopez-Escamez

. A systematic review on machine learning approaches in the diagnosis and prognosis of rare genetic diseases. J Biomed Inform 2023; 143: 104429.

27.

Wang

Zou

, et al. Transcriptome analysis based on machine learning reveals a role for autoinflammatory genes of chronic nonbacterial osteomyelitis (CNO). Sci Rep 2023; 13: 6514.

28.

Rieke

Hancox

, et al. The future of digital health with federated learning. NPJ Digit Med. 2020; 3: 119.

29.

Cortial

Montero

Tourlet

, et al. Artificial intelligence in drug repurposing for rare diseases: a mini-review. Front Med 2024; 11: 1404338.

30.

Catacutan

Alexander

Arnold

, et al. Machine learning in preclinical drug discovery. Nat Chem Biol 2024; 20: 960–976.

31.

Pati

Baid

Edwards

, et al. Federated learning enables big data for rare cancer boundary detection. Nat Commun 2022; 13: 7346.

32.

Weissler

Naumann

Andersson

, et al. The role of machine learning in clinical research: transforming the future of evidence generation. Trials 2021; 22: 1–15.

33.

Nemec Zlatolas

Welzer

Lhotska

. Data breaches in healthcare: security mechanisms for attack mitigation. Cluster Comput 2024; 27: 8639–8654.

34.

Hansson

Lochmüller

Riess

, et al. The risk of re-identification versus the need to identify individuals in rare disease research. Eur J Hum Genet 2016; 24: 1553–1558.

35.

Seh

Zarour

Alenezi

, et al. Healthcare data breaches: insights and implications. Healthcare (Basel) 2020; 8: 133–133.

36.

Brauneck

Schmalhorst

Kazemi Majdabadi

, et al. Federated machine learning, privacy-enhancing technologies, and data protection laws in medical research: scoping review. J Med Internet Res 2023; 25: e41588.

37.

Wang

. Federated learning for rare disease detection: a survey. Rare Dis Orphan Drugs J 2023; 2: 22–22.

38.

General Data Protection Regulation (GDPR) – Official Legal Text. In: General Data Protection Regulation (GDPR) [Internet]. [cited 29 Aug 2022]. Available: https://gdpr-info.eu/.

39.

Data Protection and Privacy Legislation Worldwide. In: UNCTAD [Internet]. [cited 21 Jun 2024]. Available: https://unctad.org/page/data-protection-and-privacy-legislation-worldwide.

40.

Wen

Zhang

Lan

, et al. A survey on federated learning: challenges and applications. Int J Mach Learn Cybern 2023; 14: 513–535.

41.

Matschinske

Späth

Bakhtiari

, et al. The FeatureCloud platform for federated learning in biomedicine: unified approach. J Med Internet Res 2023; 25: e42621.

42.

Rahmani

Yousefpoor

, et al. Machine learning (ML) in medicine: review, applications, and challenges. Sci China Ser A Math 2021; 9: 2970.

43.

Teo

Jin

, et al. Federated machine learning in healthcare: a systematic review on clinical applications and technical architecture. Cell Rep Med 2024; 5: 101419.

44.

Art. 5 GDPR – Principles relating to processing of personal data - General Data Protection Regulation (GDPR). In: General Data Protection Regulation (GDPR) [Internet]. [cited 26 Jun 2024]. Available: https://gdpr-info.eu/art-5-gdpr/.

45.

Yaacoub

J-PA

Noura

Salman

. Security of federated learning with IoT systems: issues, limitations, challenges, and solutions. Internet of Things and Cyber-Physical Systems 2023; 3: 155–179.

46.

Joshi

Pal

Sankarasubbu

. Federated learning for healthcare domain - pipeline, applications and challenges. ACM Trans Comput Healthc 2022; 3: 1–36.

47.

Ali

Ahsan

Tasnim

, et al. Federated Learning in healthcare: model misconducts, security, challenges, applications, and future research directions – A systematic review. arXiv [cs.CR]. 2024. Available: http://arxiv.org/abs/2405.13832.

48.

Johnson

Wei

W-Q

Weeraratne

, et al. Precision medicine, AI, and the future of personalized health care. Clin Transl Sci 2021; 14: 86–93.

49.

Busnatu

Niculescu

A-G

Bolocan

, et al. Clinical applications of artificial intelligence-an updated overview. J Clin Med Res 2022; 11: 2265–2265.

50.

Recht

Dewey

Dreyer

, et al. Integrating artificial intelligence into the clinical practice of radiology: challenges and recommendations. Eur Radiol 2020; 30: 3576–3584.

51.

Rubin

. Informatics in radiology: measuring and improving quality in radiology: meeting the challenge with informatics. Radiographics 2011; 31: 1511–1527.

52.

Ghaffar Nia

Kaplanoglu

Nasab

. Evaluation of artificial intelligence techniques in disease diagnosis and prediction. Discov Artif Intell 2023; 3: 5–5.

53.

Ahmed

Wan

Zhang

, et al. Artificial intelligence for omics data analysis. BMC Methods 2024; 1: 4–4.

54.

Walter

Haferlach

Nadarajah

, et al. How artificial intelligence might disrupt diagnostics in hematology in the near future. Oncogene 2021; 40: 4271–4280.

55.

Chen

Wang

, et al. Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans Med Imaging 2022; 41: 757–770.

56.

Venugopalan

Tong

Hassanzadeh

, et al. Multimodal deep learning models for early detection of Alzheimer’s disease stage. Sci Rep 2021; 11: 3254.

57.

Fisher

Smith

Walsh

, Coalition Against Major Diseases, Abbott, Alliance for Aging Research, Alzheimer’s Association, Alzheimer's Foundation of America, AstraZeneca Pharmaceuticals LP, Bristol-Myers Squibb Company, Critical Path Institute, CHDI Foundation, Inc., Eli Lilly and Company, F. Hoffmann-La Roche Ltd, Forest Research Institute, Genentech, Inc., GlaxoSmithKline, Johnson & Johnson, National Health Council, Novartis Pharmaceuticals Corporation, Parkinson's Action Network, Parkinson's Disease Foundation, Pfizer, Inc., Sanofi-aventis. Collaborating Organizations: clinical Data Interchange Standards Consortium (CDISC), Ephibian, Metrum Institute. Machine learning for comprehensive forecasting of Alzheimer’s disease progression. Sci Rep 2019; 9: 13622.

58.

Walsh

Smith

Pouliot

, et al. Generating digital twins with multiple sclerosis using probabilistic neural networks. bioRxiv. bioRxiv 2020. DOI: https://doi.org/10.1101/2020.02.04.934679

59.

Imran

Almusharraf

. Google Gemini as a next generation AI educational tool: a review of emerging educational technology. Smart Learn Environ 2024; 11: 1–8.

60.

OpenAI, Achiam

Adler

Agarwal

, et al. GPT-4 Technical Report. 2023. Available: http://arxiv.org/abs/2303.08774

61.

Touvron

Lavril

Izacard

, et al. LLaMA: Open and Efficient Foundation Language Models. 2023. Available: http://arxiv.org/abs/2302.13971.

62.

Sallam

. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel) 2023; 11: 877–877.

63.

Nori

King

McKinney

, et al. Capabilities of GPT-4 on Medical Challenge Problems. arXiv [cs.CL]. 2023. Available: http://arxiv.org/abs/2303.13375.

64.

Yang

Chen

PourNejatian

, et al. A large language model for electronic health records. NPJ Digit Med 2022; 5: 94.

65.

Mak

K-K

Balijepalli

Pichika

. Success stories of AI in drug discovery - where do things stand? Expert Opin Drug Discov 2022; 17: 79–92.

66.

Zhavoronkov

Ivanenkov

Aliper

, et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat Biotechnol 2019; 37: 1038–1040.

67.

Jumper

Evans

Pritzel

, et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021; 596: 583–589. Available: https://www.nature.com/articles/s41586-021-03819-2

68.

Abramson

Adler

Dunger

, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024; 630: 493–500.

69.

Stoltzfus

Kaur

Chawla

, et al. The role of telemedicine in healthcare: an overview and update. The Egyptian Journal of Internal Medicine 2023; 35: 1–5.

70.

Paragliola

Ribino

Ullah

. A federated learning approach to support the decision-making process for ICU patients in a European telemedicine network. Journal of Sensor and Actuator Networks 2023; 12: 78.

71.

Sharma

Rawal

Shah

. Addressing the challenges of AI-based telemedicine: best practices and lessons learned. J Educ Health Promot 2023; 12: 38.

72.

Telemedicine for healthcare: capabilities, features, barriers, and applications. Sensors International. 2021; 2: 100117.

73.

Pamplin

Davis

Mbuthia

, et al. Military telehealth: a model for delivering expertise to the point of need in austere and operational environments. Health Aff 2019; 38: 1386–1392.

74.

Pifer

. Coronavirus drives surge in Teladoc virtual medical visits. In: Healthcare Dive [Internet]. 15 Apr 2020 [cited 22 Jul 2024]. Available: https://www.healthcaredive.com/news/coronavirus-COVID-surge-teladoc-telehealth-virtu al-medical-visits/576031/.

75.

Ohannessian

Duong

Odone

. Global telemedicine implementation and integration within health systems to fight the COVID-19 pandemic: a call to action. JMIR Public Health Surveill 2020; 6: e18810.

76.

Wosik

Fudim

Cameron

, et al. Telehealth transformation: COVID-19 and the rise of virtual care. J Am Med Inform Assoc 2020; 27: 957–962.

77.

Bekbolatova

Mayer

Ong

, et al. Transformative Potential of AI in Healthcare: Definitions, Applications, and Navigating the Ethical Landscape and Public Perspectives. Healthcare (Basel) 2024; 12: 125–125.

78.

Schaefer

Lehne

Schepers

, et al. The use of machine learning in rare diseases: a scoping review. Orphanet J Rare Dis 2020; 15: 45.

79.

Maweu

Shamsuddin

Dakshit

, et al. Generating healthcare time series data for improving diagnostic accuracy of deep neural networks. IEEE Trans Instrum Meas 2021; 70: 1–15.

80.

Greenleaf

. Sheherezade and the 101 data privacy laws: origins, significance and global trajectories. SSRN Electron J 2013; 40: 1–29.

81.

Courbier

Dimond

Bros-Facer

. Share and protect our health data: an evidence based approach to rare disease patients’ perspectives on data sharing and data protection - quantitative survey and recommendations. Orphanet J Rare Dis 2019; 14: 75.

82.

McCormack

Kole

Gainotti

, et al. You should at least ask”. The expectations, hopes and fears of rare disease patients on large-scale data and biomaterial sharing for genomics research. Eur J Hum Genet 2016; 24: 1403–1408.

83.

High-level summary of the AI act. [cited 22 Jul 2024]. Available: https://artificialintelligenceact.eu/high-level-summary/.

84.

AI Pact. In: Shaping Europe’s digital future [Internet]. [cited 22 Jul 2024]. Available: https://digital-strategy.ec.europa.eu/en/policies/ai-pact.

85.

Bernardi

Alves

Crepaldi

, et al. Data quality in health research: integrative literature review. J Med Internet Res 2023; 25: e41446.

86.

Honeyford

Expert

Mendelsohn

, et al. Challenges and recommendations for high quality research using electronic health records. Front Digit Health 2022; 4: 940330.

87.

Weng

. Detecting systemic data quality issues in electronic health records. Stud Health Technol Inform 2019; 264: 383–387.

88.

Bernasconi

. Data quality-aware genomic data integration. Computer Methods and Programs in Biomedicine Update 2021; 1: 100009.

89.

Odish

OFF

Johnsen

van Someren

, et al. EEG May serve as a biomarker in Huntington’s disease using machine learning automatic classification. Sci Rep 2018; 8: 16090.

90.

Garcia-Carretero

Olid-Velilla

Perez-Torrella

, et al. Predictive modeling of hypophosphatasia based on a case series of adult patients with persistent hypophosphatasemia. Osteoporos Int 2021; 32: 1815–1824.

91.

Mordenti

Ferrari

Pedrini

, et al. Validation of a new multiple osteochondromas classification through Switching Neural Networks. Am J Med Genet A 2013; 161A: 556–560.

92.

Iskrov

Raycheva

Kostadinov

, et al. Are the European reference networks for rare diseases ready to embrace machine learning? A mixed-methods study. Orphanet J Rare Dis 2024; 19: 1–19.

93.

Visibelli

Roncaglia

Spiga

, et al. The impact of artificial intelligence in the odyssey of rare diseases. Biomedicines 2023; 11: 87.

94.

Welsh

Jelsone-Swain

Foerster

. The utility of independent component analysis and machine learning in the identification of the amyotrophic lateral sclerosis diseased brain. Front Hum Neurosci 2013; 7: 51.

95.

Wojtara

Rana

Rahman

, et al. Artificial intelligence in rare disease diagnosis and treatment. Clin Transl Sci 2023; 16: 2106.

96.

Kiryu

Yasaka

Akai

, et al. Deep learning to differentiate parkinsonian disorders separately using single midsagittal MR imaging: a proof of concept study. Eur Radiol 2019; 29: 6891–6899.

97.

Cohen

Chamberlin

Deloughery

, et al. Detecting rare diseases in electronic health records using machine learning and knowledge engineering: case study of acute hepatic porphyria. PLoS One 2020; 15: e0235574.

98.

Bhasuran

Schmolly

Kapoor

, et al. Reducing diagnostic delays in Acute Hepatic Porphyria using electronic health records data and machine learning: a multicenter development and validation study. medRxiv. 2023 [cited 29 Jul 2024]. DOI: https://doi.org/10.1101/2023.08.30.23293130

99.

Al-Hussaini

White

Varmeziar

, et al. An interpretable machine learning framework for rare disease: a case study to stratify infection risk in pediatric leukemia. J Clin Med 2024; 13: 1788.

100.

Esmail

Danter

. DeepNEU: artificially induced stem cell (aiPSC) and differentiated skeletal muscle cell (aiSkMC) simulations of infantile onset POMPE disease (IOPD) for potential biomarker identification and drug discovery. Front Cell Dev Biol 2019; 7: 25.

101.

Koch

Koster

. Rare genetic disorders: novel treatment strategies and insights into human biology. Front Genet 2021; 12: 714764.

102.

Jiang

Han

Min

, et al. Identification of the methotrexate resistance-related diagnostic markers in osteosarcoma via adaptive total variation netNMF and multi-omics datasets. Front Genet 2023; 14: 1288073.

103.

Wang

Helbig

Edmondson

, et al. Splicing defects in rare diseases: transcriptomics and machine learning strategies towards genetic diagnosis. Brief Bioinform 2023; 24: bbad284.

104.

De La Vega

Chowdhury

Moore

, et al. Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases. Genome Med 2021; 13: 53.

105.

Schmidt

Danyel

Grundmann

, et al. Next-generation phenotyping integrated in a national framework for patients with ultrarare disorders improves genetic diagnostics and yields new molecular findings. Nat Genet 2024; 56: 1644–1653.

106.

Hurvitz

Azmanov

Kesler

, et al. Establishing a second-generation artificial intelligence-based system for improving diagnosis, treatment, and monitoring of patients with rare diseases. Eur J Hum Genet 2021; 29: 1485–1490.

107.

Baylor College of Medicine. Using AI to improve diagnosis of rare genetic disorders. Science Daily. 25 Apr 2024. Available: https://www.sciencedaily.com/releases/2024/04/240425131345.htm. Accessed 28 Jul 2024.

108.

Zheng

Lai

Liu

, et al. Aggregation service for federated learning: an efficient, secure, and more resilient realization. IEEE Trans Dependable Secure Comput 01 March‐April 2023; 20: 988–1001.

109.

Sheller

Edwards

Reina

, et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci Rep 2020; 10: 12598.

110.

Loftus

Ruppert

Shickel

, et al. Federated learning for preserving data privacy in collaborative healthcare research. Digit Health 2022; 8: 20552076221134455.

111.

Brendan McMahan

Moore

Ramage

, et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. arXiv [cs.LG]. 2016. Available: http://arxiv.org/abs/1602.05629

112.

Asad

Moustafa

Ito

. Federated Learning Versus Classical Machine Learning: A Convergence Comparison. arXiv [cs.LG]. 2021. Available: http://arxiv.org/abs/2107.10976

113.

Sáinz-Pardo Díaz

López García

. Study of the performance and scalability of federated learning for medical imaging with intermittent clients. Neurocomputing 2023; 518: 142–154.

114.

Rauniyar

Hagos

Jha

, et al. Federated learning for medical applications: a taxonomy, current trends, challenges, and future research directions. arXiv [cs.LG] 2022. Available: http://arxiv.org/abs/2208.03392.

115.

Feng

Phillips

Malenica

, et al. Clinical artificial intelligence quality improvement: towards continual monitoring and updating of AI algorithms in healthcare. NPJ Digit Med 2022; 5: 66.

116.

Pan

Rajendran

, et al. An adaptive federated learning framework for clinical risk prediction with electronic health records from multiple hospitals. Patterns (N Y) 2024; 5: 100898.

117.

Field

Vinod

Delaney

, et al. Federated learning survival model and potential radiotherapy decision support impact assessment for non-small cell lung cancer using real-world data. Clin Oncol (R Coll Radiol) 2024; 36: e197–e208.

118.

CORDIS. [cited 30 Jul 2024]. Available: https://cordis.europa.eu/search?q=%2Fproject%2Fkeywords%3D%27Federated+Learning%27 + AND + endDate%3E%3D2024-07-31&p=1&num=10&srt = Relevance:decreasing.

119.

Federate Learning and mUlti-party computation Techniques for prostatE cancer. In: CORDIS | European Commission [Internet]. Publication Office/CORDIS; 13 Jul 2023 [cited 30 Jul 2024]. Available: https://cordis.europa.eu/project/id/101095382.

120.

Federated virtual twins for privacy-preserving personalised outcome prediction of type 2 diabetes treatment. In: CORDIS | European Commission [Internet]. Publication Office/CORDIS; 8 Jan 2024 [cited 30 Jul 2024]. DOI: https://doi.org/10.3030/101136305

121.

Control for Deep and Federated Learning. In: CORDIS | European Commission [Internet]. Publication Office/CORDIS; 1 Nov 2023 [cited 30 Jul 2024]. DOI: https://doi.org/10.3030/101096251

122.

Morafah

Reisser

Lin

, et al. Stable Diffusion-based data augmentation for Federated Learning with Non-IID data. arXiv [cs.LG]. 2024. Available: http://arxiv.org/abs/2405.07925

123.

Guo

Zhuang

Zhang

, et al. A Comprehensive Survey of Federated Transfer Learning: Challenges, Methods and Applications. arXiv [cs.LG]. 2024. Available: http://arxiv.org/abs/2403.01387

124.

Wang

Yang

Azimi

, et al. Differential private federated transfer learning for mental health monitoring in everyday settings: a case study on stress detection. arXiv [cs.LG] 2024. Available: http://arxiv.org/abs/2402.10862.

125.

Yan

Wei

, et al. Label-efficient self-supervised federated learning for tackling data heterogeneity in medical imaging. IEEE Trans Med Imaging 2023; 42: 1932–1943.

126.

Taha

Yaw

Koh

, et al. A survey of federated learning from data perspective in the healthcare domain: challenges, methods, and future directions. IEEE Access 2023; 11: 45711–45735.

127.

Jin

Liu

Chen

, et al. Federated learning without full labels: A survey. arXiv [cs.LG]. 2023. Available: http://arxiv.org/abs/2303.14453

128.

Demircioğlu

. The effect of feature normalization methods in radiomics. Insights Imaging 2024; 15: 2.

129.

Banerjee

Taroni

Allaway

, et al. Machine learning in rare disease. Nat Methods 2023; 20: 803–814.

130.

Gupta

Kayode

Bhatt

, et al. Hierarchical Federated Learning Based Anomaly Detection Using Digital Twins for Smart Healthcare. 2021 IEEE 7th International Conference on Collaboration and Internet Computing (CIC). IEEE 2021, pp.16–25. DOI: https://doi.org/10.1109/cic52973.2021.00013

131.

Zhao

, et al. Secure multi-party computation: theory, practice and applications. Inf Sci 2019; 476: 357–372.

132.

Hsu

Gaboardi

Haeberlen

, et al. Differential privacy: an economic method for choosing epsilon. 2014 IEEE 27th Computer Security Foundations Symposium. IEEE 2014, pp.398–410. DOI: https://doi.org/10.1109/CSF.2014.35

133.

Goto

Solanki

John

, et al. Multinational federated learning approach to train ECG and echocardiogram models for hypertrophic cardiomyopathy detection. Circulation 2022; 146: 755–769.

134.

ADMarker: A Multi-Modal Federated Learning System for Monitoring Digital Biomarkers of Alzheimer’s Disease. [cited 15 Jul 2024]. Available: https://arxiv.org/html/2310.15301v2

135.

Khan

. Privacy-preserving computing in the healthcare using federated learning. In: AI-Driven Marketing research and data analytics. IGI Global, 2024, pp.263–280. DOI: https://doi.org/10.4018/979-8-3693-2165-2.ch015

136.

Chen

Zhou

, et al. FedHome: cloud-edge based personalized federated learning for in-home health monitoring. arXiv [cs.NI] 2020. Available: http://arxiv.org/abs/2012.07450.

137.

Jiang

Firouzi

, et al. Federated clustered multi-domain learning for health monitoring. Sci Rep 2024; 14: 1–12.

138.

Mercado-Asis

Domingo-Maglinao

. Geriatric medicine in the medical curriculum: a MUST in the globally aging world. JMUST 2022; 6: 944–951.

139.

Ghosh

. FEEL: fEderated LEarning framework for ELderly healthcare using edge-IoMT. IEEE Trans Comput Soc Syst 2023; 10: 1800–1809.

140.

Rani

Kataria

Kumar

, et al. Federated learning for secure IoMT-applications in smart healthcare systems: a comprehensive review. Knowl Based Syst. 2023; 274: 110658. DOI: https://doi.org/10.1016/j.knosys.2023.110658

141.

Shukla

Brandwein-Weber

Samankan

, et al. Federated learning in computational pathology: classification of tall cell patterns in papillary thyroid carcinoma. In: Medical imaging 2024: digital and computational pathology. SPIE, 2024, pp.206–215. DOI: https://doi.org/10.1117/12.3006890

142.

Chen

Zeng

, et al. DFML: dynamic federated meta-learning for rare disease prediction. IEEE/ACM Trans Comput Biol Bioinform 2023; 21: 880–889.

143.

Yang

, et al. FedIIC: towards robust federated learning for class-imbalanced medical image classification. arXiv [cs.CV] 2022. Available: http://arxiv.org/abs/2206.13803.

144.

Chen

Zeng

, et al. Feature-context driven federated meta-learning for rare disease prediction. arXiv [cs.LG]. 2021. Available: http://arxiv.org/abs/2112.14364

145.

Cremonesi

Planat

Kalokyri

, et al. The need for multimodal health data modeling: a practical approach for a federated-learning healthcare platform. J Biomed Inform 2023; 141: 104338.

146.

Yang

, et al. FedRare: Federated learning with intra- and inter-client contrast for effective rare disease classification. ArXiv. 2022. abs/2206.13803. DOI: https://doi.org/10.48550/arXiv.2206.13803

147.

Khosla

Teterwak

Wang

, et al. Supervised Contrastive Learning. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2004.11362

148.

Błajda

Barnaś

Kucab

. Application of personalized education in the mobile medical app for breast self-examination. Int J Environ Res Public Health 2022; 19: 4482.

149.

Khalil

Khan Mamun

MMR

Sherif

, et al. A federated learning model based on hardware acceleration for the early detection of Alzheimer’s disease. Sensors (Basel) 2023; 23: 8272.

150.

Sheller

Reina

Edwards

, et al. Multi-institutional deep learning modeling without sharing patient data: a feasibility study on brain tumor segmentation. Brainlesion 2019; 11383: 92–104.

151.

Dasaradharami Reddy

Gadekallu

. A comprehensive survey on federated learning techniques for healthcare informatics. Comput Intell Neurosci 2023; 2023: 8393990.

152.

Alawadi

Kebande

Dong

, et al. A federated interactive learning IoT-based health monitoring platform. New Trends in Database and Information Systems 2021; 1450: 235–246.

153.

Qayyum

Ahmad

Ahsan

, et al. Collaborative federated learning for healthcare: multi-modal COVID-19 diagnosis at the edge. IEEE Open J Comput Soc 2022; 3: 172–184.

154.

Federated Learning on Transcriptomic Data: Model Quality and Performance Trade-Offs. [cited 15 Jul 2024]. Available: https://arxiv.org/html/2402.14527v1

155.

Kolobkov

Mishra Sharma

Medvedev

, et al. Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project. Front Big Data 2024; 7: 1266031.

156.

Danek

Makarious

Dadu

, et al. Federated learning for multi-omics: a performance evaluation in Parkinson’s disease. Patterns (N Y) 2024; 5: 100945.

157.

Dayan

Roth

Zhong

, et al. Federated learning for predicting clinical outcomes in patients with COVID-19. Nat Med 2021; 27: 1735–1743.

158.

Oldenhof

Ács

Pejó

, et al. Industry-scale orchestrated federated learning for drug discovery. arXiv [cs.LG]. 2022. Available: http://arxiv.org/abs/2210.08871

159.

Huang

Zhang

, et al. Collaborative analysis for drug discovery by federated learning on non-IID data. Methods 2023; 219: 1–7.

160.

Hanser

. Federated learning for molecular discovery. Curr Opin Struct Biol 2023; 79: 102545.

161.

Xiong

Cheng

Lin

, et al. Facing small and biased data dilemma in drug discovery with enhanced federated learning approaches. Sci China Life Sci 2022; 65: 529–539.

162.

Cao

Lyu

Zhu

, et al. An overview on over-the-air federated edge learning. IEEE Wirel Commun 2024; 31: 202–210.

163.

Tak

Cherkaoui

. Federated edge learning: design issues and challenges. IEEE Netw 2021; 35: 252–258.

164.

Ali

Naeem

Tariq

, et al. Federated learning for privacy preservation in smart healthcare systems: a comprehensive survey. IEEE J Biomed Health Inform 2023; 27: 778–789.

165.

Guo

Chen

Ren

, et al. Federated learning empowered real-time medical data processing method for smart healthcare. IEEE/ACM Trans Comput Biol Bioinform 2022; 21: 1–12.

166.

Beutel

Topal

Mathur

, et al. Flower: A friendly federated learning research framework. 2020. Available: https://hal.science/hal-03601230/.

167.

Geißler

Isham

Hickey

, et al. Patient involvement in clinical trials. Commun Med (Lond) 2022; 2: 94.