Transition database for rare diseases and its use for clinical documentation

Abstract

Patients with rare diseases commonly suffer from severe symptoms as well as chronic and sometimes life-threatening effects. Not only the rarity of the diseases but also the poor documentation of rare diseases often leads to an immense delay in diagnosis. One of the main problems here is the inadequate coding with common classifications such as the International Statistical Classification of Diseases and Related Health Problems. Instead, the ORPHAcode enables precise naming of the diseases. So far, just few approaches report in detail how the technical implementation of the ORPHAcode is done in clinical practice and for research. We present a concept and implementation of storing and mapping of ORPHAcodes. The Transition Database for Rare Diseases contains all the information of the Orphanet catalog and serves as the basis for documentation in the clinical information system as well as for monitoring Key Performance Indicators for rare diseases at the hospital. The five-step process (especially using open source tools and the DataVault 2.0 logic) for set-up the Transition Database allows the approach to be adapted to local conditions as well as to be extended for additional terminologies and ontologies.

Keywords

clinical documentation rare diseases semantic mapping ORPHAcodes

Introduction

Rare diseases are defined with an incidence of 1 in 2,000 individuals in the European Union and 1 in 2,500 in the USA.¹ Because there are about 6,000 to 8,000 different rare diseases, probably 263 to 446 million people worldwide are affected.² Rare diseases are diverse, highly variable and thus a “global public health burden”³: They all have in common that they have chronic effects on the quality of life of the patients and are sometimes life-threatening.⁴ Due to a lack of expertise and geographically dispersed experts, false diagnoses or long waiting periods – on average of about 7 years until the disease is correctly diagnosed – are common.^3,5 50 % of patients remain undiagnosed.⁶

In addition to the difficult diagnostics and the diagnostic delay, the documentation of the diagnoses is a huge hurdle. Rare diseases are currently only insufficiently described by classifications, such as by the International Statistical Classification of Diseases and Related Health Problems 10th Version (ICD-10). Aymé et al. showed that only 355 of up to 8,000 different diseases can be clearly classified by a specific ICD-10 code and that only 162 can be mapped to a set of ICD-10 codes.⁷ Some rare diseases are grouped into codes, such as for example Fabry disease, which is mapped as “other sphingolipidoses” with code E75.2 in ICD-10.⁸ But E75.2 also includes other syndromes, such as Farber disease⁹ and Krabbe disease.¹⁰ This means that the ICD code does not indicate which rare disease was actually diagnosed. This creates a gap between clinical documentation and the actual medical diagnoses. That also has a negative impact on research because the observational (and claim) data cannot be clearly assigned to a specific rare disease and this is making secondary data use difficult, if not impossible.

The Orphanet approach (https://www.orpha.net) represents an alternative to coding with ICD-10.¹¹ It is a multilingual portal containing information about rare diseases and orphan drugs. An important component is the ORPHAcode, which is a unique and stable identifier for a single rare disease. In this way, significantly more diseases can be provided with a unique code than with ICD-10. The resulting code system enables mapping to other terminologies, such as ICD-10-WHO, Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), HUGO Gene Nomenclature (HGNC) or Online Mendelian Inheritance in Man (OMIM). There are also references to the Human Phenotype Ontology (HPO) to describe symptoms and phenotypes.¹² Using specific coding system for rare diseases significantly increases the documentation quality and representation of rare diseases.^11,13 They thus form the basis for both quality-assured secondary data use and specific additional documentation, for example through registers.

In Germany, diagnoses have so far been documented exclusively with ICD-10 of the German Modification (ICD-10-GM) and coding with ORPHAcodes is done on a voluntary basis. An obligation to document rare diseases with ORPHAcode and the German Alpha-ID-SE is mandatory since 2023.^14,15 However, few approaches have been published on how to integrate the specific coding systems into clinical practice and IT infrastructure.

The University Center for Rare Diseases at the University Hospital Carl Gustav Carus in Dresden, developed a parametrized form within the clinical information system to enable the obligatory coding of rare diseases using ORPHAcodes for all inpatients.¹⁶ As part of the project “Collaboration on Rare Diseases” of the German Medical Informatics Initiative (CORD-MI), the underlying data basis of the form was refactored. This resulted in the Transition Database for Rare Diseases. The aim of this work is to review current approaches for documentation of diagnoses for rare diseases and to explain how the information in the Orphanet catalog can be used to support clinical documentation of rare diseases by using a database with semantic mappings. Necessary steps are described to store and version ORPHAcodes and their mappings as well as to structure them according to local specifics. This results in lessons learned that allow the adaptation of the approach to other clinical IT infrastructures.

State of the art

For an overview of methods, which are already in use, for rare disease documentation support, a rapid scoping review was conducted by two authors. The methodology based on “PRISMA Extension for Scoping Reviews”,¹⁷ which consists of the following steps:

(1) Identifying the research questions

(2) Identifying relevant studies (search strategy)

(3) Study selection

(4) Charting the data

(5) Collating, summarizing and reporting the results.

Research questions

The primary research question was: What approaches are there to support documentation of rare diseases and how is the support provided? This led to secondary research questions to detail the approaches:

• Which terminologies or ontologies are used for the documentation?

• In which stage is the approach (e.g., proof-of-concept or routine)?

• Which systems or applications are supported (e.g., in the clinical information system, in the documentation system, as an external application)?

Search strategy

A search (on 10^th on January 2023) in PubMed of the last 50 years for the search term (Documentation [Title/Abstract] OR Code* [Title/Abstract] OR Coding* [Title/Abstract]) AND (Rare Disease* [Title/Abstract] OR Orphan Disease* [Title/Abstract]) resulted in 762 papers (see Figure 1). Two authors screened these papers and discussed in case of disagreement.

Figure 1.

PRISMA flowchart to identify approaches for the state of the art (model from¹⁸).

Restriction to rare diseases is important because, on the one hand, they require specific terminologies and, on the other hand, multiple diagnosis codes are also needed, e.g. for billing purposes as well as for more precise designation, for the exact indication of suspected and excluded diagnoses.

Study selection

The two authors used the Rayyan tool^19,20 for the screenings and adhered to inclusion and exclusion criteria (see Table 1).

Table 1.

Inclusion and exclusion criteria of the rapid scoping review.

Inclusion criteria	Exclusion criteria
Articles of the last 50 years	Articles older than 50 years
Articles in German and English	Articles in a language other than German and englisch
Articles on documentation and coding of rare diseases, especially their diagnosis in outpatient and inpatient care	Articles on a different scope, e.g. coding genes, registries, orphan drugs, case studies on specific diseases

Of the 762 articles 20 papers were included through title-abstract-screening. Five studies were included and 15 studies were excluded by the full-text screening in accordance with the inclusion and exclusion criteria (see Table 1). Only three papers discuss explicit approaches to documentation support; two additional papers discuss coding support by extra applications.

A large number of papers found by the search string were excluded because they dealt with genes encoding a specific phenotype or problems in public health and epidemiology but did not show any relation to rare disease coding or documentation.

Charting the data

Table 2 shows what information was extracted to answer the research questions. Table 3 summarizes the information from the identified papers.

Table 2.

Extracted information according to research questions.

Research question	Extracted information
Metadata of the paper (with no regard to any research question)	• Country
	• Publication year
(1) What approaches are there to support documentation of rare diseases and how is the support provided?	• Approach to support documentation
(2a) Which terminologies or ontologies are used for the documentation?	• Supported terminologies or ontologies
(2b) In which stage is the approach (e.g., proof-of-concept or routine)?	• Stage of approach
(2c) Which systems or applications are supported (e.g., in the clinical information system, in the documentation system, as an external application)?	• Kind of IT system

Table 3.

Information extracted form the articles.

Article	Country	Year	Approach	Terminologies or ontologies				Stage		System
Article	Country	Year	Approach	Orpha code	HPO	ICD-10	SNOMED	Proof-of-concept	Routine	Information system	External application
CHOQUET ET AL.²¹: LORD	France	2015	Web application	●	●	●		●			●
Kretschmer et al.^15,16: Parameterized form at the Dresden University Hospital	Germany	2021	Parameterized form	●		●			●	●
MARTIN ET AL.¹⁵: Parametrized form at the Würzburg University Hospital	Germany	2022	Parameterized form	●	●	●			●	●
Martin et al.¹⁵: Diagnosis hit list at the Tübingen University Hospital	Germany	2022	Parameterized form	●		●			●	●
Alves et al.²²: RDD	Spain	2016	Symptom-based differential diagnosis generator	●				●			●
Pilehaver et al.²³: PheneBank	UK	2021	Generation of a database of phenotypes and their relationships using NLP		●		●	●			●

Reporting the results

Three papers specifically describe how rare diseases can be documented. Choquet et al.²⁴ described their web portal named „Linking Open data for Rare Diseases (LORD)“, which allows the user to browse through the Orphanet catalog by phenotypes and genotypes. The semantic link between the different terminologies (e.g. ORPHAcode, OMIM, HPO) is provided by Semantic Web and Resource Description Framework (RDF) technology. This mapping is stored in a NoSQL database. An integration into the clinical information system is possible via Application Programming Interface (API). Kretschmer et al.¹⁶ developed a parameterized form to code the rare diseases diagnosis with ORPHAcodes. Martin et al.¹⁵ described the problem of coding diagnosis of rare diseases in general and with special focus on Germany. As a solution, they mentioned three approaches of German university hospitals. In addition to the approach of a Kretschmer et al.,¹⁶ the approaches of Würzburg University Hospital and Tübingen University Hospital were also individually developed and adapted additions to the clinical information systems. In Würzburg, a parameterized form is used to document ICD-10-GM, Alpha-ID-SE and ORPHAcodes. In Tübingen, a diagnoses hit list is used to document ORPHAcodes.

The two additional papers were about IT systems for correct coding as part of diagnostic support. Alves et al.²² described the development of their tool for symptom-based Differential Diagnosis (DDX) called “Rare Disease Discovery (RDD)”. The web interface allows a search for symptoms to predict rare diseases. Pilehvar et al.²³ introduced a web-based tool named “PheneBank”. It offers the retrieval of Medline abstracts with NLP by entering phenotypes using common terminologies like SNOMED or HPO.

Thus, six separate approaches to documenting rare disease diagnoses were identified:

1. Choquet et al.²¹: LORD,

2. Kretschmer et al.^15,16: Parameterized form at the Dresden University Hospital

3. Martin et al.¹⁵: Parameterized form at the Würzburg University Hospital

4. Martin et al.¹⁵: Diagnoses hit list at the Tübingen University Hospital

5. Alves et al.²²: RDD

6. Pilehaver et al.²³: PheneBank

All approaches have in common that they use international terminologies and ontologies for coding diagnoses (see Table 3). The majority (n = 5) use ORPHAcodes, half (n = 3) also use HPO, and the approaches developed and used in Germany additionally use German terminologies, such as ICD-10-GM (n = 3) or Alpha-ID-SE (n = 1). However, SNOMED (n = 1) and OMIM (n = 1) are rarely used.

The identified approaches are already in routine use (n = 4) or are being evaluated as proof-of-concepts (n = 2) (see Table 3). The approaches used in Germany are integrated into the clinical information system or the documentation system (n = 3). The other approaches are external applications (n = 3) (see Table 3).

Despite the commonalities, a uniform approach to documenting rare disease diagnoses is not evident.

Concept

In a cooperation between the University Center for Rare Diseases and the IT division of the University Hospital Carl Gustav Carus in Dresden, a form for the clinical documentation of rare diseases was designed. In the following, the functionality of the form is explained briefly, but the focus is on the Transition Database for Rare Diseases used for the underlying data catalog as well as their interaction.

Parametrized form for clinical documentation of rare diseases

The parametrized form allows coding of rare diseases using ORPHAcode in the clinical information system ORBIS of the Dedalus Healthcare Group. The form allows searching by ORPHAcode, disease name, ICD-10 code and genes based on the Orphanet catalog. The search result can be selected and stored in the electronic patient record. Since the majority of rare diseases are mainly chronic, documentation is per patient and not per medical case. This is unusual for the German healthcare system, as documentation is strongly based on case-related billing of medical services.

On the one hand, the form is used for more accurate medical coding of rare diseases and is mandatory for inpatient care at the hospital. On the other hand, the results are used for statistical analysis to create greater awareness of rare diseases and their incidence. For this purpose, statistical parameters are displayed within a dashboard.²⁵

The clinical information system expects one comma-separated values (CSV) file per searchable field (i.e. ORPHAcode, name, ICD-10 code, genes). Previously, the CSV files were updated manually by an Excel form every half year. The refactoring presented here enables an automated update every month.

Transition database for rare diseases

In 2022, the form was refactored. The aim was to update the underlying data catalog on a monthly basis in order to always use the most recent status of the Orphanet catalog. New rare diseases are constantly being researched and included in the Orphanet catalog. The update of the source data is determined along a strict decision path by Orphanet Nomenclature Manager and a medical and scientific validating authority (see Procedural document: Orphanet nomenclature and classification of rare diseases). This process includes both the collection of information through literature and expert knowledge, as well as discussion in a selected committee. For the provision of the updated information, there is a release cycle in which the files required for the database are updated monthly. This high variability requires constant updates. The Transition Database combines all monthly data as well as changes of the Orphanet catalog since 2020.

The use of more flexible approaches, such as terminology servers or API is unsuitable for the present case, as the clinical information system requires only CSV files. Thus, a more static approach was deliberately chosen: The Transition Database is the basis for the rare disease parametrized form for clinical documentation of rare diseases.

The automated workflow of how the database is loaded and used as the basis for the data catalog of the parametrized form is shown in Figure 2. The extent to which the individual steps have been implemented is explained in the following chapter.

Figure 2.

Workflow of updating the transition database for rare diseases representing the DAGs from Airflow (without step (E) “Manual input of the CSV files into the clinical information system”; dv = DataVault, h = hub, s = satellite).

Implementation

The following section focuses solely on the implementation of the Transition Database for Rare Diseases following the steps of the automated workflow (see Figure 2.). To manage the workflow orchestration we used Apache Airflow (https://github.com/apache/airflow). It is an open source tool for workflow management. Directed acyclic graphs (DAGs) with tasks and dependencies can be programmed via Python and are scheduled by Apache Airflow.²⁶

Reading the underlying files form Orphanet

Orphadata (https://www.orphadata.com) provides aggregated information from Orphanet in the form of predefined datasets (referred to as the Orphanet catalog).

Here there are different packages, so the used German Orphanet Nomenclature Pack contains different files. The Orphanet Nomenclature File is the main source for the concept of the Transition Database: Regular updated moderated XML files containing every valid ORPHAcode, name of the disease, status (active or inactive), synonyms, disorder type (disease or syndrome or histopathological subtype), classification level (disease or group) can be downloaded from their Github Repository (https://github.com/Orphanet/Orphadata_aggregated).

To track changes the Github Repository is checked daily. A SHA256 hash is created for the data and compared with the hash value of the previous data exports. If the hash differs, the new XML files are automatically downloaded, read, transformed, written to the database and saved as a backup.

Storage of required information in the transition database

The Transition Database contains all the information of the Orphanet catalog (i.e. ORPHAcode, name, status, synonyms, ICD-10-WHO code and since recently ICD-11-WHO codes with associated mapping relationships, codes from OMIM, UMLS, MeSh, GARD, Meddra, link to Orphanet website) as well as an ID, the date of the underlying file and a generated unique job ID for the executed update (see Figure 3).

Figure 3.

Sample excerpt from the database (tool: DbVisualizer; mapping relations are hidden).

When a new version of the data is available, the associated XML file is parsed into a flat structure for later processing in a relational database. This function requires multiple transformation and pivoting steps in order to receive the target structure containing the unique ID, ORPHAcode, link to the resource, name, synonyms as well as the status and the external references. Using Apache Airflow this functionalities are orchestrated to stage the data into the database.

Every downloaded XML file contains the complete and valid ORPHAcode catalog. Each ORPHAcode and its information is written line by line to the Transition Database and then displayed in whole in its most recent version.

Update of mappings with DataVault 2.0 logic

The mappings are updated and versioned using the DataVault 2.0 logic. DataVault is a logic of data warehousing, which is based on a two-layer architecture (stage and data warehouse) with the goal to have an exact copy of the original data.²⁷ The model of DataVault consists of a single hub table and multiple satellite tables, which are linked by unique business keys. For the Transition Database we created a hub containing the unique keys and five satellites containing all the describing attributes (see Figure 4).

Figure 4.

Database model of DataVault 2.0 logic with hub (h) and satellites (s).

The hub contains all ORPHAcodes ever written to the Transition Database, a MD5 Hash as business key for each one and valid_from and valid_to dates. This is the main table for the DataVault. The five satellites contain available information about the name, status, synonyms, ICD-10 (and since recently ICD-11 codes) and associated genes for each ORPHAcode as well as valid_from and valid_to dates for each satellite. The entries are connected by the business key to ensure fast transactions. The hub and each satellite can be displayed as a most recent valid view or a historicized view with all recorded changes since DataVault creation. The DataVault is created manually with a self written Airflow Operator and uses Structured Query Language (SQL) queries. This allows to generate and customize a DataVault for different use cases independently form commercial software. In our case, the most recent views of each satellite are used for the creation and update of five CSV files (ORPHAcode and name, synonym, ICD-10, genes, names of genes).

Generation and automated e-mailing of the CSV files

The required CSV files correspond to the satellites and contain additional information that the clinical information system expects.

Each CSV file is written to a table in the Transition Database and from there updated with new information and again saved to CSV files. A task in the Airflow DAG compares the CSV table to the most recent view of the satellite associated with it. No longer valid entries get the valid_to date from today and new entries are created with valid_from today in the CSV Table. In the next step, all these CSV tables are written to separate CSV files, compressed into a ZIP archive and saved to our local cloud storage. After that, the last task sends an email with a link to the archive to the recipient list responsible for updating the clinical information system.

Manual input of the CSV files into the clinical information system

The sent CSV files are then manually inserted into the information system by an employee of the IT division. After that, the new information is available to all users.

Lessons learned and conclusion

The correct and clear documentation of rare diseases diagnoses is essential for patient care and research. However, there are only few published approaches like the scoping review shows. This is in line with the statement of Choquet et al. who described that a processing is necessary because, although the data is publicly available, it has a high heterogeneity and update frequency, and thus it can be poorly used and integrated into existing systems.²¹ Therefore, we propose the Transition Database for Rare Diseases containing information from the Orphanet catalog as well as mappings of the ORPHAcode. We explained this approach using the example of the parameterized form at the at the University Hospital Carl Gustav Carus in Dresden. It can also be adapted to local specifics due to the explanation of the single steps.

The lessons learned during development were diverse: (1) Automation for increased timeliness, (2) database as a solid basis, (3) DataVault 2.0 logic for flexible updates, (4) use of open source tools for easy embedding and (5) transferability and expandability.

(1) Manual updating of the database for the parameterized form is very complex, which is why automating the process is important. Apache Airflow is used to automate and orchestrate the workflows for downloading, saving, updating, transforming and sending the data. The automation increases the timeliness of the data and creates an overview of the frequency and content of the updates of both the data source (Orphadata) and the prepared data (Transition Database for Rare Diseases).

(2) Although terminology servers are often used to store and update terminologies and classifications, this approach is not suitable for local requirements. Since the clinical information system requires CSV files as input, the authors decided to store the data in a database and transform it according to the required data structure. This also corresponds to the described state of the art.²⁴

(3) The rapid frequency of change in the source data set, which contains highly linked information, requires a dynamic method to update the data. Therefore, the DataVault 2.0 logic was used. The hub-satellite relationship corresponds to the logic of the required CSV files, making programming easier. In addition, this structure allows for quick expansion. Thus, a new satellite could be programmed for ICD-11 codes already present in the source without changing the original program code. Therefore, we assume that a transfer to other terminologies and classifications is easily and quickly possible. So far, the focus is on the ORPHAcodes, which corresponds to the state of the art.^15,21,22

(4) The use of open source tools enables independence from commercial tools. Thus, the database could be easily and quickly embedded into the existing IT infrastructure. Apache Airflow is already used for different dashboards²⁵ and the DataVault 2.0 logic is an essential part of the clinical data warehouse²⁸ (even if the commercial tool DataVaultBuilder is used). The use of other tools is conceivable; for example, the prototype of the Transition Database was created with Pentaho Data Integration (https://www.hitachivantara.com/en-us/products/dataops-software/data-integration-analytics/download-pentaho.html).

(5) The concept can also be applied to other use cases, for example, it is successfully used in the clinical data warehouse for mappings to an international research data repository²⁹ and for mappings of medication data. In general, the step-by-step approach allows for adaption to local conditions (e.g. use of other tools, different structure of the output).

For the future, we plan to expand the database. In particular, it will be expanded to include terminologies and ontologies for genetic rare diseases, i.e. HPO and HGVS. In addition, the database will be used as a basis to store mappings to other rare disease documentation approaches: The focus here is on minimal data sets^21,30 and registries^31–33. By accurately documenting diagnoses of rare diseases, patients can be included in registries in a more precise and targeted manner. This in turn forms the basis for research on quality-assured data and enables access to improved care. Examples of the successful implementation of registries include the German registry for Cystic Fibrosis (https://www.muko.info/), LORIS MyeliNeuroGene rare disease database³⁴, the United Kingdom Primary Immune Deficiency registry³⁶ as well as the GAIN Registry³⁷. In addition, registry software tools can also use the described approach to update the underlying terminologies.

The Transition Database for Rare Diseases is routinely used as a basis for the documentation of rare diseases at the University Hospital Carl Gustav Carus in Dresden. The data obtained from this serves both patient care and the monitoring of key performance indicators for rare diseases within a dashboard.²⁵

The structured storage and continuous updating of terminologies and ontologies as well as their mappings supports the documentation of rare diseases in patient care and hence the research of these conditions. Accurate naming of diseases will increase the chance of personalized medicine for patients with rare diseases and can shorten the odyssey of diagnosis.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the German Federal Ministry of Education and Research within the project „Collaboration on Rare Diseases“ (01ZZ1911I) of the German Medical Informatics Initiative.

Open Access funding enabled and organized by Projekt DEAL. We gratefully acknowledge support from the SLUB / TU Dresden Open Access Publication Fund. The funders for the publication had no influence on the study design, data collection, analysis, decision to publish, or preparation of the manuscript.

ORCID iD

Michele Zoch

References

Schieppati

Henter

Daina

, et al. Why rare diseases are an important medical and social issue. Lancet 2008; 371(9629): 2039–2041.

Shourick

Wack

Jannot

. Assessing rare diseases prevalence using literature quantification. Orphanet J Rare Dis 2021; 16: 139.

Dharssi

Wong-Rieger

Harold

, et al. Review of 11 national policies for rare diseases in the context of key patient needs. Orphanet J Rare Dis 2017; 12(1): 63.

Aymé

Schmidtke

. Networking for rare diseases: a necessity for Europe. Bundesgesundheitsblatt - Gesundheitsforsch - Gesundheitsschutz 2007; 50(12): 1477–1483.

Blöß

Klemann

Rother

, et al. Diagnostic needs for rare diseases and shared prediagnostic phenomena: results of a German-wide expert Delphi survey. PLOS ONE 2017; 12(2). https://dx.plos.org/10.1371/journal.pone.0172532 Palau F, editor.

Graessner

Zurek

Hoischen

, et al. Solving the unsolved rare diseases in Europe. Eur J Hum Genet 2021; 29: 1477.

Aymé

Bellet

Rath

. Rare diseases in ICD11: making rare diseases visible in health information systems through appropriate coding. Orphanet J Rare Dis 2015; 10(1). https://www.ojrd.com/content/10/1/35

Orphanet INSERM . Orphanet: Fabry disease. 2021. https://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=en&Expert=324

Orphanet INSERM . Orphanet: farber disease. 2021. https://www.orpha.net/consor/cgi-bin/OC_Exp.php?Expert=333

10.

Orphanet INSERM . Orphanet: krabbe syndrom. 2021. https://www.orpha.net/consor/cgi-bin/OC_Exp.php?Lng=GB&Expert=487

11.

Rath

Olry

Dhombres

, et al. Representation of rare diseases in health information systems: the orphanet approach to serve a wide range of end users. Hum Mutat 2012; 33(5): 803–808.

12.

Köhler

Vasilevsky

Engelstad

, et al. The human phenotype ontology in 2017. Nucleic Acids Res 2017; 45(D1): D865–D876.

13.

Robinson

Graessner

. Datenstandards für Seltene Erkrankungen. Bundesgesundheitsblatt - Gesundheitsforsch - Gesundheitsschutz. 2022; 65: 1126. DOI: 10.1007/s00103-022-03591-2

14.

Weber

Dávila

. German approach of coding rare diseases with ICD-10-GM and Orpha numbers in routine settings. Orphanet J Rare Dis 2014; 9(1): O10.

15.

Martin

Rommel

Thomas

, et al. Seltene erkrankungen in den daten sichtbar machen – kodierung. Bundesgesundheitsblatt - Gesundheitsforsch - Gesundheitsschutz 2022; 65(11): 1133–1142.

16.

Kretschmer

Danker

Müller

, et al. Wie häufig ist selten wirklich? Eine Erhebung zur Häufigkeit Seltener Erkrankungen an einem Universitätsklinikum. Gesundheitswesen 2021, https://www.thieme-connect.de/DOI/DOI?10.1055/a-1388-7095

17.

Tricco

Lillie

Zarin

, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med 2018; 169(7): 467–473.

18.

Page

McKenzie

Bossuyt

, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021; 372: n71.

19.

Ouzzani

Hammady

Fedorowicz

, et al. Rayyan—a web and mobile app for systematic reviews. Syst Rev 2016; 5(1): 210.

20.

Rayyan . Intelligent systematic review. https://rayyan.ai/cite

21.

Choquet

Maaroufi

de Carrara

, et al. A methodology for a minimum data set for rare diseases to support national centers of excellence for healthcare and research. J Am Med Inform Assoc 2014; 22(1): 76–85.

22.

Alves

Piñol

Vilaplana

, et al. Computer-assisted initial diagnosis of rare diseases. PeerJ 2016; 4: e2211.

23.

Pilehvar

Bernard

Smedley

, et al. PheneBank: a literature-based database of phenotypes. Bioinforma Oxf Engl. 2021; 38: btab740.

24.

Choquet

Maaroufi

Fonjallaz

, et al. LORD: a phenotype-genotype semantically integrated biomedical data tool to support rare disease diagnosis coding in health information systems. AMIA Annu Symp Proc AMIA Symp 2015; 2015: 434–440.

25.

Leutner

Bathelt

Sedlmayr

, et al. Development of a dashboard for rare diseases - a technical case report. Stud Health Technol Inf 2021; 283: 78–85.

26.

Apache Software Foundation . Apache airflow. 2021. https://airflow.apache.org/

27.

Linstedt

Olschimke

. Building a scalable data warehouse with data vault 2.0. Burlington, MA: Morgan Kaufmann, 2015.

28.

Lorenz

Gebler

Bathelt

, et al. Evaluation of Modeling Approaches for a Clinical Data Warehouse in a Highly Dynamic Environment. Stud Health Technol Inform. 2023; 302: 753–754.

29.

Kümmel

Reinecke

Gruhl

, et al. Transition Database for a harmonized mapping of German patient data to the OMOP CDM. 2020. https://www.ohdsi.org/2020-eu-symposium-showcase-13/

30.

Abaza

Kadioglu

Martin

, et al. Domain-specific common data elements for rare disease registration: conceptual approach of a European joint initiative toward semantic interoperability in rare disease research. JMIR Med Inform 2022; 10(5): e32158.

31.

Luisetti

Campo

Scabini

, et al. The problems of clinical trials and registries in rare diseases. Proc 3rd Int Congr Rare Pulm Dis Orphan Drugs 2010; 104: S42–S44.

32.

Berger

Rustemeier

Göbel

, et al. How to design a registry for undiagnosed patients in the framework of rare disease diagnosis: suggestions on software, data set and coding system. Orphanet J Rare Dis 2021; 16(1): 198.

33.

Ruseckaite

Mudunna

Caruso

, et al. Current state of rare disease registries and databases in Australia: a scoping review. Orphanet J Rare Dis. 2023; 18(1).

34.

Spahr

Rosli

Legault

, et al. The LORIS MyeliNeuroGene rare disease database for natural history studies and clinical trial readiness. Orphanet J Rare Dis. 2021; 16(1).

35.

Shillitoe

Bangs

Guzman

, et al. The United Kingdom Primary Immune Deficiency (UKPID) registry 2012 to 2017. Clin Exp Immunol. 2018; 192(3).

36.

Staus

Rusch

El-Helou

, et al. The GAIN Registry - a New Prospective Study for Patients with Multi-organ Autoimmunity and Autoinflammation. J Clin Immunol. 2023; 43(6).