Abstract
The exposure of clinicians to patients with rare gastrointestinal diseases is limited. This hurts clinical studies, which impedes accumulation of scientific knowledge on the natural disease course, treatment outcomes and prognosis in these patients. An excellent method to detect patterns on an aggregate level that would not be possible to discover in individual cases, is a registry study. This paper aims to describe a template to create a successful international registry for rare diseases. We focus mainly on rare hepatic diseases, but lessons from this paper serve other fields in medicine, as well.
Keywords
Introduction
Increasing our knowledge about rare liver disorders, commonly defined as a disorder that affects <1 in 2000 citizens, is imperative. 1 Because most physicians are not exposed to large numbers of rare disease patients, their knowledge on the natural course, treatment response and prognosis for that rare disease is incomplete. These difficulties clearly limit our understanding and are an obstacle for research efforts to improve the outlook of patients with rare diseases.
Registries may be the answer to the lack of solid evidence. By definition, a registry is an organized system that uses observational study methods to collect existing or uniform clinical data from individual patients. 2 A registry offers a unique opportunity to conduct research on populations and conditions that are not generally studied in clinical trials, yet are important to clinical decision-makers. 3
The steps in creating a registry study do not differ much from the implementation of a clinical trial. All the fundamental elements, such as design, study population, timeline and data management are likewise present. By contrast, there is no standard guidance as to how to design a registry. A helpful open access resource is
Main aspects in the design of a clinical registry.
Objectives
Examples of multi-country liver disease registries
DILI: drug-induced liver disease; LTx: liver transplantation; PBC: primary biliary cirrhosis; PLD: polycystic liver disease; PSC: primary sclerosing cholangitis.
Natural course, quality of life and epidemiology
One of the goals of a registry could be to study the natural course of disease and associated factors. We designed a polycystic liver disease (PLD) registry with exactly this in mind. PLD is a disorder where patients progressively develop liver cysts. Information on the natural course of PLD, and answers to questions such as what are the predictors of an aggressive disease course are lacking, to date. This registry will help us to elucidate the behavioral risk factors for disease and assess the differences in treatment choices between countries.4,5
The UK Primary Biliary Cirrhosis (UK-PBC) collaboration is an excellent example of a network that already established a large successful national registry. 6 Primary biliary cirrhosis (PBC) is a rare disease (with a prevalence of 30 per 100,000 individuals in the population) with a highly variable phenotype and a high prevalence among women (the male to female gender ratio is 1:10). 7 The sheer size of this registry makes it possible to study the clinical profile seen in a subgroup of male PBC patients. In addition, this consortium recently developed a UK-PBC risk score, to assess prognosis in PBC patients. 6 Finally, this registry enables mapping of the natural history of the disease in the total PBC population, to link genetic susceptibility with phenotype and outcome, and to study the impact of PBC on the patients’ quality of life.7,8 On a different note, registry studies facilitate studies on incidence and prevalence. A requirement is that they sample cases from a confined geographical area. Studies from the Primary Sclerosing Cholangitis (PSC) Study Group are a fine example, where all PSC patients in an area of six adjacent provinces were identified, comprising 50% of the Dutch population. 9
Long-term efficacy
In order to study the long-term efficacy of therapeutic interventions, a registry is a perfect tool. Indeed, the relative probability of death and graft loss after primary liver transplantation (LTx) for a number of rare liver disorders is difficult to estimate. This is the reason for the European Liver Transplant Registry, 10 which collects data on death and graft loss as rare outcome measures in 8,840 transplanted patients.
Safety
A patient registry can be used to investigate safety, by collecting data on the unexpected adverse events of drugs. Drug-induced liver injury (DILI) is the most cited reason why approved drugs are withdrawn from the market by the US Food and Drug Administration (FDA). 11 Bromfenac and troglitazone are two well-known examples of drugs that were withdrawn because of severe hepatotoxicity that became apparent in the post-approval period. 12 A specific registry, such as the Spanish DILI Registry, collects real-life data of drug safety; and therefore, allows better estimation of the magnitude of side effects of a drug, in terms of incidence or prevalence.
Cost-effectiveness
Registries are a tool to investigate cost-effectiveness. This has become an important aspect of the market access package for novel interventions. The National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) database has been used to measure comparative treatments and the cost-effectiveness of treatment modalities for hepatocellular carcinoma (HCC). This has resulted in a clear picture of the costs of treatment modalities (LTx, chemotherapy, radiation, resection or no treatment) over various HCC stages, in relation to survival (effectiveness). 13 It goes without saying that registries such as the SEER database can be used to address other related questions. 14
Materials and methods
Study population
Target population
The purpose of a registry is a key factor that determines the target population. This is the population for whom the results are relevant, but at the same time are the source of the registry data. The actual population is a mere reflection (and probably a fraction) of the complete patient population. Only in the case of an extremely rare disease is it possible to reach a coverage rate that approaches completeness. For example, the Dutch national Multiple Endocrine Neoplasia Type 1 database has been able to capture >90% of the total patient population in The Netherlands. 15 This contrasts with the situation in PBC, as the UK-PBC group has managed to include approximately 25% of all PBC patients in the UK. 7
In order to appreciate the variability in phenotypic presentation of a disorder such as PLD, it is paramount to sample a large number of patients whom are followed for a considerable time period. We have found it difficult for PLD to have a watertight disease definition. A cut-off of the number of cysts (as the presence of >20 liver cysts) is rather arbitrary and is not always strictly used by physicians. Some PLD mutation carriers (who most likely will develop the disease phenotype, with time) do not have the required number of cysts and may be asymptomatic at the time of inclusion. The use of overly strict inclusion criteria enhances the risk of exclusion of relevant patient populations, which leads to sampling bias, compromising external validation of results. Therefore, it is key to consider the consequences of having too strict inclusion criteria.
For some diseases, there is a wide variation in terms of the disease complexity and the treatment strategies used between university and general district hospitals. In view of this, the UK-PBC consortium managed to include thousands of patients from general centers, as well as specialist centers across the entire UK. 7 This resulted in a geographically representative cohort, avoiding specialist center bias.
A large epidemiological study in PSC patients highlights the influence of selection and/or referral bias in population-based studies. The median survival until liver transplantation or PSC-related death was 13.2 years in tertiary referral centers, while transplant-free survival was 21.3 years in the total cohort (
Design
International collaboration
National and international collaboration are crucial, in order to collect a large study population. Isolated PLD is a rare liver disease with a prevalence of 1 in 158,000 people, and may also occur in the context of autosomally dominant polycystic kidney disease, which carries a prevalence of 1 in 1000.16,17 Currently, our local registry includes approximately 500 patients. We used our professional network, established for clinical trials, in order to achieve a larger study population. Promoting your registry online or by presentations supports visibility of the project, and enables collaboration with international researchers.
The global PBC Study Group is a multicenter collaboration between 15 centers that have developed a registry, including the medical information of almost 5000 PBC patients in Europe and North America, based on individual databases. 18 These data were used to develop a validated scoring system to predict transplant-free survival in ursodeoxycholic acid-treated PBC patients and to elucidate predictors for development of HCC.19,20 International successes like these emphasize that combining several national databases constitutes a unique opportunity to obtain the power to execute studies.
International cohort studies facilitate our understanding of heterogeneity in rare diseases, by stratification of the at-risk groups. Risk stratification helps to identify the patient subgroups with low and high risk profiles; and allows us to select the patients whom have the greatest potential to benefit from treatment.21,22 The GLOBE-score is a validated risk stratification tool that predicts transplant-free survival of PBC patients whom were treated with ursodeoxycholic acid, leading to more stratified and evidence-based individualized care. 19
Stakeholders
In the process of creating a registry, it is pivotal to consider the target audience for whom the outcomes matter. The identification of stakeholders is key to help determine the objectives of a registry, as they have an essential role in using or disseminating the results from a registry. Patients, physicians, scientific societies, insurance companies, hospital staff and policymakers who may have a vested interest in the development of the registry, should be involved; and they are needed for public support. Some key success factors are engagement, i.e. the active influence on registry-shaping and long-term commitment. This can be achieved by organizing open sessions with different stakeholders, to introduce the concept of a registry in an early phase of registry development. In addition, it is important to motivate all parties by making the benefits of the registry visible. For example, authorship is important for the visibility of individual participants; and it is advisable to set up agreements on authorship, early in the process of registry development.
Data management
A reliable data management system is essential. Direct communication between electronic patient records and registries would be ideal for the collection of registry data, as it saves money and time. Since most hospital systems are not yet set up to accommodate this, the most accurate and reliable method to collect data is through the creation of a web-based data management system. Though costs are higher in comparison to a non-electronic data management system, it enhances quality; as validation rules can be formulated that allow monitoring of data integrity. The host of a web-based registry can determine which roles the data collectors will have in the electronic environment. Every role comes with its own responsibilities. There can be a role for the patients, in order to complete a questionnaire, or for researchers who collect their medical data. Another benefit of a web-based registry is that it allows decentralized data entry; and thus, the possibility to collect data internationally. Examples of electronic international registries are the Hepatitis C virus-TARGET and the Hepatitis Delta International Network, both of which were used for longitudinal observational studies.23,24 An electronic registry is a financial investment, but in view of quality monitoring and efficiency, it will certainly pay off.
Timeline
Registries can have a fixed or open-end timeline, depending on the overall purpose of the registry. Most studies using a registry as an observational method have begun as open-ended projects, without a pre-defined stopping point. If continuation of the registry does not add any valuable information to the already captured data, the registry should be terminated and its data reported. 2
Data collection
Data elements
Data collection is a time-consuming process; and it is essential to consider all data elements that are central to the objective of the registry, to avoid the collection of high volumes of data with limited value. What helps in this process is to divide the main goals into specific objectives, subdividing further into measurable outcomes. 2 For example, our goal is to study the natural and clinical course of PLD, so one important objective is to obtain information on the determinants associated with treatment. As such, we need to include at least the following elements: current age, gender, age at diagnosis, date of first treatment and treatment strategy. We used an expert panel in order to capture all the relevant variables. Ultimately, a small number of the most important variables remained. 25
Self-reported data by patients and patient-reported outcome measures
Collection of variables in patient registries can be performed by patients, researchers or physicians, depending on the origin of data. Another option is to involve patients in this process. The UK-PBC group has utilized this concept, as the authors used self-reported information from a large national cohort of PBC patients (
As the patient’s view on their health status and treatment preferences has obtained a central position in the choice of treatment strategies, it is desirable to include the patient-reported outcome measures (PROMs). PROMs are ideal instruments to measure health gain.26,27 This development is endorsed, as illustrated by the guidance on PROMs that is offered by the US FDA. 28 Web-based questionnaires are an ideal modality to collect PROMs. 29
Data quality
Data quality and monitoring
All elements that are included in a registry should be pre-defined; so that during data collection, it is clear to the data collector which information should be entered. For our PLD registry, we tested whether all definitions were interpreted in the same manner, by performing a pilot study. Two researchers collected data from medical records from the same patients, and the results were compared. We were able to clarify obscurities and vague definitions, and include some missing questions or variables. In order to verify the reliability and reproducibility of data, several options are possible. The gold standard for data entry is the double entry of 5–10% of all patient points, to check and verify. 30 An even better option is to include a quality and control committee, for central and/or local monitoring, in order to guarantee the quality of data. Such a committee should monitor electronic data collection and visit different sites for quality checks. By formulating validation rules in the electronic data management system, the incorrect or inconsistent (for instance pre-menopausal status in men) data can be easily found and rectified.
Handling missing data
Registry data that are often routinely collected bear the risk of incompleteness. In order to deal with this during data analyses, there are several options. Imputation, a statistical method that replaces missing data with substituted values, may be applied here. There are several imputation techniques, but multiple imputations that replace missing data by the average of the outcomes across multiple imputed data sets, is the most popular. The main advantage of multiple imputations is that the sample size and variability is preserved. 31 The global PBC studies adjusted for missing data by multiple imputations, which did not affect the results.18,19
Privacy: anonymous data entry
Anonymous data entry in research is important, particularly for rare disease registries, as the patients may be traced back easily. According to privacy rules, the patient names should be substituted by specific codes. We used anonymous codes for all the PLD patients in our registry; and separated codes for their country and hospital. In order to trace back patients during follow-up, we use decoding lists for every center; including the research number, gender, birth date and hospital number. There needs to be caution taken to check the registries for double inclusion of patients. This can be performed by checking the names; and if needed, the data of patients with similar birth dates.
Conclusions
The use of registries in medical science clearly rises up to offer the opportunity to fill in important gaps in knowledge about rare diseases, through national and international collaboration. This paper provides a framework for the development of a clinical registry and includes the important aspects that need attention during this process.
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Conflict of interest
None declared.
