Abstract
Our research compares the ethical and institutional conditions that govern the sharing and secondary use of longitudinal population health data from multiple cohorts. The data use and data sharing conditions applicable to 27 population health cohorts were assessed. This assessment was performed in consulting the informed consent materials and institutional policies applicable to the use of data. Descriptions drawn from the research ethics consent materials were refined through dialog with the institutional staff responsible for overseeing access to data, where possible. Our results demonstrate that data of longitudinal population health cohorts assessed can generally be shared and used for secondary purposes. However, the purposes of secondary use and the preconditions applicable thereto are highly variable. Heterogeneous use conditions can also impede the storage of legacy research data and the pooling thereof for the purpose of common reuse.
Introduction
Our research addresses the capacity to reuse longitudinal population health cohort data in compliance with institutional policies, research ethics guidance, and the law. The ethical and institutional data use conditions applicable to 27 longitudinal population cohorts were assessed and compared to one another. The principal objective of this research was to determine if the ethical and institutional use conditions applicable to data of cohorts allow for the liberal secondary use thereof by other health institutions, and for the combination of such datasets according to compatible governance requirements.
The majority of the longitudinal population cohorts selected are participants in the euCanSHare consortium. EuCanSHare is a collaboration between research institutions in Canada and the European Union. The goal of euCanSHare is to create a centralized platform for the cross-border sharing of data relating to cardiovascular health. To this end, euCanSHare is collaborating with the principal investigators of participating cohorts to align both the technical formatting and the ethicolegal use conditions of participating cohorts as much as possible. The cohorts included in the consortium are principally longitudinal health cohorts having collected data from participants in successive waves of data collection across multiple years, many of them having first started to collect data in the 1980s.
EuCanSHare will integrate participating datasets into a common cohort browser allowing future researchers to search and access data produced in different research institutions over long periods of time. The cohort browser is a tool for platform science, serving as a long-term infrastructure for the scientific community to benefit from. The availability of such a resource encourages the future integration of additional, compatible datasets thereto. This can facilitate scientific research, reducing the need to expend financial and human resources in generating health data. Researchers can therefore make secondary use of research data available on the euCanSHare platform.
Secondary use of research data refers to the potential to retain and use such data for additional research purposes other than the primary purposes of the original research project during which such data are generated. The permissible conditions of secondary research use are established in the research ethics consent granted by research participants. Further details relating thereto are also integrated in the institutional policies and cohort-specific or project-specific governance documentation applicable to the research data, which establishes the conditions according to which such data can be used. Such conditions are usually reviewed and approved by a research ethics board, or equivalent body, as well as by other institutional personnel responsible for the oversight of research data. From a normative standpoint, these ethical informed consent requirements derive from international biomedical research ethics norms and the regional implementations thereof. Researchers are generally required to respect these norms as a precondition to performing research using the personal health data of identifiable individuals.
In sum, secondary use conditions reflect the data stewardship commitments that the original researchers and research institution made to the research participants. These conditions also integrate additional data stewardship commitments that the research institution hosting the data adheres to and imposes on the downstream users of such data. Our study considers whether the research ethics consent and associated, institution-specific governance conditions applicable to secondary data use are interoperable among the cohorts under study.
In computer science, interoperability refers to the capacity of different informatics systems to communicate and exchange data between one another. Normative interoperability refers to the potential for information to be exchanged between institutions and used for secondary purposes, in compliance with applicable ethical, institutional, and legal norms. 1 Datasets are interoperable if the ethical, institutional, and legal conditions that govern their use are compatible, even if different. Datasets are not interoperable if such ethical, institutional, and legal conditions conflict, or if the formalities that must be respected in accessing and using each dataset differ considerably. 2
If datasets are interoperable, it is possible to streamline the administration of permissions to access and use such data. This can expedite the secondary use of the data by researchers, and may allow researchers to combine such datasets or perform research projects using multiple datasets. This can be important for the purposes of replicating prior research to confirm the conclusions thereof, and to create larger research cohorts for the purposes of deriving more definitive research results (i.e., results of greater statistical significance). This can also be useful for the purposes of using existing data to answer novel research questions, without being required to recruit a novel research cohort.
Our research demonstrates that the data use conditions applicable to the cohorts studied generally allow for them to be reused for broad research purposes, and by a wide range of entities. Nonetheless, differences in the conditions that must be respected in using each dataset, and the requirement to request access to each dataset independently, could create practical barriers to the expedient reuse of existing health data by researchers for secondary research.
Materials and Methods
The list of cohorts included in our study is found in Figure 1 (n = 27). The methods used to obtain our data and draw conclusions therefrom are as follows:

Comparative table detailing the permissible secondary uses of studied cohorts. * Each of the study sites participating in the CAHHM cohort had separate consent materials, which were reviewed independently. The CAHHM nonetheless constitutes one singular cohort and has been treated as such in the discussion below. † The consent materials for this study, except data and limited other materials, are lost or otherwise unavailable. CAHHM, Canadian Alliance of Healthy Hearts and Minds.
First, the research consent materials used to gather consent from research participants were obtained from the principal investigator responsible for the oversight of the cohort, or in a limited subset of instances, from publicly accessible webpages relating to the concerned cohort. The research consent materials generally include the informed consent forms used to obtain consent from research participants, and often also include participant information forms used to describe the research project. Such documentation describes the commitments of original researchers and future researchers regarding the permissible uses of concerned data. These commitments are made to the original research participants before their agreement to research participation, and to the research ethics boards or equivalent bodies responsible for ensuring that the research performed is ethical and that the uses of samples and data are subject to appropriate oversight.
Second, supplementary documentation describing the ethical and institutional data use conditions applicable to the cohorts was obtained from the principal investigator responsible for the oversight of the cohort, or from publicly accessible webpages relating to the concerned cohort. Supplementary documentation was not available for all the cohorts, and the contents thereof varied from one cohort to another. Generally speaking, the supplementary documentation consulted includes publication policies, data access agreements and data access policies or procedures, data security policies, research protocols, and confidentiality policies. These policies capture institutional approaches to data use that reflect international bioethics norms, local law, and the usual practices of the concerned health institution.
In certain instances, original English-language copies of all of the relevant documents could not be obtained (n = 15). The software application Deep L, or an equivalent application, was used to perform the translation of such materials into English. For foreign language documents that could only be obtained in the form of images or paper copies, OCR software was first used to convert them into machine-readable text before translation.
Third, the ethical and institutional permissions described in the above documentation were synthesized into standardized profiles representing them. For a single cohort, multiple different standardized profiles could be created if the cohort had different tiers of data access permissions (e.g., open access data for nonsensitive data and managed access data for sensitive data) or if different permissions or restrictions applied to different waves of data collection. The data were organized using two methods of representation: Automatable Discovery and Access Matrix (ADA-M) profiles and plaintext descriptions.
The ethical and institutional permissions in the data were initially analyzed using the Global Alliance for Genomics and Health (GA4GH) and the International Rare Diseases Research Consortium (IRDiRC) ADA-M. 3 ADA-M profiles are spreadsheets used to describe the purposes for which data can be used and the conditions applicable to such use. The presence or absence of restrictions as to the purposes for which data can be used are expressed using one of a limited number of designated values, ranging in permissiveness from “Unrestricted” to “Forbidden.” The presence or absence of conditions of data use that relate to how data can be used is expressed in binary terms (i.e., “True” or “Untrue.”). Both of these sections are supplemented with free text fields that permit the integration of text detailing the nature of the permissions, restrictions, and use conditions described.
A number of other fields allow for information to be entered about miscellaneous topics. Such fields include descriptions of categories of data that the dataset contains, rules of interpretation to resolve potential ambiguities regarding the purposes for which data can be used, and information about the institution and principal investigators responsible for the stewardship of data. The ADA-M profiles created were supplemented with plain textual descriptions of the use conditions applicable to each dataset.
Fourth, the principal investigators responsible for the cohorts were contacted to provide confirmation and validation of accuracy of profiles and of summaries created, or to provide clarifications and changes. The clarifications and changes were integrated to the finalized ADA-M profiles and their companion textual descriptions.
The contents of ADA-M profiles were used to determine if ethical and institutional use conditions posed limits to the interoperability of the concerned datasets, and if so, the degree to which such limitations could preclude the secondary research use of data, or its centralization in a common data platform.
Two interoperability assessments were conducted. The first assessment sought to determine if data of concerned cohorts could be used to perform cardiovascular disease research, and if international data sharing could be performed. The second assessment sought to compare the categories of permissions and the data use conditions applicable to each cohort, and to determine if such permissions and conditions were harmonized across cohorts. Our results demonstrate that the data of the majority of cohorts reviewed could be shared internationally and used for the purpose of performing cardiovascular research. However, the categories of use and sharing permissible, and the conditions applicable to such use and sharing often limit the interoperability of data and could preclude the harmonized secondary use thereof.
Limitations
The limitations of our research are as follows:
First, due to the nature of documentation that was reviewed (e.g., informed consent materials, institutional policies, and agreements), the source of each barrier to interoperability identified is ambiguous. It can be difficult to establish if data use limitations arise due to local legal norms, research ethics requirements, or institutional policy, or if a limitation is imposed at the discretion of researchers responsible for the oversight of data. It can therefore be difficult to determine if the barriers identified are inspired by preclusions in local law that are relatively challenging for researchers to change, or if the barriers arise due to institutional policies or local research practices that are susceptible to variation and amendment. It is also possible that the normative barriers described in our research will be subject to shifts in the future as the law, bioethics norms, and institutional practices are modified.
Second, the characterization of ethical and institutional permissions established in institutional policy documentation and informed consent materials involves a measure of subjective appreciation. It may not always be possible to create perfectly accurate representations of ethical and institutional permissions in data, and to draw exact comparisons between the permissions applicable to different datasets. The description of ethical and institutional permissions in data is not a process of pure representation, but an interpretive process.
Third, our research pertains principally to longitudinal population health cohorts in Central and Eastern Europe in the area of cardiovascular health. A small number of cohorts included are located in other parts of Europe or in Canada. Consequently, it is possible that the results of our research would have been significantly different if the cohorts selected had related to a different area of scientific research, or if the cohorts had been selected from a different part of the world. Therefore, our conclusion that datasets tend toward normative interoperability could prove inapplicable in other contexts, for instance, in areas of health research other than population health, which do not have an established culture of retaining data for long-term use, or relative to health institutions that are exposed to alternate cultural, economic, and legal paradigms and shepherd their health data resources accordingly.
Fourth, ethical and institutional data governance practices are intimately entwined with the legal paradigms that govern the use of personal data. Data protection law is perhaps the most notable example thereof. Such regulation can constrain the transfer of data across institutional and territorial boundaries, or can establish the conditions according to which such transfers can proceed. Recent developments in data protection law include the adoption of novel or updated data protection laws in numerous countries or regions that formerly did not stringently regulate data use (e.g., California, Brazil, certain parts of Canada, and South Korea). Future changes in data protection legislation, or in the interpretation thereof by courts and regulators, could therefore affect the institutional data governance practices and normative permissions applicable to the cohort data considered.
It is also important to note that other applicable conditions not described in the documentation made available to us could materially limit the potential to reuse health data, or to share such data with other research institutions. Such conditions include intellectual property licenses, and practical limitations in ensuring that the data are meaningfully comparable from a scientific and technical standpoint (i.e., ensuring that data formats, mandatory metadata elements, and ontologies are chosen to allow the meaningful comparison of data originating from different cohorts).
Results
In the first interoperability assessment, the following potential limitations to normative interoperability were assessed:
First, do the ethical and institutional use conditions applicable to the cohort allow for cardiovascular disease research? Second, do the ethical and institutional use conditions applicable to the cohort allow for international data sharing?
The results of the review have been synthesized in the table below (Fig. 1). The results of all 27 cohorts reviewed have been integrated to the table.
The results demonstrate that the cohorts assessed generally allow for secondary cardiovascular disease research to be performed (n = 21). Some cohorts impose limitations on performing cardiovascular disease research, but do not preclude such research entirely (n = 2). One cohort appears (i.e., Monica Kaunas) to impose limitations on the potential to perform secondary cardiovascular disease research.
The results demonstrate that the cohorts assessed generally allow for the liberal international sharing of study data for secondary research purposes (n = 14). Of the remaining cohorts, four are silent to the potential for international data sharing. Three cohorts allow for the international sharing of study data once special preconditions have been satisfied. Three other cohorts impose certain preclusions or restrictions on the international sharing or international storage of study data.
Overall, the cohorts studied generally allow for both the secondary use of concerned study data for cardiovascular research purposes, and for international sharing thereof.
In the second interoperability assessment, the following potential limitations to normative interoperability were assessed:
Geographic or jurisdictional restrictions on data sharing.
Restrictions on use based on categories of entities (e.g., hospitals, universities, etc.) or sector of activities (e.g., commercial and noncommercial).
Data reuse restricted to specific categories of research.
Requirements to use data in direct collaboration with the institution having collected or provided the data.
Different data permissions across each instance of collection.
Different data permissions dependent on the consent choices of individual research participants.
Requirement that the institution having collected or provided the data, or other specified institution, perform scientific, ethical approval process, or data management directly.
Data are collected and managed without the use of formal research consent materials.
Data permissions are granted based on discretionary criteria that are difficult to automate.
Limitations on participant recontact (for future research participation).
Limitations on data linkage.
The results of the review have been synthesized in the table below (Fig. 2). The results of all 27 cohorts reviewed have been integrated to the table.

Comparative table detailing the ethicolegal interoperability of studied cohorts. † This cohort description comprises the following CAHHM collection waves: CAHHM (Alberta's Tomorrow Project), CAHHM (Atlantic Alliance), CAHHM (CARTaGENE), and CAHHM (British Columbia Generations Project), CAHHM (Ontario Health Study). The permission levels for the second interoperability assessment table were established as follows: “Data governance practices do not present such restriction” generally means that the cohort's ADA-M profile explicitly states that no such restriction is applicable. In a subset of cases, a green description has been used if the ADA-M profile is silent to the issue, but the context implies that a lack of restriction can be presumed. “Data governance practices impose limited restrictions or requirements” generally means that the cohort's ADA-M profile explicitly asserts that restrictions of the described category are applicable to the dataset, but the textual description associated with the restriction establishes that such restriction is not a significant impediment to data sharing. “Data governance practices silent to this issue” generally means the cohort's ADA-M profile is silent to the presence or absence of the concerned restriction. “Data governance practices present such restriction” generally means that the cohort's ADA-M profile imposes a strong restriction of the category described, and that the restriction can be considered an impediment to the centralization of data, or to the free movement thereof. “Data governance practices forbid this activity or impose conditions that could act as a significant barrier to data sharing” generally means that the cohort's ADA-M profile categorically forbids the concerned activity, or that the concerned restriction is present and sufficiently strong to preclude most or all data sharing. “It cannot be assessed due to a lack of documents and/or the absence of such information in the documents provided” means that it is impossible to know what the concerned restrictions are in the dataset, as the consent materials or other documents required to assess the issue are unavailable or are insufficiently detailed to provide the requisite information. ADA-M, Automatable Discovery and Access Matrix.
The results reveal the following characteristics in the normative permissions regulating the secondary use of datasets.
Availability of consent materials and secondary documentation
For the majority of cohorts (n = 22), we performed a direct review of cohort consent materials and secondary documentation. For three other cohorts, such materials had been lost or destroyed, or were otherwise unavailable, which made it impossible for us to conduct a review of their ethical and institutional data access and data use conditions (i.e., MONICA Newcastle, MONICA Warsaw, and PAMELA). For two cohorts, it was not possible to directly access the relevant materials, but cohort principal investigators (PIs) provided descriptions of the relevant access and use conditions (i.e., MONICA Brianza and MONICA Friuli).
Geographic or jurisdictional restrictions on data sharing
Roughly one-third of cohorts explicitly allow for the international and interjurisdictional sharing of cohort data on the same conditions as local data sharing (n = 9). Certain cohorts allow for the sharing of data generally; however, these cohorts do not explicitly specify if data can be shared with other countries or other jurisdictions (n = 8). Of the remaining cohorts, many allow for the international or interjurisdictional sharing of cohort data, subject to certain limitations (n = 6). Limitations observed include the requirement that the recipient jurisdiction conforms to certain specified data protection requirements. Other limitations observed include the requirement for applicant researchers to obtain the explicit approval of cohort PIs to access data from a third country. One cohort, MONICA Kaunas, does not allow for the international or interjurisdictional sharing of data at all.
Entity-specific or sector-specific limitations
Five of the cohorts studied explicitly allow for the sharing of data with institutions and entities of all nature (i.e., commercial and noncommercial entities, academic and industry researchers, etc.). Of the remaining cohorts, three are silent as to the categories of entities that can access the cohort data.
Ten cohorts describe the entities that can access their data in generally permissive terms that nonetheless impose certain limitations. Of these 10 cohorts, some stipulate that broad categories of entities can access their data, which can include companies, hospitals, research institutions, and universities (e.g., Barts Bioresource), or that research groups specializing in specific categories of research can access the data (e.g., Northern Sweden Health and Disease Study MONICA 2004, 2009).
Two of these 10 cohorts generally do not allow external entities to access their data, but reserve the possibility to grant external entities or collaborators access to data subject to exceptional permissions (i.e., Hamburg City Health Studies and MONICA Novosibirsk).
Of the remaining cohorts, three preclude the commercial use of data or its sharing with commercial entities (i.e., ESTHER, KORA, and the Moli-Sani Study), and one precludes the external sharing of data altogether (i.e., MONICA Kaunas).
Collaboration requirements
The majority of cohorts studied do not require entities accessing their data to collaborate with the original research institution in performing secondary research (n = 15). Seven cohorts do, however, impose some form of collaboration requirement applicable to the use of their data. These can range from a more limited obligation to list the cohort's original study PIs as coauthors of publications, to a more onerous obligation to collaborate directly with the PIs from the original research institution in performing secondary research.
Reuse restricted to specific categories of research
Most cohorts impose certain restrictions regarding the categories of research that can be performed using their data (n = 18). In contrast, some cohorts allow for secondary research writ large to be performed using their data (e.g., CAHHM and U.K. Biobank). Other cohorts still allow for broad secondary use of data for health research purposes, but explicitly enumerate the categories of research for which the data can be used (e.g., ESTHER and Hamburg City Health Studies). Certain cohorts, especially those that have been collecting data for a long time, frame the permissible purposes of data use more narrowly. For example, some cohorts only allow the secondary use of data to perform research related to specific diseases, such as cardiovascular disease (e.g., Barts Bioresource and MATISS). The shift toward more permissive secondary use conditions in more recently collected cohorts reflects the growing acceptance of consent for broad secondary use purposes among the international research community.4–6 Indeed, obtaining an informed broad consent to future research use should be favored.
Different data permissions within cohort records based on participant choices
Perhaps due to the epidemiological and longitudinal biobanking (resource) nature of majority of cohorts studied, most cohorts did not allow individual research participants to independently select among multiple consent options regarding the permissible uses of their data (n = 16).
For some cohorts, however, different research participants may have provided consent to different categories of permissible uses within a single cohort (e.g., NSHDS Monica 2014). If different research participants provide consent to different options within a single study or single longitudinal cohort, it is important to consider how such choices can be efficiently represented in metadata without requiring researchers to manually verify the different consent options on each consent form at the time of reuse.
Some cohorts have allowed individual research participants to make personalized choices that do not materially affect the research purposes for which data can be used. Such choices can include decisions as to the return of material incidental findings or the linkage of study data to health record data (e.g., MOLI-SANI study).
Specified institution performs permission management
For the majority of cohorts studied, a specified research institution or a named body (e.g., a data access committee [DAC], steering committee, or regional ethics review board) must review and approve each application for access to data (n = 15). If cohorts require a specified research institution to perform the oversight of access to their data, this could be an impediment to the common management of permissions in data from multiple cohorts by a central oversight body. 7
Discretionary permission management criteria
Approximately half of the institutions reviewed explicitly incorporated contextual analysis and discretionary decision-making powers to the access processes used in according access rights in their data to external researchers (n = 13). It would be difficult to standardize data access processes for such cohorts, and the preconditions for accessing such datasets would prove difficult to represent in a standardized or uniform manner.
Limitations on participant recontact (for future research participation)
The majority of cohorts studied had not established through their documentation and local practice whether it is possible to recontact research participants to invite them to participate in future research (n = 15). Of the eight cohorts that did explicitly address the recontact of research participants for future research purposes, many established that only the original study team or original research institution could initiate such contact. This is consistent with prevailing norms in bioethics, which generally do not allow third parties or other researchers to initiate contact with historical research participants. 8 One cohort explicitly precluded the possibility of recontacting existing research participants for the purpose of requesting their participation in future research (i.e., ATBC).
Limitations on data linkage
For slightly less than half the cohorts studied, the relevant consent materials and institutional procedures are silent to the potential to link participant research data with other sources of information about the concerned individual (e.g., medical records, administrative health databases, etc.) (n = 11). The remaining cohorts allow for external sources of data about the research participants to be linked to study data (n = 13). In the majority of instances, the data linkage is required to be performed by the institution responsible for collecting study data at the time of original data collection, or by another designated body that is specialized in performing health data linkage. The materials studied generally establish the categories of data can be collected through linkage. Usually, such data are limited to clinical data about the research participant extracted from their health records, or a subset thereof that is of clear relevance to the goals of the study. Linkage here refers to the potential to combine multiple sources of data about a singular individual, rather than the potential to combine different sources of data about different individuals.
Discussion
Our research on the ethics and data governance conditions of population cohorts reveals common measures regulating data availability for future use by other researchers. Such measures are implemented through the informed consent process, and are reinforced through institutional policies and practices. As already mentioned, we did not examine the constraints of data protection legislation.
Territorial restrictions to the sharing of concerned longitudinal cohort data do not appear to be significant. Generally, data are available for reuse by a wide range of entities and for a broad range of secondary research purposes. Provided that the sample of research cohorts surveyed is representative of the population research community, it appears that the research data of longitudinal population health cohorts can generally be used for broad secondary research purposes by a large number of entities.
However, a number of barriers to the interoperable secondary use of data still exist. The research purposes for which data can be used are often heterogeneous. Each health research institution generally requires its own custodian to review applications for secondary access to data, and further, the considerations accounted for by each institution are variable. For these reasons, it may be challenging for researchers to gain access to the population health data of a large number of cohorts to perform comparative research. The procedures to obtain access to data may need to be performed independently to gain access to data from multiple cohorts, and the compatibility of data use conditions applicable to each dataset may need to be verified on a case-by-case basis.
Consequently, the principal inference drawn from our research is that the data of longitudinal population health cohorts can be used for broad secondary research purposes. However, the unharmonized nature of access policies, variable conditions of use, and eligible research purposes applicable to existing legacy datasets could act as practical barriers to external researchers obtaining convenient access to existing research data for secondary research purposes. This is especially the case if researchers intend to use multiple datasets in the ambit of one research project, as the researchers may need to duplicate their access processes for each cohort and carefully monitor the compatibility between the data use conditions applicable to each dataset.
Some of the barriers to normative interoperability identified could limit the integration of datasets of differing provenance to a single research initiative. Examples thereof include collaboration requirements and discretionary or institution-specific permission management procedures. These barriers to interoperability cause misalignment between the conditions of use applicable to multiple datasets, rather than precluding the secondary use of the affected datasets altogether. Such misalignment can subject the combinate use of affected datasets to unduly onerous, or contradictory, preconditions of use due to the cumulative character of use conditions applicable to each.
Other barriers to interoperability could entirely frustrate the secondary use of data, even absent the need to combine the concerned data with data of differing provenance. Examples thereof include territorial restrictions on data use, as well as the limitation of data use to specified categories of institutions or to noncommercial research purposes.
This allows us to glean the following insight. Use conditions that prescribe which categories of entities can use data or the purposes for which data can be used could limit the future usability of such data, by narrowly limiting the actors that can avail themselves of such data. Conversely, use conditions that require specified formalities or preconditions to be discharged before data use risk prohibiting collaborative or combinate uses of such data. Datasets that are most useful in combination with other datasets should therefore not use burdensome options or governance formalities that are applicable on a dataset-specific level, as this could unduly impede the utility of such data by precluding its effective combination (e.g., population health data that are best suited to comparative research, or genomic data that are best suited to pooling and subsequent analysis using computational techniques).
To further ensure that data can be used interoperably for secondary research purposes, the following measures are recommended. In the future, research consortia or research networks should harmonize the permissible purposes for which data can be used before the collection of such data. The procedures anticipated to apply to the oversight of data, and the conditions that researchers must respect in using it, should be made consistent across institutions, where possible. This can be achieved by ensuring that the content of informed consent materials is compatible and anticipates further data sharing. This can further be achieved in formulating compatible data access procedures and data use conditions at each institution participating in a consortium. The latter objective could be difficult to achieve if local laws, local research customs, or applicable research ethics guidance is variable across multiple participating institutions. 9
Moving forward, for research data to be interoperable with the data of other research institutions and research data gathered from other countries, it is recommended to obtain consent and ethics approval for all the following uses, when appropriate, based on the nature and purpose of the research and the populations under study 10 :
Further biomedical research
Open-access sharing of aggregated or anonymized data
Controlled-access sharing of individual-level identifiable (coded) data
International sharing of research data
The use of data for both commercial and noncommercial research purposes
The storage of data on a cloud platform
The use of data in combination with other sources of research or health data
However, if researchers intend to create a research consortium or a research network using data that have already been collected and are governed by a pre-existing legacy consent, the following practices can be followed to integrate such data to the consortium or network:
First, researchers should establish the anticipated purposes for which the consortium data are to be used and the formalities governing such use. Second, researchers should compare the permissions enshrined in the informed consent materials applicable to each dataset intended for contribution to the consortium and determine if the conditions consented to are compatible with the use conditions intended by the consortium. Third, for datasets that do not meet the minimum requirements required to contribute data to the consortium, it is a recommended best practice to recontact the research participants concerned by the data and obtain a new consent to include those individuals' data in the consortium. If it is impracticable or impossible to obtain new consent to include data in the consortium, then the data can instead be incorporated to the consortium upon obtaining an ethics waiver of consent from the local research ethics board, or the local equivalent thereof, when permissible by law.
Anonymized data, or sufficiently de-identified data, should be centralized in a common, open database. Applications for secondary access to coded data should be subject to the oversight of a DAC or similar body (i.e., managed/controlled access). This can facilitate the performance of research using multiple datasets from a singular consortium, and ensure that researchers have consistent and equitable access to the consortium's data. 11
Robust data governance and the appropriate supervision of data access and data use should be favored as mechanisms for protecting data from misuse, while promoting data-driven scientific research. Specialist personnel, technological safeguards, staff training, and the adoption of policies and strategies on data stewardship are all mechanisms that can ensure a high standard of data governance is maintained across all of the participating nodes of a research consortium. 12
Restricting data use based on the intended categories of research or on a territorial basis, or through the use of complex formalities should be avoided where possible and permissible. Such limitations can bar scientists from using health data for secondary research purposes. While protectionist on their surface, they do not demonstrably create additional protection for individuals concerned by data. 13
It may be necessary for researchers to adjust their data governance practices and informed consent processes to mitigate risks applicable to vulnerable groups, and to respect the practices of specified communities. Researchers collaborating with carceral populations, pediatric research participants, and other vulnerable groups should consult stakeholders from the concerned communities and representative groups that advocate for their interests, and should respect relevant legal norms and community standards regarding the stewardship of such data (e.g., laws that govern the research participation of minors, self-governance principles applicable to data relating to specific subpopulations).
Conclusion
While at first glance, the overall results of our study bode well for some degree of international data sharing in euCanSHare, the kaleidoscopic granularity of Figure 2 reveals both the ongoing barriers to data sharing and the consent elements needed to foster future interoperability. The fact that a number of cohorts date back to 1980s may offer some explanation for current challenges, but the temerity of research ethics boards in their interpretation of consents is also a contributing factor. Irrespective, our consent recommendations, if adopted moving forward, will hopefully serve to prospectively build new collaborative opportunities. However, more is needed. Data privacy and data protection rights (e.g., the general data protection regulation; data nationalism), are not and should not be, in opposition to data sharing. In the health sector, both serve individual and societal interests. 14
Indeed, the heterogeneous nature of modern populations requires representative genomic reference maps that reflect such diversity to target resource allocation and build risk stratification categories for subpopulation screening strategies in modern, sustainable health systems. This is in the interest of all citizens and the secure linkage of their medical records with health administrative databases in real time for access by researchers will ensure the building of clinical support tools for precise diagnoses when citizens become patients. 15
Perhaps a more dynamic legal catalyst is needed for such systemic change—one that will focus beyond antithetical dualism such as privacy and autonomy and restructure the foundations of the freedom of biomedical research as an expression of the universal human right of everyone to benefit from science and its applications.16–18
Footnotes
Authors' Contributions
Conceptualization: A.B. and B.M.K.; data curation: A.B.; formal analysis: A.B.; funding acquisition: B.M.K; investigation: A.B.; resources: B.M.K; software: A.B.; visualization: A.B; writing—original draft: A.B. and B.M.K.; and writing—review and editing: A.B. and B.M.K.
Acknowledgments
We wish to acknowledge the contributions of Anne-Marie Tassé and Myriam Koayes, who initiated the collection of information from the participating cohorts and the creation of their structured profiles as part of their work for the Population Partnerships Project for Genomic Governance (P3G2). We also wish to thank Teemu Niiranen and Kari Kuulasmaa for their tireless help in obtaining data of MORGAM cohorts and in communicating with the MORGAM cohort PIs. The authors also wish to thank the principal investigators and institutional personnel of the participating cohorts, who dedicated their time and energies to providing us with the information needed to conduct our study, and helped us to interpret the results.
Ethics Approval
The research does not require ethics approval.
Animal Research
The research does not involve animals.
Consent to Participate
The research does not involve human participants, human data, or human biological samples.
Consent for Publication
The research does not involve human participants, human data, or human biological samples.
Availability of Data and Material
The data (ADA-M profiles and textual summaries) used in this study are available on request. The data have been published on the euCanSHare cohort browser and will be available to researchers through that portal, once the portal is made available to the public.
Code Availability
The research does not involve code or software.
Author Disclosure Statement
No conflicting financial interests exist.
Funding Information
EuCanSHare: “EuCanSHare: an EU-Canada joint infrastructure for next-generation multi–Study Heart research”/Recherche de Santé du Québec 278114/Canadian Institutes of Health Research 16033/European Commission (2018–2024) Canada Research Chair in Law and Medicine.
