Abstract
The recent progress of genomics research is providing unprecedented insight into human genetic variance, susceptibility to disease and risk stratification. Current trends predict that a massive amount of genomic data will be produced in the upcoming years which, when coupled with the fast-paced development of the field, will create new social, ethical, and legal challenges. In the complex legislative environment of the European Union, genomic data sharing policies will have to weigh the benefits of scientific discovery against the ethical risks posed by the act of sharing sensitive data. In this complex, interconnected environment, blockchain provides a unique and novel solution to accountability, traceability, and transparency issues regarding genomic data sharing. Implementing a distributed ledger technology-based database could empower both patients and citizens to responsibly use genomic data pertaining to them because it allows for a higher degree of control over the recipients of their data and their uses. The blockchain technology will engage both data owners and policymakers to address the multiple issues of genomic data sharing and allow us to redefine the way we look at genomics.
Main Piece
Genomics enables researchers to use powerful computational and statistical methods to decode the functional information hidden in DNA sequences. 1 Progress from the analysis of whole-genome sequencing data provided unprecedented insights into human genetic variance, disease, evolution, and migration patterns. 2 Estimates based on the current growth trends predict that genetics research will generate between 2 and 40 exabytes (million terabytes) of data within the next decade. 3
This amount of data has the potential to transform health care, revolutionize the understanding of human health and disease, and lead to a personalized approach to prevention and care. Big data allows for an in-depth assessment of health outcomes based on patient characteristics; genomic biomarkers are increasingly used to stratify disease outcomes and susceptibility into subgroups that reflect underlying heterogeneity and potential response to interventions. 4 For example, one study compared the clinical radiologists’ original decisions with a big data system's cancer predictions based on a biopsy that confirmed breast cancer. The big data approach results had 5·7% fewer false positives and 9·4% fewer false negatives than radiologists’ results. 5 Another example regards the screening for diabetic retinopathy. A deep learning system, fuelled by genetic data, demonstrated enhanced health care decision-making with automated image analysis and accelerated the process of manual evaluation of image data to screen for eye disease in diabetics. 6 Genomics can improve individual and population health outcomes, but making genomic data widely available creates ethical, legal and social challenges.
In February 2021, the EU kicked off the Joint Action “Towards a European Health Data Space” (TEHDAS), which will further develop the principles regarding secondary use of data (including genomic), promote health data sharing between member states, support scientific research and foster digital health services in which clear liability rules are implemented and enforced. Additionally, many national governments and supranational organizations are collaborating to make genomic data sharing easier and more feasible. In the EU, the “1+ Million Genomes” (1 + MG) initiative brings together 22 EU countries, plus the United Kingdom and Norway, to ensure secure, federated access to genomic data, while also keeping it legally and ethically safe.
However, the EU has a complex legislative environment regarding data protection. Article Eight of the EU Charter of fundamental rights stipulates the right of EU citizens to the protection of personal data, fair processing for specified purposes, and the necessity of consent. 7 Every individual has the right to access and rectify personal data concerning them that has been collected. The 2016 General Data Protection Regulation (GDPR) further strengthens individuals’ fundamental rights in the digital age. Under Article Four of the GDPR, genetic data is included in the catalogue of special categories of data and requires explicit consent for its use and processing. 8 Thus, in the EU, the discussion of genomic data sharing is still open and emerging technologies, such as blockchain, will play a key role in addressing priority issues.
Data sharing policies must weigh the scientific discovery benefits against the ethical risks posed to the people contributing their data and delicately balance the appropriate level of shareability (for the benefit of the community) and individual protection. Although some consent procedures for genomic data sharing already exist, full genomic data ownership is a central focus because it allows individuals to personalize the degree to which their data is shared and the recipients of such data. 9
Even though publicly shared data are fully anonymized, they may still be sensitive to breaches via different routes, such as cross-referencing different sources or identity tracking. A blockchain is a database containing information on transactions that occurred within a given network, to be simultaneously used and shared within a large decentralized, publicly accessible network. 10 Its name is related to the blocks of transaction data being chained together by incorporating a digital fingerprint of each block in the following one. Consequently, each block is tightly linked to the previous and the following one, making it impossible to alter one without affecting the others as well. 11 Once the block is filled with data, it is timestamped and chained onto the previous block, hence making the data chained together in chronological order.
The distributed ledger technology is the technological infrastructure and protocol that allows simultaneous access, validation, and record of blockchain systems plus updates them in an immutable manner across a network that is spread across multiple entities or locations. 12 Distributed networks eliminate the need for a central authority to keep a check against manipulation, while also guaranteeing that information is stored in a secure and accurate manner using cryptography. In addition, once it is stored, the information cannot be edited or modified, and it becomes governed by the rules of the network. The very nature of such a decentralized ledger makes it immune to cyberattacks, as hackers would need to attack all the copies stored across the network at the same time. This also avoids the centralization of knowledge, as the individual blocks do not have access to information stored in other blocks. In centralized forms of data storage, instead, companies or entities in charge of the database have full control over the data, hence raising doubts about data security and reliability. This kind of network has unlimited uses: the most famous and controversial one is Bitcoin. 13 But in the field of data sharing, blockchain opens a world of possibilities, especially when the type of data being handled is particularly sensitive, as in the case of genomic data.
A successful case study is represented by the Estonian application of blockchain technology to health care. Starting in 2016, the Estonian government applied blockchain to store medical records with immediate, positive results: it eased the sharing of data among authorized parties and bridged traditional data silos, it decreased medical care costs through better insurance claims and, finally, improved data mining through the immutable records maintained by the blockchain. 14 Similarly, a prospective cohort study investigated the use of a blockchain-authenticated platform for genomic data sharing, together with de-identified electronic health records and imaging data, yielding encouraging results. The investigated blockchain-based pilot infrastructure was not significantly different in the completeness of available data when compared to the gold standard data-sharing platforms but it was more robust when recording certain data elements in the registry format, making them more accurate for analyses that aim to further personalized medicine. 15
People's motivation to share their own genomic data can range depending on several variables, including people's familiarity with the concepts of DNA and their trust in the recipient. 16 Ensuring data owners feel empowered about the choice of who gets to use their data and for what purposes is a key issue to address in order to increase data sharing for secondary use and promote the use of genomic big data for disease outcome stratification and accurate prediction of response to pharmaceutical or clinical interventions. 17 By using a blockchain-based sharing platform, all information regarding data access and management is transparent and allows the user to be aware of all stakeholders using the data at any time, for any purpose. The blockchain's innate accountability and traceability of transactions are helpful in clarifying how, when, and how many times a user's genomic data is transferred.
In addition, using blockchain-based services allows tokenization of genomic data, allowing users to exert a form of control over sharing their own data. Young businesses, such as Genecoin, built a marketplace-like system where consumers can trade their private genomic data for non-monetary tokens that users can later exchange for goods and services. This resulted in increased user satisfaction, increased interest in how data was used and users being more prone to sharing their data when asked by different entities from the one storing the data. 18
Although blockchain is still a relatively new technology, it could be a real game-changer in the field of genomics and beyond. Blockchain gives stakeholders novel tools to simultaneously protect sensible data and empower data owners to exert full control over their use, as it allows a case-by-case consent evaluation for each data access query. Progress in the field of personalized medicine will urge policymakers to address the multiple issues of data sharing and cement blockchain as a promising technology to redefine the way we look at genomics.
Footnotes
Author contributions
FC conceptualized and wrote (review and editing) the paper. FB wrote (original draft) the paper. FAC searched the literature, conceptualized and wrote (original draft) the paper. AG conceptualized and wrote (review and editing) the paper. AM searched the literature, conceptualized and wrote (original draft) the paper. SB revised the paper critically for important intellectual content. WR conceptualized and wrote (review and editing) the paper.
Conflicts of interest
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship and/or publication of this article.
Guarantor
FC
