Abstract
With the advent of the big data era, data security issues are becoming more common. Healthcare organizations have more data to use for analysis, but they lose money every year due to their inability to prevent data leakage. To overcome these challenges, research on the use of data protection technologies in healthcare is actively underway, particularly research on state-of-the-art technologies, such as federated learning announced by Google and blockchain technology, which has recently attracted attention. To learn about these research efforts, we explored the research, methods, and limitations of the most widely used privacy technologies. After investigating related papers published between 2017 and 2023 and identifying the latest technology trends, we selected related papers and reviewed related technologies. In the process, four technologies were the focus of this study: blockchain, federated learning, isomorphic encryption, and differential privacy. Overall, our analysis provides researchers with insight into privacy technology research by suggesting the limitations of current privacy technologies and suggesting future research directions.
Keywords
Introduction
A paradigm of the fourth industrial revolution, the demand for technologies that process and analyze big data in the healthcare sector has led to active research in the field. However, as the use of medical data increases, issues regarding the protection of personal information in the data are being raised. The most prominent issue is the area of personal information protection, as electronic records and billing data used in research contain personal information, such as patient gender, age, and address. This has led to many medical data breaches, and cases of personal information breaches caused by this can be found in many papers. 1 Many countries have enacted laws to prevent such information from leaking, such as the Personal Information Protection Act (PIPA) in Korea 2 and the General Data Protection Regulation (GDPR) in Europe. 3
Due to such laws, conducting epidemiological research using patients’ personal information contradicts protecting their personal information. 4 Therefore, various attempts are being made to analyze medical data while protecting patients’ personal information, specifically using privacy protection technologies. Examples of such technologies are differential privacy (DP) and homomorphic encryption. More recently, concepts, such as blockchain and federated learning, have also been utilized to ensure the privacy of personal information.
When blockchain is decentralized, it shows excellent performance in the security sector, indicating that patients’ data can be used safely. 5 Blockchain has also been applied in the field of drug supply network management. However, since the risk of counterfeiting is high, it is applied only for certification, sales, and distribution monitoring. For the initial application, drug transactions were monitored by storing all the information on drug transactions in a blockchain. 6 Another technology, federated learning, enables data diversity when applied in the medical field. Federated learning improves artificial intelligence (AI) learning for patients with similar patient symptoms by connecting data from multiple medical institutions. Furthermore, the more data collected for similar patients, the greater the predictive accuracy of AI for the patient group. This can provide valuable insight to clinicians along with individual physician findings. 7
Which methods were used to store and protect medical information before these developments emerged? Personal information protection technologies can be divided into two categories: firewall and encryption. A firewall is a type of network security that manages data entry and exit by monitoring received and transmitted information. 8 Although building a firewall on a network is expensive, it is generally one of the most effective data protection methods. Types of firewall include packet filter, stateful packet inspection, and application gateways. Encryption is a method of protecting an attack by encrypting an object, and data are protected by using decryption as a concrete example or by using a user-specified identify document (ID)/password (PW) method. 9
Despite the use of such technologies, there have been instances where personal information was compromised. In the United States, medical data leaks occur every year through attacks on specific organizations, suspension of services, and data leakage. For example, the 2021 Accellion FTA hacking incident was a large-scale data leak that affected more than 100 companies. It is estimated that the health information of more than 3.51 million people was stolen. In the same year, 20/20 Eye Care Network, a Florida-based eye and ear management service provider, exposed the protected health information of 3,253,822 individuals because of an incorrect configuration of Amazon Web Services S3 cloud storage buckets. 10 Such vulnerabilities have revealed limitations in applying personal information protection technologies and the need to investigate solutions.
According to a report released on 23 March by the OECD, 11 more and more institutions are considering applying privacy technology. However, he mentions that the application of these technologies has been delayed due to many limitations in terms of their completeness. In doing so, we introduce four areas that can be divided into representative categories: data obfuscation, encrypted data processing, fed, distributed analytics, and data accountability tools. With this in mind, we wrote this article with the aim of getting researchers a guide to privacy technology in the future by selecting and reviewing technologies that are of high interest (many studies, popularity, etc.).
With this aim, the current study examines the methods, technology, research, and limitations of the most widely used personal information protection technologies for storing and protecting medical big data in recent years. The results of this study aim to provide robust insight into personal information protection technology for researchers to utilize medical big data safely for future research efforts.
What sets this paper apart from other papers is that it does not focus solely on just one technology, but provides insights on various privacy technologies. This has the advantage of allowing researchers hoping to gain various insights into privacy technologies to quickly learn about the current overall flow of related technology in a short period of time, which is different from other papers.
Methods
There are many privacy protection technologies. Data are protected in various ways, such as by adding noise to the data to protect the data or by using methods, such as protecting the data during its movement. Among them, Automated Validation of Internet Security Protocols and Applications (AVISPA) 12 is a tool that enables the analysis of privacy protection methods. It was announced in 2005. It is a push-button tool for automatic verification of protocols and applications sensitive to internet security, and it greatly helps the technology to be converted into a protocol in a systematic way. However, we will not cover this technology in this paper and will proceed with examining the trend by focusing on the technology.
A specific period was selected to explore recent trends in personal information protection technologies. This period was selected as 7 years up to December 2023. Thus, papers published between 2017 and December 2023 were examined. Next, we selected two databases for obtaining relevant research papers: IEEE and PubMed. We used the search terms “privacy” and “healthcare” to have maximum search results. Then, we reviewed the search results and selected appropriate works.
Inclusion and exclusion criteria
Papers that improve the technology by conducting analysis or research on each technology, or papers that improved the limitations of each technology were selected and reviewed. In the area of improving technology limitations, papers were selected and reviewed, focusing on the fact that the limitations of the technology can be understood, and new trends can be seen. Review papers with similar topics to this paper or papers that have to pay for access are excluded. The final selection was made after considering the duplication and quality of the studies.
Data analysis
To investigate recent research trends and specific privacy technologies, the title and abstract of finalized research papers were extracted and organized in Figure 1. Among the keywords derived, we performed keyword grouping in the “Healthcare data” and “Privacy protection technology” sections. Electronic medical records (EMR), claim data, omics data, and patient generated health data (PGHD) were the most common keywords concerning healthcare data. We explored privacy protection and technology separately in terms of privacy protection technology and obtained four keywords—blockchain, federated learning, homomorphic encryption, and DP —that this study focuses on.

A schematic diagram of keyword calculation through word counting.
The databases that were used to extract the data are IEEE and PubMed. IEEE is the Institute of Electrical and Electronics Engineers, which publishes papers on research related to engineering and technology. PubMed is a free website where you can search academic literature on biomedical and healthcare. To cover the search range that encompasses the keywords of science and technology and medical care, research papers were searched and organized through two databases.
Literature review
The IEEE and PubMed platforms have compiled a huge database of research papers published between January 2013 and December 2023 that comprise the keywords “healthcare” and “privacy.” We found nearly 12,000(11,742) research papers published from 2017 to 2023. We chose this timeframe to observe relatively recent research trends. Papers that were not freely accessible, unrelated to data privacy, or did not utilize medical big data were excluded from the analysis. Finally, a total of 922 research papers were selected in the first search. Then, four technical keywords—blockchain, federated learning, homomorphic encryption, and DP —were derived through word counting in the 922 research papers. Related papers were selected from the 922 papers through four technical keywords, and 15 papers were selected for each technology through manual reviews. Figure 2 summarizes the process of selecting the research papers.

Flow of research for systematic literature review.
Results
In addition, Figure 3 shows the research trends from 2013 to 2023 regarding the application of medical big data for each personal information protection technology. It is to examine the flow of research trends. The period of data confirmed only in PubMed is represented. Research papers on four technology have been published only since 2015. But, papers began to be published in all four technologies from 2017 when the research began actively. Blockchain, which was introduced in 2008, has been the most actively studied topic until recently, and federated learning has also seen a sharp increase in research interest.

Personal information protection technology trends (2013–2023 in PubMed).
Privacy protection technologies
Blockchain
Blockchain is derived from electronic money technology developed to overcome the weaknesses of the money transaction system through authorized third parties (banks, countries). 13 Blockchain can be interpreted as a series of data with timestamps, which form a block by combining the original data with a function called the hash. Each block forms a chain with the previous hash included in the next block, and hence, the term ‘blockchain’ is used. Blockchain authenticates the use of blocks constructed using timestamps to prevent duplicate payments, which is the same as replacing the role of an authorized third party, such as an existing country or bank. If a Block 1 user has malicious intent and wants to contact an external block immediately after contacting Block 2, the time stamp history of Chain 1 that has already been processed blocks the malicious attempt of Block 1. 14 Figure 4 presents a conceptual schematic design of blockchain.

Conceptual schematic diagram of blockchain. 13
Blockchain has been largely utilized for preserving, managing, and exchanging electronic health records (EHRs) (Table 1). EHRs were developed to overcome the difficulties in tracing and managing existing medical documentation methods and providing better medical services to patients. EHRs can help patients improve their health conditions by providing accurate information about previous diagnoses and treatments to their doctors. 23 This electronically stored data are also very useful in situations where a patient is rushed to another hospital. If electronically stored patient data can be safely transferred to another hospital, better medical treatment can be provided by quickly referring to the patient's previous medical records, especially in emergencies. However, the most important concern regarding EHRs is ensuring safe storage and movement of patients’ personal and medical information because the data comprise very sensitive personal information. This study explains how blockchain can provide a solution for this concern.
Blockchain (Bc) technology paper list.
All the research papers examined in this study have suggested ways to utilize blockchain to protect and utilize EHR data. Two papers have particularly focused on securely exchanging EHR data and cited leakage of personal information as the biggest barrier to health information exchange.20,25 One of them has focused on patient identification through blockchain transaction information, an issue pointed out in conventional blockchain-based HIE research. 20 Data identification was introduced as a way to analyze transaction recipients and callers, even if the data are contained in blocks and encrypted in case a patient's personal information is leaked. It was also introduced to infer the patient through the treatment they have received. Therefore, the study provided a personal information protection solution through blockchain by concealing the personal information of the caller and receiver. 20 Both papers developed the framework using Ethereum, a private blockchain.20,25
Other studies have explained how blockchain has been applied to all aspects of storage, management, and exchange of medical data without focusing solely on EHR exchanges.21–24,26–29 Some studies have focused solely on the security of health information derived from wearable devices and Internet of Things (IoT) devices.21,22,26,27 The IoT technology refers to a network of physical objects with built-in internet-enabled software and sensor devices, and data collected from such devices are managed and exchanged without being directly operated by a person. 21 Such technology is being extensively applied in the medical field. Devices, such as the electrocardiogram (ECG), electroencephalogram (EEG), and blood pressure monitor, can easily collect and monitor medical data. 22 In addition to blockchain, research on IoT technologies has introduced several key privacy technologies, such as fog computing, Public Health Information Management System (PHIMS), and Interplanetary File System (IPFS). Fog computing is a service structure that creates a new relay layer between cloud servers and IoT devices using a simpler concept than cloud computing. One study reported that using a combination of fog computing and blockchain in a medical environment would benefit both patients and hospitals. 21 In terms of relatively recent papers, rather than introducing new or additional technologies, many of the studies aimed to improve throughput or processing speed in existing technologies, with the goal of making them more practical and applicable to real life while preserving privacy.15–19 The advancement of these technologies can help overcome the current scalability problem of blockchain. Currently, blockchain faces the problem of scalability due to the principle of the technology, and this problem is a prerequisite for improving the problems faced by this technology (accommodating a large amount of data or protecting various data). By solving these processing speeds or processing volumes, it will be possible to further grow into a technology closely related to real life. 30
Federated learning
Deployed through the cloud infrastructure, the federated learning technology provides an additional approach to models trained through an interaction between mobile devices and users. Google has applied this technology to Gboard, a virtual keyboard app for Android. 31 Expanding into multi-agency research, first, each institution downloads a general global model. The downloaded model uses data from each institution to create a unique model. The difference from the existing model is encrypted by mixing noise that both cloud servers and institutions are aware of and then transmitted to the cloud. The transmitted information is used for model learning after decoding. These steps are repeated continuously until the model achieves the desired performance or the final deadline is achieved. 32 Figure 5 presents a conceptual diagram of the federated learning technology.

Conceptual diagram of the federated learning technology. 32
Most studies on federated learning have been conducted to prove its validity by first using open-source datasets to create a similar environment (Table 2).38,40,43,45–47 A typical dataset, called MIMIC-III, has been primarily used in these studies. The MIMIC-III is an open-source database containing data of patients aged 16 or more admitted to the Beth Israel Deaconess Medical Center (BIDMC), a teaching hospital of Harvard Medical School. 46 Other studies have been conducted by extracting actual data from specific hospitals and making actual predictions, such as predicting abnormalities based on CT images of COVID-19 patients 41 and hospital EHR data. 42
Federated learning (Fe) technology paper list.
Research on joint learning is divided into two themes. The first research theme compares the federated learning model with the conventional machine learning model to prove the validity of the former's use.40–43,47 The second research theme addresses issues related to federated learning, such as communication costs.38,40,44 Some of these studies have applied federated learning from textual data,43,47 mobile data, 40 and image data. 41 All of these study have examined whether federated learning can be applied to the selected data type, and if so, whether there are existing methods and advantages of doing it. Opinions on the study results were divided into (a) not different from the existing methods 43 or (b) presents an advantage in terms of performance.40–42,47 One study explained that although there was a time advantage, it was a slight one in terms of the area under curve. 46
Another study was conducted to predict a person's emotional state using data collected through wearable mobile devices. 40 Wearable devices are a way to obtain patient data in real time. Various types of patient data, such as body temperature, heart rate, and electrical skin activity, can be extracted through a patient's worn device, which allows data related to an individual's cognitive, behavioral, and emotional states to be collected in real time. Data collected from these wearable devices are used to analyze AI models created by a central server, and instead of learning predictive models directly, a brief training is conducted on each user's device. A copy of the parameters calculated through the training conducted on each device is sent to the central server and used to enhance the model's accuracy. Through this process, a model for predicting patient's emotional state is created. 40 Based on the relatively recent papers, similar to blockchain, there have been many efforts to apply this technology to real-world applications, such as the application of non-IID data, 33 which is similar to the environment of real-world datasets,34,35 or the study of communication costs. 36 These efforts can be said to be aimed at finding answers to the problems currently facing federated learning. Federated learning was designed as a way to protect personal information from its origins, but it also has limitations. These problems range from problems that can generally infer members or data to attacks that can maliciously tamper with the model. 48 Future papers urgently need to develop models that can block these malicious attacks and can be used in realistic situations.
Differential privacy
DP is a technology developed to combine two conflicting concepts: privacy and data learning. DP aims to utilize useful information from data. However, it does not focus on learning anything from the data itself. 55 The primary purpose of DP is to share data. To use the data, the data user first transfers the query to a reliable data provider (data curator or database). Then, a data provider achieves personal information protection by providing noise-added data to the result of the transmitted query. 60 Figure 6 illustrates a simple model depicting the principle of DP. 49

Conceptual diagram of the DP technology. 48
Research on DP technologies is often aimed at developing privacy capabilities by assisting technologies other than direct and primary use technologies. Major privacy protection technologies include blockchain,55,61 combined learning,55–57 deep learning,58,64 binary classification,59,62 and data sharing (Table 3).60,63 DP is applied to these technologies in research because each technology has limitations in protecting personal information. For example, federated learning requires the use of a central server for data learning and data movement, which allows an attacker to launch a single point of failure attack that attacks the central server, and in blockchain technology, it is challenging to overcome attacks by malicious participants. 55 Therefore, two or three technologies are often combined to prevent the leakage of personal information that can occur when only one technology is used.
Differential privacy (DP) technology paper list.
The DP technology has been applied to slope-down methods,55,58,61 original data before sharing,59–63 reduced data, 56 and aggregated parameters, 57 among others. The reason for this application is that the main technologies of research, such as blockchain and federated learning, were designed to protect personal information. However, there are still limits to the seamless protection of personal information when such technologies work independently. Such vulnerabilities often allow experienced attackers to infer or leak patient information through paradoxical systems. 58 Therefore, researchers have consistently suggested improving the privacy protection of personal information through the application of DP.
A study on pattern prediction models utilized distributed EHR data to predict patient states. 57 It employed a technique, called sequential pattern mining, which is a method of studying data recorded in chronological order and discovering unexpected patterns. In this paper, they used a method of aggregating pattern data learned by each institution, that is, aggregating the data by placing a server in the center to discover abnormal patterns, which are very sensitive to data leakage of medical evidence and prescriptions received by patients. In contrast, a large amount of EHR data is required to detect patterns, leading to a high risk of data leakage. Therefore, DP was applied to the data to prevent data leakage. According to the relatively recent papers, like the two techniques above, most of the studies are aimed at protecting privacy and utilizing data through the practical application of DP. Studies50,53 that utilize real-world data for this purpose illustrate this. The recent trend has been to focus on creating technologies that directly utilize DP in real-world applications (or through research that assumes real-world-like situations). However, these practical applications have had the problem of not being able to achieve both privacy and performance, which are difficult to reconcile, for a while. The paper of Mueller et al. 51 actually shows that these two are compatible. In addition to this compatibility, future papers should conduct studies that apply extended functions and performance to real-world data.
Homomorphic encryption
Homomorphic encryption is an encryption method that can perform operations using encrypted data without decrypting the ciphertext. In homomorphic encryption, a stage for encrypting and transmitting data and decrypting and processing the transmitted data is not required. Figure 7 shows a simple schematic representation of homomorphic encryption. 65 Generally, encrypted data are transmitted with an encryption key; the transmitted data are decrypted; and then the data are processed. However, this process is omitted in homomorphic encryption. A data user encrypts the data with homomorphic encryption and transmits the data to a data processor or a cloud server. The transmitted data are processed in an encrypted state, and the data user decrypts the processed encrypted data and records the results. 80

Conceptual diagram of the homomorphic encryption technology. 65
Homomorphic encryption has also been applied to protect personal information in combination with other technologies, similar to DP (Table 4). Among medical data, studies on the protection of genomic data, such as personal genes, are shown to be studies to which homogeneous encryption has been applied.70,73,75 In other studies, homomorphic encryption has been applied to personal information function in deep learning analysis74,77 or in an authentication management system that combines biometric information and the personal ID/PW method. Overall, homomorphic encryption is used to add privacy protection capabilities lacking in a single technology. 76 Among them, genetic research is remarkably widely applied in certain fields. Human genetic research, such as Genome Wide Association Studys (GWASs), has become a standard for customized medical treatment. 75 Such extensive studies continue to develop in terms of size and technology. With this development in GWASs, the amount of data required for research has undoubtedly increased. However, the construction of a database through large-scale data exchange implies a risk of personal information leakage and inference and reconstruction of medical and personal data. As a result, this data exchange required for large-scale research is strictly limited by numerous data protection regulations that vary from region to region. 73 According to these regulations, the technologies applied to GWAS research include blockchain and federated learning. However, each technology has its own limitations. One study investigated personal information protection in the exchange of human genetic data by using homomorphic encryption. 73 In particular, the authors simultaneously used combined learning and homomorphic encryption for biomedical research. In addition, unlike a personal information protection technology using existing encryption that may affect the analysis result by encryption, each organization analyzes the data, and then encrypts and shares the analyzed result. Consequently, the survival analysis was capable of protecting personal information, and the study was efficiently executed. 73 More recently, researchers have been working on improving the processing speed of homomorphic encryption, which is the most problematic aspect of homomorphic encryption.67,71,72 This can be seen as an effort to take the technology described above and apply it to real-world environments to share and analyzing data while protecting privacy. Although the paper in 73 is not very suitable for real data, it shows an example of applying homomorphic encryption to a study on genetic data. This example shows that analysis is possible by applying homomorphic encryption to real data. This can be said to be an example that shows that homomorphic encryption can be utilized for analysis of real data and information protection. If the above technologies can be applied to real-world environments based on the improvements made through these studies, it is expected that a lot of research and improvements will be made.
Homomorphic encryption (He) technology paper list.
Discussion
Our study examined the methods, technology, research, and limitations of the most widely used personal information protection technologies for the storage and protection of medical big data in recent years. Many researchers are investigating the protection of personal information in financial and medical fields, where it is critical. However, research papers published up to December 2021 exhibit limitations in security management, such as transferring responsibility for data management to patients, even though the technology has been applied to patient data. Blockchain is a personal information protection technology that has many advantages in ensuring data integrity, reliability, and transparency. Although it has great security advantages, there are ways to destroy the technology, 20 limiting its usability. Therefore, research on actual data exchange applied to multiple institutions to maintain the security and reliability of the privacy protection technology is a pertinent topic for future blockchain research.
When conducting research that highlights the shortcomings of federated learning,37,38 the research purpose was to complement the shortcomings of federated learning using different methods. In one study, a method, called federated generalized tensor factorization, was used in applications, such as a recommendation system, space–time data analysis, and signal processing, which have the unique capability of expressing high-dimensional data. However, by sharing all related variables, a federated learning method that increases privacy while reducing communication costs by limiting existing tensor usage methods can be achieved. 37 Another study showed how differences in the amount of data, the number of computational nodes, and the distribution of data in federated networks affect the performance of federated learning. 38
DP and homomorphic encryption are both technologies aimed at encryption; therefore, they have similar limitations. In the case of DP, this study suggests that it is difficult to balance performance and security, as shown in another study. 53 In other words, the more the security increases, the more memory or practice time is needed. Homomorphic encryption can be seen to possess a similar limitation, in that its relative accuracy is reduced, 73 or it is not secured enough. 74
Despite the fact that technology is applied for security purposes, there are still cases where security weaknesses occur due to technical limitations.54,59,62,75 Therefore, future technology research on DP and isomorphic encryption must focus on weaknesses or methods that sacrifice accuracy and security. In the case of relatively recent papers, many studies have been conducted to apply the above technologies. Through this, it can be seen that many studies are moving toward the stage where many studies can be conducted while protecting personal information.
Conclusion
Our study investigated the current development of a privacy protection technology that processes and analyzes big data produced in the field of healthcare and healthcare research. Although medical data lead to value creation in research, they contain sensitive personal and medical information and should be used with caution. For this reason, this study was designed to investigate privacy protection technologies and lend valuable insights for the progress of research by using medical data responsibly and safely.
Footnotes
Acknowledgments
We would like to thank all participants for their contribution to this study in all of the ways.
Contributorship
SL contributed to the conception and design of study. KR contributed to the acquisition of data. KR and HS contributed to the analysis and/or interpretation of data and drafting the manuscript. SL and J-YK contributed to revising the manuscript critically for important intellectual content. HS, KR, J-YK, and SL provided approval of the version of the manuscript to be published.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethics Approval
This paper is a review paper that does not use personal information or personal data and, therefore, does not require consent or permission from the individual. Therefore, we hereby declare that this paper does not require IRB approval or permission.
Funding
This research was supported by the Basic Science Research Program through the National Research Foundation Korea (NRF). This work was supported by the Ministry of Education [NRF-2021R1I1A3044287]. This work was supported by the Gachon University research fund of 2024 [GCU-202404180001].
Guarantor
Suehyun Lee.
