Abstract
Artificial intelligence (AI) systems have the potential to significantly enhance various aspects of human life, including healthcare, employment, energy management, finance, and creative endeavors. However, the rise of larger, computationally intensive AI models has also led to concerns about their negative societal impacts, such as increasing economic concentration, higher energy emissions, and threats to data privacy. In this paper, we examine the societal implications of a promising machine learning approach called federated learning. We begin by reviewing ongoing research challenges, noting that federated learning is often promoted as a potential solution to mitigate issues of economic concentration, environmental harm, and privacy concerns associated with AI. While federated learning systems hold significant promise, they are currently at an early stage of development and face critical technical challenges, including data leakage risks and high communication costs, which need to be addressed before widespread adoption can occur.
Keywords
Artificial intelligence and its societal concerns
The development of deep learning (LeCun et al., 2015) has driven incredible advancements in artificial intelligence (AI) systems. Prominent examples include AlphaGo (Silver et al., 2016), which defeated the Go world champion; AlphaFold (Jumper et al., 2021), which solved the protein folding problem in biology; and ChatGPT (OpenAI, 2022), which became the fastest application to reach 100 million users (Hu, 2023). These more advanced AI systems are in turn expected to drive improvements in medicine, hiring, energy, finance, writing, and nearly every aspect of life. Yet AI progress has also sparked concerns about possible adverse effects on economic concentration, energy emissions, and data privacy. Popular image generators like StableDiffusion (Rombach et al., 2022) can create economic concentration by displacing millions of visual artists with much cheaper machine learning (ML) models owned by large tech companies. Advanced AI models can generate harmful emissions by consuming large amounts of energy during the computationally intensive training process (Luccioni et al., 2022). Finally, these models incentivize companies to infringe on users’ privacy because the training process requires very large datasets collected from users, often without their explicit consent or knowledge (Paullada et al., 2021). OpenAI's prominent large language model (LLM), GPT-3, for example, was trained on a combination of public and private text data taken from books, Wikipedia, and a crawl of the internet (Brown et al., 2020).
Policymakers around the world have sought to address these negative consequences of widespread AI adoption (Aho and Duffield, 2020; Pernot-Leplay, 2020), but many of their proposed solutions are either technically difficult or impossible to implement (Kuru and de Miguel Beriain, 2022). Because of these limitations, AI researchers have sought new methods that can balance building powerful models without creating significant societal harm. Federated learning is one such promising method for addressing economic concentration, increased energy emissions, and privacy violations (McMahan et al., 2017; Qinbin et al., 2023).
Whereas traditional ML methods require centralizing data into a single training set, federated learning is partially or fully decentralized. This decentralization helps prevent several organizations from controlling all the data and computational resources needed for training models. Federated learning has also shown great promise in reducing overall emissions by improving smart monitoring devices that track sustainability metrics such as energy emissions (Victor et al., 2022; Zhang et al., 2022). Finally, federated learning enables a group of clients to collaboratively train a single ML model without sharing training data with one another, better preserving the privacy of each client's users.
In this paper, we begin by providing an overview of the different types of federated learning systems. Next, we describe the current challenges in the field and introduce ongoing research into potential solutions. Our main contribution lies in analyzing how federated learning systems can mitigate the adverse effects of deep learning systems. We conclude that federated learning systems must improve data protection during client–server communications and reduce the overall communication overhead to effectively address issues such as economic concentration, increased energy emissions, and data privacy violations.
Overview of federated learning
In federated learning, a group of clients (devices, servers, or organizations) work together to train a single ML model without exchanging their raw data (Qinbin et al., 2023). This process allows individual clients to control what data they share with others, giving them greater control over users’ data privacy. In a federated learning system with a central server, the training process typically involves four repeated steps (Yang et al., 2019):
participating clients calculate the information needed to update a ML model, such as gradients; the clients apply privacy protection techniques, such as differential privacy and homomorphic encryption (HE), to the information and send it to a central server; the central server aggregates this information to update a global ML model; the server redistributes the updated model to the nodes.
For applications that require complete decentralization, clients may follow this same process, replacing the central server with a peer-to-peer consensus mechanism for aggregating the client information into a global ML model (Abdulrahman et al., 2021). Compared to traditional ML techniques, federated learning is far more decentralized, giving clients far more ownership over privacy protection.
Types of federated learning
Depending on the distribution of data among clients, federated learning can be separated into two broad categories (Yang et al., 2019): horizontal and vertical federated learning.
Horizontal federated learning involves data distribution in which clients have different samples with the same features (Li et al., 2020a). For example, hospitals, no matter where they are, will likely record similar features about their patients: age, gender, height, weight, medical history, etc. Yet hospitals in different regions will see different patients, meaning they have different data samples. Although pooling data might be attractive for hospitals hoping to leverage additional data to build more accurate ML models, sharing such sensitive medical data is often legally impermissible (US Department of Health & Human Services, 1996). Horizontal federated learning allows these hospitals to train a single model that learns from the combined set of patients from all the hospitals, without requiring any individual hospital to share sensitive data about its patients with others.
Vertical federated learning, by contrast, is used when clients have the same data samples but different features (Qun et al., 2023). Consider predicting whether someone is at risk for type-2 diabetes as an example. Past research has shown that high blood pressure, obesity, lack of exercise, and high caloric intake are all factors that may contribute to type-2 diabetes. An accurate model would likely incorporate these features, yet it is unlikely that a single client will have access to all of them. A hospital may have access to a person's blood pressure and obesity status, while a smartphone health app may have access to step count and caloric intake. These clients have access to different features about the same person. Vertical federated learning allows clients to pool their data about an individual to train a single model that uses more features, without sharing their proprietary data with one another.
Challenges with federated learning
Although an increasingly popular technique (Qinbin et al., 2023), federated learning faces five challenges that make practical implementation difficult (Li et al., 2020b): privacy concerns, communication overhead, system heterogeneity, statistical heterogeneity, and bias in models. In this section, we describe these challenges and introduce related research.
Privacy concerns. While clients do not transmit raw data, research has shown that personal information can be extracted from the data they do share (Melis et al., 2018): model gradients and global model weights. In response, researchers have emphasized the need for secure computation and differential privacy techniques (Mothukuri et al., 2021).
Secure computation only reveals the end-result of a computation, while protecting the original inputs and intermediate steps. The two main secure computation techniques—secure multi-party computation (SMPC) and HE—require substantial communication, making them ill-suited to federated learning systems with large models or many clients (Truong et al., 2021). In addition, neither technique is secure when a malicious actor can eavesdrop on the client–server communications (Asad et al., 2020; Kairouz et al., 2021). Differential privacy (Dwork et al., 2006) is another prominent privacy-preserving technique. It adds noise to a dataset to mask the attributes of specific individuals, promoting user privacy by making it harder to infer individual-level characteristics (Gosselin et al., 2022). Differential privacy has been adapted to federated learning with some success (Geyer et al., 2018) but may harm overall model accuracy. Even with these state-of-the-art methods implemented properly, an adversary will be able to exfiltrate user data from a federated learning system if they can either (1) view client–server communications or (2) compromise the central server itself.
A malicious actor could compromise a central server as one of the clients contributing model updates, called “backdoor attacks” (Bagdasaryan et al., 2020), or through traditional cyber-attacks. In the first case, researchers have found adversaries can recover data from a federated learning system so long as they have access to a single client sharing model updates with the central server (Wang et al., 2020; Xie et al., 2020). Although promising, none of the methods for detecting or defending against these attacks are foolproof (Rieger et al., 2022; Wu et al., 2021; Xie et al., 2020). As such, a single malicious client could make the entire system insecure, creating a very large attack surface for potential adversaries. In the second case, standard cybersecurity defenses can mitigate against these attacks, but a malicious actor might own the central server itself, creating a federated learning system to steal data from contributing clients.
Communication overhead. Because federated learning requires clients to repeatedly share information, it has much greater communication requirements than traditional ML. This communication overhead can make the training process slower and more expensive. Wu et al. (2022) propose a more efficient federated learning algorithm called FedKD to reduce the communication cost. Instead of using a single large model, their method uses a small mentee model and a larger mentor model to perform adaptive knowledge distillation. In FedKD, only the small mentee model is passed between the central server and the client. FedKD further reduces communication costs by using a dynamic gradient approximation method based on singular value decomposition (SVD) to compress the small mentee model's gradient updates. Other researchers have proposed reducing communication overhead through optimized updating (number of communication rounds) (Luping et al., 2019; McMahan et al., 2017; Smith et al., 2018), improved compression (size of each communication) (Caldas et al., 2019; Sattler et al., 2019; Yi et al., 2021), and decentralized training (shares communication overhead across the network) (Chen et al., 2023; Dai et al., 2022; Liu et al., 2019).
System heterogeneity. The clients contributing to the federated learning process are heterogeneous. Their hardware varies in important characteristics such as memory, computational resources, and network bandwidth. Treating devices uniformly allows slower or non-responsive devices to delay the entire training process. The straightforward solution of removing slower or non-responsive devices can lead to a systematic underrepresentation of those clients. Liu et al. (2022) introduce a method called InclusiveFL, which assigns models to clients in a way that accounts for their different capabilities: larger models will be assigned to more powerful clients and smaller ones to weaker clients. In the end, these models are combined to create a model that is large enough to avoid accuracy drop-offs, while still being representative. Researchers are also exploring better sampling techniques (Nishio and Yonetani, 2019) and fault tolerance (Li et al., 2020c) as ways to lessen the impact of system heterogeneity on model performance.
Statistical heterogeneity. Data distributions may vary between clients. A hospital in a sunny place such as Arizona may have higher skin cancer prevalence rates than one in a cloudy place such as Seattle. As such, the data distributions would likely differ between the two. For this non-IID (independent and identically distributed) data, overall model accuracy may decrease. It may also be harder to measure model convergence during training. Ongoing research aims to remedy these problems by training and combining several, separate models instead of only training a single model (Smith et al., 2017).
Bias in models. Federated learning systems may exhibit societal biases. Sensitive attributes, such as race or gender, may contribute to model predictions in an undesirable way (Abay et al., 2022; Shi et al., 2023). Existing ML fairness interventions, which often require access to the full set of training features, may not work for federated learning systems because clients do not share training data with the central server. Qi et al. (2022) proposed FairVFL to reduce bias in federated learning models. FairVFL partitions the feature set into those that should (fairness-insensitive) and should not (fairness-sensitive) be used to make predictions. After partitioning the datasets, FairVFL learns a unified representation of the data that is independent from the fairness-sensitive features (such as gender). The algorithm then performs adversarial learning twice: first to remove any remaining bias and second to remove any individual information. Researchers have also proposed solutions that mitigate bias by modifying the client selection process (Huang et al., 2020; Yang et al., 2020a), reweighting gradient values (Mohri et al., 2019; Wang et al., 2021), and performing algorithmic optimization (Cui et al., 2021; Qi et al., 2022).
Societal impacts of federated learning
In this section, we describe how federated learning systems might be used to mitigate AI systems’ adverse effects of economic concentration, increased energy emissions, and data privacy violations. These three aspects represent some of the most pressing concerns associated with modern AI development. Economic concentration arises from the consolidation of data and resources by a small number of tech giants, leading to inequitable access to AI technologies. Increased energy emissions are a critical issue due to the high computational demands of training large AI models, which contribute to environmental degradation. Data privacy violations are also a significant concern, as the vast data requirements for AI training often lead to unethical data collection and potential breaches of user privacy. Addressing these issues is crucial for ensuring that AI technologies contribute positively to society.
Equitable decentralized business models
Because modern deep learning systems require extensive data and computational resources, only the largest, most well-capitalized companies can afford to train advanced AI systems. This phenomenon makes it harder for smaller companies that may lack access to large datasets or computational resources to compete, concentrating the benefits of AI systems with large organizations. Federated learning can help alleviate this trend by enabling decentralized training while maintaining data privacy. However, it is important to acknowledge that the models themselves may still belong to major companies, which can limit the potential to fully decentralize economic power. While federated learning allows clients to maintain greater control over their data, this does not automatically lead to a redistribution of power between large and small companies. Instead, it helps reduce the imbalance between companies and individual users by giving users more control over how their data is used. Yang et al. (2019) give one example of how federated learning can be used with profit-sharing rules to give smaller organizations greater leverage and ownership over the ML development life cycle. As a result, new decentralized business models that better incentivize clients to contribute data and resources may emerge. Instead of profits accruing with a single data or model owner, the profits can be more equitably distributed amongst contributors.
Monitoring improvements versus higher emissions
Federated learning also has the potential to advance environmental sustainability efforts by improving “internet-of-things” devices (Hu et al., 2018; Victor et al., 2022; Zhang et al., 2022). Improved monitoring can help organizations better evaluate energy-saving initiatives (Varlamis et al., 2022). Yet, despite improved monitoring devices, federated learning's high communication overhead may ultimately consume more energy than it saves. (Qiu et al. (2021) found that, depending on the exact system design, federated learning systems can emit up to two orders of magnitude more carbon than centralized ones. Research into closing that gap is ongoing, but is still in its nascent stages (Guler and Yener, 2021; Yang et al., 2020b). In its current form, the incremental energy requirements of federated learning systems may simply be prohibitively costly. Federated learning still needs to improve communication efficiency before it can be used to help reduce the energy emissions of advanced AI systems.
Double-edged impact on data privacy
Federated learning has the potential to greatly improve data privacy and security (Ge et al., 2020). Federated learning can improve privacy by limiting how much data clients must share to train a ML model. Unlike traditional ML methods that aggregate data into a single dataset managed by a central party, federated learning ensures no party can access data contributed by another. This practice of data localization also gives clients control over how they share their data at a very granular level. For example, they might opt out of sharing data specific tasks, models, or steps during training. This decision could occur at the device level for individuals or the dataset level for organizations. It also improves security by reducing single-point-of-failure risks; since each client can only access its own data, the fallout from data breaches is limited to the compromised machines. Perfectly implemented, a federated learning system guarantees that participating in the model training process would not expose any more data than not participating. In fact, it may be one of the only techniques that can meet strict General Data Protection Regulation (GDPR) privacy requirements (Truong et al., 2021).
These improvements, however, only occur if the system is perfectly implemented. Compared to centralized ML methods, federated learning offers potential adversaries more opportunities to exfiltrate data. The training process for federated learning has more rounds of communication and gives more third-party clients access to the system, which both create far more entry points for malicious actors to compromise the system. The current states of secure computation, differential privacy, and backdoor attack defenses suggest federated learning systems would be ill-equipped to defend so many failure points.
In the best case, federated learning redistributes privacy risk: whereas a traditional ML system consolidates client data into a single point-of-failure, a federated learning system keeps that data decentralized. In this way, although there are more potential vulnerabilities, each breach would leak a much smaller amount of user data. Given the current number of vulnerabilities with federated learning privacy protection techniques, achieving the best case is unlikely. Most likely, preserving data localization with federated systems will be difficult, which means data breaches will leak data from clients beyond the compromised machines. Although federated learning has the potential to redefine and revolutionize how we protect user data, just as with environmental protection, it still needs more time to mature. Even then, federated learning represents a double-edged sword that redistributes privacy risks in a way that may greatly benefit or harm individual users’ privacy.
Conclusion
This paper introduced federated learning as a potential remedy for the negative societal impacts of advancements in large AI models, such as economic concentration, environmental damage, and privacy violations. Although federated learning, as a decentralized ML method, shows significant promise in addressing these issues, it must first overcome several technical challenges before achieving widespread adoption. These challenges include privacy concerns related to data leakage, communication overhead, system and statistical heterogeneity, and biases within models. We reviewed past research efforts to address these challenges and found that federated learning has the potential to foster more equitable business models through decentralized incentive structures, enhance sustainability through accurate environmental monitoring, and protect user privacy via data localization. However, as a nascent technology, federated learning still requires considerable technical advancement before large-scale implementation is feasible. With continued research and development, federated learning could play a key role in balancing the goals of creating powerful AI systems while ensuring socially responsible outcomes.
Footnotes
Acknowledgment
Thanks to Fangzhao Wu for his help understanding past research related to federated learning.
Contributorship
Conceptualization: Justin Curl and Xing Xie; Writing the original draft: Justin Curl; Review and editing: Justin Curl and Xing Xie.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
