Sage Journals: Discover world-class research

Abstract

Hierarchical federated learning (HFL) is an effective “cloud-edge-device” distributed model training framework that protects data privacy. During HFL training, poisoning attacks on local data and transmitted models can affect the accuracy of the global model. Existing methods for defending against unauthorized attacks primarily rely on single-feature anomaly detection approaches, such as calculating distances or densities between data or model parameters. These methods fail to integrate multiple characteristics and metrics to capture anomalous model updates,thus exhibiting significant limitations in model robustness and accuracy. Therefore, we propose a robust hierarchical federated learning method with a dual-layer filtering mechanism (DF-HFL). This method first uses Kernel density estimation to infer the approximate data distribution of devices, ensuring minimal differences among devices within each cluster. It then calculates the density weight of each cluster and the local weight of each device, comparing the difference between local and global weights. Anomalous weights are filtered through a threshold during aggregation. DF-HFL amplifies the distance between malicious updates and normal updates using the dual-layer filtering mechanism, effectively identifying anomalous weights that do not significantly deviate from the normal weight distribution. This helps in accurately detecting anomalies. Additionally, mean filtering is employed to reduce the impact of anomalous data on the original normal gradients, enhancing system robustness. To demonstrate the effectiveness of the proposed method, experiments were conducted on MNIST, FMNIST, Heart Disease and Bank Market datasets. Compared to existing methods like FedAvg, Krum, Random, and Multi-Krum, the global model accuracy improved by 5.18%, 38.98%, 29.96%, and 6.44%.

Keywords

Hierarchical federated learning robustness anomaly detection dual-layer filtering mechanism

Introduction

With the development of wireless communication technology and the increasing emphasis on data privacy and security, the hierarchical federated learning (HFL) framework based on “cloud-edge-end” collaboration has received more and more attention¹ and is widely used in scenarios such as smart transportation,² industrial internet of things monitoring,³ and smart medical care.⁴ Under this framework, the global model training requires uploading the local model trained on the local device (MD) to the edge server (ES), which is closer to the device, for partial aggregation, and then uploading the aggregated model to the cloud server (CS) for global aggregation, which is able to realize the model training and updating while protecting the privacy of data.^5,6

However, the multi-tier architecture of HFL inherently increases its vulnerability to attacks. Attackers can launch poisoning attacks during model updates or transmissions at different layers, which leads to the propagation of these malicious updates at different layers, gradually affecting the global performance and reliability of the model.^7,8

Therefore, the defence means of HFL is crucial for the accuracy and convergence of the global model. Traditional distance-based or density-based methods remove anomalous clients to enhance the robustness of the model by directly calculating the variability between the local gradients of different clients, but this single robustness method cannot cover all anomalies.^9,10 For example, the distance-based anomaly detection method makes it difficult to find a uniform distance metric when the data distribution varies greatly, resulting in some benign clients being misjudged as anomalous clients, which in turn cannot participate in the training of the global model. Density-based anomaly detection mechanisms may incorrectly misclassify local anomalies as global anomalies, resulting in benign clients not being able to participate in the training either.^11,12 In addition, the effectiveness of these methods may diminish when the attacked data or gradient is less different from the normal data. For example, gradient tampering attacks can affect the model without significantly deviating from the normal data distribution by creating small perturbations on the global model that gradually deviate from the correct optimization direction and are difficult to identify by a simple disparity metric.^13,14

We propose a robust hierarchical federated learning method with a two-layer filtering mechanism named DF-HFL. This method investigating from two dimensions of data distribution and gradient, we adopt the method of combining distance and density as the anomaly scoring mechanism for data security and construct a filter federated learning algorithm, which is able to amplify the difference between anomalous and normal participants and reduce the influence of anomalous data on the system, and enhance the system’s overall robustness. It mainly includes: in order to ensure the security of the real data of MDs, the kernel density estimation algorithm is utilized on the server side to approximate the data distribution of MDs by updating the gradient, and the K-means clustering algorithm is used to classify the MDs with similar data distributions into the same cluster in order to effectively identify potential malicious updates in the case of non-uniform distribution. Then the KL scatter metric is utilized to calculate the variability of different MDs with respect to the local cluster distribution, and this is used as the anomaly scoring criterion. To further enhance anomaly detection, we propose a two-layer filtering anomaly detection mechanism. The first layer uses a maximal filter to amplify the distance between anomalous and normal updates to ensure that the system can correctly reject malicious updates. The second layer then applies a mean filter to the model gradient to highlight the overall trend of the data and help separate malicious updates from the global update trend. This not only makes the model updates smoother and more continuous but also achieves privacy protection by enhancing the robustness of the gradient while safeguarding the model performance. The method implements a multi-dimensional security assessment to accurately detect and discard malicious updates by measuring data distribution and gradient differences.

In summary, in real-world scenarios, DF-HFL can be applied to various large-scale data collaborative training scenarios. For example, in smart transportation scenarios, DF-HFL monitors traffic flow, surveillance videos and other data collected by road test units and edge devices, and analyzes them to identify abnormal data. At the same time, during the collaborative training process, it ensures that the model is trained towards the optimal goal to avoid the impact of abnormal updates on the model.

The main contributions of this article are as follows:

This article proposes a robust hierarchical federated learning method with a dual-layer filtering mechanism. At the edge server (ES), Kernel density estimation (KDE) is used to approximate the original data distribution of mobile devices (MDs) and K-means clustering is applied to group MDs into clusters, ensuring that the differences between MDs within each cluster are minimized to accurately identify anomalies during training.

The article introduces a dual-layer scoring mechanism based on distance and density to detect anomalies. The cluster density weight for each cluster is obtained through clustering, and each MD’s local weight is derived based on KL divergence. Then, the anomaly score for each MD is calculated by multiplying the difference between the MD’s local gradient updates and the global gradient by the weight.

The dual-layer filtering mechanism is employed to filter out anomalous data. A gradient amplifier is used to enlarge the distance between normal updates and anomalous updates, enabling accurate differentiation between anomalous and normal data. Mean filtering is used to highlight the overall data trend, reducing the influence of outliers and identifying potential anomalous updates.

The effectiveness of DF-HFL was validated on the MNIST and FMNIST datasets. Experimental results show that the global model accuracy of DF-HFL increased by 23.71% and 47.58% compared to other methods.

Related work

Existing HFL anomaly detection methods typically calculate differences in data distribution or gradient updates using evaluation metrics such as distance or density, then use these differences for anomaly scoring, and finally select normal data distributions or gradient updates for the global model based on the scoring and filtering mechanism.^15–17

The basic idea of distance-based detection mechanisms is to use the distance or similarity between samples to determine whether there is an anomaly or malicious behavior. Common distance metrics include Euclidean distance, Manhattan distance, cosine similarity, and so on.^18,19 In literature Arias et al.,²⁰ researchers used a nearest-neighbor approach to calculate anomaly detection scores and proposed a nonparametric detection algorithm that combines distance and isolation metrics to represent the most relevant features of anomalies. Although this method can detect different types of anomalies, it is easy to cause misjudgments due to the lack of a unified distance metric in non-IID scenarios. Therefore, Velliangiri et al.²¹ proposed a feature distance map-based measurement method to identify denial of service (DoS) attacks in networks, using min–max normalization to standardize identified features and mark correlations between features, which reduces computational complexity and system error rate. Meanwhile, many researchers have used triangular region map techniques to model real-world environments and segment them into different regional structures based on extracted features or content, using support vector machines (SVMs) based on the distance to associate data samples with cluster centers, enhancing detection accuracy.²² In Li et al.,²³ researchers proposed a segmented detection method based on the local malicious factor, identifying malicious participants by estimating the distance between data distributions. Building on this, Gong et al.²⁴ proposed amplifying the most active features in each local update to better distinguish between malicious and benign updates, improving overall system efficiency. Although these methods can identify potential malicious attacks by analyzing the characteristics of data and gradients, distance-based detection methods struggle to accurately calculate differences between participants’ data and gradients in high-dimensional data spaces. Additionally, if the training dataset is unevenly distributed or contains significant noise, the distance calculation will be affected, leading to a decline in detection performance.

Reputation-based detection mechanisms guide the identification of malicious participants through methods like setting reputation score thresholds and implementing reputation reward-punishment mechanisms.^25,26 The trust score-weighted mean update then incentivizes the training of the global model. However, these methods overlook the uncertainty of clients in the FL training environment, as they do not consider that reliable clients may randomly switch from trustworthy to malicious behavior during the training process. Thus, Al-Maslamani et al.²⁷ proposed allowing the server to select devices for training based on the client’s reputation scores to update the global model, using a reinforcement learning algorithm to adaptively choose the optimal reputation threshold. Additionally, researchers addressed the real-time dynamic changes in vehicular networks, proposing a logarithmic normalization scheme based on reputation updates to accurately handle scaled gradients from malicious vehicles, thereby improving system robustness.²⁸ Although these methods can filter out malicious participants through reputation scoring mechanisms to ensure correct model training, most current methods do not consider the personalized adaptive adjustment of reputation thresholds, which prevents the setting of personalized threshold parameters for specific client groups, affecting the model’s performance.

Model framework

In this section, we describe the proposed hierarchical collaborative federated learning (FL) model training framework, as shown in Figure 1. First, the central cloud server (CS) in hierarchical FL sends the initialized global model to each participating edge server (ES), and the ES then forwards the initialized global model to each mobile device (MD) participating in local training. Second, the MDs use their own datasets to begin training the model. After a certain number of training rounds, the local model updates are uploaded to the connected ES for aggregation. Finally, the partially aggregated models from the ES are uploaded to the CS for global aggregation. The notation used in this article is explained in Table 1.

Figure 1.

HFL modeling framework.

Table 1.

Symbol description.

Symbol	Description	Symbol	Description
$s_{m}$	Edge server	$k_{p}$	Kernel block size
$h_{i}$	Mobile device	$G_{i}$	Gradient height
$D_{i}$	Local dataset	$W_{i}$	Gradient width
$ω_{i}$	Model gradient	$a_{k}$	Filter threshold
$ω_{a c t}^{^{'}}$	Active gradient	$λ_{j}^{1}$	Cluster density weight
$ϖ$	Model accuracy	$λ_{i}^{2}$	Device local weight
$K (\cdot)$	Kernel density	$λ_{i}^{^{'}}$	Scoring weight

This work considers a layered FL framework for cloud-edge-end collaboration that contains a central cloud server, M edge servers $E = {s_{1}, \dots s_{M}}$ and n mobile devices $H = {h_{1}, \dots h_{N}}$ . Each mobile device has a local dataset $D_{i}$ for model training, the size of which can be denoted as $| D_{i} |$ . The accuracy of the global model is affected by the accuracy of the partial global models it aggregates, and the partial global model accuracy is determined by the local model accuracy of the MDs involved in training. Therefore, the goal of hierarchical FL is to improve the accuracy of the global model by minimizing the local loss function of each participating training device, which can be expressed as follows: $min L (ω_{i}) = (\sum_{| D_{i} |} L (ω_{i})) / | D_{i} |$ . Where $L (ω_{i})$ denotes the local loss function of the mobile device model on the local dataset $D_{i}$ , and the model gradient is $ω_{i}$ . We use stochastic gradient descent (SGD) to estimate the gradient update using each small batch of samples in a randomly selected manner to minimize the loss function on the training data and update the model parameters through iterations, $ω_{j}^{t + 1} = ω_{i}^{t} - α \nabla L (ω_{i}^{t})$ . Where $α$ denotes the learning rate and $L (ω_{i})$ denotes the current gradient update. When the local training is completed, the MD first uploads the updated model to the ES side and uses FedAvg for aggregation, and then the ES uploads the aggregated model to the CS side for global aggregation. This process is iterated until the global model reaches the desired model accuracy.

Robust hierarchical federated learning with dual-layer filtering mechanism

When applying HFL to practical distributed computing environments, the system faces issues of malicious tampering or illegal attacks on device data during training and communication.²⁸ Attackers can inject new malicious clients or manipulate existing clean clients to generate poisoned local training updates, affecting the model’s training outcome. During local training, attackers may inject incorrect labels or misleading features into real local data, distorting the model’s understanding and learning process, which leads to inaccurate predictions. Additionally, attackers may modify or replace gradients during transmission or processing, altering the model’s update direction or causing the model to converge to incorrect local optima, resulting in a deviation from the correct parameter updates and reducing model performance and accuracy.^29,30

While HFL can mitigate the impact of illegal attacks by applying data filtering and gradient validation at different levels, there remain threats from multi-level, multi-node, and multi-type poisoning attacks, such as data poisoning on end devices or model poisoning at the edge server. To minimize poisoning attacks and the security risks of hierarchical propagation in HFL, ensuring the security and trustworthiness of the training process and the reliability of the final global model, we propose a robust HFL framework with a dual-filtering mechanism. This mechanism employs an anomaly detection approach based on distance and density, jointly evaluating the security level of training clients from both data distribution and gradient perspectives, as shown in Figure 2.

Figure 2.

Robust hierarchical federated learning method process for dual-layer filtering mechanisms.

The specific training process of this scheme includes: (a) uploading the local client training model to the base station and estimating the data distribution of all participating clients using kernel density estimation; (b) grouping the participating local MDs using K-means clustering based on the obtained data distribution to ensure that the data in each cluster has the maximum similarity; (c) assigning cluster weights based on the cluster density, and also computing the global distribution; (d) use the KL scatter of each MD distribution and the global distribution as the scoring value, which is assigned as the local weight of the MD; (e) add the local weight of the MD with the weight of the cluster in which it is located and normalize it to get the final scoring weight; (f) amplify the gradient features from the perspective of spatial features of the gradient based on the maximal filter (which makes the distance between abnormal data and normal data become larger), and solve for the MD parameters and the global model parameters, multiply the difference value with the MD scoring weights, and select the MD with the smallest distance to be aggregated; (g) filter the model parameters to be aggregated by mean value filtering to enhance the robustness of the gradient. This two-layer filtering mechanism based on filtering can make appropriate filtering decisions to defend against multiple types of poisoning attacks according to the poisoning of different training levels, ensuring the security and accuracy of the global model.

Grouping of MD under Kernel density estimation

Based on the privacy and security properties of federated learning, ES is unable to obtain the real dataset distribution of local clients. Therefore, in order to obtain the data distribution of local clients while avoiding exposing sensitive data directly to an insecure network environment, we analyze the uploaded gradient data at the edge server side to obtain the predicted data distribution. Although the defender does not have access to the true statistical distribution of the participants $h_{i}$ , an estimated data distribution can be approximated by analyzing each output label using kernel density estimation methods. Considering the relationship between participants $h_{i}$ and contributions $y$ as a probabilistic graphical model, the expected probability distribution of each output label can be used to understand and characterize the overall probability distribution. However, distance-based techniques focus mainly on conventional anomaly detection, which becomes increasingly difficult to be effective as the number of malicious nodes increases, and can only be used as a rough security filtering detection.

Therefore, we design a density measurement component to complement the conventional distance-based mechanism. Taking advantage of the property that malicious gradients tend to be sparsely distributed while benign gradients are denser, the density of the neighborhood around each data (both data distribution and gradient) is evaluated, where the neighborhood is defined as the set of $k$ neighbors. The $k$ nearest neighbors updated by each participant were first selected using a distance metric, and scores for each metric were recorded. These neighbors are selected based on the degree of similarity and distance between data points, and the value of this score is correlated with factors such as the size of the distance metric and the density of the area in which the neighbor is located, with higher scores indicating that the neighbor has a higher density or importance in the surrounding area. Then, the scores of $k$ neighbors are aggregated into a density score. Finally, participants with high-density scores are whitelisted based on their density scores. Participants in the whitelist are considered benign or highly trusted and can be allowed to continue to participate in updates or other operations. This density measurement component is integrated into the distance-based mechanism as a top-down security policy. By considering the density profile of data points, we can identify potentially malicious gradients in a more granular manner and enhance the screening and security of participants, thereby improving the overall system security and stability.

We consider the case where the defender is unaware of the number of malicious participants present in the HFL system and define the neighbor domain of each participant as $H_{i} = {h_{i, 1}, \dots, h_{i, k}}$ , where the value of $k$ is much smaller than the sum of the number of edge servers and the number of clients. The process of using kernel density estimation to predict the distribution of participating client data includes: (a) in the preprocessing stage, dividing the gradient features according to their dimensionality and effectively grouping the gradient data for subsequent processing and analysis; (b) clustering the parameters according to the output labels, and clustering the data with similar features or attributes together, so as to analyze and model the data more accurately; (c) for each gradient parameter category or label, the probability density function of its data is estimated using kernel density estimation, which takes into account the effect of the kernel function around the data point and uses it as a product of the numerical outputs to obtain an estimate of the data distribution.

In the process of kernel density estimation of the gradient parameters, we define the set of $k$ domains of the update parameters of the $t$ th iteration as being $θ_{i} = {θ_{i, 1}, \dots, θ_{i, j}, \dots θ_{i, k}}$ , respectively, each neighbor update distance from the location of the central parameter $θ_{i}$ can be computed as:

\bar{d_{i}} = \frac{1}{k} \sum_{j = 1}^{k} d i f f (θ_{i}, θ_{i, j}),

(1)

where the function

d i f f (\cdot)

denotes the squared Euclidean distance between

θ_{i}

and its neighbors, and in order to further analyze the degree of variance between all the updating gradients in order to estimate the distribution of the data, we compute the average of the distances of all the gradients

\bar{D_{i}}

, which can be expressed as:

\bar{D_{i}} = \frac{1}{k (k - 1)} \sum_{i = 1}^{k} \sum_{j = 1}^{k} d i f f (θ_{i}, θ_{j}), i \neq j,

(2)

For each participant $h_{i}$ , the update parameters from the participant contribute differently to the output, which can be notated as the contribution $y = [\begin{matrix} y_{1}, \dots, y_{r}, \dots y_{R} \end{matrix}]$ , where $r \in [1, R]$ denotes the $r$ th output label. We can represent the relationship between the participant $h_{i}$ and the output contribution $y$ as a specific probabilistic graphical model using the expected probability distribution of each output label. Therefore, in the case that ES does not have access to the client’s real dataset, we can preprocess the analysis of output labels that satisfy the condition $\sum_{r = 1}^{R} P (θ_{i} | y_{i}) = 1$ to predict the probability distribution of the data on the client side, and use the nonparametric kernel density estimation function to estimate the estimated local distribution of the client and its k-nearest neighbor user interface. We use $\tilde{d} (h_{i})$ to denote the estimated data distribution of the ith client on the $r$ th label, which can be computed as:

\tilde{d} (h_{i}^{r}) = \frac{1}{k} \sum_{j = 1}^{k} K (\frac{h_{i}^{r} - h_{i, j}^{r}}{h^{r}}),

(3)

In this context, $K (\cdot)$ is the kernel density function, $h_{i, j}^{r}$ represents the $r$ -th output label of the $j$ -th neighbor of $H_{i}$ , and $h^{r}$ is the bandwidth of the kernel density function $K (\cdot)$ . For different output labels, the bandwidth values are usually not the same and remain constant, depending on the classifier used. The choice of bandwidth is critical for estimating the data distribution. A large bandwidth may result in an overly smoothed probability distribution, masking the true structure of the data, while a small bandwidth can produce a sharp, multi-peaked density estimate, affecting the accuracy of judgment. The Gaussian kernel function has a higher density near the center, gradually decreasing as the distance from the center increases. We use the Gaussian kernel function as the kernel density function, which can be expressed as:

K (\frac{h_{i}^{r} - h_{i, j}^{r}}{h^{r}}) = \frac{1}{h^{r} \sqrt{2 π}} e^{- ((h_{i}^{r} - h_{i, j}^{r}) / 2 h^{r})},

(4)

DF-HFL, in the first layer of filtering, approaches from the data dimension by obtaining the estimated client data distribution based on the uploaded gradient updates. It then calculates the KL divergence between different data distributions to identify malicious clients under attack, ensuring the security and reliability of the training process.

In data poisoning attacks, malicious attackers may intentionally tamper with their local data or inject false data to degrade the global model’s performance. When data is poisoned, originally similar distributions may become dissimilar, resulting in non-independent and identically distributed (non-IID) data. DF-HFL groups the participating MDs according to the heterogeneity of the data distributions, making it easier to identify when user data is subject to poisoning attacks.

At the initialization stage of the group training, we first consider all devices under each ES as a whole. Then, DF-HFL iteratively assigns data points to $K$ clusters, minimizing the sum of squared distances between each data point and its corresponding cluster center. This process includes two steps: calculating the distance between the data distribution of each MD and the center of each cluster and updating the cluster centers. We use Euclidean distance to calculate the distances:

d (ω_{i}, ω_{k}) = ‖ ω_{i}, ω_{k} | |^{2},

(5)

Then update the cluster center after every MD assignment: for the centroid of the $j$ th cluster $ω_{k}$ , compute the average of all data points in the cluster as the new centroid, $ω_{k} = 1 / | S_{j} | \sum_{ω_{i} \in S_{j}} ω_{i}$ . Where $| S_{j} |$ is the number of data points in the $j$ th cluster and $S_{j}$ is the set of data points in the $j$ th cluster.

For the divided clusters, calculate the weight of each cluster: $λ_{j}^{1} = | S_{j} | / \sum_{i = 1}^{K} | S_{i} |$ . Cluster weights are important influences on global model aggregation, determining how much the gradient updates of different MDs contribute to the global model and directly affecting the model accuracy.

Distance and density-based filtering mechanism

In the training process, selecting benign participants and eliminating malicious ones requires a suitable scoring mechanism. Traditional distance-based scoring mechanisms require the data distributions of participating MDs to remain uniform, and the datasets affected by malicious attacks should exhibit significant differences from benign datasets. As noted in the literature,³¹ under conditions of independent and identically distributed (IID) data, the data distribution learned by a local model subjected to data poisoning attacks no longer aligns with the original IID data, resulting in non-independent and identically distributed (non-IID) scenarios. However, this is not always the case; in current practical environments, most clients have non-IID data distributions. In such cases, distance-based detection mechanisms may struggle to effectively identify malicious updates. Meanwhile, density-based detection mechanisms are another mainstream approach for anomaly detection. This method can group participating training clients through clustering to identify abnormal updates. When handling data, regions with lower density are typically considered noise or local anomalies, but sometimes these noise points may be incorrectly classified as global anomalies.^32,33 Additionally, in cases where local density varies significantly within the dataset, this method might mistakenly label regions with lower local density—though not anomalous on a global scale—as anomalies, thus affecting the final training results. Therefore, how to jointly apply both distance-based and density-based anomaly detection mechanisms is crucial for accurately identifying malicious participants.³⁴

We adopt a combined distance and density approach as the data security filtering method in the DF-HFL framework, relying on measuring pairwise distances and density differences between local updates to identify and discard malicious updates. Each ES discards updates that fall outside a specified range, thereby mitigating the impact of malicious data. The specific solution process is shown in Algorithm 1.

Algorithm 1: The Training Process of DF-HFL.

Input: MDs h_i, Number of groups k, Aggregation Threshold a_k

Intimidation: Global model $ω_{c}^{0}$

1: for iteration do

2: Upload gradient updates to ES after device local training

3: Let ES approximate the original data distribution based on the gradient through Equation (2)

4: The K-means clustering algorithm was utilized to classify the MD into different clusters through Equation (4)

5: Calculate cluster weights for each cluster

6: Differences between different MDs in each cluster were calculated and anomaly scored using KL scatter through Equation (6)

7: Calculate the local weight of the MD

8: Filter Amplifier Amplifies Normal Update and Abnormal Update Distance

9: Select the top a_k MD aggregations of the anomaly score

10: Mean value filtering reduces the impact on normal data after removing anomalous data

11: end for

In general, if a participant is subjected to common data poisoning attacks, such as label flipping or malicious data injection, the participant’s data distribution may change, and its KL divergence could increase. KL divergence is a metric that measures the difference between two probability distributions, essentially quantifying the information loss or error introduced when one distribution is used to approximate another. When data distribution changes, the dependencies between variables may increase, causing the KL divergence to grow. The changes introduced by poisoning attacks are malicious and may cause the model to perform worse on affected participants. Therefore, we use KL divergence as a metric to evaluate the distance between participants in order to address attacks at the data poisoning level.

We consider two random variables: local data distribution $D_{i}$ and global data distribution $D_{c}$ , denoted by $P$ and $Q$ respectively, with probability distribution $p (x), q (x)$ . We define $D_{KL} (P ‖ Q)$ as the KL divergence from $P$ to $Q$ , which can be calculated as:

D_{KL} (P ‖ Q) = - \sum_{x} p (x) \log q (x) - \sum_{x} - p (x) \log p (x),

(6)

By calculating the KL divergence between data distributions, we can identify data points that may differ from others due to malicious attacks, data poisoning, or other abnormal situations. This allows us to promptly detect non-independent and identically distributed (non-IID) data, helping to identify potential malicious or anomalous data, thus improving the model’s robustness and security.

According to the KL divergence, the local weight distribution of MD is calculated: $λ_{i}^{2} = 1 / (1 + D_{KL} (D_{i} ‖ D_{c}))$ .

Add the local weight of MD to the cluster weight of its cluster to get the final score weight: $λ_{i} = λ_{j}^{1} + λ_{i}^{2}$ . The obtained weights are normalized to the maximum and minimum values, and the weight values are scaled to a specific range of $[0, 1]$ to better compare and process the weight values: $λ_{i}^{^{'}} = (λ_{i} - λ_{min}) / (λ_{max} - λ_{min})$ . Among them, $λ_{min}$ and $λ_{\max}$ are the minimum and maximum values of the weight values.

Abnormal detection based on dual-layer filtering

DF-HFL uses a two-layer filtering mechanism to identify abnormal updates and reduce the impact of such updates on the original gradients. The first layer employs a maximum filter amplifier to identify active gradients and amplify the difference between abnormal and normal gradients, helping to detect abnormal updates. The second layer applies mean filtering to smooth the filtered gradients, further reducing the impact of noise and other malicious attacks on the gradients, ensuring that gradient updates are more continuous, and making the model updates smoother and more consistent, as shown in Figure 3.

Figure 3.

Abnormal filtering based on dual-layer filtering.

Filter amplifier

The heterogeneous nature of data distribution may cause the uploaded gradient directions to become disordered, making it difficult to filter out abnormal gradients. In DF-HFL, the local updates uploaded by clients are collected at the ES, consisting of gradient updates from each participant’s local model training. These updates may contain both benign gradients and traces of malicious gradients. DF-HFL uploads the aggregated gradients from the ES to the CS, where maximum filtering is applied to the uploaded gradient updates to identify the most active feature gradients. Cosine similarity between the active gradients and all received gradients is used to filter out abnormal gradients, ensuring that the model training remains on the correct trajectory.

Upon receiving the updated gradients $ω i$ at the CS, the most active features—those with the greatest impact on the model—are extracted. This helps amplify the differences between benign and malicious gradients, making the discrepancies more apparent on specific features. The process begins by applying maximum filtering and cascading operations to the received updates. During maximum filtering, each participant’s update $ω i$ is divided into $k p \times k p$ blocks, and maximum filtering is performed using a kernel size to retain the most active feature gradients within each block. The most active feature gradient can be calculated as follows:

\begin{aligned} ω_{act}^{^{'}} = & max ({ω_{g_{i}, w_{i}} ∣ g_{i} \in [k_{p} * (g_{j} - 1) + 1, k_{p} * g_{j}] \\ w_{i} \in [k_{p} * (w_{j} - 1) + 1, k_{p} * w_{j}]}), \end{aligned}

(7)

g_{j} \in {1, 2, \dots, G_{out}}, w_{j} \in {1, 2, \dots, W_{o u t}},

(8)

where

G_{out} = G_{i} / k_{p}, W_{out} = W_{i} / k_{p}

G_{i}

represents the height of each gradient update,

W_{i}

represents the width of each gradient update, and

k_{p}

is the size of the block. Then, the amplified gradients are combined through cascading operations to form an amplified gradient set

ω_{act} = {ω_{ac t_{i}}^{^{'}} | i = 1, 2, \dots, N}

. This step achieves gradient selection by retaining the information that has the most impact on the model. The final amplified gradient set is used as an evaluation metric for gradient security. Even if a participant is subject to model poisoning attacks, the gradients processed by the maximum filter will not be significantly affected. This security evaluation metric helps ensure the safe aggregation of the model and can effectively counter attacks at the model poisoning level.

By combining active feature extraction with maximum filtering, this method enhances the model’s sensitivity to gradients. Thus, it improves the model’s ability to detect and resist malicious attacks, ensuring its safety and stability.

For each gradient vector in the amplified gradient set, the cosine similarity with other gradient vectors is calculated, and a threshold is set to determine whether the gradient vectors are abnormal. For those gradient vectors with cosine similarity below the threshold, they are regarded as abnormal gradients and filtered out. The abnormal gradients are also removed from the gradient set. The cosine similarity between the gradient vector $ω_{i}$ and the global gradient $ω_{c}$ can be calculated as:

similarity (ω_{i}, ω_{c}) = \frac{\sum_{i = 1}^{n} ω_{i} \times ω_{c}}{\sqrt{\sum_{i = 1}^{n} {(ω_{i})}^{2}} \times \sqrt{\sum_{i = 1}^{n} {(ω_{c})}^{2}}},

(9)

Through this hierarchical filtering mechanism, abnormal MD data distributions and gradient updates can be filtered out layer by layer, preventing malicious attackers from providing false data or tampering with real data and gradients, which could affect the accuracy of model training. This ensures the safety and accuracy of the global model.

To find the difference between the MD gradient updates and the global model gradient updates, the difference value is multiplied by the MD’s scoring weight: $p_{i} = λ_{i}^{^{'}} * similarity (ω_{i}, ω_{c})$ , selecting the $k$ MDs with the smallest distance for aggregation. Finally, DF-HFL applies mean filtering to the model parameters to be aggregated, enhancing the robustness of the updated gradients.

Mean filter

DF-HFL uses mean filtering to average multiple gradient data points to smooth the gradient data of MD, so as to effectively suppress the influence of noise or outliers. Although DF-HFL identifies abnormal updates through the first layer, so that the gradient abnormality of a certain MD is weakened in the first layer of mean filtering, there may still be some residual abnormal influence. Therefore, DF-HFL can eliminate this long-tail effect through further smoothing in the second layer, ensure more stable gradient updates, and reduce the excessive impact of a single client upload.

The original gradient of MD is represented as a matrix, and DF-HFL defines a convolution kernel sliding filter window to perform a sliding mean filtering operation from left to right and from top to bottom on the original gradient tensor $ω_{i}$ of MD. For example, the gradient of each dimension of the original gradient tensor is represented by a 6 $\times$ 6 matrix, and the size of the convolution kernel sliding filter window is 2 $\times$ 2. The mean filtering of DF-HFL uses a 2 $\times$ 2 sliding filter window to gradually traverse the 6 $\times$ 6 original gradient and finally obtain the gradient after mean filtering. In order to make the gradient dimension after filtering consistent with the original gradient, we expand the original 6 $\times$ 6 matrix into a 7 $\times$ 7 matrix, fill part of the newly expanded matrix with 0, and then perform a mean filtering operation on the expanded 7 $\times$ 7 gradient matrix. This method can counteract the influence of abnormal data on the original gradient, effectively eliminate the drastic changes in local gradients caused by illegal attacks on data or gradients, and improve the robustness of the system.

The computational burden of DF-HFL mainly exists in the two-layer filtering mechanism. The first is the maximum filtering mechanism, which requires dividing the gradient of each MD into blocks, taking the maximum value and reconstructing the gradient. The computational complexity is $O (N \times d)$ , where $d$ is the dimension of the gradient. The second is the mean filtering, which calculates the mean of the gradient matrix of each MD through a sliding window operation, with a complexity of $O (N \times d)$ .

Performance analysis

This section introduces the experimental results of the proposed method. First, we introduce the simulation environment and hyperparameter settings of the DF-HFL anomaly recognition algorithm. Secondly, we verify the recognition performance of the DF-HFL method for malicious updates during FL training by increasing the attack ratio and attack intensity and comparing the performance with other baseline algorithms.

Simulation settings

The performance of the experimental platform used in this study is as follows: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00 GHz, with a base processor frequency of 2.00 GHz and memory of 32 $\times$ 4 GB. The software environment is Ubuntu 20.04 and Python 3.10. We simulated a total of 60 devices and 3 edge servers, with 20 devices under the range of each edge server.

Dataset and parameter settings: To verify the resilience of DF-HFL against different attacks, we conducted experiments on the following three datasets:

MNIST³⁵: This dataset contains approximately 70,000 handwritten digit image samples, with 60,000 images used for training and 10,000 for testing. All images are grayscale with a size of 28 $\times$ 28 pixels, and each pixel’s grayscale value ranges from 0 to 255. For this dataset, the attacker injected 100 malicious participants into the HFL system. The source label was set to the digit 7, and the target label was set to the digit 1.

Fashion-MNIST(FMNIST)³⁶: This dataset contains grayscale images of various clothing and accessory items, totaling 10 categories. Each category has 6,000 images, with a total of 60,000 training images and 10,000 test images. Each image is sized at 28x28 pixels, with grayscale levels ranging from 0 to 255.

Heart Disease³⁷: This dataset is a classic medical dataset that is often used in machine learning and statistical modeling research to predict the presence or risk of heart disease.

Bank Market³⁸: This dataset is used to predict customer responses in bank marketing activities. Specifically, the dataset records the bank’s interactions with customers during telephone marketing.

Potential attacks and comparison methods: This article evaluates the detection performance of DF-HFL against malicious attacks during model training from five aspects: the selection of aggregation threshold parameters, the convergence performance of DF-HFL under different malicious attack ranges, ablation experiments on the effectiveness of each module in DF-HFL, comparisons of the time required for each training round across different methods, and computational overhead, as well as the global model accuracy under varying malicious attack ranges and intensities.

We employed Gaussian noise attacks, which can lead to misjudgments in anomaly detection models, incorrectly identifying normal data as anomalous or failing to correctly detect actual anomalous data. We varied the variance of the Gaussian noise, which represents the intensity of the attack. To validate the adaptive real-time resistance capability of DF-HFL against attacks of varying intensities during training, we defined the intensities of malicious attacks as 0.3, 0.4, and 0.5. Here, higher values indicate stronger attack intensities.

We compared DF-HFL with the following four methods:

FedAvg³⁹: This method averages all received local updates at the cloud server without performing any malicious gradient detection operations.

Random³⁹: In this method, for the gradient updates received by the cloud server, a random selection of updates is made for aggregation in each round, with a new random selection in every round.

Krum¹⁰: This method filters out anomalous models by calculating the differences between the model updates provided by each participant and the other model updates using Euclidean distance.

Multi-Krum⁴⁰: A robust aggregation algorithm for aggregating client gradients. It calculates the distance between the gradient update uploaded by each client and the gradient updates of other clients based on the Euclidean distance, and selects the client with the closest distance to aggregate.

Simulation results

In this section, we evaluate the performance of DF-HFL and four other methods in resisting malicious attacks during HFL training.

Threshold parameter selection

DF-HFL selects the top $a_{k}$ gradient updates from each cluster based on anomaly scores for aggregation at the cloud server (CS). Considering that the optimal aggregation threshold varies for different datasets and attack ranges, we fixed the attack ranges at 10%, 20%, 30%, and 40% for the MNIST, FMNIST, Heart Disease, and Bank Market datasets to verify the best aggregation threshold. The attack range percentage indicates the ratio of malicious devices (MDs) affected by attacks to the participating MDs during the training process of the HFL model. The optimal threshold is the ratio of the number of MDs selected for aggregation to the total number of MDs. According to the experimental results in Table 2, a higher attack range necessitates a lower aggregation threshold to achieve a higher accuracy of the global model, and vice versa. Specifically, for the MNIST dataset, the optimal aggregation thresholds for attack ranges of 10%, 20%, 30%, and 40% are 90%, 70%, 60%, and 50%, respectively. For the FMNIST dataset, the optimal aggregation thresholds for the same attack ranges are 90%, 80%, 60%, and 50%. In the Heart Disease dataset, the optimal aggregation thresholds for attack ranges of 10%, 20%, 30%, and 40% are 90%, 70%, 60%, and 50%, respectively. In the Bank Market dataset, the optimal aggregation thresholds for attack ranges of 10%, 20%, 30%, and 40% are 80%, 70%, 60%, and 50%, respectively. All the following experiments are tested under the corresponding optimal aggregation threshold values. All subsequent experiments are conducted at their corresponding optimal aggregation threshold sizes.

Table 2.

Global model accuracy for different aggregation thresholds.

Threshold	Attack range
Threshold	50%	60%	70%	80%	90%
MNIST
10%	0.933	0.937	0.959	0.967	0.968
20%	0.94	0.951	0.964	0.962	0.95
30%	0.952	0.955	0.953	0.944	0.917
40%	0.953	0.95	0.927	0.9	0.876
FMNIST
10%	0.806	0.809	0.812	0.813	0.815
20%	0.805	0.807	0.807	0.808	0.762
30%	0.800	0.802	0.791	0.759	0.630
40%	0.797	0.793	0.679	0.609	0.507
Heart Disease
10%	0.803	0.803	0.819	0.819	0.869
20%	0.801	0.801	0.803	0.786	0.786
30%	0.803	0.819	0.786	0.803	0.812
40%	0.803	0.801	0.736	0.786	0.770
Bank Market
10%	0.882	0.882	0.882	0.885	0.875
20%	0.879	0.879	0.881	0.854	0.776
30%	0.859	0.882	0.872	0.876	0.867
40%	0.869	0.845	0.868	0.846	0.723

Global model convergence of DF-HFL

We first verify the convergence performance of DF-HFL on four datasets, MNIST, FMNIST, Heart Disease, and Bank Market. In order to verify the ability of DF-HFL to withstand different levels of attacks, we fix the attack range as 10%, 20%, 30%, and 40%, and the attack intensity as 0.3, 0.4, and 0.5. The percentage of the attack range denotes the ratio of the MDs that have been maliciously attacked to those that have participated in the training of the HFL model during the training process. Meanwhile, the attack intensity denotes the variance of Gaussian noise attack as 0.3, 0.4, and 0.5. We plot the convergence results of DF-HFL in Figure 4. DF-HFL can maintain convergence on both datasets. Meanwhile, under the MNIST dataset, DF-HFL can maintain convergence with 40 rounds of iterations when the attack intensity is 0.3. When the attack intensity is 0.1, and the attack range is 10% to 40%, the DF-HFL can be maintained at a lower level after 50 rounds of iterations, and the DF-HFL fluctuates when the attack intensity is 0.5, and the attack range is 40%, but it always stays at a lower level. It can be seen that DF-HFL can always reach convergence with fewer training rounds for different strengths and different ranges of malicious attacks.

Figure 4.

Convergence of DF-HFL under different attack intensities. (a) MNIST-0.3 attack intensity; (b) MNIST-0.4 attack intensity; (c) MNIST-0.5 attack intensity; (d) FMNIST-0.3 attack intensity; (e) FMNIST-0.4 attack intensity; (f) FMNIST-0.5 attack intensity; (g) Heart Disease-0.3 attack intensity; (h) Heart Disease-0.4 attack intensity; (i) Heart Disease-0.5 attack intensity; (j) Bank Market-0.3 attack intensity; (k) Bank Market-0.4 attack intensity; (l) Bank Market-0.5 attack intensity.

Accuracy of global models with different malicious attack range

We verified the resistance of DF-HFL compared to FedAvg, Krum, Random, and Multi-Krum under different ranges of malicious attacks on the datasets MNIST, FMNIST, Heart Disease, and Bank Market. We fixed the attack intensity to 0.3, respectively, and compared the global model accuracy change curves for 300 rounds of iterative training with different methods under 10%, 20%, 30%, and 40% attack ranges. According to the experimental results in Figure 5, on the one hand, under the MNIST dataset, when the attack range is small, the global models of DF-HFL, FedAvg, and Random can quickly converge to a high accuracy. The accuracy of the global models of FedAvg and DF-HFL with 12 rounds of iterative training can be maintained at around 0.95, Random needs 65 rounds of iterative training to maintain convergence, and Multi-Krum needs 95 rounds of iterative training to maintain convergence, but Krum needs 280 rounds of iterative training to converge to lower accuracy. On the other hand, when the attack range is increased to 40%, DF-HFL can still converge to a higher accuracy with fewer iterations, and the final accuracy of its global model is higher than that of the other methods. Although Random can converge, its accuracy is lower, as is that of Krum. This is because DF-HFL scores the MDs involved in training abnormally jointly based on distance and density methods and amplifies the distance between normal and malicious updates through a dual-layer filtering mechanism, which helps to better identify abnormal updates and ensures the accuracy of the global model.

Figure 5.

Accuracy of DF-HFL under different attack range. (a) MNIST-10% attack range; (b) MNIST-20% attack range; (c) MNIST-30% attack range; (d) MNIST-40% attack range; (e) FMNIST-10% attack range; (f) FMNIST-20% attack range; (g) FMNIST-30% attack range; (h) FMNIST-40% attack range; (i) Heart Disease-10% attack range; (j) Heart Disease-20% attack range; (k) Heart Disease-30% attack range; (l) Heart Disease-40% attack range; (m) Bank Market-10% attack range; (n) Bank Market-20% attack range; (o) Bank Market-30% attack range; (p) Bank Market-40% attack range.

Comparison of global model accuracy under different attack ranges and attack intensities

In order to verify the ability of DF-HFL to resist malicious attacks under different attack ranges and attack strengths, we fixed the attack strengths to 0.3, 0.4, and 0.5, and fixed the attack ranges to 10%, 20%, 30%, and 40% on the datasets MNIST, FMNIST, Heart Disease, and Bank Market, and compared the global model accuracy of DF-HFL with FedAvg, Krum, Random and Multi-Krum after 300 rounds of iterative training. According to the experimental results in Figure 6, under the Mnist dataset, when the attack intensity is 0.3, the global model accuracies after 300 rounds of training can reach more than 0.9 for all methods except Krum, but as the attack intensity increases, the global model accuracies of the other methods are lower than that of DF-HFL, for example, at an attack intensity of 0.3 and an attack range of 40%, the DF-HFL s global model accuracy is 5.63%, 16.46%, 23.32%, and 8.59% higher than the other four methods, respectively. In summary, DF-HFL calculates the data of the MDs involved in training, the anomaly scores of the gradient updates from multiple perspectives, and amplifies the gap between the anomaly updates and the normal updates through the dual-layer filtering mechanism while reducing the impact of the removed anomalies on the correct data, so that it can maximize the retention of the effective training information and improve the accuracy of the global model while identifying the malicious attacks.

Figure 6.

Accuracy under different attack ranges and attack intensities. (a) MNIST-0.3 attack intensity; (b) MNIST-0.4 attack intensity; (c) MNIST-0.5 attack intensity; (d) FMNIST-0.3 attack intensity; (e) FMNIST-0.4 attack intensity; (f) FMNIST-0.5 attack intensity; (g) Heart Disease-0.3 attack intensity; (h) Heart Disease-0.4 attack intensity; (i) Heart Disease-0.5 attack intensity; (j) Bank Market-0.3 attack intensity; (k) Bank Market-0.4 attack intensity; (l) Bank Market-0.5 attack intensity.

Computational overhead

We compared the computational overhead of DF-HFL with FedAvg, Krum, Random, and Multi-Krum on the datasets MNIST, FMNIST, Heart Disease, and Bank Market for 300 epochs. We set the attack range to 10% and the attack intensities to 0.3. According to the experimental results in Table 3, the computational overhead of DF-HFL for 300 epochs is slightly higher than that of FedAvg and Random because it needs to perform a double-layer filtering anomaly detection mechanism to identify anomalies, while FedAvg directly trains all data samples and aggregates all updates without doing any anomaly detection. Random minimizes the computational overhead by selecting only one update to aggregate in each round. However, through the above experiments, the accuracy of FedAvg and Random is much lower than that of DF-HFL. Therefore, DF-HFL only needs a little more computational overhead to obtain a global model with much higher accuracy than other methods.

Table 3.

Computational overhead.

Overhead(s)	MNIST	FMNIST	Heart Disease	Bank Market
DF-HFL	2726.07	4081.20	1062.39	3621.60
FedAvg	2757.51	3802.99	974.30	3630.98
Krum	2779.37	4439.88	2960.43	4273.87
Random	2436.88	3998.21	946.22	3355.71
Multi-Krum	2785.6	4441.29	3099.3	4326.38

Conclusions

In this article, we study the anomaly filtering problem under the HFL framework, which aims to avoid operations such as malicious attackers providing false data or tampering with normal gradient updates to degrade the accuracy of the global model by identifying illegal attacks during model training and transmission. Therefore, we propose a robust hierarchical federated learning method with a two-layer filtering mechanism, which gets rid of the limitations imposed by the original method of identifying malicious attacks by using only the distance or density gaps in the data distributions or gradient updates, and approximates the data distributions by the updated gradients through a kernel density estimation method, and classifies the devices with similar distributions into the same cluster. A two-layer filtering mechanism is used to identify malicious updates while reducing the impact of malicious data on the normal gradient. In this process the distance between malicious updates and normal updates using maximum filtering method strips the malicious updates from the overall trend of the gradient. Mean value filtering is also used to make the filtered model updates smoother and more continuous, to enhance the robustness of the gradient and to guarantee the balance between model performance and privacy protection.

Footnotes

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the Joint Key Project of National Natural Science Foundation of China under Grant U2468205, in part by the Researchers Supporting Project number (RSPD2025R681) at King Saud University, Riyadh, Saudi Arabia, in part by the National Natural Science Foundation of China under Grant 62202156, Grant 62472168, Grant 62473146 and Grant 62072056; in part by the Hunan Provincial Key Research and Development Program under Grant 2023GK2001 and Grant 2024AQ2028; in part by the Hunan Provincial Natural Science Foundation of China under Grant 2024JJ6220; in part by the Key Project of Natural Science Foundation of Hunan Province under Grant 2024JJ3017; and in part by the Research Foundation of Education Bureau of Hunan Province under Grant 23B0487.

ORCID iD

Chaoyi Yang

Data availability statement

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

Zhan

, et al. Hwamei: a learning-based synchronization scheme for hierarchical federated learning. In: 2023 IEEE 43rd international conference on distributed computing systems (ICDCS), pp.534–544. IEEE.

Pandya

Srivastava

Jhaveri

, et al. Federated learning for smart cities: a comprehensive survey. Sustainable Energy Technol Assess 2023; 55: 102987.

Gao

Wang

Guo

, et al. Federated learning based on CTC for heterogeneous internet of things. IEEE Internet Things J 2023; 10: 22673–22685.

Rani

Kataria

Kumar

, et al. Federated learning for secure IoMT-applications in smart healthcare systems: a comprehensive review. Knowl Based Syst 2023; 274: 110658.

Liang

Xiao

Chen

, et al. TMHD: twin-bridge scheduling of multi-heterogeneous dependent tasks for edge computing. Future Gener Comput Syst 2024; 158: 60–72.

Zhou

Chen

, et al. TrustBCFL: mitigating data bias in iot through blockchain-enabled federated learning. IEEE Internet Things J 2024.

, et al. 3DFed: adaptive and extensible framework for covert backdoor attack in federated learning. In: 2023 IEEE symposium on security and privacy (SP), pp.1893–1907. IEEE.

Zhang

Chen

Cheng

, et al. PoisonGAN: generative poisoning attacks against federated learning in edge computing systems. IEEE Internet Things J 2020; 8: 3310–3322.

Gao

Guo

Zhang

, et al. Privacy-preserving collaborative learning with automatic transformation search. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.114–123.

10.

Fang

Cao

Jia

, et al. Local model poisoning attacks to Byzantine-Robust federated learning. In: 29th USENIX security symposium (USENIX Security 20), pp.1605–1622.

11.

Yin

Chen

Kannan

, et al. Byzantine-Robust distributed learning: towards optimal statistical rates. In: International conference on machine learning, pp.5650–5659. PMLR.

12.

Fraboni

Vidal

Lorenzi

. Free-rider attacks on model aggregation in federated learning. In: International conference on artificial intelligence and statistics, pp.1846–1854. PMLR.

13.

Guerraoui

Rouault

, et al. The hidden vulnerability of distributed learning in byzantium. In: International conference on machine learning, pp.3521–3530. PMLR.

14.

Tolpegin

Truex

Gursoy

, et al. Data poisoning attacks against federated learning systems. In: Computer security—ESORICs 2020: 25th European symposium on research in computer security, ESORICs 2020, guildford, UK, September 14–18, 2020, proceedings, part I 25, pp.480–501. Springer.

15.

Boucher

Shumailov

Anderson

, et al. Bad characters: imperceptible NLP attacks. In: 2022 IEEE symposium on security and privacy (SP), pp.1987–2004. IEEE.

16.

Cao

Chang

Lin

, et al. Understanding distributed poisoning attack in federated learning. In: 2019 IEEE 25th international conference on parallel and distributed systems (ICPADS), pp.233–239. IEEE.

17.

Chen

Salem

Chen

, et al. BadNL: backdoor attacks against NLP models with semantic-preserving improvements. In: Proceedings of the 37th annual computer security applications conference, pp.554–569.

18.

Darzidehkalani

Ghasemi-Rad

Van Ooijen

. Federated learning in medical imaging: part II: methods, challenges, and considerations. J Am Coll Radiol 2022; 19: 975–982.

19.

Liu

Huang

Luo

, et al. FedVision: an online visual object detection platform powered by federated learning. In: Proceedings of the AAAI conference on artificial intelligence, volume 34. pp.13172–13179.

20.

Arias

LAS

Oosterlee

Cirillo

. AIDA: analytic isolation and distance-based anomaly detection algorithm. Pattern Recognit 2023; 141: 109607.

21.

Velliangiri

Amma

Baik

. Detection of DoS attacks in smart city networks with feature distance maps: a statistical approach. IEEE Internet Things J 2023; 10: 18853–18860.

22.

Vinayakumar

Alazab

Srinivasan

, et al. A visualized botnet detection system based deep learning for the internet of things networks of smart cities. IEEE Trans Ind Appl 2020; 56: 4436–4456.

23.

Zhao

, et al. LoMar: a local defense against poisoning attack on federated learning. IEEE Trans Dependable Secure Comput 2021; 20: 437–450.

24.

Gong

Shen

Zhang

, et al. AgrAmplifier: defending federated learning against poisoning attacks through local update amplification. IEEE Trans Inf Forensics Secur 2023; 19: 1241–1250.

25.

Feng

Rong

Sun

, et al. PMF: a privacy-preserving human mobility prediction framework via federated learning. Proc ACM Interact Mobile Wearable Ubiquitous Technol 2020; 4: 1–21.

26.

Fung

Yoon

Beschastnikh

. The limitations of federated learning in sybil settings. In: 23rd International symposium on research in attacks, intrusions and defenses (RAID 2020), pp.301–316.

27.

Al-Maslamani

Ciftler

Abdallah

, et al. Toward secure federated learning for IoT using DRL-enabled reputation mechanism. IEEE Internet Things J 2022; 9: 21971–21983.

28.

Liu

Wang

. Threats, attacks and defenses to federated learning: issues, taxonomy and perspectives. Cybersecurity 2022; 5: 4.

29.

Tao

Zhang

, et al. Privacy-preserved federated learning for autonomous driving. IEEE Trans Intell Transp Syst 2021; 23: 8423–8434.

30.

Kumari

Rieger

Fereidooni

, et al. BayBFed: Bayesian backdoor defense for federated learning. In: 2023 IEEE symposium on security and privacy (SP), pp.737–754. IEEE.

31.

Zhang

Cao

Jia

, et al. FLDetector: defending federated learning against model poisoning attacks via detecting malicious clients. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pp.2545–2555.

32.

Niu

Tang

, et al. Billion-scale federated learning on mobile clients: a submodel design with tunable privacy. In: Proceedings of the 26th annual international conference on mobile computing and networking, pp.1–14.

33.

Zhang

Zuo

Zhang

. FFDNet: toward a fast and flexible solution for CNN-based image denoising. IEEE Trans Image Process 2018; 27: 4608–4622.

34.

Huang

Ding

Jiang

, et al. DP-FL: a novel differentially private federated learning framework for the unbalanced data. World Wide Web 2020; 23: 2529–2545.

35.

LeCun

Bottou

Bengio

, et al. Gradient-based learning applied to document recognition. Proc IEEE 1998; 86: 2278–2324.

36.

Xiao

Rasul

Vollgraf

. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:170807747, 2017.

37.

Janosi

Steinbrunn

Pfisterer

, et al. Heart disease. UCI machine learning repository, 1989. DOI: https://doi.org/10.24432/C52P4X.

38.

Moro

RPS

Cortez

. Bank marketing. UCI machine learning repository, 2014. DOI: https://doi.org/10.24432/C5K306.

39.

Zhou

Lin

Loghin

, et al. Communication-efficient decentralized machine learning over heterogeneous networks. In: 2021 IEEE 37th international conference on data engineering (ICDE), pp.384–395. IEEE.

40.

Blanchard

El Mhamdi

Guerraoui

, et al. Machine learning with adversaries: Byzantine tolerant gradient descent. In: Guyon I, Luxburg UV, Bengio S, et al. (eds) Advances in neural information processing systems, volume 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/f4b9ec30ad9f68f89b29639786cb62ef-Paper.pdf.