Sage Journals: Discover world-class research

Abstract

Federated learning framework facilitates more applications of deep learning algorithms on the existing network architectures, where the model parameters are aggregated in a centralized manner. However, some of federated learning participants are often inaccessible, such as in a power shortage or dormant state. That will force us to explore the possibility that the parameter aggregation is operated in an ad hoc manner, which is based on consensus computing. On the contrary, since caching mechanism is indispensable to any federated learning mobile node, it is necessary to investigate the connection between it and consensus computing. In this article, we first propose a novel federated learning paradigm, which supports an ad hoc operation mode for federated learning participants. Second, a discrete-time dynamic equation and its control law are formulated to satisfy the demands from federated learning framework, with a quantized caching scheme designed to mask the uncertainties from both asynchronous updates and measurement noises. Then, the consensus conditions and the convergence of the consensus protocol are deduced analytically, and a quantized caching strategy to optimize the convergence speed is provided. Our major contribution is to give the basic theories of distributed consensus problem for federated learning framework, and the theoretical results are validated by numerical simulations.

Keywords

Consensus computing federated learning ad hoc mode quantized caching convergence

Introduction

Federated learning (FL) is an emerging and promising decentralized machine learning approach that performs a collaborative training of models with the local datasets on various mobile devices, with the local model updates being sent to a server to aggregate and the updated global model being fed back for the next round of local training, instead of transmitting the raw data to a data center. Hence, training models are shared and the privacy of local datasets is preserved, while the communication cost can be reduced greatly.¹ Given these compelling benefits, a rapidly increasing research attention has been dedicated to applying FL in the field of wireless communications to support more intelligent, more convenient and applicable applications,² for example, an image classification task for vehicular edge computing,³ a content popularity prediction for augmented reality (AR) applications,⁴ a signal classification or a deep anomaly detection in industrial distributed wireless sensor networks, etc.^5,6

Recent 2 years have begun to witness an increasing interest in studying how to employ FL on mobile ad hoc networks (MANET) or multi-agent systems (MAS), such as unmanned aerial vehicles (UAV),⁷ and vehicular Internet of Things (IoT).⁸ Nevertheless, these mobile devices acting as FL clients are designed to directly communicate with an FL server (or a cluster center that plays a role in relaying), rather than an ad hoc operation mode. This situation is also present when FL is applied to wireless sensor networks.^5,6 The distributed consensus problem is the theoretical basis for supporting FL parameter synchronization (namely, model updates’ aggregation) in the context of ad hoc operation mode.

The distributed consensus problem (aka consensus computing) means to make the scalar states of a set of nodes (or agents) converge to the same value or the average under local communication constraints. A distributed consensus algorithm or protocol is an interaction rule that specifies the information exchange between an agent and all of its neighbors on the network. The essence of this algorithm is that, in each round, one or more agents can communicate information with its immediate neighbors, and then each agent updates its estimate of a quantity of states by combining the estimate with those of its neighbors.

What are the benefits brought by the ad hoc operation mode for FL framework? (1) Either FL server or cluster center may be inaccessible in some scenarios, such as in a power shortage or device dormant state, (2) when a large number of FL clients send their requests to link an FL server or a cluster center, it may be overwhelmed due to its limited capability, (3) this operation mode enables short-range communications such as ultra-wideband by which a large channel capacity can be obtained for local model updates’ transmissions that are capacity-consuming, (4) in this mode, FL parameter synchronization is enabled by mobile devices in a coordinated manner in the absence of FL server or cluster center, and (5) the asynchronous transmissions of local model updates are allowed to a considerable extent in this operation mode.

In view of the above reasons, a novel FL paradigm is proposed at the beginning of this article, as shown in Figure 1. It is obvious that the most salient challenge for this paradigm should be the communication overhead for parameter synchronization. Thanks to the sparsification technology,⁹ combining with quantization, compression, or selective communication technologies,² the size of model or gradient updates can be reduced tremendously. It can be seen that our scheme is sufficiently practicable if ultra-wideband communication is employed in this paradigm as well.

Figure 1.

The proposed FL paradigm supports mobile devices in an ad hoc operation mode.

In a word, to the best of our knowledge, as of now the study of the distributed consensus problem over FL framework has not been found yet. That is our motivation of exploring this issue.

Related work

As mentioned above, until now we have not seen an FL framework whose mobile clients operate in an ad hoc manner yet. The existing studies of FL over MANET (or MAS) are concentrated on communication cost reduction, FL client selection, data privacy and security, so on.

Zhang and Hanzo⁷ proposed an FL-aided multi-UAV system to conduct classification tasks for exploration scenarios, where each of UAV is coordinated by a ground fusion center as FL server to form a cooperative network. An algorithm of weighted zero-forcing precoding is used by each of UAV to mitigate the interference to the FL server. Bao et al.⁸ proposed an edge computing-based joint client selection and networking scheme for vehicular IoT, where some of vehicles are assigned to act as both edge nodes (aka cluster centers) and FL clients via a distributed approach. The selected clients play a role of forwarders between common vehicles and FL server. Lu et al.¹⁰ employed an FL architecture empowered by blockchain to address data privacy concerns on Internet of Vehicles, where the security of shared data is guaranteed by integrating learning parameters into a blockchain. Regarding the aforementioned sparsification technology, Sun et al.⁹ presented a general gradient sparsification framework as another way to reduce communication cost for FL parameter synchronization on IoT, where validation data sets are maintained with top-1 accuracy when 99.9% gradients are sparsified. As for the FL applications in distributed sensor networks, Liu et al.⁵ used an FL paradigm to fuse the learning process and recognition results of each sensor node for the modulation recognition of wireless signals. Liu et al.⁶ proposed an on-device FL-based deep anomaly detection framework for sensing time-series data. Both of the FL frameworks require parameter aggregators as FL servers, instead of an ad hoc operation mode.

To date, the distributed consensus problem of perfect models, which are assumed that each agent (or node) can obtain its neighbor information timely and precisely, instantaneous transmissions, perfect clock synchronization, concurrent updates, identical agent dynamics, even fixed network topologies, has reached a reasonable degree of maturity.¹¹ Nonetheless, wireless networked systems in practical applications often operate in uncertain communication environments and are inevitably subjected to communication latency, asynchronous clock and updates, agents’ heterogeneity, topological dynamics, as well as measurement noises (including additive and multiplicative noises). Then what are the specific demands on consensus computing in terms of FL framework over MANET (or MAS)? (1) Due to the intermittent and unreliable communications arising from topological dynamics and channel noise, measurement noises that may be treated as time delay (i.e. additive noise) and packet loss (i.e. multiplicative noise) should be taken into account. Furthermore, time delay on each link (including channel and queuing delay) is asked to be non-identical and time-varying. Packet loss rate on each link (including channel and queuing packet loss) may be non-identical and time-varying as well, (2) given that some of nodes are sometimes inaccessible (e.g. in dormant state), FL parameter aggregation must be achieved in an asynchronous manner, (3) in consideration of nodes’ heterogeneity, caching on each node ought to be non-identical, and (4) consensus computing is requested to make good use of the broadcast nature of wireless communications, in that its convergence can be accelerated tremendously.¹²

Although the existing distributed consensus algorithms explore some of four aspects mentioned above, to the best of our knowledge, an algorithm that can fulfill all of requests has not been seen yet. Moreover, the caching issue along with consensus computing has not been received any concerns up to now. Olfati-Saber and Murray¹³ provided the consensus protocols and their convergence analysis for directed balanced networks with constant time-delays by introducing disagreement functions, while a direct connection between the algebraic connectivity of a graph and the convergence of a linear consensus protocol is established. Savino et al.¹⁴ contributed a sufficient condition of consensus for discrete-time switching networks, based on linear matrix inequalities that consider the joint effect of time-varying delays and topological uncertainty. Under an assumption that delay is time-varying and undirected network is connected, Wang et al.¹⁵ derived the conditions to guarantee consensus for continuous-time multi-agent systems. For the consensus problem of a switched multi-agent system composed of continuous-time and discrete-time subsystems, Zheng and Wang¹⁶ proposed a linear consensus protocol and proved that this consensus problem is solvable under arbitrary switching with undirected connected graph, directed graph, and switching topologies, respectively. Kar and Moura¹⁷ studies the distributed average consensus with intermittent topologies and noisy channels in sensor networks, which leads to a bias-variance dilemma, that is, running consensus for long reduces the bias of the final average estimate but increases its variance, and presented two versions of consensus compromise to this tradeoff. Zong et al.¹⁸ investigated the stochastic consensus conditions of linear MAS with fixed time-delays and stochastic multiplicative noises. First, the stochastic stability for stochastic differential delay equations driven by multiplicative noises is examined. Then, sufficient conditions are deduced for the mean-square and a. s. consensus. Zheng et al.¹⁹ studied the mean-square consensus problem of discrete-time linear MAS over directed networks with constant delay and non-identical packet dropouts. Sufficient consensus conditions are obtained in terms of delay, packet dropout rates, network topology and agent dynamics. On the basis of a first-order average-consensus protocol with switching networks and additive noises, Chen et al.²⁰ gave a quantitative description of relation between convergence speed and connectivity of topologies by using stochastic approximation methods and establishing a critical consensus condition for network topologies.

In short, inspired by the fact that caching plays a critical role in FL operations,^21,22 we will investigate the connection between caching and consensus computing, while discussing the condition for reaching consensus.

Problem formulation

Algebraic graph theory

A MANET (or MAS) is described as a sequence of weighted digraphs $G (k) = {V, E (k)}$ , with graph index $k \in {1, 2, \dots, \infty}$ , where V is the set of N nodes (or agents) and $E (k)$ is the set of all existing wireless links. Let $A (k) = [a_{ij}] \in R^{N \times N}$ be the adjacency matrix of $G (k)$ , with elements below

a_{ij} = {\begin{matrix} w_{ij}, (i, j) \in E (k) \\ 0, (i, j) \notin E (k) \end{matrix}

(1)

where $w_{ij}$ is the weight of link $(i, j)$ .

Suppose that the probability that there exists a link $(i, j)$ is $p_{ij}$ and the packet loss rate over the entire network is $p$ . Let $L (k) = [l_{ij}] \in R^{N \times N}$ be the Laplacian matrix of $G (k)$ , with elements below

l_{ij} (k) = {\begin{matrix} - P_{ij} (k) ε_{ij} (k), i \neq j \\ \sum_{j = 1, j \neq i}^{N} P_{ij} (k) ε_{ij} (k), i = j \end{matrix}

(2)

where $P_{ij} (k)$ indicates whether there exists a link $(i, j)$ , $ε_{ij} (k)$ indicates whether a packet loss occurs over the link $(i, j)$ , and both $P_{ij} (k)$ and $ε_{ij} (k)$ follow a Bernoulli process. As a result, $\Pr (P_{ij} (k) = 1) = p_{ij}$ , $\Pr (P_{ij} (k) = 0) = 1 - p_{ij}$ , $\Pr (ε_{ij} (k) = 0) = p$ , and $\Pr (ε_{ij} (k) = 1) = 1 - p$ .

In addition, let ${\bar{λ}}_{i}$ denote the $i th$ eigenvalue of the Laplacian in average $\bar{L} (k)$ .

Distributed consensus problem

Each node (or agent) in a networked system can be described by a discrete-time dynamic equation, that is

x_{i} (k + 1) = A x_{i} (k) + B u_{i} (k), i = 1, 2, \dots, N

(3)

where $x_{i} \in R^{n}$ and $u_{i} \in R^{m}$ are the state and input of node $i$ , respectively. Both A and B are node $i$ ’s coefficient matrices, which are employed to characterize this node.

In view of the results,²³ we can assume that all eigenvalues of A are either on or outside a unit circle, $[A | B]$ is controllable, and the union of a sequence of weighted directed graphs, that is, $G = \cup G (k)$ , contains a directed spanning tree.

Considering both delay and packet loss, $u_{i}$ can be expressed as

u_{i} = K \sum_{j \in N_{i}} P_{ij} (k - τ) ε_{ij} (k - τ) (x_{j} (k - τ) - x_{i} (k - τ))

(4)

where K is the control gain, $τ$ is the delay denoted by an integer, and $N_{i}$ is the neighbor set of node $i$ . Therefore, we can formulate the update equation as

\begin{matrix} x_{i} (k + 1) = A x_{i} (k) + BK \sum_{j \in N_{i}} P_{ij} (k - τ) ε_{ij} (k - τ) \\ (x_{j} (k - τ) - x_{i} (k - τ)) \end{matrix}

(5)

It is said this networked system can reach a mean-square consensus if there exists a control gain K such that

lim_{k \to \infty} E {| | x_{i} (k) - x_{j} (k) | |^{2}} = 0, \forall i, j \in {1, . . ., N}

(6)

where $| | \cdot | |$ represents the Euclidean norm of the vector.

Consensus protocol

In this section, we will derive a criterion to evaluate the consensus of the networked system and present the consensus protocol.

Consensus conditions

For a networked system with delay and packet loss, the update equation is

x (k + 1) = Ax (k) + ε (k) BKx (k - τ)

(7)

Lemma 1

For equation (7), as long as there exists a positive-definite matrix P satisfying equation (8), then we can say that the system is mean-square consensus, that is

\begin{matrix} (μ BK + A - I)^{T} P + (τ + 1) (μ BK + A - I)^{T} \\ P (μ BK + A - I) + P (μ BK + A - I) + \\ τ (μ BK)^{T} P (μ BK) + σ^{2} K^{T} B^{T} PBK < 0 \end{matrix}

(8)

where $μ = E (ε (k)) = 1 - p$ , $σ^{2}$ is the variance of $ε (k)$ and $σ^{2} = p (1 - p)$ . The proof of this lemma is given in Appendix 1.

We employ the control gain K defined as¹⁹

\begin{matrix} K : \\ \frac{ω^{*}}{2 τ (1 - p) + 1} {(B^{T} QB)}^{- 1} B^{T} Q [(τ + 1) A - τ \cdot I] \end{matrix}

(9)

ω^{*} : = \arg min_{ω \in R} max_{i = 2, \dots, N} {| 1 - ω {\bar{λ}}_{i} |}^{2}

(10)

Q in equation (9) is a positive-definite matrix that meets the Riccati inequality

Q > {\tilde{A}}^{T} Q \tilde{A} - γ {\tilde{A}}^{T} QB (B^{T} QB)^{- 1} B^{T} Q \tilde{A}

(11)

where $\tilde{A} = (τ + 1) A - τ \cdot I$ and $γ = (τ + 1) (1 - p) / (2 τ (1 - p) + 1) (1 - ω^{*})$ .

The consensus error is defined as

δ_{i} (k) = x_{i} (k) - \sum_{j = 1}^{N} r_{j} x_{j} (k)

(12)

where $r_{j}$ is the element of vector $r = [\begin{matrix} r_{1} & r_{2} & . . . & r_{N} \end{matrix}]^{T}$ enabling $r^{T} \cdot L (k - τ) = 0$ and $r^{T} \cdot 1_{N} = 1$ .

Since G has a directed spanning tree, there exist matrices Y and S such that $ψ = [1_{N} Y]$ , $ψ^{- 1} = [r^{T} S]^{T}$ , and $\tilde{δ} (k) = (ψ^{- 1} \otimes I_{N}) E [δ (k)]$ , then

\begin{matrix} {\tilde{δ}}_{i} (k + 1) = A {\tilde{δ}}_{i} (k) - \\ ε (k) {\bar{λ}}_{i} BK {\tilde{δ}}_{i} (k - τ), i = 1, 2, . . ., N \end{matrix}

(13)

We can see that equation (6) is also equal $lim_{k \to \infty} E {| | δ_{i} (k) | |^{2}} = lim_{k \to \infty} E {| | {\tilde{δ}}_{i} (k) | |^{2}} = 0$ .¹⁹ Thus, if this equation always holds for $\forall i \in {1, 2, \dots, N}$ , the consensus can be reached in a mean-square manner.

To get the further study of the topological consensus conditions, we can define Lemma 2. It is difficult to form a balanced digraph (or a balanced joint digraph) for broadcast-based networks. That is why a bias of convergence result from its accurate value occurs sometimes, which is questioned.²⁴

Lemma 2

If the network is a balanced digraph, the final convergence value of it is the mean value of its initial value.

The proof is given in Appendix 2.

Quantized caching

Due to the delay occurring when sending messages between nodes, the asynchronous problem is inevitable. Hence, it is necessary to consider the asynchronous problem of communications. We use a quantized caching mechanism to mask the uncertainties from asynchronous updates and varying delays. As shown in Figure 2, node $i$ receives messages with different delays (from $t - τ Δ t$ to $t$ ) and caches them, respectively. Then, node $i$ deals with them at the moment $t$ . The messages are sent by the neighbor node $j$ at the moment $t - τ Δ t$ , which may arrive at $t - (τ - 1) Δ t$ or $t - (τ - 2) Δ t$ due to the delay. The quantized caching mechanism used caches these messages and deals with them at moment $t$ after they arrived.

Figure 2.

Illustration of quantized caching on the FL participants.

The algorithm given in Table 1 is used to simulate the consensus process of the networked system.

Table 1.

Consensus algorithm.

Input: Initial states

x_{1} (0), x_{2} (0), . . ., x_{N} (0)

Output: Alignment value

x

1: Generate the initial state of N nodes

x_{1} (0), x_{2} (0), . . ., x_{N} (0)

2: N nodes start their broadcasting3: For

i

= 1 to

k

4: Node

i

receives messages from its neighbors5: Node

i

updates its state after

τ

based on

x_{i} (k + 1) = A x_{i} (k) + B u_{i} (k)

6: Node

i

broadcasts its new state7: If

x_{i} (k) = x_{j} (k)

for

\forall i, j

, the alignment is reached at round

k

We give the complexity of message overhead of this consensus algorithm prior to its convergence analysis. Assume that the number of messages sent by all nodes over an entire network is d in a certain period of time $τ_{max}$ that denotes the allowed maximum (without packet loss) of time delay $τ$ . Thus, the message overhead in a round of communication is $τ (d / τ_{max})$ over the entire network. When considering the number of iterated rounds k, the complexity of message overhead should be $k τ (d / τ_{max})$ .

Convergence analysis

In this section, we will discuss the impact of time-delay and packet loss on the convergence speed of the network.

Convergence speed

Inequality equation (66) in Appendix 1 indicates that the convergence speed is determined by $e^{γ}$ . The larger the value of $γ$ is, the faster the algorithm converges. Now, let $s^{*} : = e^{γ}, s^{*} \in (1, s_{0})$ . If $s_{0}$ becomes larger, $s^{*}$ should also become larger. Suppose that $s_{0}$ is the solution of $H (s) = 0$ when $s > 1$ , as shown in Appendix 1. Therefore, the convergence speed of the algorithm turns into the range of zero point values of function $H (s)$ when $s > 1$ . The larger the zero point value of function $H (s)$ is, the faster the algorithm converges.

Based on equation (13), we have

E [P_{1}] = (μ {\bar{λ}}_{i} BK) P [- 2 + (2 τ μ + 1) {\bar{λ}}_{i} BK],

(14)

where $μ = 1 - p$ .

According to equation (63) in Appendix 1, we get

H (s) = C_{2} (s) (1 - s^{- 1}) - λ_{min} (- E [P_{1}])

(15)

where

C_{2} (s) = 2 ‖ P ‖ + 3 τ^{2} ‖ P ‖ \cdot E [‖ ε BK ‖] \cdot s^{τ} \geq 0

(16)

Lemma 1 indicates that it is necessary to satisfy $E [P_{1}] < 0$ to ensure the mean-square consensus of the networked system.

By introducing the control gain K, we get

BK = \frac{ω^{*}}{2 τ (1 - p) + 1} \cdot I

(17)

and

E [P_{1}] = (μ {\bar{λ}}_{i} BK) P ({\bar{λ}}_{i} ω^{*} - 2)

(18)

Since $ω^{*} = 2 / ({\bar{λ}}_{2} + {\bar{λ}}_{max})$ and ${\bar{λ}}_{i} ω^{*} = 2 {\bar{λ}}_{i} / ({\bar{λ}}_{2} + {\bar{λ}}_{max}) < 2$ ,¹⁹ as long as P is a positive-definite matrix, $E [P_{1}] < 0$ holds. We also assume that $A = 1$ , $B = b \in R^{1 \times 1}$ , $K = k \in R^{1 \times 1}$ , and all of the digraphs are balanced.

When each node state is a one-dimensional vector, $H (s)$ may be expressed as

H (s) = s^{- 1} {(2 + g) s + f \cdot s^{τ} (s - 1) - 2}

(19)

where

{\begin{matrix} g = \frac{1 - p}{2 τ (1 - p) + 1} ({\bar{λ}}_{i} ω^{*}) ({\bar{λ}}_{i} ω^{*} - 2) \\ f = 3 τ^{2} \frac{1 - p}{2 τ (1 - p) + 1} {\bar{λ}}_{i} ω^{*} \end{matrix}

(20)

Let

h (s) : = (2 + g) s + f \cdot s^{τ} (s - 1) - 2

(21)

The zero point of $H (s)$ is also the zero point of $h (s)$ when $s > 1$ . Since we have previously proved that $H (s)$ has a unique zero in the region of $s > 1$ , $h (s)$ also has a unique zero.

Combining these with the analysis of $H (s)$ , we can conclude that the larger the zero point of $h (s)$ in the region of $s > 1$ is, the faster the algorithm converges.

Impact of time delay and packet loss rate

From equation (21), we know that $h (s)$ is related to packet loss rate $p$ and delay $τ$ . In this subsection, we will analyze the effects of these parameters on the convergence speed of the networked system.

We first investigate the impact of packet loss rate.

Let $ω : = {\bar{λ}}_{i} ω^{*}$ and $y : = s$ . We know that the zero point of $h (y)$ is an implicit function based on $p$ , $τ$ and $ω$ . To study the impact of packet loss rate on the convergence speed, let $τ$ and $ω$ be constants. The relationship between $y$ and $p$ can be analyzed based on the existence theorem of the implicit function, thus

y' (p) = - \frac{h_{p} (p, y)}{h_{y} (p, y)}

(22)

where

h_{y} (p, y) = 2 + g + f [τ \cdot y^{τ - 1} (y - 1) + y^{τ}]

(23)

Since $ω \in (0, 2)$ , $ω (ω - 2) \in (- 1, 0)$ and $((1 - p) / (2 τ (1 - p) + 1)) \in (0, 1 - p]$ , there exists $g \in (p - 1, 0)$ and $2 + g > 0$ . Besides, $f > 0$ , $y > 1$ and $f [τ \cdot y^{τ - 1} (y - 1) + y^{τ}] > 0$ , which gives $h_{y} (p, y) > 0$ .

We can get the partial derivative of $h (p, y)$ w.r.t. the packet loss rate as

\begin{matrix} h_{p} (p, y) = \\ [y ω (ω - 2) + 3 τ^{2} ω y^{τ} (y - 1)] {- \frac{1}{{[2 τ (1 - p) + 1]}^{2}}} . \end{matrix}

(24)

We also have

\begin{matrix} h (p, y) = \\ [y ω (ω - 2) + 3 τ^{2} ω y^{τ} (y - 1)] \frac{(1 - p)}{2 τ (1 - p)} + 2 y - 2 \\ = 0 . \end{matrix}

(25)

Since $y > 0$ ,

y ω (ω - 2) + 3 τ^{2} ω y^{τ} (y - 1) < 0

(26)

which means that $h_{p} (p, y) > 0$ holds.

Considering these, we can prove that $y' (p) < 0$ , that is, the zero point of $h (y)$ decreases with the increase of packet loss rate.

The impact of delay analysis is the same as the packet loss rate. To do that, we first need to analyze the positive and negative of $y' (τ)$ as

y' (τ) = - \frac{h_{τ} (τ, y)}{h_{y} (τ, y)}

(27)

where the partial derivative of $h (τ, y)$ w.r.t. $τ$ is

h_{τ} (τ, y) = y \frac{dg}{d τ} + f (y - 1) y^{τ} \ln y + \frac{df}{d τ} (y - 1) y^{τ}

(28)

and based on equation (20), we can get

{\begin{matrix} \frac{dg}{d τ} = ω (ω - 2) \frac{- 2 {(1 - p)}^{2}}{{[2 τ (1 - p) + 1]}^{2}} > 0 \\ \frac{df}{d τ} = \frac{6 τ^{2} ω {(1 - p)}^{2} + 6 τ \cdot ω (1 - p)}{{[2 τ (1 - p) + 1]}^{2}} > 0 \end{matrix}

(29)

As a result, we have $h_{τ} (τ, y) > 0$ and $y' (τ) < 0$ . To sum up, a remark is concluded as follows.

Lemma 3

The convergence speed of the consensus algorithm decreases with the increase of time delay or packet loss rate.

Convergence optimization

Assume that the maximum time delay is $τ_{max}$ , the data arrival rate in each round of communication follows a Poisson distribution, and the neighbor nodes send $d$ messages on average at a certain time. Therefore, the average number of messages received in a round of communication is $d / τ_{max}$ . Then, the parameters of Poisson distribution is $λ = (d / τ_{max})$ . When the cache is set to $τ$ , there exists a probability distribution, which is

P [N (τ) = n] = \frac{{(λ τ)}^{n} e^{- λ τ}}{n!},

(30)

and the expected number of received messages is $τ (d / τ_{max})$ . Thus, the packet loss rate caused by queuing or caching is

p_{qu} = \frac{1}{d} (d - τ \frac{d}{τ_{max}}) = 1 - \frac{τ}{τ_{max}}

(31)

and the total packet loss rate is

\begin{matrix} p = 1 - (1 - p_{ch}) (1 - p_{qu}) \\ = 1 - (1 - p_{ch}) \frac{τ}{τ_{max}} \end{matrix}

(32)

where $p_{ch}$ is the packet loss rate caused by channel noise.

Introducing $p$ into $h (y)$ , we can get

h (y) = (2 + g) y + f \cdot y^{τ} (y - 1) - 2

(33)

where

{\begin{matrix} g = \frac{\frac{τ}{τ_{max}} (1 - p_{ch})}{\frac{2 τ^{2}}{τ_{max}} (1 - p_{ch}) + 1} ω (ω - 2) \\ f = \frac{3 τ^{3} ω}{τ_{max}} \frac{(1 - p_{ch})}{\frac{2 τ^{2}}{τ_{max}} (1 - p_{ch}) + 1} \end{matrix}

(34)

It is necessary to analyze the positive and negative of $y' (τ)$ according to the existence theorem of the implicit function as

y' (τ) = - \frac{h_{τ} (τ, y)}{h_{y} (τ, y)}

(35)

By calculating $h_{y} (τ, y)$ and $h_{τ} (τ, y)$ , respectively, we can get

h_{y} (τ, y) = 2 + g + f [τ y^{τ - 1} (y - 1) + y^{τ}] > 0

(36)

and

h_{τ} (τ, y) = y \frac{dg}{d τ} + f y^{τ} (y - 1) \ln y + \frac{df}{d τ} y^{τ} (y - 1)

(37)

When $p$ is incorporated into $y' (τ)$ , the positive and negative of $y' (τ)$ cannot be analyzed. When $τ = 0$ , the packet loss rate is 1, and the convergence speed is 0. When $τ$ is increasing, it can effectively reduce the packet loss rate, and thus improve the convergence speed. When $τ \to τ_{max}$ , the additional packet loss rate caused by caching (or queuing) will infinitely approach 0. The convergence speed decreases with the increase of $τ$ .

The extreme point of $y (τ)$ can be obtained by solving the following equations

{\begin{matrix} (2 + g) y + f \cdot y^{τ} (y - 1) - 2 = 0 \\ y \frac{dg}{d τ} + f \cdot y^{τ} (y - 1) \ln y + \frac{df}{d τ} y^{τ} (y - 1) = 0 \end{matrix}

(38)

and the solution is

τ = \frac{τ_{max}}{2} (p_{ch} + \sqrt{{p_{ch}}^{2} + \frac{4 a}{τ_{max}}})

(39)

where

\begin{matrix} a = 24 ω + 2 ω^{2} (ω - 2) + 2 ω \sqrt{384 ω^{7} + 2304 ω^{6} + 5328 ω^{5} + 6081 ω^{4} + 3292 ω^{3} - 548 ω^{2} + 464 ω + 144} \end{matrix}

(40)

Simulation results

RWP (Random Waypoint) mobility model is selected to simulate the movement trajectories and information exchanges of nodes moving within a circular area with a radius of 10 m. Assume that the initial sate values $x_{i} \in R^{1 \times 1}$ on each node $i$ are uniformly distributed between 0 and 1. Table 2 provides the simulation parameters.

Table 2.

Simulation scenario settings.

Simulation parameters	Values
Mobile network model	RWP
Node communication radius	2 m
Number of nodes	40
Moving speed	[0 m/s, 10 m/s]
Halt time	[2 s, 6 s]
Control gain coefficient $ω$	1
Maximum delay $τ_{max}$	400
Packet loss by channel noises	0.01

Figure 3(a)–(d) illustrates the variations of node states for $τ = 4$ , $τ = 200$ , $τ = 400$ and $τ = 310$ , respectively. The x-axis represents the communication rounds, and the y-axis represents the state values of each node. It can be seen from Figure 3(a) that our algorithm converges just after 800 rounds, which is faster than 1000 rounds in Figure 3(b) and over 1000 rounds in Figure 3(c). It reveals that the convergence speed of consensus algorithm would decline as time delay increases.

Figure 3.

The comparison of different converge speeds w.r.t. various delay settings: (a) delay 4, (b) delay 200, (c) delay 400, and (d) delay 310.

Using equation (39), we can calculate that the optimal delay setting should be 310, as shown in Figure 3(d). It can be observed evidently from this subfigure that the convergence process starts with 400 rounds, which is much faster than both of the results above when $τ < 300$ and $τ > 300$ respectively.

Figure 4 reflects the impact of packet loss rate on the convergence speed. Therein, the x-axis represents the communication rounds, and the y-axis represents the state values of each node. Figure 4(a)–(c) illustrates the variations of node states for $p_{ch} = 0$ , $p_{ch} = 0.1$ and $p_{ch} = 0.3$ when $τ = 310$ , respectively. It can be observed from Figure 4(a) that our algorithm converges just after 200 rounds, which is much faster than that in Figure 4(b) and (c) in the case of the same delay. This indicates that the convergence speed of consensus algorithm would decline as packet loss rate increases. Therefore, packet loss rate plays a more significant role in convergence speed in comparison with time delay.

Figure 4.

The comparison between different converge speeds w.r.t. various channel packet loss rate settings: (a) P_ch = 0,(b) P_ch = 0.1, and (c) P_ch = 0.3.

Conclusion

It can be seen that both time delay and packet loss rate on each link are allowed to be non-identical even time-varying under our control law by employing different quantized caching policies for different nodes. FL parameter aggregation can also be achieved in an asynchronous manner by caching some messages on a node for a period of time prior to being updated. Besides, the exchange of messages between neighbor nodes proceeds by broadcasting under our control law. However, that will probably lead to a bias of convergence result from its accurate value, which is our future work. The consensus conditions deduced analytically revealed that neither time delay nor packet loss rate affect the convergence of the consensus protocol, except its convergence speed. Nevertheless, the union of a sequence of directed network graphs is requested to be able to contain a directed spanning tree. As a result, it is observed that the caching on mobile devices actually plays a critical role in consensus computing. It can be concluded that it is possible to operate in an ad hoc manner for FL participants, although the centralized operation mode cannot be replaced completely.

Footnotes

Appendix 1

Appendix 2

Handling Editor: Peio Lopez Iturri

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China (grant no. 61771354).

ORCID iD

Xin Yan

References

Wahab

Mourad

Otrok

, et al. Federated machine learning: survey, multi-level classification, desirable criteria and future directions in communication and networking systems. IEEE Commun Surv Tutor 2021; 23(2): 1342–1397.

Lim

WYB

Luong

Hoang

, et al. Federated learning in mobile edge networks: a comprehensive survey. IEEE Commun Surv Tutor 2020; 22(3): 2031–2063.

Pan

, et al. Federated learning in vehicular edge computing: a selective model aggregation approach. IEEE Access 2020; 8: 23920–23935.

Niknam

Dhillon

Reed

JH.

Federated learning for wireless communications: motivation, opportunities, and challenges. IEEE Commun Mag 2020; 58(6): 46–51.

Liu

Yang

Zhao

, et al. Intelligent signal classification in industrial distributed wireless sensor networks based industrial internet of things. IEEE Trans Ind Inform 2021; 17(7): 4946–4956.

Liu

Garg

Nie

, et al. Deep anomaly detection for time-series data in industrial IoT: a communication-efficient on-device federated learning approach. IEEE Intern Things J 2021; 8(8): 6348–6358.

Zhang

Hanzo

Federated learning assisted multi-UAV networks. IEEE Trans Veh Technol 2020; 69(11): 14104–14109.

Bao

Guleng

, et al. Edge computing-based joint client selection and networking scheme for federated learning in vehicular IoT. Chin Commun 2021; 18(6): 39–52.

Sun

, et al. Toward communication-efficient federated learning in the internet of things with edge computing. IEEE Intern Things J 2020; 7(11): 11053–11067.

10.

Huang

Zhang

, et al. Blockchain empowered asynchronous federated learning for secure data sharing in internet of vehicles. IEEE Trans Veh Technol 2020; 69(4): 4298–4311.

11.

Ren

Cao

Distributed coordination of multi-agent networks: emergent problems, models, and issues. London: Springer-Verlag, 2011.

12.

Vecchio

Amendola

Ducange

An integrated topology control framework to accelerate consensus in broadcast wireless sensor networks. IEEE Trans Wirel Commun 2018; 17(11): 7472–7485.

13.

Olfati-Saber

Murray

RM.

Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans Autom Contr 2004; 49(9): 1520–1533.

14.

Savino

dos Santos

CRP

Souza

, et al. Conditions for consensus of multi-agent systems with time-delays and uncertain switching topology. IEEE Trans Ind Electron 2016; 63(2): 1258–1267.

15.

Wang

Song

, et al. Consensus conditions for multi-agent systems under delayed information. IEEE Trans Circ Syst II Exp Brief 2018; 65(11): 1773–1777.

16.

Zheng

Wang

Consensus of switched multiagent systems. IEEE Trans Circ Syst II Exp Brief 2016; 63(3): 314–318.

17.

Kar

Moura

JMF

. Distributed consensus algorithms in sensor networks with imperfect communication: link failures and channel noise. IEEE Trans Signal Process 2009; 57(1): 355–369.

18.

Zong

Yin

, et al. Stochastic consentability of linear systems with time delays and multiplicative noises. IEEE Trans Autom Contr 2018; 63(4): 1059–1074.

19.

Zheng

Xie

, et al. Consensusability of discrete-time multiagent systems with communication delay and packet dropouts. IEEE Trans Autom Contr 2019; 64(3): 1185–1192.

20.

Chen

Wang

Chen

, et al. Critical connectivity and fastest convergence rates of distributed consensus with switching topologies and additive noises. IEEE Trans Autom Contr 2017; 62(12): 6152–6167.

21.

Wang

, et al. Attention-weighted federated deep reinforcement learning for device-to-device assisted heterogeneous collaborative edge caching. IEEE J Select Area Commun 2021; 39(1): 154–169.

22.

Min

, et al. Mobility-aware proactive edge caching for connected vehicles using federated learning. IEEE Trans Intell Transp Syst 2021; 22(8): 5341–5351.

23.

Zheng

Xiao

, et al. Mean square consensus of multi-agent systems over fading networks with directed graphs. Automatica 2018; 95: 503–510.

24.

Yang

Blum

RS.

Broadcast-based consensus with non-zero-mean stochastic perturbations. IEEE Trans Inform Theory 2013; 59(6): 3971–3989.

25.

Zong

Zhang

Consensus control of discrete-time multiagent systems with time delays and multiplicative measurement noises. Sci Sin Math 2016; 46(10): 1617–1636.

26.

Mao

Kloeden

PE.

Discrete Razumikhin-type technique and stability of the Euler–Maruyama method to stochastic functional differential equations. Discr Cont Dyn Syst 2013; 33(2): 885–903.

Distributed consensus problem with caching on federated learning framework

Abstract

Keywords

Introduction

Related work

Problem formulation

Algebraic graph theory

Distributed consensus problem

Consensus protocol

Consensus conditions

Lemma 1

Lemma 2

Quantized caching

Convergence analysis

Convergence speed

Impact of time delay and packet loss rate

Lemma 3

Convergence optimization

Simulation results

Conclusion

Footnotes

Appendix 1

Appendix 2

Declaration of conflicting interests

Funding

ORCID iD

References