Abstract
Federated learning framework facilitates more applications of deep learning algorithms on the existing network architectures, where the model parameters are aggregated in a centralized manner. However, some of federated learning participants are often inaccessible, such as in a power shortage or dormant state. That will force us to explore the possibility that the parameter aggregation is operated in an ad hoc manner, which is based on consensus computing. On the contrary, since caching mechanism is indispensable to any federated learning mobile node, it is necessary to investigate the connection between it and consensus computing. In this article, we first propose a novel federated learning paradigm, which supports an ad hoc operation mode for federated learning participants. Second, a discrete-time dynamic equation and its control law are formulated to satisfy the demands from federated learning framework, with a quantized caching scheme designed to mask the uncertainties from both asynchronous updates and measurement noises. Then, the consensus conditions and the convergence of the consensus protocol are deduced analytically, and a quantized caching strategy to optimize the convergence speed is provided. Our major contribution is to give the basic theories of distributed consensus problem for federated learning framework, and the theoretical results are validated by numerical simulations.
Introduction
Federated learning (FL) is an emerging and promising decentralized machine learning approach that performs a collaborative training of models with the local datasets on various mobile devices, with the local model updates being sent to a server to aggregate and the updated global model being fed back for the next round of local training, instead of transmitting the raw data to a data center. Hence, training models are shared and the privacy of local datasets is preserved, while the communication cost can be reduced greatly. 1 Given these compelling benefits, a rapidly increasing research attention has been dedicated to applying FL in the field of wireless communications to support more intelligent, more convenient and applicable applications, 2 for example, an image classification task for vehicular edge computing, 3 a content popularity prediction for augmented reality (AR) applications, 4 a signal classification or a deep anomaly detection in industrial distributed wireless sensor networks, etc.5,6
Recent 2 years have begun to witness an increasing interest in studying how to employ FL on mobile ad hoc networks (MANET) or multi-agent systems (MAS), such as unmanned aerial vehicles (UAV), 7 and vehicular Internet of Things (IoT). 8 Nevertheless, these mobile devices acting as FL clients are designed to directly communicate with an FL server (or a cluster center that plays a role in relaying), rather than an ad hoc operation mode. This situation is also present when FL is applied to wireless sensor networks.5,6 The distributed consensus problem is the theoretical basis for supporting FL parameter synchronization (namely, model updates’ aggregation) in the context of ad hoc operation mode.
The distributed consensus problem (aka
What are the benefits brought by the ad hoc operation mode for FL framework? (1) Either FL server or cluster center may be inaccessible in some scenarios, such as in a power shortage or device dormant state, (2) when a large number of FL clients send their requests to link an FL server or a cluster center, it may be overwhelmed due to its limited capability, (3) this operation mode enables short-range communications such as ultra-wideband by which a large channel capacity can be obtained for local model updates’ transmissions that are capacity-consuming, (4) in this mode, FL parameter synchronization is enabled by mobile devices in a coordinated manner in the absence of FL server or cluster center, and (5) the asynchronous transmissions of local model updates are allowed to a considerable extent in this operation mode.
In view of the above reasons, a novel FL paradigm is proposed at the beginning of this article, as shown in Figure 1. It is obvious that the most salient challenge for this paradigm should be the communication overhead for parameter synchronization. Thanks to the

The proposed FL paradigm supports mobile devices in an ad hoc operation mode.
In a word, to the best of our knowledge, as of now the study of the distributed consensus problem over FL framework has not been found yet. That is our motivation of exploring this issue.
Related work
As mentioned above, until now we have not seen an FL framework whose mobile clients operate in an ad hoc manner yet. The existing studies of FL over MANET (or MAS) are concentrated on communication cost reduction, FL client selection, data privacy and security, so on.
Zhang and Hanzo 7 proposed an FL-aided multi-UAV system to conduct classification tasks for exploration scenarios, where each of UAV is coordinated by a ground fusion center as FL server to form a cooperative network. An algorithm of weighted zero-forcing precoding is used by each of UAV to mitigate the interference to the FL server. Bao et al. 8 proposed an edge computing-based joint client selection and networking scheme for vehicular IoT, where some of vehicles are assigned to act as both edge nodes (aka cluster centers) and FL clients via a distributed approach. The selected clients play a role of forwarders between common vehicles and FL server. Lu et al. 10 employed an FL architecture empowered by blockchain to address data privacy concerns on Internet of Vehicles, where the security of shared data is guaranteed by integrating learning parameters into a blockchain. Regarding the aforementioned sparsification technology, Sun et al. 9 presented a general gradient sparsification framework as another way to reduce communication cost for FL parameter synchronization on IoT, where validation data sets are maintained with top-1 accuracy when 99.9% gradients are sparsified. As for the FL applications in distributed sensor networks, Liu et al. 5 used an FL paradigm to fuse the learning process and recognition results of each sensor node for the modulation recognition of wireless signals. Liu et al. 6 proposed an on-device FL-based deep anomaly detection framework for sensing time-series data. Both of the FL frameworks require parameter aggregators as FL servers, instead of an ad hoc operation mode.
To date, the distributed consensus problem of perfect models, which are assumed that each agent (or node) can obtain its neighbor information timely and precisely, instantaneous transmissions, perfect clock synchronization, concurrent updates, identical agent dynamics, even fixed network topologies, has reached a reasonable degree of maturity.
11
Nonetheless, wireless networked systems in practical applications often operate in uncertain communication environments and are inevitably subjected to communication latency, asynchronous clock and updates, agents’ heterogeneity, topological dynamics, as well as measurement noises (including additive and multiplicative noises). Then what are the
Although the existing distributed consensus algorithms explore some of four aspects mentioned above, to the best of our knowledge, an algorithm that can fulfill all of requests has not been seen yet. Moreover, the caching issue along with consensus computing has not been received any concerns up to now. Olfati-Saber and Murray 13 provided the consensus protocols and their convergence analysis for directed balanced networks with constant time-delays by introducing disagreement functions, while a direct connection between the algebraic connectivity of a graph and the convergence of a linear consensus protocol is established. Savino et al. 14 contributed a sufficient condition of consensus for discrete-time switching networks, based on linear matrix inequalities that consider the joint effect of time-varying delays and topological uncertainty. Under an assumption that delay is time-varying and undirected network is connected, Wang et al. 15 derived the conditions to guarantee consensus for continuous-time multi-agent systems. For the consensus problem of a switched multi-agent system composed of continuous-time and discrete-time subsystems, Zheng and Wang 16 proposed a linear consensus protocol and proved that this consensus problem is solvable under arbitrary switching with undirected connected graph, directed graph, and switching topologies, respectively. Kar and Moura 17 studies the distributed average consensus with intermittent topologies and noisy channels in sensor networks, which leads to a bias-variance dilemma, that is, running consensus for long reduces the bias of the final average estimate but increases its variance, and presented two versions of consensus compromise to this tradeoff. Zong et al. 18 investigated the stochastic consensus conditions of linear MAS with fixed time-delays and stochastic multiplicative noises. First, the stochastic stability for stochastic differential delay equations driven by multiplicative noises is examined. Then, sufficient conditions are deduced for the mean-square and a. s. consensus. Zheng et al. 19 studied the mean-square consensus problem of discrete-time linear MAS over directed networks with constant delay and non-identical packet dropouts. Sufficient consensus conditions are obtained in terms of delay, packet dropout rates, network topology and agent dynamics. On the basis of a first-order average-consensus protocol with switching networks and additive noises, Chen et al. 20 gave a quantitative description of relation between convergence speed and connectivity of topologies by using stochastic approximation methods and establishing a critical consensus condition for network topologies.
In short, inspired by the fact that caching plays a critical role in FL operations,21,22 we will investigate the connection between caching and consensus computing, while discussing the condition for reaching consensus.
Problem formulation
Algebraic graph theory
A MANET (or MAS) is described as a sequence of weighted digraphs
where
Suppose that the probability that there exists a link
where
In addition, let
Distributed consensus problem
Each node (or agent) in a networked system can be described by a discrete-time dynamic equation, that is
where
In view of the results,
23
we can assume that all eigenvalues of
Considering both delay and packet loss,
where
It is said this networked system can reach a mean-square consensus if there exists a control gain
where
Consensus protocol
In this section, we will derive a criterion to evaluate the consensus of the networked system and present the consensus protocol.
Consensus conditions
For a networked system with delay and packet loss, the update equation is
Lemma 1
For equation (7), as long as there exists a positive-definite matrix
where
We employ the control gain
where
The consensus error is defined as
where
Since
We can see that equation (6) is also equal
To get the further study of the topological consensus conditions, we can define Lemma 2. It is difficult to form a balanced digraph (or a balanced joint digraph) for broadcast-based networks. That is why a bias of convergence result from its accurate value occurs sometimes, which is questioned. 24
Lemma 2
If the network is a balanced digraph, the final convergence value of it is the mean value of its initial value.
The proof is given in Appendix 2.
Quantized caching
Due to the delay occurring when sending messages between nodes, the asynchronous problem is inevitable. Hence, it is necessary to consider the asynchronous problem of communications. We use a quantized caching mechanism to mask the uncertainties from asynchronous updates and varying delays. As shown in Figure 2, node

Illustration of quantized caching on the FL participants.
The algorithm given in Table 1 is used to simulate the consensus process of the networked system.
Consensus algorithm.
We give the complexity of message overhead of this consensus algorithm prior to its convergence analysis. Assume that the number of messages sent by all nodes over an entire network is
Convergence analysis
In this section, we will discuss the impact of time-delay and packet loss on the convergence speed of the network.
Convergence speed
Inequality equation (66) in Appendix 1 indicates that the convergence speed is determined by
Based on equation (13), we have
where
According to equation (63) in Appendix 1, we get
where
Lemma 1 indicates that it is necessary to satisfy
By introducing the control gain
and
Since
When each node state is a one-dimensional vector,
where
Let
The zero point of
Combining these with the analysis of
Impact of time delay and packet loss rate
From equation (21), we know that
We first investigate the impact of packet loss rate.
Let
where
Since
We can get the partial derivative of
We also have
Since
which means that
Considering these, we can prove that
The impact of delay analysis is the same as the packet loss rate. To do that, we first need to analyze the positive and negative of
where the partial derivative of
and based on equation (20), we can get
As a result, we have
Lemma 3
The convergence speed of the consensus algorithm decreases with the increase of time delay or packet loss rate.
Convergence optimization
Assume that the maximum time delay is
and the expected number of received messages is
and the total packet loss rate is
where
Introducing
where
It is necessary to analyze the positive and negative of
By calculating
and
When
The extreme point of
and the solution is
where
Simulation results
RWP (Random Waypoint) mobility model is selected to simulate the movement trajectories and information exchanges of nodes moving within a circular area with a radius of 10 m. Assume that the initial sate values
Simulation scenario settings.
Figure 3(a)–(d) illustrates the variations of node states for

The comparison of different converge speeds w.r.t. various delay settings: (a) delay 4, (b) delay 200, (c) delay 400, and (d) delay 310.
Using equation (39), we can calculate that the optimal delay setting should be 310, as shown in Figure 3(d). It can be observed evidently from this subfigure that the convergence process starts with 400 rounds, which is much faster than both of the results above when
Figure 4 reflects the impact of packet loss rate on the convergence speed. Therein, the

The comparison between different converge speeds w.r.t. various channel packet loss rate settings: (a)
Conclusion
It can be seen that both time delay and packet loss rate on each link are allowed to be non-identical even time-varying under our control law by employing different quantized caching policies for different nodes. FL parameter aggregation can also be achieved in an asynchronous manner by caching some messages on a node for a period of time prior to being updated. Besides, the exchange of messages between neighbor nodes proceeds by broadcasting under our control law. However, that will probably lead to a bias of convergence result from its accurate value, which is our future work. The consensus conditions deduced analytically revealed that neither time delay nor packet loss rate affect the convergence of the consensus protocol, except its convergence speed. Nevertheless, the union of a sequence of directed network graphs is requested to be able to contain a directed spanning tree. As a result, it is observed that the caching on mobile devices actually plays a critical role in consensus computing. It can be concluded that it is possible to operate in an ad hoc manner for FL participants, although the centralized operation mode cannot be replaced completely.
Footnotes
Appendix 1
Appendix 2
Handling Editor: Peio Lopez Iturri
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China (grant no. 61771354).
