Sage Journals: Discover world-class research

Abstract

In social learning models, truth-seeking agents learn both individually from direct evidence and socially by pooling beliefs with others. That learning can be undermined by two types of unreliable agents: zealots, who do not learn and promote the same fixed opinion and free riders, who lack access to evidence yet still influence others. In this paper, we explore how learning rules that incorporate memory can mitigate the effects of unreliable agents. To do so, we construct an agent-based model of social learning in which agents apply a probabilistic bounded confidence (BC) model that evaluates the similarity between themselves and others based on samples of recent beliefs rather than current beliefs only. When compared to a memoryless BC benchmark, BC with memory proves significantly less sensitive to the choice of similarity threshold governing agent interactions, to the extent that a fixed threshold is effective for avoiding all types of zealots. It is also less susceptible to high levels of distrust in evidence. The BC with memory model is then extended to social learning about multiple hypotheses, and we show that the robustness results generalise to the case in which beliefs are multi-dimensional probability distributions.

Keywords

social learning bounded confidence Kolmogorov-Smirnov test memory unreliable agents

Introduction and background

Large interconnected populations can exploit social learning to share limited information, to compensate for noise, and to reach consensus about the state of the world (Heyes, 1994). For example, in certain species of social insects multiple signals from individuals are aggregated to inform collective decisions about potential nest sites or food sources (Galef and Laland, 2005; Leadbeater and Chittka, 2007). Social learning is also a fundamental part of all human societies and is a key driver of cultural evolution (Boyd et al., 2011). Furthermore, there is a growing interest in its application to the design of swarm robotics and multi-agent systems (Crosscombe et al., 2017; Valentini et al., 2017; Ndousse et al., 2021).

Social learning is often studied in terms of the interaction between two processes: individual learning in the form of updating from direct evidence and social interactions in which beliefs are fused or pooled. This dual process model has been investigated in a variety of different settings including economics (Jadbabaie et al., 2012), social epistemology (Douven and Kelp, 2011), and multi-agent AI systems (Crosscombe and Lawry, 2016; Lee et al., 2018). Ideally the complementarity of these two processes facilitates the collection and efficient propagation of information, driving population wide consensus, and which in turn can provide a mechanism for error correction in the presence of evidential noise. Here, we focus on a model in which agents’ beliefs are probability distributions and where both processes have natural probabilistic interpretations.

Social learning often leads to useful emergent population level effects such as consensus formation which can be exploited to provide solutions that are robust to environmental noise by harnessing multiple, locally efficient, computations and interactions. However, this can also make populations vulnerable to the presence of certain kinds of unreliable individuals since erroneous information can spread throughout the population resulting in convergence to false beliefs or fragmentation of the population into polarised groups (O’Connor and Weatherall, 2019). For example, recent work on opinion dynamics and swarm robotics has explored the effects of zealots on social learning populations (Karan et al., 2018; Mobilia, 2003). These are stubborn individuals who consistently promote a fixed biased opinion, neither learning from evidence or adapting to the beliefs of their peers. A number of ways have been investigated to mitigate the effects of such agents in collective systems. This includes constraining the structure of the underlying communication network between agents, introducing heterogeneity into the population, and cross-inhibiting highly conflicting opinions (Antonic et al., 2024; Njougouo et al., 2024; Reina et al., 2023). In this paper, we will focus on another well studied approach referred to as bounded confidence (BC), according to which individual agents judge the reliability of opinions expressed by others in the population based on how similar they are to their own.

The overarching research aim here is to identify local learning and updating rules that can facilitate effective social learning across a population of agents, where some individuals are unreliable. In a strong sense such rules are normative and the research has a range of engineering applications in the areas of multi-agent and multi-robot systems. In this context, social learning typically informs collective decision making, where artificial agents or robots explore an environment whilst receiving sensory data and also pooling information received when communicating with their peers. Based on these two sources of information, the aim of social learning is to enable an accurate consensus to emerge about the true state of the world. For example, best-of-n problems (Parker and Zhang, 2009; Valentini et al., 2017) are a widely studied form of collective decision making inspired by the foraging and nest site hunting behaviour of social insects (Franks et al., 2006). Specifically, in best-of-n there are n distinct options of varying but measurable quality. The social learning task is then for the population to reach consensus about which of these options has the highest quality. This challenge has also been extended to that of collectively ranking the n options by quality rather than just identifying the best (Crosscombe and Lawry, 2021; Lawry, 2024a). Other applications of social learning in robotics include anomaly detection (Madin et al., 2024) and collective mapping and classification tasks in which agents must classify different regions of the environment based on local sensing and pooled information (Shan and Mostaghim, 2024).

This normative perspective may also help to shed light on social learning as it occurs in biological systems and human societies. In particular, the advent of social media and the subsequent emergence of large and highly connected social networks has emphasised the problems of misinformation and polarisation in complex systems of this kind (Azzimonti and Fernandes, 2023; Denniss and Lindberg, 2025; Douven and Hegselmann, 2021; O’Connor and Weatherall, 2019). These problems have been widely studied in opinion dynamics where, for example, bounded confidence has been proposed as a driving mechanism that can result in the population converging to multiple different opinions (Bernardo et al., 2024). In light of this, one might ask why bounded confidence would ever arise as a collective behaviour in the first place. A possible reason is that restricting interactions to others with similar beliefs, can help an agent to avoid being misled by unreliable agents of various kinds (Douven and Hegselmann, 2021; Hegselmann and Krause, 2015; Karan et al., 2018; Lawry, 2024b; Mobilia, 2003). However, while bounded confidence can be an effective tool in this context its efficacy can be very sensitive to parameter choice when evaluating similarities (Douven and Hegselmann, 2021) and it can be adversely affected by other factors such as a high level of scepticism in evidence (Lawry, 2024). In this paper we will argue that many of these limitations can be avoided if similarity judgements incorporate memory. Indeed there are a number of studies that emphasise the role of memory when evaluating trustworthiness in human social interactions. For example, Chang et al. (2010) show that in iterated trust games people update their trust decisions based on past interactions with a partner. In addition, in Lee et al. (2016), studies show that adolescents use social information to keep track of others reliable or unreliable behaviour over repeated interactions, adjusting trust decisions accordingly.

In this paper, we extend the model proposed in Lawry (2024), so that agents base their confidence judgements on a sample of recent beliefs rather than just on the current beliefs of each agent with whom they are in contact. This limited exploitation of memory will be shown to greatly enhance the robustness of bounded confidence as a tool for social learning in the presence of unreliable agents. In particular, we will show that the memory-based BC model is much less sensitive to the value of the similarity threshold, and also much less influenced by the levels of evidential mistrust in the population, than memoryless BC. This can help to explain why bounded confidence might be adopted as a robust heuristic to mitigate the effect of unreliable agents in scenarios when individuals can to some extent track beliefs over time. As such it has potential applications in collective decision making where artificial agents are susceptible to malfunction or even malicious outside influence. Furthermore, perhaps emphasising that our reliability judgements about others should be well-informed and made over a period of time, rather than being snap judgements based on latest pronouncements, would go some way to reducing our vulnerability to fake news or to unreliable actors promoting a particular agenda.

As noted above, the original motivation for bounded confidence in opinion dynamics was as part of an explanatory model of how polarised opinions can naturally emerge in a population as individuals repeatedly aggregate their beliefs (Deffuant et al., 2000; Hegselmann and Krause, 2002). In these models, beliefs are represented by real numbers and aggregation is linear. In this setting, bounded confidence is implemented so that each agent evaluates the absolute difference between their belief and that of any one of their peers, judging them to be reliable if that difference value is less than a given fixed threshold. In the absence of any external evidence, and assuming random initial beliefs, then for particular threshold values, the population converges to a number of different belief values, that is, a form of polarisation. Furthermore, the incorporation of truth-seeking behaviours where agents also receive direct evidence about the state of the worlds captures a form of social learning (Hegselmann and Krause, 2006). In this case, for suitable threshold values, BC improves learning in the presence of zealots and also free riders, these being agents who while learning from others do not themselves receive evidence (Douven and Hegselmann, 2021). In contrast to these models where both aggregation and evidential updating are linear, Lawry (2024) introduces an explicitly probabilistic bounded confidence model in which evidential updating is Bayesian and probabilities are aggregated using a log-linear pooling operator. In this setting, bounded confidence can mitigate the effect of zealots, but its effectiveness is reduced in scenarios where agents are highly sceptical about the evidence they receive.

The majority of BC models are memoryless in the sense that confidence judgements are always made only on the basis of agents’ current beliefs. This means, however, that information about the way in which an agent’s beliefs have changed over time is ignored. Such information is nonetheless likely to be highly relevant when attempting to identify certain kinds of unreliable agents. For example, reliable learning agents should be regularly updating their beliefs as new evidence becomes available in contrast to stubborn zealots who will rarely if ever deviate from their established opinions. Existing studies have tended to focus on incorporating memory into BC models by making current judgements of trustworthiness to some extent contingent on similar past judgements (Mariano et al., 2020; Zhang et al., 2018). For example, Giola et al. (2020) propose an opinion dynamics model which takes into account the recent history of agreement or disagreement between individuals such that agents who have agreed in the past will be more likely to also agree also in the future. From a slightly different perspective, Jędrzejewski and Sznajd-Weron, 2018 model the effect of memory on the choice between conformity and independence in opinion formation. In particular, they consider an adaptation of the q-voter model in which conformists choose their opinion so as to conform with a set of q social neighbours if they are unanimous, while independent voters choose their opinion independently of others. The probability of an agent acting as a conformist then depends on the maximum utility they have received in the past while acting as a conformist relative to the maximum utility they have received while acting independently. Furthermore, in terms of performance, Becchetti et al. (2023) show that memoryless opinion dynamics have strict constraints on convergence times but then they present empirical results which suggest that even limited memory can significantly improve convergence time.

There are then also opinion dynamics models that implicitly assume a type of memory in their formulation. The voter model is a popular binary opinion model in which agents choose whether or not to switch between opinions based on the votes of their social network neighbours. A variant of the voting model requiring a simple form of memory has been proposed by Stark et al. (2008) in which the longer an agent has been in one belief state, the slower it is to transition to the other state. Somewhat counterintuitively, this gradual reduction in transition rates at the agent level can result in faster convergence at the population level. The Deffuant-Weisbuch (DW) model is another well studied model of opinion dynamics where beliefs are real numbers and aggregation is linear. In Lorenz and Urbig (2007), different communication strategies in the DW model are investigated. For example, a balancing strategy involves agents who have recently communicated with individuals with high belief values, actively seeking out individuals with low values. Alternatively, curious agents try to identify other individuals whose beliefs are changing in the same direction as their own. Both strategies involve an element of memory since agents must record trajectories of recent beliefs in order to implement them.

The research in this paper differs from that overviewed above in several clear respects. Primarily, while a variety of models of memory in opinion dynamics have been described, none of them are explicitly and directly linked to the similarity measurements underpinning BC. Here, we propose a model in which memory is inherent to such judgements, and this will turn out to fundamentally improve the robustness and hence applicability of BC. Furthermore, in most BC models, agents’ beliefs are either represented as binary states, for example, in the voting model, or as generic real numbers, for example, in the HK or DW models. Instead, here we frame social learning in terms of subjective probabilities applying, what are in this setting, established updating and pooling operations to provide a natural treatment of truth-seeking agents engaged in collective behaviour.

The overarching contribution of the paper is to demonstrate that incorporating memory into a bounded confidence model of social learning can significantly improve its effectiveness as a tool for avoiding unreliable agents of at least two different types. In particular, we introduce a probabilistic model of bounded confidence social learning in which agents evaluate their similarity to others based on samples of recent beliefs rather than just by measuring the similarity between current beliefs. We show that using memory in this way significantly increases the robustness of bounded confidence to the extent that a single value of the similarity threshold constraining agent interactions is effective for all types of zealots. It also makes social learning of this kind much less susceptible to high levels of population wide distrust in evidence than memoryless BC. These factors mean that integrating memory into BC increases its potential applicability in a variety of collective decision making scenarios.

More specifically, the paper makes the following technical contributions as follows: (1) we introduce a measure of similarity between sets of beliefs exploiting an established statistical test; (2) we present an agent-based model of probabilistic social learning in the presence of zealots incorporating memory and compare it to a memoryless benchmark; (3) we extend this binary model to multi-hypotheses by proposing a mapping from high-dimensional beliefs to real values and show that BC with memory continues to outperform the benchmark; (4) free rider agents are introduced into the population which further differentiates between the memory-based and memoryless approaches.

The next section introduces the core components of a probabilistic model of social learning based on the two processes of log-linear probability pooling and Bayesian evidential updating. We then investigate a binary (two hypotheses) agent-based model of BC with memory, exploring social learning performance and robustness with regard to the similarity threshold, agent connectivity, and varying levels of evidential distrust and noise, as compared to the memoryless benchmark described in Lawry (2024). Building on this model we generalise BC with memory to the multi-hypotheses case by introducing a mapping from the multi-dimensional probability simplex to the real numbers and then evaluating the similarity between these projected values. We then extend the BC with memory model in another direction so as to include free rider agents. Finally, we present some discussion and conclusions as well as identifying possible directions for future research.

Probabilistic social learning

In many BC models, agents’ beliefs are simply taken to be real numbers without any underlying interpretation being specified (Bernardo et al., 2024; Douven and Kelp, 2011; Hegselmann and Krause, 2002, 2006). Whilst having the advantage of generality this makes it difficult to motivate updating and pooling rules as lying within an established theory of reasoning for boundedly rational agents. Bayesian epistemology is a prominent framework capturing rational uncertain reasoning and subjective judgements where beliefs are represented as probabilities. The use of probability to quantify credence in this way also has an established operational justification in terms of betting behaviour (De Finetti, 1937; Ramsey, 1931), and there is a clear procedure for evidential updating based on Bayes theorem. Furthermore, there is a developed literature on probability pooling that can inform formalisations of social learning behaviour. From this perspective, we can gain insight from developing models of social learning and opinion dynamics within a probabilistic setting.

We consider a simple social learning problem, and apply a probabilistic approach along similar lines to that described in Lawry and Lee (2020) and Lee et al. (2018, 2021). More specifically, let H₁, …, H_m denote m mutually exclusive and exhaustive hypotheses, so that the belief of an agent A_i can be characterised by a probability distribution in the form of a vector $x_{i} \in P_{m}$ where $P_{m}$ is the m − 1 dimensional probability simplex and where x_i,j is the probability A_i gives to H_j, that is, $P_{A_{i}} (H_{j}) = x_{i, j}$ , for j = 1, …, m. The general setting here of opinion dynamics on a simplex is similar to that of Lorenz (2008), although our focus is on a Bayesian form of social learning rather than on different structural conditions. Agents learn individually by updating their probability distributions given direct evidence, but also learn socially by pooling their probability distributions with those of their peers. Given a pool of k agents A₁, …, A_k with probability distributions x_i for i = 1, …, k then a pooling operator takes the k probability distributions as inputs and returns a single aggregate probability distribution π(x₁, …, x_k) where $π : P_{m}^{k} \to P_{m}$ . Here, we will focus on the log-linear pooling operator defined as follows:

Definition 1

The Log-Linear Pooling Operator

The pooled probability of H_j for j = 1, …, m is given by:

\begin{matrix} π {(x_{1}, \dots, x_{k})}_{j} = \frac{{(\prod_{i = 1}^{k} x_{i, j})}^{\frac{1}{k}}}{\sum_{j = 1}^{m} {(\prod_{i = 1}^{k} x_{i, j})}^{\frac{1}{k}}} \end{matrix}

The use of the log-linear operator for pooling can be justified in a number of different ways. For example, the supra-Bayesian interpretation imagines an oracle who updates their probability distribution over the hypotheses based only on the probability distributions of the agents in the pool (Genest and Zidek, 1986; Morris, 1974). If the oracle assumes independence between agents, that likelihoods take the form of Dirichlet distributions and that they have a uniform prior on hypotheses, then Bayesian updating results in a posterior distribution consistent with the log-linear operator (Lee et al., 2018). An alternative justification is in terms of information loss resulting from pooling. From this perspective it can be shown that the log-linear pooled distribution x* minimises the average Kullback-Leibler divergence from x* to x_i for i = 1, …, k (Abbas, 2009). Finally, Dietrich and List (2016) adopt an axiomatic approach in which competing pooling operators are assessed based on which of a set of intuitive properties they satisfy. One such property, which is only satisfied by the log-linear operator, requires that if each of the distributions were to be updated given the same evidence and then pooled, the result would be the same as if the distributions were pooled prior to updating and then the pooled probability updated given the evidence.

The log-linear operator also has properties which impact on the way in which agents are able to dynamically learn from others so as to reach a consensus. Most notably it tends to amplify strongly held beliefs which are shared (Lee et al., 2018). More generally, the operator has the property that disagreement between agents increases uncertainty across the pool while agreement decreases it. While the first property is shared with linear pooling, the second its not. In particular, in contrast to linear pooling, log-linear pooling can result in a decrease in average entropy across the pool of agents, ensuring that under certain conditions social learning can lead to an increase in information.

In this model of social learning, agents also learn individually on the basis of evidence they receive directly and which reports one of the hypotheses as being true. On receiving such a piece of evidence E an agent updates their probability distribution according to Bayes theorem as follows (Lee et al., 2018):

Definition 2

Bayesian Evidential Updating

For E ∈ {H₁, , …, H_m} we have that:

\begin{matrix} P_{A} (H_{j} | E) = \frac{P_{A} (E | H_{j}) P_{A} (H_{j})}{\sum_{j = 1}^{m} P_{A} (E | H_{j}) P_{A} (H_{j})} \end{matrix}

In this case the likelihood P_A(E|H_j) captures the agent A’s judgement about the reliability of the evidence E in the case that H_j is the true hypothesis.

To implement Bayesian updating in this context each agent must quantify the reliability of the evidence received so as to evaluate P(E|H_j). To make distinct judgements in each case would be challenging and hence we propose a simplified version of Definition 2 where the likelihood function has the following form:

\begin{matrix} P_{A} (E | H_{j}) = \{\begin{cases} 1 - δ : E = H_{j} \\ \frac{δ}{m - 1} : otherwise \end{cases} \end{matrix}

Here, δ ∈ (0, 0.5] is a parameter quantifying the agent’s general level of distrust of evidence, taken in our model to be constant across all hypotheses and agents. In other words, all agents model the reliability of all evidence sources as follows: In the case that any given hypothesis H_j is true then the probability that a source will accurately report this is 1 − δ, while there is probability δ that the source will erroneously report the wrong hypothesis divided uniformly between those H_i where i ≠ j. For the two hypotheses case, Figure 1 illustrates the effect of evidential updating for different levels of distrust δ, by showing an agent’s posterior probability in H₁ after updating for different prior probability values where the evidence supports H₁, that is, E = H₁. Notice that as δ → 0.5 then updating has a decreasing effect on the prior, that is, P_A(H₁|E) → P_A(H₁). In contrast, as δ → 0 then updating results in total certainty in the supported hypothesis, that is, if E = H₁ then P_A(H₁|E) → 1, except in the case that P_A(H₁) = 0 when the updated probability is undefined in the limit.

Figure 1.

Updated probability of H₁ given evidence E = H₁ plotted against prior probability of H₁ for different distrust values δ.

Bounded confidence with memory

Bounded confidence corresponds to the heuristic that an agent judges the reliability of its peers based on how similar their beliefs are to its own. In most BC models agents evaluate similarity at each time t by comparing their own current belief with the current beliefs of other agents to whom they are connected in their communication network. In probabilistic social learning we can formalise this idea by assuming that agents have a similarity measure $S : P_{m}^{2} \to [0,1]$ for which S(x, y) quantifies the degree of similarity between probability distributions x and y. Given a threshold τ ∈ [0, 1] an agent with probability distribution x would then classify another agent with probability distribution y as being reliable provided S(x, y) > τ. Following Lawry (2024), one possibility is to take similarity to be defined by statistical distance and we use this memoryless bounded confidence model as a benchmark.

Definition 3

Statistical Distance

The statistical distance (sometimes called total variational distance) between two probability distributions x and y is given by:

\begin{matrix} d (x, y) = \frac{1}{2} \sum_{j = 1}^{m} | x_{j} - y_{j} | \end{matrix}

In this case, we take S(x, y) = 1 − d(x, y).

Clearly basing similarity only on agents’ current belief values has significant limitations since during social learning such beliefs are inherently dynamic over time. An alternative would be for agents to record a sample of recent beliefs for both themselves and their communication network neighbours. Specifically, let X and Y be subsets of $P_{m}$ representing sets of recent beliefs of two agents, and suppose we have defined a similarity measure $S : 2^{P_{m}} \times 2^{P_{m}} \to [0,1]$ between two sets of beliefs. Bounded confidence would then require that for a threshold τ ∈ [0, 1] an agent with recent beliefs X would judge another agent with recent beliefs Y to be reliable provided that S (X, Y) > τ. However, one problem with this form of extended BC model is the difficulty of identifying suitable candidate similarity measures for sets of beliefs that improve on single belief BC at identifying unreliable agents. In this paper, we propose to base S on the Kolmogorov-Smirnov two sample test (Smirnov, 1939) which is a non-parametric significance test for whether two samples are drawn from the same distribution. Since the Kolmogorov-Smirnov test can only be applied to samples of real-valued random variables we will initially consider the m = 2, that is, the two hypotheses case, since in this case an agent’s belief can be represented by a single probability value, that is, P_A(H₁). We will then subsequently propose a method for extending the Kolmogorov-Smirnov approach to bounded confidence to the m > 2 multi-hypotheses case.

Definition 4

KS Test Statistic

Suppose $X, Y \subseteq R$ are two sets of independent samples each drawn from fixed, though possibly different, distributions. Then the KS test statistic for X and Y, denoted D_X,Y, is the maximum absolute difference between the empirical cumulative distribution functions, F_X and F_Y, for X and Y, respectively. Formally,

\begin{matrix} D_{X, Y} = \sup {| F_{X} (v) - F_{Y} (v) | : v \in R} \end{matrix}

where

\begin{matrix} F_{X} (v) = \frac{1}{| X |} \sum_{x \in X} 1_{x \leq v} and F_{Y} (v) = \frac{1}{| Y |} \sum_{y \in Y} 1_{y \leq v} \end{matrix}

Now let D be a random variable corresponding to the KS test statistic on randomly sampled sets of values with the same number of elements as X and Y, assuming that both samples are drawn from the same distribution then we define

\begin{matrix} S (X, Y) = P (D \geq D_{X, Y}) \end{matrix}

This corresponds to the p-value of the two sample Kolmogorov-Smirnov significance test which can be determined numerically (Viehmann, 2021) or using limit distributions in the case of large samples (Feller, 1948).¹ In this case the null hypothesis is that the two samples are drawn from the same distribution and S (X, Y) > τ means that we would fail to reject the null hypothesis with confidence 1 − τ.² There are, however, a number of problems with applying the Kolmogorov-Smirnov test in this way as part of a bounded confidence model. Firstly the test requires that the two sets are both comprised of independent samples, but if X and Y are sets of recent beliefs of two agents then if they are both engaged in learning then their current beliefs will be dependent on their earlier beliefs and hence the independence assumption will not hold. Secondly, the accuracy of the Kolmogorov-Smirnov test increases with sample size while in the context of social learning it may be infeasible for individuals to keep track of all their neighbours’ beliefs over extended time periods and hence X and Y are likely to be of relatively low cardinality. On the other hand, in the sequel we will present agent-based simulation results to show that nonetheless the p-value of the KS test can be an effective heuristic for collective reliability assessment. Furthermore, the use of heuristics in this context is arguably justified as part of the assumption of bounded rationality which is commonly made in collective decision making (Mueller-Frank and Neri, 2021). In complex interaction networks, individuals lack the information required to make fully rational decisions about whether or not to treat a peer as reliable. Hence, they will inevitably employ approximate reasoning based on heuristics of some kind. Indeed, bounded confidence is already a simplified heuristic according to which reliability is only assessed in terms of transmitted opinions. From this perspective, independence between beliefs could perhaps be justified as a least biased assumption in the absence of sufficient knowledge on which to based more accurate dependence assessments. Furthermore, even though p-value accuracy may be limited by small sample sizes their use may still result in a significant improvement in the efficacy of social learning when compared to other models of bounded confidence or where there is no reliability assessment.

Since the Kolmogorov-Smirnov test is only applicable to real-valued data, the above approach to bounded confidence cannot be applied directly in multi-hypotheses BC models where m > 2 since in such cases agents’ beliefs correspond to probability distribution in $P_{m}$ . One way to overcome this difficulty would be to identify a suitable function $g : P_{m} \to R$ mapping probability distributions on hypotheses to real numbers. Then two sets of recent beliefs X and Y could be compared by first mapping each probability distribution onto its respective real value to obtain g(X) and g(Y), and then evaluating the KS test statistic for the two projected samples. In other words, define the similarity between X and Y as,

\begin{matrix} S (X, Y) = P (D \geq D_{g (X), g (Y)}) \end{matrix}

In this paper we consider the case where g is the expected value of the hypotheses index i so that for probability distribution x = (x₁, …, x_m),

\begin{matrix} g (x) = E_{x} (i) = \sum_{i = 1}^{m} i x_{i} \end{matrix}

In this case, g is a projection from elements of the probability simplex $P_{m}$ to a real value in $R$ , with the property that probability distributions with a high probability for H_i will have values close to i. The intuition is that g can provide sufficient information over time to distinguish successful learning agents from other types of unreliable agents such as zealots, since the former will tend to having a shared value of g(x) close to i, where H_i is the true hypothesis. In the sequel, we will show simulation experiments to support this intuition. To some extent, this definition of g is arbitrary, but with the important feature that probability distributions which are close to the same corner of the probability simplex will return similar values, distinct from distributions close to a different corner.

Here, we investigate time-stepped agent-based simulation models of social learning in the presence of zealots. We assume there are two types of agent in the population: learning agents who learn individually from evidence, and socially by pooling with the beliefs of their neighbours in a communication network; zealots who do not learn but consistently transmit the same belief z. In each time step, a learning agent receives direct evidence with probability ρ referred to as the evidence rate. Such evidence takes the form of an assertion promoting one of the hypotheses, that is, E = H_j for j = 1, …, m, and hence if an agent receives evidence then they update their current probability distribution according to Definition 2 taking account of their level of distrust δ in the evidence. We assume homogeneity of learning agents so that each agent has the same value of δ. Furthermore, evidence is not always accurate and instead is generated according to an error distribution ϵ , where ϵ_j is the probability that E = H_j, and for which the true hypothesis is the mode.

All agents are part of a communication network in the form of a graph in which nodes are agents and an edge between two agents indicates that there is a direct channel of communication between them. For social learning, each learning agent keeps a record of its beliefs during a window of ω time steps so that at time step t this is given by X_t = {x_t−ω+1, …, x_t}. Learning agents also record the transmitted beliefs of agents to whom they are connected in the communication network. For such an agent let this set of beliefs be denoted Y_t = {y_t−ω+1, …, y_t}. At each time step, every learning agent then evaluates the similarity between its current window of beliefs and the recorded window of its neighbours and deems a neighbour to be reliable if S (X_t, Y_t) ≥ τ where similarity is determined according to Definition 4. In this way suppose that an agent with current belief x_t identifies a set of reliable neighbours with current beliefs y_t,1, …, y_t,k−1, then it pools these beliefs with its own and adopts the combined belief π (x_t, y_t,1, …, y_t,k−1) determined according to Definition 1. Then, with probability ρ, the agent receives evidence in which case they make a further update to their probability distribution according to Definition 2. Otherwise the agent makes no further changes to their belief within that time step. Figure 2 shows a flow diagram illustrating this agent-based model.

Figure 2.

Flow diagram showing the operations performed by a learning agent in one time step of the agent-based model.

In this model it is assumed that communication network neighbours are able to transmit their full probability distributions over the m hypotheses during each time step. In the case of artificial agents this assumption is relatively unproblematic. For example, even for quite large values of m, sharing m − 1 dimensional vectors of real numbers should be easily within the memory and communication capabilities of most simple robots, although there are exceptions where operating conditions are particularly challenging, for example, under water robots. On the other hand, as a model of human behaviour it is clearly unrealistic in at least two respects. (1) Humans do not tend to communicate their beliefs in terms of real numbers, but rather use natural language. In this regard, all real-valued opinion dynamics models are unrealistic. (2) Humans do not tend to communicate the entirety of their beliefs across a whole set of hypotheses in one step, but instead only partially reveal their beliefs over time. Nonetheless, perhaps if viewed as a simplified model of collective learning our approach can still offer some insight into human behaviour. For instance, probability values could be viewed as being translations of natural language belief descriptions into a formal framework. In addition, a time step could be thought of as comprising multiple interactions during which an agent gradually reveals their full beliefs.

As a benchmark we will use the memoryless BC model described in Lawry (2024) in which an agent, at time t, judges the reliability of its social network neighbours based on the similarity of their current beliefs y_t to its own x_t, that is, S(x_t, y_t) > τ, and where similarity is evaluated using statistical difference as given in Definition 3. Otherwise the agent-based model for this benchmark is also structured as shown in Figure 2.

Binary bounded confidence

In this section we focus on the case where there are just two hypotheses H₁ and H₂ so that H₂ is the complement or negation of H₁. In this setting, it is convenient to represent beliefs by single probability values x ∈ [0, 1] corresponding to P_A(H₁) = x, and representing the probability distribution x = (x, 1 − x). As well as learning agents a certain proportion of agents are zealots each promoting a fixed probability value z ∈ [0, 1], representing the distribution z = (z, 1 − z). Without loss of generality we will assume that H₁ is the true hypothesis and hence evidential noise is characterised by a probability value ϵ ∈ [0, 0.5) representing the noise distribution ϵ = (1 − ϵ, ϵ). A summary of parameters for agent-based simulations of the model is given in Table 1.

Table 1.

Summary table of parameters in agent-based simulation experiments.

Parameter	Notation	value(s)
Population size		100
Independent runs		50
Proportion of zealots		20% and varying
Distrust of evidence	δ	0.1 and varying
Evidence rate	ρ	0.1
Evidential noise	ϵ	0.3 and varying
Similarity threshold	τ	0.01 and varying
Similarity threshold (benchmark)	τ	Optimal
Zealot probability	z	0.1, 0.5, 0.9 and varying
Memory window length	ω	10 and 20
Watts-Strogatz neighbours		10 and varying
Watts-Strogatz reconnect		0.1

Successful social learning in this model will result in the average probability allocated to the true hypothesis across agents, approaching a value close to 1 over time. For reasonable levels of noise, this will also occur in the case of individual learning where agents only learn directly from evidence, since over time the evidential signal will, on average, reinforce belief in the true hypothesis. Hence, an important potential benefit of social learning in this context is to increase the rate of learning so that agents converge on a high probability for the true hypothesis more quickly. In other words, if social learning, as specified by a certain set of parameters, does not converge more quickly than individual learning for the same evidential noise levels, then arguably there is no benefit from social learning in that case. As we will see the presence of certain types of zealots can sometimes remove the utility of social learning in this sense. Therefore, as a performance metric we will primarily adopt convergence time here defined to be the shortest time at which the average probability of the true hypothesis H₁ across non-zealot agents has exceeded 0.95 for 100 consecutive time steps, that is, this is the number of time steps required for the average probability of the true hypothesis to be consistently within 0.05 of 1. The number of time steps is capped at 500, this being the value returned if the convergence criterion is not met. Results are averaged over 50 independent experimental runs fixed at the relevant parameter settings and error bars show 90% percentiles across the runs. The only exceptions are heat maps where the results are averaged over 10 runs. The communication network is randomly generated for each run of the simulation as a Watts-Strogatz (small-world) graph. This form of random graph has been proposed as a possible model of certain kinds of human social networks (Newman et al., 2002) as well as of communication constrained interactions between individual robots in multi-robot systems or swarms (Crosscombe and Lawry, 2022). A Watts-Strogatz graph is defined by two parameters; the number of neighbours of each agent (node) and a (typically small) reconnecting probability of rewiring an existing edge from a neighbour to a different agent.

We now present a series of simulation experiments to investigate the following research questions: (1) Is memory-based BC less sensitive to the choice of similarity threshold than the memoryless benchmark for different types of zealot? The relevant model parameters to be investigated are τ and z. (2) Is memory-based BC less affected by high levels of doubt in evidence compared to the memoryless benchmark? For this question the relevant model parameter is δ. (3) Is memory-based BC able to cope with high levels of evidential noise? Here, the relevant model parameter to investigate is ϵ. (4) Is memory-based BC effective if agents are only able to keep track of a small number of neighbours? For this question the relevant model parameter is the number of Watt-Strogatz neighbours.

Initially, we investigate the robustness of both the memory-based and the memoryless model to the choice of the similarity threshold τ. As discussed in Lawry (2024), the memoryless bounded confidence model is sensitive to the choice of τ, and of particular concern is that the effective mitigation of different types of zealots requires different τ values. For example, Figure 3(c) shows average convergence time for memoryless BC plotted against τ for three types of zealots: zealots promoting the false hypothesis with z = 0.1 (black line); zealots promoting uncertainty with z = 0.5 (red line); zealots promoting the true hypothesis with z = 0.9 (blue line). Notice that while convergence time is minimised with τ ≈ 0.7 in the z = 0.1 and z = 0.5 cases, convergence time is optimal for z = 0.9 when τ = 1. Indeed for this latter case τ = 1 is the only threshold value for which the convergence requirement is met and corresponds to evidence only individual learning where agents consistently forgo the option of learning from their peers. The sensitivity of the memoryless model to τ means that it cannot achieve good performance with a fixed threshold value for a variety of zealot types. This is illustrated in Figure 4(a) where the blue line shows average convergence time for the memoryless model plotted against z when τ = 0.7 (as is optimal for z = 0.1). Although performance is good for z between 0.1 and 0.5, convergence times increase significantly for z > 0.5 and for high values the convergence threshold is not met. One might not intuitively expect that the presence of zealots promoting the true hypothesis to some degree could be so disruptive to the memoryless approach. Some additional insight can be gained from Figures 5(a) and (b) showing trajectories of the probability of the true hypothesis for individual agents against time for z = 0.6 and z = 0.9, respectively. Even though both types of zealots to some degree promote the true hypothesis, it is nonetheless clear from the trajectories that by constantly promoting the same opinion in the memoryless model they disrupt consensus formation amongst learning agents around a strong belief in H₁.

Figure 3.

Social learning performance for varying similarity thresholds τ: z = 0.1 (black line), z = 0.5 (red line), and z = 0.9 (blue line).

Figure 4.

Social learning convergence time plotted against varying parameters modelling types of zealots, evidential distrust, social network connectivity, and evidential noise. Black lines are for BC incorporating memory with τ = 0.01, blue lines are for memoryless BC with τ = 0.7 and green lines are for individual learning only. (a) Convergence time versus zealots, (b) convergence time versus distrust, (c) convergence time versus neighbours, and (d) convergence time versus noise.

Figure 5.

Probability of the true hypothesis plotted against time for populations with zealots (red lines) z = 0.6 and z = 0.9. Figures 5(a) and (b) are for the memoryless benchmark, while Figures 5(c) and (d) show equivalent results for BC incorporating memory. (a) Memoryless bounded confidence with z = 0.6, (b) memoryless bounded confidence with z = 0.9, (c) bounded confidence with memory and z = 0.6, and (d) bounded confidence with memory and z = 0.9.

BC incorporating memory tends to be much less sensitive to the choice of value for τ as illustrated in Figures 3(a) and (b) showing convergence time plotted against τ. Figure 3(a) shows the results for τ values ranging between 0 and 1 where the two extreme cases correspond, respectively, to no constraints on social learning and individual learning only. Values of τ between 0 and 1 then result in better performance than these two limiting cases, but with a gradual decline in performance for increasing τ > 0. Indeed very low values of τ appear to be optimal as shown in Figure 3(b) which plots convergence times for τ between 0.01 and 0.1 in steps of 0.01. Furthermore, notice that in contrast to memoryless model shown in Figure 3(c), in Figures 3(a) and (b), the plots are almost identical for the three zealot types z = 0.1, z = 0.5, and z = 0.9. This is important since it raises the possibility of agents adopting a single fixed similarity threshold allowing for good performance across a range of scenarios. For example, in Figure 4(a), the black line shows the convergence time for the memory-based model for varying z with τ = 0.01, indicating consistently good performance across zealot types. Consequently, for the remainder of the paper we fix τ = 0.01 for bounded confidence with memory in all subsequent experiments and make no attempt to adapt the similarity threshold in order to optimise performance in specific learning scenarios. In contrast, in light of the sensitivity of the memoryless model to τ, and in order for it to provide a suitable benchmark, we attempt to select a threshold value providing close to optimal performance in different experimental scenarios.

Figure 4 illustrates the sensitivity of both BC with memory and the memoryless benchmark to a number of model parameters. For the latter ,we have used the similarity threshold τ = 0.7 which is optimal for mitigating the effect of zealots that strongly promote the false hypothesis as discussed above and shown in Figure 3(c).

High levels of distrust in evidence can often occur across a population during collective learning and decision making. Agents may be aware that certain evidence is unreliable, for example due to noise resulting from inaccurate measurements from imprecise or even malfunctioning sensors. For example, the robots used in swarms tend to be relatively low cost with limited low resolution sensors. Furthermore, in applications such as search and rescue these robots may be required to operate under difficult conditions, for example, in poor weather or in smoke filled buildings (Kegeleirs and Birattari, 2025). These factors mean that the accuracy of sensor data is often inherently uncertain. In addition, belief perseverance is a common psychological phenomenon according to which individuals are reluctant to update ingrained beliefs in the face of evidence that may contradict them (Guenther and Alicke, 2008). In our probabilistic model, evidential distrust is quantified by the parameter δ, corresponding to an agent’s belief that evidence will erroneously support the false hypothesis. If δ is high, that is, close to 0.5, then the evidential signal driving agents to update their beliefs towards having higher probability in the true hypothesis is much reduced (see Figure 1). This can make it harder for agents to detect zealots since, in some circumstances, a zealot’s trajectory of unchanging beliefs could resemble their own. Results presented in Lawry (2024) suggest that the effectiveness of memoryless BC is significantly reduced as learning agents’ level of distrust in evidence increases. This is also shown here in Figure 4(b) which plots convergence time against δ. For δ > 0.15 the convergence time for memoryless BC (blue line) exceeds that for individual learning only (green line) in the presence of zealots with z = 0.1. In other words, since in such cases, belief pooling in the presence of zealots results in a slower convergence time than achieved with individual learning from direct evidence only, there is no benefit from social learning. On the other hand, while the convergence time for BC with memory (black line) also increases with the level of distrust, it is nonetheless always less than that for individual learning. Consequently, even for high levels of distrust resulting a weak evidential signal, as shown in Figure 1, BC with memory is still able to mitigate the presence of zealots, albeit requiring a longer learning time in which to do so, as the doubt level increases.

Another potentially important factor in social learning is agent connectivity as captured in our model by the number of neighbours specified for the Watts-Strogatz graph. Figure 4(c) shows convergence times for Watts-Strogatz graphs with a fixed rewiring probability of 0.1 but varying numbers of neighbours. Note that for both BC with and without memory, increasing connectivity reduces convergence time in the presence of zealots with z = 0.1 . BC with memory is slightly slower for low levels of connectivity, but the two models are broadly comparable as the number of neighbours increases. Unless otherwise stated we will assume a relatively low connectivity level as represented by 10 neighbours in the Watts-Strogatz model. This result is also important for practical applications of memory-based BC on social networks. More specifically, the proposed use of memory to evaluate similarity between agents requires that each agent keeps track of its network neighbours, recording their beliefs over a period of time. Even if that period is relatively short such data collection has an inevitable cost. For example, in physical environments where agents move across a spatial region, an agent must track the position of its peers, linking, as it does so, different agents in different locations to fixed identities, as positions change dynamically. Alternatively, online agents must track their network neighbours across different platforms, matching online identities to particular individuals. Hence, in practice agents may need to limit the number of individuals that they track, and consequently if a version of memory-based BC is to be applied in practice then we need to ensure its effectiveness on communication networks with limited connectivity as represented by agents having relatively few neighbours. Figure 4(c) provides some evidence that this is indeed the case.

In almost all social learning contexts, there is an element of unreliability associated with evidence obtained from the environment. For instance, sensors tend to have limited resolution and may be imperfectly calibrated, difficult environmental conditions can make accurate measurement impossible, and environmental complexity can mean that only a small proportion of features can be captured. Here, we consider the effect of evidential noise as modelled by the parameter ϵ, corresponding to the probability that evidence received asserts the false hypothesis. Figure 4(d) shows convergence time plotted against ϵ in the presence of zealots with z = 0.1. For all values of ϵ between 0 and 0.45 both BC models converge to a high average belief in the true hypothesis faster than individual learning with the same level of evidential noise (green line) in the presence of zealots with z = 0.1, meaning that the benefits of social learning are preserved even if noise is very high. However, for very high levels of noise where ϵ ≥ 0.4, BC with memory (black line) is significantly slower to converge than memoryless BC (blue line).

In this section, we have compared BC models with and without memory for probabilistic social learning with two hypotheses where beliefs can be represented by single probability values. For learning problems of this kind results suggest that BC with memory is more robust at mitigating the presence of zealots in the population than the memoryless benchmark. In particular, BC with memory is much less sensitive to the choice of similarity threshold τ, and furthermore the same value of τ = 0.01 is effective for all zealot types with z between 0.1 and 0.9. This is important for the efficacy of the approach since, unlike memoryless BC, no prior information is required concerning the type of zealots present in the population. Furthermore, BC with memory is significantly more robust to high levels of evidential distrust than memoryless BC. Indeed even for relatively low levels of distrust, for example, δ ≈ 0.15, the memoryless model is slower to converge than individual learning, while for δ ≥ 0.2 the convergence requirement is not met within the allocated 500 time steps. However, there are some circumstances in which BC with memory, although mitigating the effect of zealots, is significantly slower to converge than the memoryless benchmark with optimal τ. This includes scenarios in where there is very high noise or very low connectivity. In the next section, we now investigate the effectiveness of BC with memory for multiple hypotheses where m > 2.

A multi-hypotheses model

Here, we extend the BC to model with memory to scenarios in which there are multiple competing hypotheses by mapping any given probability distribution on hypotheses $x \in P_{m}$ to a real number $g (x) \in R$ corresponding to the expected hypothesis index relative to x as described in the section Bounded Confidence with Memory. Accordingly, an agent with current window of probability distributions X_t will assess the reliability of a neighbouring agent with current window of probability distributions Y_t by using Definition 4 to evaluate S(g(X_t), g(Y_t)) and which can then be compared to the similarity threshold τ.

Using this approach and the agent-based simulation model shown in Figure 2 we initially consider the case where m = 3 as this still allows for relatively straightforward visualisation of results. Figure 6(a) shows a ternary heat map of convergence times of BC with memory for a population with 20% zealots promoting varying probability distributions z across $P_{3}$ where there is infrequent (ρ = 0.1) and noisy evidence as modelled by the error distribution ϵ = (0.5, 0.25, 0.25). In light of the greater complexity of the learning task, for this and for all subsequent multi-hypotheses experiments we have assumed a slightly larger window size of ω = 20. The results suggest uniformly good performance for BC with memory across different possible zealot distributions in $P_{3}$ with only extreme zealots advocating 0 probability for the true hypothesis resulting in significantly slower convergence. In contrast, for memoryless BC shown in Figure 6(b) we see that the convergence criterion is not met for a large region of the probability simplex $P_{3}$ representing stubbornly uncommitted zealots or those promoting the true hypothesis H₁ to varying degrees. For memoryless BC we have chosen τ = 0.6 so as to give optimal performance in the presence of zealots strongly promoting one of the false hypotheses, specifically z = (0.05, 0.9, 0.05), as shown by the black line in Figure 7. For this parameter setting notice that for some regions of $P_{3}$ in which zealots are strongly promoting one of the false hypotheses, then memoryless BC can converge more quickly than BC with memory as is consistent with results for the binary hypotheses case, for example, see Figure 4(a).

Figure 6.

Social learning performance in the case of multiple hypotheses for different types of zealots. Figures 6(a) and (b) are ternary heat maps of convergence time for zealots promoting probability distributions across $P_{3}$ . Figure 6(a) shows BC with memory, and Figure 6(b) shows the memoryless benchmark. (a) BC with memory convergence time for different zealots and three hypotheses. (b) Benchmark convergence time for different zealots and three hypotheses; τ = 0.6.

Figure 7.

Convergence time against τ for the three and five hypotheses cases in the presence of zealots strongly promoting H₂. For three hypotheses (black line) z = (0.05, 0.9, 0.05); for five hypotheses (red line) z = (0.025, 0.9, 0.025, 0.025, 0.025).

To further investigate the scalability of BC with memory, we now consider the case where there are m = 5 hypotheses. For this ,we consider zealots promoting probability distributions lying on two lines in $P_{5}$ ; the line

\{z = (\frac{z}{4}, 1 - z, \frac{z}{4}, \frac{z}{4}, \frac{z}{4}) : z \in [0.1, 0.8]\}

between

z = (0.025, 0.9, 0.025, 0.025, 0.025)

and

z = (0.2, 0.2, 0.2, 0.2, 0.2)

promoting the false hypothesis H₂ to varying degrees, and the line

\{z = (z, \frac{1 - z}{4}, \frac{1 - z}{4}, \frac{1 - z}{4}, \frac{1 - z}{4}) : z \in [0.2, 0.9]\}

between

z = (0.2, 0.2, 0.2, 0.2, 0.2)

and

z = (0.9, 0.025, 0.025, 0.025, 0.025)

promoting the true hypothesis H₁ to varying degrees. In this context, τ = 0.6 is chosen for memoryless BC so as to optimise performance in the presence of z = (0.025, 0.9, 0.025, 0.025, 0.025), as indicated by the red line in Figure 7. Figure 8 then shows convergence times in the presence of zealots with distributions on these two lines for both BC with memory (black line) and the memoryless benchmark (blue line), with evidential noise modelled by the distribution ϵ = (0.4, 0.15, 0.15, 0.15, 0.15). In this case the results show that BC with memory converges more quickly that memoryless BC in the presence of all but the most extreme zealots promoting a false hypothesis. Indeed memoryless BC is particularly disrupted by zealots who are stubbornly uncertain or who promote the true hypothesis to some limited degree.

Figure 8.

Convergence time where there are five hypotheses for varying zealot probability distributions of two types. The left hand side of Figure 8 shows convergence times in the presence of zealots promoting H₂ to different degrees, while the right hand side is where zealots are promoting H₁ to different degrees. For multi-hypotheses social learning we have adopted a memory window length of 20 time steps, that is, ω = 20; BC with memory (black line), memoryless benchmark with τ = 0.6 (blue line).

In summary, the proposed model of BC with memory generalises well to social learning with multiple hypotheses, showing robust performance in the presence of most types of zealots. In contrast, memoryless BC, when optimised to mitigate zealots strongly promoting a false hypothesis, is then susceptible to disruption from zealots who are either stubbornly uncertain or promote the true hypothesis to some limited extent. Indeed for large numbers of hypotheses memoryless BC is now slower to converge than BC with memory even for zealots promoting a false hypothesis to a weaker extent (see Figure 8). In the next section we introduce one additional type of agent that while able to socially learn from its peers has no direct access to evidence.

Agents without access to evidence

It is often the case in social learning that only a minority of agents have any direct access to evidence. Unlike zealots such agents are willing and able to learn from their neighbours, some of whom may have access to evidence, and they can therefore potentially learn the true hypothesis. However, they are likely to be more susceptible than learning agents to the influence of zealots and to emerging erroneous beliefs in general. In this section, we will investigate social learning by populations consisting of three types of agents, the first two of which have already been the focus of previous sections; zealots who constantly promote the same belief, learning agents who learning socially but who also have access to evidence and hence learn individually based on evidence they receive; and free riders who learn socially from their neighbours but do not receive evidence. Populations consisting of these three agent types were studied for the HK bounded confidence model in Douven and Hegselmann (2022) with results suggesting that for a suitable choice of learning parameters social learning can be effective provided the proportion of free riders in the population is not too high relative to the proportion of zealots, but is significantly impeded if both proportions are too high. The HK model is memoryless and in this section we will consider the performance of BC with memory for these kinds of mixed populations. Specifically, we extend the agent-based model defined above to include free riders where the operations performed by such an agent in a time step are as shown by the flow diagram in Figure 9, and then investigate social learning performance in the binary hypotheses cases for varying proportions of agent types.

Figure 9.

Flow diagram showing the operations performed by a free rider agent in one time step of the agent-based model.

Figure 10(a) is a ternary heat map for BC with memory showing the average probability of the true hypothesis at convergence across non-zealots, that is, learning agents and free riders, for varying proportions of the three agents, and where zealots promote z = 0.1. BC with memory converges (Figure 10(a)) to a high average probability in the true hypothesis across non-zealots for a broad range of population types with varying proportions of the three types of agents. This includes where there is a relatively low proportion of learning agents compared to zealots and free riders. In contrast, for memoryless BC (Figure 10(b)) performance in the presence of zealots is degraded if there is also a significant proportion of free riders in the population. For these benchmark results the similarity threshold was set to τ = 0.8 on the basis of simulation experiments shown in Figure 11. This shows average probability of H₁ across non-zealots for three different populations, all with 20% zealots with z = 0.1, but with varying proportions of free riders and learning agents. Threshold value τ = 0.8 gives optimal or close to optimal performance in all three cases.

Figure 10.

Comparison of BC incorporating memory with the memoryless benchmark for different mixtures of three agent types; zealots (z = 0.1), free riders and learning agents. Figures 10(a) and (b) are ternary heat maps showing average probability of the true hypothesis across non-zealots at convergence for varying proportions of the three types. (a) BC with memory and mixtures of agents; τ = 0.01, (b) Memoryless BC and mixtures of agents; τ = 0.8.

Figure 11.

Memoryless BC and mixtures of agents: Average probability of the true hypothesis across non-zealots plotted against τ for three different mixtures of agent types; 20% zealots (z = 0.1), 70% free riders, 10% learning agents (black line); 20% zealots (z = 0.1), 40% free rider, 40% learning agents (red line); 20% zealots (z = 0.1), 10% free riders, 70% learning agents (blue line).

Figure 12 shows trajectories of average probability of the true hypothesis against time for the three types of agents in a population in which there are 40% learning agents, 40% free riders, and 20% zealots. Average probabilities are shown in black for learning agents, blue for free riders, and red for zealots. Figures 12(a)–(c) show the trajectories for memoryless BC for zealots with z = 0.1, z = 0.5, and z = 0.9, respectively. Notice that for all three types of zealots, a significant proportion of free riders do not learn effectively. Some free riders converge to very low probability values in H₁, and this happens even in the case in which the zealots to an extent promote H₁. In these latter cases, this may result from free riders being misled by learning agents who have received erroneous evidence. Furthermore, some free riders converge on an intermediate probability value expressing ongoing uncertainty resulting from the direct and indirect influence of zealots. For BC with memory, as shown in Figures 12(d)–(f), the vast majority of free riders converge to a high probability in the true belief in the presence of all three types of zealots. A relatively small number of free riders still do not learn effectively, but these converge to a range of different and widely distributed probability values.

Figure 12.

Probability of the true hypothesis plotted against time for populations with 20% zealots (red lines), 40% freeloaders (blue lines), and 40% learning agents (black line), and where the zealots promote z = 0.1, z = 0.5, and z = 0.9. Figures 12(a)–(c) show results for memoryless bounded confidence, while Figures 12(d)–(f) show results for bounded confidence incorporating memory. (a) Memoryless BC with z = 0.1, (b) Memoryless BC with z = 0.5, (c) Memoryless BC with z = 0.9, (d) BC with memory with z = 0.1, (e) BC with memory with z = 0.5, and (f) BC with memory with z = 0.9.

Conclusions and future work

We have proposed a probabilistic model of social learning that incorporates bounded confidence with memory. This uses the Kolmogorov-Smirnov test to measure the similarity between sets of probability values, and hence allows for the application of bounded confidence to trajectories of recent probability values over a time window rather than just current values. Agent-based simulation experiments suggest that BC with memory is less sensitive to the choice of similarity threshold than the memoryless probabilistic benchmark model. Furthermore, learning performance is much less susceptible to high levels of distrust in evidence across agents. These results then generalise from the two to the multi-hypotheses case in which case bounded confidence is evaluated based on sets of real values in the form projections of multi-dimensional probability distributions. Finally, we have shown that incorporating memory into bounded confidence allows for effective social learning in a broad range of mixed populations comprising of free riders agents as well as zealots and learning agents.

The power of social learning comes from its capacity to exploit emergent effects driven by the two local processes of evidential updating and opinion pooling to reach a population level consensus about the true state of the world. However, this can also make it vulnerable to the influence of unreliable agents, since these same underlying mechanisms can also facilitate the spread of false beliefs. Bounded confidence provides a potential heuristic strategy that agents can deploy to avoid unreliable individuals during learning. The intuition behind BC in this setting is that as an honest truth-seeking agent your belief will evolve in a particular way over time, and the beliefs of other reliable agents will follow a similar trajectory. This contrasts with unreliable, or at least less reliable agents, whose beliefs will tend to change in a different way, if at all. However, if the comparison between agents’ is based only on current beliefs then relevant information is lost about the evolution of opinions. This can result in the effectiveness of BC as a detection tool being highly contingent on choosing the right similarity threshold, and where this value is dependent on the type of unreliable agents present in the population. This significantly reduces the applicability of BC since prior knowledge about agent types is rarely available in practice. The experiments in this paper show that incorporating memory into BC by making similarity judgements based on samples of recent beliefs, greatly increases its robustness by making it much less sensitive to the choice of parameter values, and hence reducing the need for prior knowledge about the makeup of the agent population. This opens up the possibility for collective decision making applications in multi-agent and multi-robot systems where there is an inherent vulnerability to malfunctioning or malicious interventions causing cascading failures. It also serves to emphasise the importance of basing reliability assessments on a sufficient amount of information gathered over repeated interactions, and this may serve to guide us as we navigate social media and other online platforms.

The agent-based models investigated in this paper have a number of limitations which point to possible directions for future research. For example, modelling zealots as stubborn individuals who constantly signal the same probability value or distribution is simplistic. At any time such agents are likely to be much more reactive to the current spread of opinions across individuals with whom they are in communication, so as to best influence the emerging consensus and whilst also avoiding being classified as unreliable. This can perhaps be formulated as a multi-objective optimisation problem for zealots to choose to transmit a trajectory of distributions which are sufficiently close to those of its neighbours to avoid exclusion, but where the inclusion of its stated distributions in pooling will tend, over time, to draw those neighbours closer to its target opinion. As well as more sophisticated models of unreliable agents it would be interesting to increase the heterogeneity of the agent population. For instance, we might model learning agents who obtain evidence at varying rates, of varying reliability and with different levels of distrust.

The current model includes only a simple static model of agent communications in the form of a randomly generated small-world network. Of course, communications could be structured differently and it would be interesting to consider other types of graph, some of which could be more extreme than others. Furthermore, in practice communication between agents may change dynamically, for examples as agents move around a physical environment and therefore drop in and out of communication range with their peers. In this context it may be useful to study multi-agent models which a dynamic physical element in which agents change their relative positions over time. Such models may also incorporate communication errors or noise of different types.

Footnotes

ORCID iD

Jonathan Lawry

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was partially funded by the INFORMED-AI project EP/Y028732/1.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Statement of significance

Social learning allows individuals to collectively learn from each other as well as from direct evidence. This has many advantages including by enabling agents to compare beliefs so as to correct errors and misconceptions, and by providing an efficient mechanism to disseminate knowledge across a population. However, these strengths can also make social learning vulnerable to the presence of unreliable agents who may distort opinions resulting in the spread of false information. There is a simple local heuristic called bounded confidence that can help filter out unreliable opinions, and which requires that agents only pay attention to opinions that are sufficiently similar to their own. The effectiveness of this rule as a filtering mechanism is known to be very sensitive to exactly how agents determine sufficient similarity. This paper introduces a model of social learning with bounded confidence, but with the additional feature that similarity judgements involve an element of memory and are made based on trajectories of recent opinions as they have evolved over time, rather than just based on current opinions. Within this model it is shown that incorporating memory into bounded confidence makes it much more robust and less sensitive to the choice of the filtering threshold. This research has the potential to help improve the design of multi-agent and multi-robot systems when performing collective decision making tasks by providing an effective local norm for reliability detection. It can also help us to understand the role that bounded confidence could play in human collective behaviour.

Notes

References

Abbas

(2009) A Kullback-Leibler view of linear and log-linear pools. Decision Analysis 6(1): 25–37. https://doi.org/10.1287/deca.1080.0133

Antonic

Zakir

Dorigo

, et al. (2024) Collective robustness of heterogeneous decision-makers against stubborn individuals. In: AAMAS ’24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, Auckland, New Zealand, 2024, pp. 68–77.

Azzimonti

Fernandes

(2023) Social media networks, fake news, and polarization. European Journal of Political Economy 76: 102256. https://doi.org/10.1016/j.ejpoleco.2022.102256

Becchetti

Clementi

Korman

, et al. (2023) On the role of memory in robust opinion dynamics. International Joint Conference on Artificial Intelligence 4: 29–37.

Bernardo

Altafini

Proskurnikov

, et al. (2024) Bounded confidence opinion dynamics: a survey. Automatica 159: 111302. https://doi.org/10.1016/j.automatica.2023.111302

Boyd

Richerson

Henrich

(2011) The cultural niche: why social learning is essential for human adaptation. Proceedings of the National Academy of Sciences 108(2): 10918–10925. https://doi.org/10.1073/pnas.1100290108

Chang

Doll

van’t Wout

, et al. (2010) Seeing is believing: trustworthiness as a dynamic belief. Cognitive Psychology 61(2): 87–105. https://doi.org/10.1016/j.cogpsych.2010.03.001

Crosscombe

Lawry

(2016) A model of multi-agent consensus for vague and uncertain beliefs. Adaptive Behavior 24(4): 249–260. https://doi.org/10.1177/1059712316661395

Crosscombe

Lawry

(2021) Collective preference learning in the best-of-n problem: from best-of-n to ranking n. Swarm Intelligence 15(10): 145–170. https://doi.org/10.1007/s11721-021-00191-9

10.

Crosscombe

Lawry

(2022) The impact of network connectivity on collective learning. In: Distributed Autonomous Robotic Systems DARS 2021. Springer Proceedings in Advanced Robotics. Springer, Vol. 22.

11.

Crosscombe

Lawry

Hauert

, et al. (2017) Robust distributed decision-making in robot swarms: exploiting a third truth state. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017, pp. 4326–4332.

12.

de Finetti

(1937) Foresight: Its Logical Laws, Its Subjective Sources. Annales de l’Institut Henri Poincaré 7: 1–68.

13.

Deffuant

Neau

Amblard

, et al. (2000) Mixing beliefs among interacting agents. Advances in Complex Systems 03(01n04): 87–98. https://doi.org/10.1142/s0219525900000078

14.

Denniss

Lindberg

(2025) Social media and the spread of misinformation: infectious and a threat to public health. Health Promotion International 40(2): daaf023. https://doi.org/10.1093/heapro/daaf023

15.

Dietrich

List

(2016) Probabilistic opinion pooling. In: Hajek

Hitchcock

(eds) Oxford Handbook of Probability and Philosophy. Oxford University Press.

16.

Douven

Hegselmann

(2021) Mis-and disinformation in a bounded confidence model. Artificial Intelligence 291: 103415. https://doi.org/10.1016/j.artint.2020.103415

17.

Douven

Hegselmann

(2022) Network effects in a bounded confidence model. Studies in History and Philosophy of Science 94: 56–71. https://doi.org/10.1016/j.shpsa.2022.05.002

18.

Douven

Kelp

(2011) Truth approximation, social epistemology, and opinion dynamics. Erkenntnis 75: 271–283. https://doi.org/10.1007/s10670-011-9295-x

19.

Feller

(1948) On the Kolmogorov-Smirnov limit theorems for empirical distributions. The Annals of Mathematical Statistics 19(2): 177–189. https://doi.org/10.1214/aoms/1177730243

20.

Franks

Dornhaus

Best

, et al. (2006) Decision making by small and large house-hunting ant colonies: one size fits all. Animal Behaviour 72(3): 611–616. https://doi.org/10.1016/j.anbehav.2005.11.019

21.

Galef

Laland

(2005) Social learning in animals: empirical studies and theoretical models. BioScience 55(6): 489–499. https://doi.org/10.1641/0006-3568(2005)055[0489:sliaes]2.0.co;2

22.

Genest

Zidek

(1986) Combining probability distributions: a critique and an annotated bibliography. Statistical Science 1: 114–135. https://doi.org/10.1214/ss/1177013825

23.

Giola

Carnmarota

Kühn

(2020) Opinion dynamics with emergent collective memory: a society shaped by its own past. Physica A: Statistical Mechanics and its Applications 558: 124909.

24.

Guenther

Alicke

(2008) Self-enhancement and belief perseverance. Journal of Experimental Social Psychology 44(3): 706–712. https://doi.org/10.1016/j.jesp.2007.04.010

25.

Hegselmann

Krause

(2002) Opinion dynamics and bounded confidence: models, analysis, and simulation. The Journal of Artificial Societies and Social Simulation 5(3). Available at: https://www.jasss.org/9/3/1.html.

26.

Hegselmann

Krause

(2006) Truth and cognitive division of labour: first steps towards a computer aided social epistemology. The Journal of Artificial Societies and Social Simulation 9(3). Available at: https://www.jasss.org/5/3/2.html.

27.

Hegselmann

Krause

(2015) Opinion dynamics under the influence of radical groups, charismatic leaders, and other constant signals: a simple unifying model. Networks and Heterogeneous Media 10(3): 477–509. https://doi.org/10.3934/nhm.2015.10.477

28.

Heyes

(1994) Social learning in animals: categories and mechanisms. Biological Reviews 69: 207–231. https://doi.org/10.1111/j.1469-185x.1994.tb01506.x

29.

Jadbabaie

Molavi

Sandroni

, et al. (2012) Non-Bayesian social learning. Games and Economic Behavior 76(1): 210–225. https://doi.org/10.1016/j.geb.2012.06.001

30.

Jędrzejewski

Sznajd-Weron

(2018) Impact of memory on opinion dynamics. Physica A: Statistical Mechanics and its Applications 505: 306–315. Available at: https://doi.org/10.1016/j.physa.2018.03.077

31.

Karan

Salimi

Chakraborty

(2018) Effect of zealots on the opinion dynamics of rational agents with bounded confidence. Acta Physica Polonica B 49(1): 73–93. https://doi.org/10.5506/aphyspolb.49.73

32.

Kegeleirs

Birattari

(2025) Towards applied swarm robotics: current limitations and enablers. Frontiers in Robotics and AI 12: 1607978. https://doi.org/10.3389/frobt.2025.1607978

33.

Lawry

(2024a) Heterogeneous thresholds, social ranking, and the emergence of vague categories. Artificial Life 30(4): 523–538. https://doi.org/10.1162/artl_a_00442

34.

Lawry

(2024b) Zealot detection in probabilistic social learning using bounded confidence. Proceedings of ALIFE: 717–725.

35.

Lawry

Lee

(2020) Probability pooling for dependent agents in collective learning. Artificial Intelligence 288: 103371. https://doi.org/10.1016/j.artint.2020.103371

36.

Leadbeater

Chittka

(2007) Social learning in insects: from miniature brains to consensus building. Current Biology 17(16): R703–R713. https://doi.org/10.1016/j.cub.2007.06.012

37.

Lee

Jolles

Krabbendam

(2016) Social information influences trust behaviour in adolescents. Journal of Adolescence 46: 66–75. https://doi.org/10.1016/j.adolescence.2015.10.021

38.

Lee

Lawry

Winfield

(2018) Combining opinion pooling and evidential updating for multi-agent consensus. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018, pp. 347–353.

39.

Lee

Lawry

Winfield

(2021) Negative updating applied to the best-of-n problem with noisy qualities. Swarm Intelligence 15(1-2): 111–143. https://doi.org/10.1007/s11721-021-00188-4

40.

Lorenz

(2008) Fostering consensus in multidimensional continuous opinion dynamics under bounded confidence. In: Helbing

(ed) Managing Complexity: Insights, Concepts, Applications. Understanding Complex Systems. Springer.

41.

Lorenz

Urbig

(2007) About the power to enforce and prevent consensus by manipulating communication rules. Advances in Complex Systems 10(2): 251–269. https://doi.org/10.1142/s0219525907000982

42.

Madin

Lawry

Hunt

(2024) Collective anomaly perception during multi-robot patrol: constrained interactions can promote accurate consensus. In: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, Ávila, Spain, 2024, pp. 630–637.

43.

Mariano

Morarescu

Postoyan

, et al. (2020) A hybrid model of opinion dynamics with memory-based connectivity. IEEE Control Systems Letters 4(3): 644–649. https://doi.org/10.1109/lcsys.2020.2989077

44.

Mobilia

(2003) Does a single zealot affect an infinite group of voters? Physical Review Letters 91: 028701. https://doi.org/10.1103/PhysRevLett.91.028701

45.

Morris

(1974) Decision analysis expert use. Management Science 20(9): 1233–1241. https://doi.org/10.1287/mnsc.20.9.1233

46.

Mueller-Frank

Neri

(2021) A general analysis of boundedly rational learning in social networks. Theoretical Economics 16: 317–357. https://doi.org/10.3982/te2974

47.

Ndousse

Eck

Levine

, et al. (2021) Emergent social learning via multi-agent reinforcement learning. In: 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, Vol. 139, pp. 7991–8004.

48.

Newman

MEJ

Watts

Strogatz

(2002) Random graph models of social networks. Proceedings of the National Academy of Sciences 99(1): 2566–2572. https://doi.org/10.1073/pnas.012582999

49.

Njougouo

Reina

Tuci

, et al. (2024) On the role of zealots in a best-of-n problem on a heterogeneous network. Physica A: Statistical Mechanics and its Applications 655: 130198. https://doi.org/10.1016/j.physa.2024.130198

50.

O’Connor

Weatherall

(2019) The Misinformation Age: How False Beliefs Spread. Yale University Press.

51.

Parker

CAC

Zhang

(2009) Cooperative decision-making in decentralized multiple-robot systems: the best-of-n problem. IEEE/ASME Transactions on Mechatronics 14(2): 240–251. https://doi.org/10.1109/tmech.2009.2014370

52.

Ramsey

(1931) Truth and probability. In: Braithwaite

(ed) Foundations of Mathematics and Other Essays. Kegan, Paul, Trench, Trubner, & Co, pp. 156–198.

53.

Reina

Zakir

De Masi

, et al. (2023) Cross-inhibition leads to group consensus despite the presence of strongly opinionated minorities and asocial behaviour. Communications Physics 6(236): 236. https://doi.org/10.1038/s42005-023-01345-3

54.

Shan

Mostaghim

(2024) Many-option collective decision making: discrete collective estimation in large decision spaces. Swarm Intelligence 18(2): 215–241. https://doi.org/10.1007/s11721-024-00239-6

55.

Smirnov

(1939) On the estimation of the discrepancy between empirical curves of distribution for two independent samples. Bulletin Mathématique de l’Université de Moscou 2(2): 3–11.

56.

Stark

H-U

Tessone

Schweitzer

(2008) Decelerating microdynamics can accelerate macrodynamics in the Voter Model. Physical Review Letters 101: 018701. https://doi.org/10.1103/PhysRevLett.101.018701

57.

Valentini

Ferrante

Dorigo

(2017) The best-of-n problem in robot swarms: formalization, state of the art, and novel perspectives. Frontiers in Robotics and AI 4(9). Available at: https://doi.org/10.3389/frobt.2017.00009

58.

Viehmann

(2021) Numerically more stable computation of the p-values for the two-sample Kolmogorov-Smirnov test. https://arxiv.org/abs/2102.08037.2021

59.

Zhang

Liu

Wang

, et al. (2018) On the opinion formation of mobile agents with memory. Physica A: Statistical Mechanics and its Applications 492: 438–445. https://doi.org/10.1016/j.physa.2017.10.029

Incorporating memory into bounded confidence models of probabilistic social learning

Abstract

Keywords

Introduction and background

Probabilistic social learning

Bounded confidence with memory

Binary bounded confidence

A multi-hypotheses model

Agents without access to evidence

Conclusions and future work

Footnotes

ORCID iD

Funding

Declaration of conflicting interests

Statement of significance

Notes

References