Sage Journals: Discover world-class research

Abstract

Research on online news consumption has long focused on improving people’s ability to select accurate news sources. However, people who consume accurate information do not necessarily make an accurate collective. Reliable information aggregation and collective decision-making also benefit from independence between judgments. We tested whether collective incentives can improve collective accuracy by reducing judgment correlation, even when average individual accuracy remains unchanged. We define collective incentives as reward structures that tie an individual’s payoff to the accuracy of their group’s aggregated decision rather than their own personal accuracy. In an online experiment (N = 232), participants made forecasts after viewing realistic tweets from correlated or independent news sources. Shared tweets among group members were more accurate but fully correlated across group members; local tweets were less accurate but independent between participants. When rewarded for collective accuracy, participants used more independent local tweets, improving collective but not individual accuracy. Generalizing our findings, an exploratory analysis of public news datasets found a greater content correlation among larger publishers, modulated by accuracy. These findings highlight the overlooked importance of judgment independence alongside accuracy in creating thriving information ecologies online. Collective incentives may complement accuracy-based interventions by nudging news consumption towards diverse, local, and independent sources. We discuss how future research should implement collective incentives to improve collective decisions in real-world settings.

Keywords

News consumption group incentives collective learning judgment aggregation group size forecasting

Introduction

This paper examines how reward structures influence collective accuracy in online news consumption. We introduce and test the concept of collective incentives—rewarding individuals based on the accuracy of their group’s aggregated decisions rather than their personal performance. Using an online forecasting experiment (N = 232) with realistic news stimuli, we show that collective incentives prompt individuals to rely on more independent, less correlated information sources, thereby improving collective accuracy without changing individual accuracy. We further corroborate these findings with exploratory analyses of public news datasets, revealing systematic patterns of content correlation in mainstream and local publishers. Together, these results highlight the overlooked role of judgment independence in information aggregation and suggest new strategies for designing healthier online information ecosystems.

How people consume information has dramatically changed over the past decade. Local journalism is facing an alarming decline, especially at the community level. In the United States, more than one-third of all newspapers have vanished since 2005 (Metzger, 2024). In 2024, 127 local papers shut down—roughly 2–3 closures every week—contributing to the rise of “news deserts” with no local coverage. Over half of U.S. counties now have only one or no local news outlet, leaving an estimated 55 million Americans with little to no access to vital local news. Mainstream media has gained trust at the expense of independent news outlets (Cellan-Jones, 2019; Fletcher and Park, 2017; Martin and McCrain, 2019; Newman et al., 2019; Park et al., 2020). This crisis is not confined to the U.S. In the United Kingdom, for example, more than 320 local newspapers closed between 2009 and 2019 as advertising revenues for regional media plummeted by about 70%. As a result, the share of UK adults who read a local or regional newspaper each week fell from 22% in 2015 to just 12% by 2023. Similar patterns of decline are evident across much of the world as audiences migrate online, where news spread is better predicted by partisanship and emotional content than factual accuracy (Lazer et al., 2018; Vosoughi et al., 2018). How these changes will affect collective information dynamics is still under debate (Bak-Coleman et al., 2021).

Information accuracy and independence are two pillars of accurate collective information aggregation and decision-making. Previous studies on online information environments have largely focused on improving news accuracy, for example, improving the detection of inaccurate news and motivating people to disseminate accurate information (Pennycook et al., 2021; Pennycook and Rand, 2021), enhancing content moderation (Garrett and Poulsen, 2019; Gillespie, 2018), innovating fact-checking approaches (Pennycook and Rand, 2019), emphasizing a publisher’s reliability (Dias et al., 2020), and predicting the spread of rumors and misinformation (P Resnick, 2015; Ciampaglia et al., 2015; Bessi et al., 2016; Del Vicario et al., 2016; Bessi et al., 2015; Oh et al., 2010; Roozenbeek et al., 2020).

However, focusing solely on information accuracy overlooks the importance of information independence in settings requiring collective information aggregation and decision-making, such as forecasting, risk assessments, or public consultations on policy. Although mainstream sources often maintain high accuracy due to greater public scrutiny, they can also foster homogeneous opinions among readers. This can influence the quality of collective decisions by making the group vulnerable when the news source is wrong. Independent information sources can improve collective accuracy by reducing correlated errors (Becker et al., 2017; Bernstein et al., 2018; Dalkey and Helmer, 1963; Mann and Helbing, 2017). Smaller groups benefit from prioritizing an accurate but correlated source, whereas larger groups benefit from incorporating less accurate but more diverse sources (Lamberson and Page, 2012).

Collective accuracy improves as group size increases, assuming that individuals have a probability of being correct greater than chance $(\bar{p} > 0.5)$ and their judgments are not strongly correlated. As inter-judge correlation increases, increasing group size enhances group performance at a diminishing rate (Condorcet, 1785; Marshall et al., 2019; Ladha, 1992). Theoretically, as group size increases, group predictive accuracy is limited by the ratio of mean accuracy $\bar{p}$ to the square root of the mean inter-correlation of group members’ opinions $\bar{ρ}$ (Hogarth, 1978). This relationship highlights two strategies collectives can use to improve collective accuracy: raising the mean individual accuracy $(\bar{p})$ or reducing the mean interdependence $(\bar{ρ})$ , representing the average pairwise correlation between individuals’ judgments ρ_ij.

Low correlation (and increased group accuracy) can stem from increased variance of group members’ estimates (Page, 2007) or from the degree to which individuals’ errors are dependent on one another across repeated decisions. In our study, we use a repeated-measure approach, controlling inter-judge correlation via repeated forecasting tasks. Additionally, we extend this framework beyond individual judgments to examine the correlation of news sources themselves—whether relying on a single highly accurate but widely shared news source introduces dependencies that affect group decision-making.

Building on this theoretical framework, we hypothesize that under specific conditions, listening to multiple, somewhat less accurate but independent local news sources—we call this a sleuth strategy—can be more beneficial for collective decision-making than relying on highly accurate but correlated mainstream sources—a consumer strategy (Figure 1(a)).

Figure 1.

(a) Individuals select one news source to gather more information to make a binary forecast about a future event. The global news source on the left is more accurate, on average over time, than local news sources on the right. However, when the majority of newsreaders consume it, it generates high correlations among individuals in the population, correlating judgments. When individual judgments are aggregated (e.g., via majority rule), aggregating local sources generates higher collective accuracy (although not necessarily higher individual accuracy). (b) An accurate graphic representation of the experiment’s interface. Participants were shown a sequence of forecasting problems involving realistic events (e.g., “Will there be a case of Zika virus before August 2018?”). One correlated and one uncorrelated information sources provided relevant information to make an accurate forecast. Each source either supported a positive outcome (e.g., source on the right) or opposed it (e.g., source on the left). (c) Average individual and group accuracy (left and right panel, respectively) for each incentive condition (color) and group size (x-axis). Group sizes are displayed as three tertiles—small, medium, and large groups—based on the average number of active users. See also Supplemental Table S1 for experimentally assigned group sizes.

Recent behavioral evidence and theoretical work show that incentives can influence individuals’ choice of information sources. Individual incentives tend to produce judgment correlation and herding behavior as individuals tend to copy the single most accurate available source of information to maximize their performance (Bazazi et al., 2019; Mann and Helbing, 2017). On the contrary, other incentive structures, including rewarding individuals for their collective accuracy, reduce herding and nudges individuals to better use all information available, including lower-accuracy uncorrelated sources (Bazazi et al., 2019; Hong et al., 2012; Kao et al., 2014; Stasser and Titus, 2003). Whether these findings can be applied to nudge newsreaders’ preferences and improve group decisions is currently unknown.

We tested these hypotheses using human volunteer groups in sequential forecasting judgments, studying the effects of incentives (individual vs. collective) and group size on accuracy and news source preference. On each round, participants (N = 232) were shown two news sources: a high-accuracy source shared with others in their group (the “Global” source), and a lower-accuracy but independent source unique to them (the “Local” source). Both sources aimed for truth, with accuracy rates above chance (70% for the mainstream and 65% for the local source). Sources provided information as fictional tweets, based on which participants made realistic geopolitical judgments on a probability scale (Figure 1(b)). The fictional forecasting approach ensured accuracy was dictated by sources’ accuracy and not participants’ prior knowledge.

When rewarded for their collective accuracy (as opposed to their individual accuracy), people displayed a preference for less accurate, but independent news sources over shared, correlated ones, despite the latter’s higher accuracy. We also observed an interaction between group size and incentive type, demonstrating that collective incentives amplify the effect of group size on collective accuracy. This interaction highlights how incentive structures can shape the effectiveness of group decision-making as group size increases.

Additionally, we conducted an exploratory analysis based on publicly available news datasets. We find a negative correlation between two measures of media locality and content similarity, which was modulated by news source accuracy. Although with many limitations, this result hints at the relevance of our findings for the real online information environment.

Our findings draw attention to a previously unexplored area of intervention for building healthy online information environments. We conclude by discussing possible applications and limitations.

Method

Procedure

The study was approved by the Massachusetts Institute of Technology Institutional Review Board. The complete procedure is reported in Supplemental Information. Participants (N = 232) were recruited on Amazon Mechanical Turk and were compensated for their time ($9.5 per hour) and performance (£0.05 for every correct answer). They gave informed consent before starting the game. The game was implemented using the Empirica platform for group experimentation, developed at the MIT Media Lab, and publicly available at empirica.ly (Almaatouq et al., 2021). The experiment consisted of a series of realistic binary forecasting problems regarding geopolitical events (Figure 1(b)). Forecasts were initially collected on a 0–100 probability scale (0: impossible event, 100: certain event) to elicit both a participant’s choice (the event is more likely vs unlikely to happen) and confidence in the choice (confident vs uncertain). This allowed us to analyze first-order and second-order judgments separately (Carlebach and Yeung, 2023; Fleming and Daw, 2017; Fleming et al., 2014). Questions were independently generated by the IARPA Hybrid Forecasting Competition, a national forecasting competition aiming at improving the prediction accuracy of global events of geopolitical relevance. Outcomes were unknown at the time of the experiment. The computer simulated each event’s outcome and calculated performance based on the participants’ predictions. In other words, prior knowledge could not improve accuracy, and only news sources reliably predicted event outcomes. Participants were informed that the correct outcome of each forecast event was randomly generated by the computer and had no connections with actual real events. They were not explicitly informed about the differences between local and global sources or their accuracies; they had to infer this through trial and error. They were, however, told that the tweets, although fictional, contained key information needed to forecast the fictional events accurately. They were explicitly instructed to avoid using any personal knowledge on the topic and use only the information available in the experiment (i.e., the local and global sources). After every forecasting problem (e.g., “Will there be a locally-transmitted case of the Zika virus in Singapore between 1 Jul 2018 and 31 Aug 2018?”), participants in the individual incentive condition received text feedback on their individual decision (e.g., “You predicted: Yes”) and individual accuracy (e.g., “Your prediction was correct”). Participants in the collective incentive condition received feedback on the group majority’s decision (e.g., “Your group predicted: Yes”) and group accuracy (e.g., “Your group prediction was correct”).

Experimental conditions

Participants were randomized into two incentive conditions (Incentive: collective vs individual) and four different group sizes (Group Size: 1, 3, 7, or 15). In the collective incentive condition, participants were rewarded for their group accuracy. In the individual incentive condition, they were rewarded for their individual accuracy. Group accuracy was calculated round-by-round based on the majority of the assigned group size.

Participants waited in an online waiting room to allow everyone to log in and read the instructions. If the room did not reach the assigned group size by the end of 15 minutes, the experiment started with the participants available. Larger groups were more likely to start with fewer participants than assigned or include idle members, introducing a selective bias that disproportionately affected larger groups. For example, as group accuracy was computed based on the assigned group size, groups that started below capacity had a harder time reaching a correct majority. To address these limitations, we present our effects using an intention-to-treat analysis, which treats group size as randomly assigned, and an as-treated analysis, which treats group size observationally as the average number of active users in a group. To recover the causal effect of group size (rather than the causal effect of being assigned to a given group size), we provide an additional instrumental variable analysis and bootstrapped coefficient estimates in the Supplemental Information. The single-player condition (group size = 1) acted as a control. A no-reward condition was not used because it would have incentivized poor performance and idle participants.

Local and global information sources

The generative model producing the event outcomes is described in Supplemental material. On every round, participants were presented with one uncorrelated (local) and one correlated (global) information source, which provided evidence for or against the forecast event. Participants were presented with both information sources (the correlated and uncorrelated ones) in every round, and evidence was presented as a mock-up tweet. Figure 1(b) represents the actual user interface.

Local and global news sources differed (a) in their error rate and (b) in their correlation among group members (Figure 1(a) and Supplemental Figure S1). A source was incorrect when it provided supporting evidence for an event that did not happen or refuting evidence for an event that happened. Error rates were 35% and 30% for the local and global sources, respectively. The global source provided the same information to every participant. The local source provided information independently for each participant. This design created a limiting case where inter-judge correlation is $\bar{ρ} = 1$ if people adopt a pure consumer strategy and $\bar{ρ} = 0$ if participants follow a pure sleuth strategy. News source’s logo, name, and where it appeared on screen (left/right) were randomized. All news sources’ logos and names, including the ones associated with the global sources, were not recognizable to most participants (97%) to remove prior preferences or biases due to brand recognition. The number of account followers displayed was kept constant across participants. At the end of the experiment, a survey asked participants their perception of their group’s size, the news source they relied on the most, and each source’s perceived accuracy rate.

Results

Collective rewards favor large groups

Although strongly correlated (r (73) = 0.77, p < .001), experimentally assigned group size differed from the actual number of active participants in a group. Individual and group accuracy is shown in Figure 1(c) and Supplemental Table S1. Table 1 shows an intention-to-treat binomial regression model on group accuracy, with main effects for assigned group size and incentive (baseline: individual), and an interaction between group size and incentive. The model performed better than an alternative model without an interaction term (χ² (1) = 8.23, p = .004). Results showed a negative effect of group size on group performance in our baseline condition (β = −0.19, SE = 0.05, z = −3.89, p < .01) and a significant interaction between group size and incentive (β = 0.22, SE = 0.07, z = 2.86, p < .01). This suggests a significant difference in group accuracy in the two incentive conditions as a function of assigned group size.

Table 1.

Intention-to-treat analysis on group accuracy. A binomial regression model on group accuracy with coefficients for assigned group size, incentive condition (baseline: individual incentive) and the interaction between group size and incentive. Formula: group accuracy ∼ size + incentive + size x incentive. Significance thresholds: . = p < .1, * = p < .05, ** = p < .01, *** = p < .001. p-value thresholds represent two-sided hypothesis testing.

Effects	Estimate	SE	z-value	p-value
(Intercept)	−0.13	0.04	−3.17	0.001 **
Size	−0.19	0.05	−3.89	<.001 ***
incentive_collective	0.08	0.08	1.01	0.30
size:incentive_collective	0.22	0.07	2.86	0.004 **

The intention-to-treat analysis in Table 1 estimates the effect of being assigned to a large (or small) group, rather than being in one. We thus replicated our results using an as-treated approach, where we treat the independent variable group size observationally, namely, as the actual number of active users in the group (Supplemental Table S2). The model is also shown in Figure 2(b). The dashed lines represent the accuracy of the local source (65%, in red) and the global source (70%, in blue). We can use the binomial distribution to estimate the expected group accuracy when using a pure sleuth strategy (dotted line). This expected accuracy is the probability that a binomial distribution with n = 15 (our largest group size) and p = .65 (the local news source accuracy rate) returns a majority of eight or more correct votes (89%, in purple). The solid black line represents the model fit. A positive slope representing the effect of group size is observed in the collective incentive condition but not in the individual incentive condition. Group incentives begin to outperform individual incentives with a group size of about three (vertical grey line). This group size is the value of n in a binomial distribution with p = .65, where the probability of a correct majority is larger than the global source’s accuracy (70%).

Figure 2.

(a) Difference in group accuracy between the last and first decile of trials (y-axis) as a function of as-treated group size (x-axis) and incentive (represented by color). Marker size represents experimentally assigned group size (intention-to-treat group size). Positive values, implying learning, represent increases in group accuracy, while negative values represent decreases in group accuracy. (b) Logistic model quantifying the effect of group size (x-axis) and incentive condition (left-right panel) on collective accuracy (y-axis): group accuracy ∼ size + incentive + size x incentive (single participants removed). Solid black lines represent model fit. (c) Difference in agreement rate with local source between the last and first decile of trials (y-axis) as a function of as-treated group size (x-axis) and incentive (color). Marker color and size follow the convention described in Figure 2(a). Positive values represent increased preference for the local source. (d) Participants’ preference for the local sources when sources disagree with each other as a function of group size and incentive. Preference is defined as the match between participants’ forecasts and a source’s provided information. In Figure 1(b), for instance, the participant’s forecast (70%) matches the source on the right, but not the source on the left.

Assigned group size can be used as an instrumental variable to recover the causal effect of actual group size (rather than assigned group size). We ran a bootstrap analysis using a logistic two-stage residual inclusion (2SRI) model on group accuracy (see Supplemental Information). Results, reported in Supplemental Table S3 and Figure S2, broadly confirm our findings and replicate the interaction between group size and incentive type. We expected these effects to be the result of learning and thus emerge gradually over time (Figure 2(a)). However, adding time (measured as normalized round number) to our model did not significantly improve model fit (χ² (4) = 4.29, p > .3).

The same models were fitted on individual rather than group accuracy, to test whether the observed improvement in group accuracy was simply the product of improved individual accuracy (Figure 1(c)). The model included group size, incentive and their interaction (Supplemental Table S4). We found no effect of group size (β = 0.004, SE = 0.007, z = 0.57, p > .5), collective incentive (β = 0.04, SE = 0.06, z = 0.67, p = .5) nor a significant interaction between group size and incentive (β = 0.004, SE = 0.01, z = 0.45, p > .6). Overall, the findings suggest that group accuracy improved, notwithstanding the fact that the average individual accuracy was unaffected by our manipulation. A linear model fitted to Brier scores—a quadratic prediction error measure used to assess forecast calibration (Tetlock, 2006)—revealed a significant advantage of large groups (β = −0.002, SE = 0.001, t = −4.039, p < .001) but no interaction with incentive type (see Supplemental Information, Supplemental Table S6).

Collectively rewarded individuals prefer local news

To understand the mechanisms underlying the observed advantage of large, collectively rewarded groups, we analyzed individual-level decisions. Collectively rewarded individuals should rely more on private information (local news source) than when individually rewarded (Kao et al., 2014). We found that incentives affected individual preferences for the local news source when news sources disagreed (Figure 2(c)–(d)). Given that round-level preferences for the global vs. local source were not directly observable, we inferred them from the participants’ behavior. We operationalized individual preferences for the local source as the agreement between the participant’s binary choice—namely, which event the participant believed was more likely to occur—and the local source’s binary evidence—namely, which outcome the local source suggested was more likely to occur. If a participant trusted the local news source, their forecasts should match the source. We ran a binomial regression on individual preference for the local source with fixed effects for group size, incentive (baseline: individual), and their interaction, focusing on rounds when news sources disagreed. Results revealed a negative effect of group size (β = −0.02, SE = 0.008, z = −2.37, p = .01) and a preference for local sources over global sources in collectively rewarded groups when news sources were in disagreement—namely, when participants had to decide which one to listen to (β = 0.03, SE = 0.01, z = 2.63, p < .01) (Table 2, Figure 2(d)).

Table 2.

Intention-to-treat analysis on the preference for the local news source when sources are in disagreement. Coefficients of the model on individual preference for local sources. The model’s predictors are assigned group size and incentive condition (baseline: individual incentive). Model formula: agreement with local source ∼ size + incentive + size × incentive. Significance thresholds: . = p < .1, * = p < .05, ** = p < .01, *** = p < .001. P-value thresholds represent two-sided hypothesis testing.

Effects	Estimate	SE	z-value	p-value
(Intercept)	−0.268	0.061	−4.41	<.001 ***
incentive_collective	−0.067	0.102	−0.66	.5
Size	−0.02	0.008	−2.373	.01 *
size:incentive_collective	0.032	0.012	2.631	<.01 **

At the end of the experiment, we asked participants to explicitly report what they believed each news source’s accuracy was and which news source they mostly relied on. The proportion of participants reporting having relied more on the local source was smaller in collectively rewarded than individually rewarded three-person groups (Figure 3(a)). In larger groups, the same proportion was larger in collectively rewarded groups than in individually rewarded groups. A two-way analysis of variance revealed a significant effect of group size (F (2, 41) = 7.17, p = .002). When asked to express their perception of each source’s percentage accuracy, participants in the individual reward condition perceived the global source to be more accurate, on average, than the local sources (Figure 3(b)). The reverse was true for collectively rewarded participants, although the effect was not significant. No significant interaction was found between group size and incentive when we analyzed ratings of subjective decision confidence (Supplemental Information). Overall, these findings suggest that higher collective accuracy was achieved even though people could not precisely tell (or only a fraction of them could) which of the two news sources was more accurate.

Figure 3.

Exit survey results. (a) Proportion of participants (y-axis) who reported having relied more on the local information source, divided by experimental condition. (b) Reported accuracy difference between local and global information source in the two incentive conditions. Error bars represent standard errors of the mean.

Locality, correlation, and accuracy in digital news media

How relevant is our controlled experiment to the real-world complexity of online news? Arguably, reliable mainstream publishers covering the same story may provide less social value than many smaller publishers covering independent stories (Martin and McCrain, 2019; Usher and Ng, 2020; Downing and Philip Schlesinger, 2004). We ran an exploratory analysis using publicly available datasets. We explored the relationship between news media locality (mainstream vs local reach), news media accuracy, and correlation between news sources. The following results are preliminary and analyses were not preregistered.

We used the News Aggregator Dataset, freely available at Kaggle.com, from the UCI Machine Learning Repository (Lichman, 2013). This dataset contains headlines, URLs, and categories for 422,937 news stories collected by a web aggregator between March 10th, 2014 and August 10th, 2014, and 10986 unique publishers, ranging from mainstream ones (e.g., Reuters and The Guardian) to small local ones. The News Aggregator Dataset uniquely identifies news articles that refer to the same news story (e.g., several articles about recently released employment statistics). There are 7230 unique stories covered in this dataset. This feature allowed us to ask whether larger publishers tend to be more correlated in terms of the stories they cover and the content similarity when they cover the same story compared to smaller ones.

First, we defined a coarse measure of publisher reach (degree of locality). As information about the size or popularity of the publisher was not available, we used how prolific the publisher was in the period covered (number of entries in the dataset) as a proxy of locality, under the assumption that larger newspapers would be more prolific and thus appear more often in aggregator searches. This measure turned out to be a surprisingly good approximation of a publisher’s size and notoriety. According to this measure, the most prolific publisher was Reuters, followed by several well-known others like Bloomberg. At the bottom of the distribution (with just one entry in the dataset) were several smaller publishers (e.g., Yale Daily News or The Cameron Herald). The measure showed a negative but not significant association with an independent measure of newspapers’ monthly reach in the United Kingdom (Pearson’s ρ(d.f. = 10) = −0.31), as surveyed by Publishers Audience Measurement Company from April 2019 to March 2020. More precise measures—for example., readership volume, social media followers, or geographic indicators—should be investigated in the future.

Second, we defined two measures of correlation among news media, namely, Jaccard’s similarity in the stories covered and headlines’ linguistic similarity. Alternative correlation measures could be defined, like political slant, overlap of stories covered or language used. We first looked at the similarity in the set of stories covered. Arguably, two news media outlets covering the same set of stories provide more correlated information than if they covered different stories. We defined story similarity as the Jaccard index between the sets of stories covered by two publishers. Figure 4(a) shows story similarity between pairs of publishers, among the first 100 publishers ranked by output (prolific or not). A gradient can be observed from the top-left corner to the bottom-right corner, suggesting that the larger the publishers the larger the overlap of the set of stories they covered. We then correlated the difference in locality (i.e., difference in output) and story similarity. We used Reuters, the most prolific news source, as our reference point and calculated these measures for each publisher in the dataset with respect to it. We found a strong negative correlation between story similarity and publisher’s locality (ρ = −0.84, p < .001), confirming our earlier observation that large publishers seem to cover more overlapping stories than smaller ones.

Figure 4.

(a) Story similarity between pairs of news publishers in the News Aggregator dataset. Story similarity was computed as the Jaccard similarity of the set of stories covered by pairs of publishers. The figure shows the first hundred publishers by output in the dataset. Greater story similarity is shown in the top-left part of the matrix. Data was interpolated with a Gaussian filter (σ = 2). (b) Story similarity as a function of publisher size (i.e., output difference with respect to Reuters) and publisher accuracy (i.e., factual reporting score calculated by Media Bias Fact Checking). Publisher size was calculated with respect to Reuters, the largest publisher in the News Aggregator dataset. Stories similarity was computed as the Jaccard similarity of the set of stories covered by each publisher with respect to the set of stories covered by Reuters. The dataset is publicly available at Kaggle.com Lichman (2013).

We then looked at story headline similarity for newspapers that covered the same story. This measure is better aligned with our experimental results. Our experimental study considered news sources with independent angles on the same story rather than similarity in stories covered. Headline similarity was computed as pairwise cosine similarity of term frequency–inverse document frequency (tf-idf) vectors. A significant negative correlation existed between a publisher’s difference in output with respect to Reuters (i.e., their locality) and the average similarity of the headlines, for each story covered by both publishers (ρ = −0.062, p < .001). In other words, the larger the publisher the more similar its story headlines were to Reuters’ headlines (Supplemental Figure S5). Notice that, as expected, publishers with high degrees of locality (rightmost data points in Figure 4(a)) also show higher variability in their headline similarity. Arguably, smaller outlets may cover more niche topics or serve smaller geographies than global newspapers. Overall, these findings support the notion that smaller news sources tend to be more diverse and independent in their content than larger global news sources.

Finally, we defined a coarse measure of publisher accuracy. Although a definition of media sources’ accuracy is not straightforward, some news watchdogs provide excellent reliability rating scores. Here, we used the factual reporting score calculated by the Media Bias Fact Check website (MBFC) (Gruppi et al., 2021; Media Bias Fact Check, 2018; Patricia Aires et al., 2019). MBFC is a volunteer run fact-checking site rating websites that assesses political slant and credibility of factual reporting of a large number of news websites. Ratings are subjective but are based on a structured rubric and numerical scoring system to assign labels. We used MBFC scores for the category “Factual/Sourcing”^*. Data contained Factual scores for a total of 1570 unique news publishers, divided in three categorical labels for source reliability (“Mixed,” “High,” and “Very High”). We aggregated the High and Very High categories as the latter contained disproportionately fewer elements than the other two categories. We retained only publishers present in the MBFC and the News Aggregator datasets (N = 989). We then tested whether the association between story similarity and publisher locality was modulated by the publisher’s reliability using a regression model. We found that MSBFC publisher reliability modulated the association between story similarity and publisher size (Figure 4(b)). Local news media scored lower on story similarity with respect to Reuters (β = −0.29, SE = 0.09, t = −3.14, p = .002). This effect was stronger for accurate than inaccurate sources, suggesting that accurate mainstream media in our dataset tended to cover more similar stories (β = −0.30, SE = 0.12, t = −2.52, p = .014).

Discussion

This study examined how collective incentives and independent news sources might improve collective accuracy in judgments based on news content. The key findings demonstrate that collective incentives nudged participants to use more independent yet less accurate news sources. This “sleuth strategy” enhanced collective but not individual accuracy, especially for larger groups. An analysis of news datasets revealed that mainstream outlets showed greater content correlation, which was influenced by accuracy.

When incentivized for individual accuracy, participants relied more on the single most accurate news source (the “consumer strategy”), even if the source was correlated with other people. Although beneficial for individuals and expected according to reinforcement learning mechanisms (Behrens et al., 2008; Sutton and Barto, 1998), the consumer strategy made groups sensitive to correlated mistakes (Mann and Helbing, 2017). Wrong majority decisions are expected every time the global news source is wrong. Individual incentives are common in many decisions, from markets to daily life choices. Contrary to earlier studies, recent studies found that individual incentives can produce herding and reduce the diversity of opinions in the population (Bazazi et al., 2019; Gürçay et al., 2015; Mann and Helbing, 2017). Many tasks are characterized by optimally aggregating shared and private information (Stasser and Titus, 2003). Collective rewards may help populations better leverage distributed knowledge.

Group decision-making models and reinforcement learning models in animal groups provide a mechanistic description of this phenomenon (Kao et al., 2014; Mann and Helbing, 2017). Although the importance of information independence in judgment aggregation has long been known, researchers have debated how to achieve it (Asch, 1956; Bernstein et al., 2018; Dalkey and Helmer, 1963; Navajas et al., 2018). Our findings show that promoting shared incentives can increase judgment diversity and reduce herding. Greater attention put into the local news sources likely was the result of learning by association (i.e., by trial and error) which news source was predictive of a positive outcome. The lack of a statistically significant effect of time in our model is likely the result of a lack of statistical power or due to preferences for a particular source developing early on in the experiment and then staying relatively stable.

Our second contribution lies in applying these mechanisms to news media. Our results highlight the overlooked value of judgment independence alongside accuracy when designing interventions to improve the online information ecosystem. While prior work focused on improving news and judgments’ accuracy (Guess et al., 2020; Pennycook et al., 2020, 2021; Van der Linden et al., 2020), our findings suggest that the use of collective incentives is an underexplored avenue for behavioral interventions. Improving news judgment accuracy must be balanced with increased average judgment correlation (Hahn et al., 2019).

In this study, we made three notable design choices. The first is the use of a forecasting task. Arguably, accuracy in news reporting and accuracy in predicting the future are very different things. However, the phenomena highlighted in this paper generalize to all domains where a ground truth is defined and fact-checking is possible, for example, public consultations on some policies or risk assessments.

The second is the use of truth-seeking sources instead of less benign news types (such as fake news). Understanding fake news is undeniably important but outside the scope of this paper. Here, we focus on more common truth-seeking scenarios (Allen et al., 2020; Guess et al., 2019). According to our framework, fake news represents the worst of two worlds because it lowers mean accuracy (spreading misinformation) and increases mean correlation (spreading information widely) (Gabbatt, 2019; Mahone and Napoli, 2020; Nyhan, 2019).

The third design choice was to present both information sources at the same time and infer participants’ preferences from their responses. This design was chosen to closely match Kao et al.’s model (Kao et al., 2014). In the context of online information, individuals are exposed to multiple information sources when scrolling through posts or notifications. Based on these simultaneous stimuli, they choose which piece of information to pay attention to and what news to rely on. An alternative method design is to force the subject to choose between news sources and reveal the news only after a choice is made. This promising method could increase statistical power and give researchers a clearer measure of subjective preferences (Table 2).

From our preliminary data analysis on the News Aggregator and MBFC datasets, news media accuracy predicted greater correlation among mainstream media. Our results suggest that documented “winner-take-all” dynamics in news production and ownership can damage collective decisions (Cellan-Jones, 2019; Downing and Philip Schlesinger, 2004; Libert and Binns, 2019; Martin and McCrain, 2019; Newman et al., 2019; Usher and Ng, 2020).

Mixed strategies are better than pure consumer or sleuth strategies (Figure 1(C) in (Kao et al., 2014)). Participants in our experiment showed evidence of using mixed strategies, alternating between local and global sources even in later stages of the experiment (see Supplemental Material) (Lo, 2013; Newell and Schulze, 2016; Schulz et al., 2015) and did not seem to follow simpler heuristics (Supplemental Information, Figure S4). As participants in the experiment did not follow pure consumer or pure sleuth strategies, we expected some benefit of group size in both conditions—albeit more pronounced in the collective incentive condition (Hogarth, 1978). Thus, the lack of an effect of group size in individually rewarded groups was surprising (Figure 1(c)). A lack of statistical power and the issues around group size manipulation described above might help explain this result.

Limitations

We acknowledge several limitations. First, managing group size proved challenging due to attrition and non-compliance, which disproportionately affected larger groups. Larger groups filled more slowly and often hit the 15-min waiting room cap before reaching the target size, leading to games starting with fewer participants than assigned. These groups were also more likely to include idle participants who did not make forecasts, which impacted majority decisions. The experiment relied on the assigned group size for majority calculations (e.g., $>$ 7 for a 15-person group) rather than actual active users, disadvantaging larger groups with idle members or below capacity, as they were less likely to meet the majority threshold. These effects worked against our effects of interest and thus cannot explain our findings. For completeness, we reported analyses on the experimentally assigned group size (intention to treat) and, observationally, as active users in the group (as-treated). We also replicate the main results in Supplemental Information using an instrumental variable approach to recover the causal effect of actual group size.

Individuals in a group never interacted with one another, limiting generalizability to real-world media. Peer-to-peer communication and homophilous selection has been shown to negatively affect aggregate accuracy due to increased herding (Alipourfard et al., 2020; Hahn et al., 2019; Lorenz et al., 2011). News sources may have much less to do with one’s desire for accuracy than with one’s choice of friends. Another difference with real-world settings is the presence of content recommendation algorithms, which were entirely absent from our experiment (Ciampaglia et al., 2018; Pescetelli et al., 2022; Shao et al., 2018; Stella et al., 2018). Arguably, these unique online media features tend to further reduce independence by exploiting correlation patterns between users.

Another limitation is that precisely distinguishing mainstream and independent sources remains difficult outside controlled conditions. Online, news providers are likely to rely on third-party information sources, like AP and Reuters, within a complex web of dependencies that are difficult to untangle (Libert and Binns, 2019). Although the news ecosystem is more complicated than our two-source toy world, we can still design interventions that don’t rely on an a priori definition of local and global sources. Participants in our study did not know in advance which news sources were independent or accurate. Collective incentives turned out to be an effective and agnostic behavioral intervention, shifting people’s preferences to the collective advantage.

Applications

An obvious application of our results is to combat misinformation online. We call for scholars to develop better measures of the quality of online information ecologies that include not only the accuracy of online news but also their correlation. Improved measures of correlation between news media will help us track patterns of mutual influence and assess the health of the digital news ecosystem, providing valuable insights for policymakers. Regulation could require online platforms to provide these measures and penalize algorithms proportionally to the average information correlation of recommended content. Testing these interventions is not straightforward and will require cooperation of news providers, online platforms, policymakers, and scholars (King et al., 2017).

Beyond the online information domain, these results shed light on the role of incentives in collective accuracy that is likely to generalize more broadly. For example, individual investors making private financial decisions are better off by following the most accurate source as this accrues greater individual benefit. However, an investment company might be better off using collective incentives to nudge people to rely on their private, independent sources.

To conclude, prior literature on online information has overlooked the role of news correlation. Accurate but mainstream news content may be problematic when individual preferences are aggregated. Collective incentives may help driving consumption of more independent news sources.

Supplemental Material

Supplemental Material - Collective incentives improve group accuracy by reducing reliance on shared news sources

Supplemental Material for Collective incentives improve group accuracy by reducing reliance on shared news sources by Niccolò Pescetelli, Alex Rutherford, Iyad Rahwan, Albert Kao in Collective Intelligence

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via contract number 2017-17061500006. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.

Open science statement

Reproducibility

Data and analysis are made available online via OSF ().

Significance statement

This paper demonstrates the overlooked value of independent news sources for improving collective decisions, even if less individually accurate. The findings show collective incentives can nudge consumption towards diverse sources and enhance aggregation resilience. This paper connects research on online news with collective intelligence and reinforcement learning models of animal groups. The results have significance for scholarship on epistemic threats and real-world applications to design thriving online information ecosystems.

ORCID iDs

Niccolò Pescetelli

Iyad Rahwan

Albert Kao

Supplemental Material

Supplemental material for this article is available online.

References

Alipourfard

Nettasinghe

Abeliuk

, et al. (2020) Friendship paradox biases perceptions in directed networks. Nature Communications 11(1): 707.

Allen

Howland

Mobius

, et al. (2020) Evaluating the fake news problem at the scale of the information ecosystem. Science Advances 6(14): eaay3539.

Almaatouq

Becker

Houghton

, et al. (2021) Empirica: a virtual lab for high-throughput macro-level experiments. Behavior Research Methods 53(5): 2158–2171.

Asch

(1956) Studies of Independence and conformity: a minority of one against a unanimous majority. Psychological Monographs: General and Applied 70(9): 1–70.

Bak-Coleman

Alfano

Barfuss

, et al. (2021) Stewardship of global collective behavior. Proceedings of the National Academy of Sciences of the United States of America 118(27): e2025764118.

Bazazi

von Zimmermann

Bahrami

, et al. (2019) Self-serving incentives impair collective decisions by increasing conformity. PLoS One 14(11): e0224725.

Becker

Brackbill

Centola

(2017) Network dynamics of social influence in the wisdom of crowds. Proceedings of the National Academy of Sciences of the United States of America 114: 201615978.

Behrens

TEJ

Hunt

Woolrich

, et al. (2008) Associative learning of social value. Nature 456(7219): 245–249.

Bernstein

Shore

Lazer

(2018) How intermittent breaks in interaction improve collective intelligence. Proceedings of the National Academy of Sciences of the United States of America 115(35): 8734–8739.

10.

Bessi

Coletto

Davidescu

, et al. (2015) Science vs conspiracy: collective narratives in the age of misinformation. PLoS One 10: e0118093.

11.

Bessi

Zollo

Del Vicario

, et al. (2016) Users polarization on Facebook and Youtube. PLoS One 11(8): e0159641.

12.

Carlebach

Yeung

(2023) Flexible use of confidence to guide advice requests. Cognition 230: 105264.

13.

Cellan-Jones

(2019) Who will pay for trusted news? BBC News.

14.

Ciampaglia

Shiralkar

Rocha

, et al. (2015) Computational fact checking from knowledge networks. PLoS One 10: e0128193.

15.

Ciampaglia

Nematzadeh

Menczer

, et al. (2018) How algorithmic popularity bias hinders or promotes quality. Scientific Reports 8(1): 15951.

16.

Condorcet

(1785) Essai Sur l’application de l’analyse à la probabilité des decisions rendues à la pluralité des vois. Paris: de l’Imprimerie Royale.

17.

Dalkey

Helmer

(1963) An experimental application of the DELPHI method to the use of experts. Management Science 9(3): 458–467.

18.

Del Vicario

Vivaldo

Bessi

, et al. (2016) Echo chambers: emotional contagion and group polarization on Facebook. Scientific Reports 6(1): 37825.

19.

Dias

Pennycook

Rand

(2020) Emphasizing publishers does not effectively reduce susceptibility to misinformation on social media. Harvard Kennedy School Misinformation Review.

20.

Downing

JDH

Philip Schlesinger

DMQEW

(2004) The Sage Handbook of Media Studies. Sage Publications.

21.

Fleming

Daw

(2017) Self-evaluation of decision performance: a general Bayesian framework for metacognitive computation. Psychological Review 124(1): 1–59.

22.

Fleming

Maniscalco

, et al. (2014) Action-specific disruption of perceptual confidence. Psychological Science 26(1): 89–98.

23.

Fletcher

Park

(2017) The impact of trust in the news media on online news consumption and participation. Digital Journalism 5(10): 1281–1299.

24.

Gabbatt

(2019) How local ’fake news’ websites spread ’conservative propaganda’ in the US.

25.

Garrett

Poulsen

(2019) Flagging Facebook falsehoods: self-identified humor warnings outperform fact checker and peer warnings. Journal of Computer-Mediated Communication 24(5): 240–258.

26.

Gillespie

(2018) Custodians of the Internet. New Haven & London: Yale University Press. ISBN 9780300173130.

27.

Gruppi

Horne

Adalı

(2021) NELA-GT-2020: a large multi-labelled news dataset for the study of misinformation in news articles. ArXiv.

28.

Guess

Nagler

Tucker

(2019) Less than you think: prevalence and predictors of fake news dissemination on Facebook. Science Advances 5(1): eaau4586.

29.

Guess

Lerner

Lyons

, et al. (2020) A digital media literacy intervention increases discernment between mainstream and false news in the United States and India. Proceedings of the National Academy of Sciences of the United States of America 117(27): 15536–15545.

30.

Gürçay

Mellers

Baron

(2015) The power of social influence on estimation accuracy. Journal of Behavioral Decision Making 28(3): 250–261.

31.

Hahn

von Sydow

Merdes

(2019) How communication can make voters choose less well. Topics in Cognitive Science 11(1): 194–206.

32.

Hogarth

(1978) A note on aggregating opinions. Organizational Behavior and Human Performance 21(1): 40–46.

33.

Hong

Page

Riolo

(2012) Incentives, information, and emergent collective accuracy. Managerial and Decision Economics 33(5-6): 323–334.

34.

Kao

Miller

Torney

, et al. (2014) Collective learning and optimal consensus decisions in social animal groups. PLoS Computational Biology 10(8): e1003762.

35.

King

Schneer

White

(2017) How the news media activate public expression and influence national agendas. Science 358(6364): 776–780.

36.

Ladha

(1992) The condorcet jury theorem, free speech, and correlated votes. American Journal of Political Science 36(3): 617.

37.

Lamberson

Page

(2012) Optimal forecasting groups. Management Science 58(4): 805–810.

38.

Lazer

DMJ

Baum

Benkler

, et al. (2018) The science of fake news. Science 359(6380): 1094–1096.

39.

Libert

Binns

(2019) Good news for people who love bad news Proceedings of the 10th ACM Conference on Web Science - Websci ’19. New York, NY: ACM Press, pp. 155–164.

40.

Lichman

(2013) UCI machine learning repository.

41.

(2013) The adaptive market hypothesis.

42.

Lorenz

Rauhut

Schweitzer

, et al. (2011) How social influence can undermine the wisdom of crowd effect. Proceedings of the National Academy of Sciences of the United States of America 108(22): 9020–9025.

43.

Mahone

Napoli

(2020) Hundreds of hyperpartisan sites are masquerading as local news. This map shows if there’s one near you.

44.

Mann

Helbing

(2017) Optimal incentives for collective intelligence. Proceedings of the National Academy of Sciences of the United States of America 114(20): 5077–5082.

45.

Marshall

Kurvers

Krause

, et al. (2019) Quorums enable optimal pooling of independent judgements in biological systems. eLife 8: e40368.

46.

Martin

McCrain

(2019) Local news and national politics. American Political Science Review 113(2): 372–384.

47.

Media Bias Fact Check (2018) Media bias fact check.

48.

Metzger

(2024) The State of Local News. Northwestern University, Medill. Technical report.

49.

Navajas

Niella

Garbulsky

, et al. (2018) Aggregated knowledge from a small number of debates outperforms the wisdom of large crowds. Nature Human Behaviour 2(2): 126–132.

50.

Newell

Schulze

(2016) Probability matching Cognitive Illusions: Intriguing Phenomena in Judgement, Thinking and Memory. Abingdon: Psychology Press, 504. chapter 3.

51.

Newman

Fletcher

Kalogeropoulos

, et al. (2019) Reuters Institute Digital News Report 2019. University of Oxford. Technical report.

52.

Nyhan

(2019) Americans trust local news. That Belief Is Being Exploited. https://www.nytimes.com/2019/10/31/upshot/fake-local-news.html

53.

Kwon

Rao

(2010) An exploration of social media in extreme events: rumor theory and Twitter during the Haiti earthquake 2010. ICIS 2010 Proceedings. https://aisel.aisnet.org/icis2010_submissions/231

54.

P Resnick

QMZZ

(2015) Enquiring minds: early detection of rumors in social media from enquiry posts Proceedings of the 24th International Conference on World Wide Web (ACM), 1395–1405.

55.

Page

(2007) The Difference How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies. Princeton University Press. ISBN 9780691138541.

56.

Park

Fisher

Flew

, et al. (2020) Global mistrust in news: the impact of social media on trust. International Journal on Media Management 22(2): 83–96.

57.

Patricia Aires

G Nakamura

F Nakamura

(2019) A Link-based approach to detect media bias in news websites Companion Proceedings of the 2019 World Wide Web Conference. New York, NY, USA: ACM, 742–74510.1145/3308560.3316460.

58.

Pennycook

Rand

(2019) Fighting misinformation on social media using crowdsourced judgments of news source quality. Proceedings of the National Academy of Sciences of the United States of America 116(7): 2521–2526.

59.

Pennycook

Rand

(2021) The psychology of fake news. Trends in Cognitive Sciences 25(5): 388–402.

60.

Pennycook

McPhetres

Zhang

, et al. (2020) Fighting COVID-19 misinformation on social media: experimental evidence for a scalable accuracy-nudge intervention. Psychological Science 31(7): 770–780.

61.

Pennycook

Epstein

Mosleh

, et al. (2021) Shifting attention to accuracy can reduce misinformation online. Nature 592: 590–595.

62.

Pescetelli

Barkoczi

Cebrian

(2022) Bots influence opinion dynamics without direct human-bot interaction: the mediating role of recommender systems. Applied Network Science 7(1): 46.

63.

Roozenbeek

Schneider

Dryhurst

, et al. (2020) Susceptibility to misinformation about COVID-19 around the world. Royal Society Open Science 7(10): 201199.

64.

Schulz

Konstantinidis

Speekenbrink

(2015) Exploration-exploitation in a contextual multi-armed bandit task Proceedings of the 13th International Conference on Cognitive Modeling. Groningen, NL, 118–123.

65.

Shao

Ciampaglia

Varol

, et al. (2018) The spread of low-credibility content by social bots. Nature Communications 9(1): 4787.

66.

Stasser

Titus

(2003) Hidden profiles : a brief history. Psychological Inquiry 14(3): 304–313.

67.

Stella

Ferrara

De Domenico

(2018) Bots increase exposure to negative and inflammatory content in online social systems. Proceedings of the National Academy of Sciences of the United States of America 115(49): 12435–12440.

68.

Sutton

Barto

(1998) Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.

69.

Tetlock

(2006) Expert Political Judgment: How Good is it? How can we Know? Princeton: Princeton University Press.

70.

Usher

YMM

(2020) Sharing knowledge and “microbubbles”: epistemic communities and insularity in US political journalism. Social Media + Society 6(2): 205630512092663.

71.

van der Linden

Roozenbeek

Compton

(2020) Inoculating against fake news about COVID-19. Frontiers in Psychology 11: 566790.

72.

Vosoughi

Roy

Aral

(2018) The spread of true and false news online. Science 359(6380): 1146–1151.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

2.48 MB