Abstract
Recent rises in political polarization across the globe are often ascribed to algorithmic content filtering on social media, news platforms, or search engines. The widespread usage of news recommendation systems (NRS) is theorized to drive users in homogenous information environments and, thereby, drive affective, ideological, and perceived polarization. To test this assumption, we conducted an online experiment (n = 750) with running algorithms that enriches content-based NRS with negative or neutral sentiment. Our experiment finds only limited evidence for polarization effects of content-based NRS. Nevertheless, the time spent with an NRS and its recommended articles seems to play a crucial role as a moderator of polarization. The longer participants were using an NRS enriched with negative sentiment, the more they got affectively polarized, whereas participants using an NRS incorporating balanced sentiment ideologically depolarized over time. Implications for future research are discussed.
Introduction
Every day, Internet users interact with personalized and customizable recommendation systems, receiving tailored content based on individual profile information and their past preferences (Pariser, 2011). This is, for instance, the case whenever users receive news through social media platforms such as Facebook, Twitter, or Instagram, but also when using other news intermediaries such as search engines (e.g., Google News). While news recommendation systems (NRS) can help users with information overload, the possibility of customizing news content allows readers to selectively filter out news that seems irrelevant or counter-attitudinal, possibly leaving them with only one ideological perspective (Sunstein, 2001). The spread of these personalized NRS and their growing importance in exposing people to political information and shaping their political viewpoints concerns theorists and public opinion makers alike (e.g., Dahlberg, 2007; Papacharissi, 2002; Bakshy et al., 2015).
This is due to the idea that discussions and information exposure about politics might be taking place in insulated groups, separated along party or ideological lines, with little or no contact between the groups (Bright, 2018), implying that people are captured in self-selected “echo chambers” (Sunstein, 2001) or algorithmically induced “filter bubbles” (Pariser, 2011), only communicating with those who have similar ideological viewpoints. One element seemingly affecting this process is the sentiment of the news coverage. NRS favor strong sentiment, as content containing high amounts of sentiment was found to make news articles more popular (Hsu et al., 2019) and to be shared more on social media platforms (Stieglitz & Dang-Xuan, 2013). Often, however, strong negative sentiment in news is connected to opinionated or partisan media coverage (Sheafer, 2007), possibly furthering the separation along party or ideological lines. As deliberative democracy implies the need for exposure to a range of diverse viewpoints in order to make well-informed decisions (Gentzkow & Shapiro, 2010), a possible consequence of this development is growing political polarization, as an individually tailored news diet might amplify the growing distance between political parties, their supporters, and ideologies (Warner, 2010). Empirical research has begun to investigate (a) to which extent filter bubbles actually occur in the context of NRS and (b) which consequences these algorithmic information environments have on the user level (for an overview, see, Ludwig & Müller, 2022; Rau & Stier, 2019). Yet, evidence on both questions is mixed, at best, and a lot of blank spots remain. For instance, most studies rely on survey data or experiments that only mimic news recommendations using screenshots as stimuli. These are important studies, which have, in many different ways, informed the design of the present experiment and provide the groundwork for all current NRS research.
Nevertheless, they are not able to differentiate between possible NRS effects and other polarizing mechanisms as, for example, selective exposure (e.g., Stroud, 2010) or users’ custom filtering criteria in the platforms’ settings (Dylko et al., 2017). Furthermore, they do not fully capture actually implemented NRS on news platforms and social media, and, therefore, lack external validity. So far, there are a few studies testing the effects of actually running NRS (e.g., Möller et al., 2018; Neumann et al., 2021). With this study, we want to address these shortages and contribute to the research findings on political polarization effects of running NRS. Another research gap concerns the time spent with the NRS. Different types of news reading habits have been distinguished, such as systematic, long-lasting, in-depth information processing in contrast to more heuristic news consumption habits such as “snacking,” aiming at getting a basic overview of the current news events, which are rather short in duration (Bohner et al., 1995; Costera Meijer & Groot Kormelink, 2015). Research has yet to find out how these different consumption habits, and different amounts of time spent with recommendation systems, affect dimensions of political polarization, especially in the context of algorithmically curated environments. This study will, therefore, test in an online experiment different versions of content-based NRS, partly enriched with sentiment, based on the news coverage of refugees and migration, to examine their impact on affective, ideological, and perceived polarization—while controlling for the time spent with the NRS.
Polarization and Recommendation Systems
Many Western cultures, especially the U.S., have witnessed an uprising in political polarization in recent decades. Political polarization is described as “a process whereby the normal multiplicity of differences in a society increasingly align along a single dimension and people increasingly perceive and describe politics and society in terms of ‘Us’ versus ‘Them’” (McCoy et al., 2018). Political polarization is seen as one of the major factors influencing societal and political processes in recent decades (e.g., Fiorina & Abrams, 2008; Baldassarri & Gelman, 2008). This applies to, for example, growing animosities between counter-partisans (Abramowitz & Saunders, 2008), growing opinion radicalization (Baldassarri & Gelman, 2008), or even political violence (Jensen et al., 2012). Polarization is not a one-dimensional phenomenon, but it can occur in different forms. Mainly there are three types of political polarization that need to be differentiated: affective, ideological, and perceived polarization. These types of polarization can be simultaneously present and are frequently interlocked and affecting each other. The importance in differentiating these types of polarization lays in their varied individual and socio-political outcomes described below.
Affective and ideological polarization are both characterized by a separation of individuals of different political camps, typically from the ideological left and right, over policy differences (Webster & Abramowitz, 2017). In the case of affective polarization, this manifests in a strong liking of one’s partisan party, and a close attachment to it, accompanied by the simultaneous dislike of the opposing party and wish for distance toward it or its members (Iyengar et al., 2012). Ideological polarization is similar to affective polarization but based on the distance of rejection and support of issue stances or attitudes toward political topics (DiMaggio et al., 1996). Ideological polarization is, therefore, frequently, but not necessarily always, connected to partisan identification. Especially in countries with multi-party systems, as it is the case with our study context Germany, ideological polarization might be a more useful indicator of societal segregation than affective polarization, with the latter being more relevant in dual-party systems. Perceived polarization, in turn, is how much a person perceives the opinion climate in society to be polarized along party lines or ideologies (e.g., Yang et al., 2016). Perceived polarization, therefore, is not polarization as such, but rather its individual assessment in a given society. A heightened perceived polarization might, nevertheless, also increase the other types of polarization mentioned above, as the belief in a strongly polarized society might lead to more extreme position and wish for distance from its opponents.
The upspring of digital media, especially of algorithmically curated, or individually customizable environments, such as social media platforms, news outlets, and search engines, is attributed with being, at least partly, responsible for the creation of “filter bubbles” and the rise of all three forms of political polarization in many Western cultures (Pariser, 2011). The “filter bubble” hypothesis basically states that individualized algorithm-driven NRS favor news items that match users’ prior political attitudes. This is assumed to create gated information environments that reinforce users’ existing attitudes ultimately leading to societal polarization (van Aelst et al., 2017).
NRS can be generally described as algorithmic tools for filtering and suggesting news items that might be of interest to a news reader, with the underlying goal for the news outlets to maximize the time news consumers spend reading content on their webpage. Commonly used NRS employ content-based (CB), collaborative filtering (CF), demographic filtering (DF), and knowledge-based systems (KB), often combined into hybrid forms. CF algorithms learn users’ preferences from their past actions as well as other users’ past actions. CB is the most used NRS (Raza & Ding, 2021), calculating similarities between items based on their feature vectors. For a news article, this vector could, for example, contain the topics of the article and the news outlet. In contrast to CF, CB only uses the active user’s past ratings, not other users’ ratings, for suggesting new items. The recommendation list consists of items that are similar to the ones the user has liked in the past (de Gemmis et al., 2015). DF is based on the assumption that demographically similar users have similar interests. The similarity between users is calculated based on profile data such as age, sex, or place of residence to recommend items that are popular in the user’s demographic neighborhood (Pazzani, 1999). Finally, KB uses explicit user requirements and knowledge about the domain to generate recommendations for the user. They compute suggestions based on a rich database of ratings (Felfernig & Burke, 2008). Combinations of different algorithms, the so-called hybrid NRS, are often used to overcome weaknesses of single approaches, for example, the cold-start problem, for example, computing recommendations for new users or items, or the over-specialization problem, for example, the lack of diversity in recommendations (Burke, 2002).
The few studies that have been conducted mostly analyze the effects of algorithmic curation systems on content diversity (Möller et al., 2018; Bakshy et al., 2015; Levy, 2021; Dylko et al., 2017; Yang et al., 2016) and selective exposure of users (Nguyen et al., 2014; Beam & Kosicki, 2014; Ohme, 2021). So far, nevertheless, only a very limited number of studies incorporated actual running algorithmic recommendation systems (Ludwig & Müller, 2022). Most of this research relies upon experimental mock setups or survey designs which are sometimes combined with behavioral web-tracking data. A limitation that all of these different methodological approaches have in common is that they cannot isolatedly analyze effects of an NRS. Either they only simulate the NRS, or they study the effects of platform-based news exposure more generally—which means that possible NRS effects are necessarily intertwined with the effects of other features and affordances of platform use. The latter is the case because many of these studies use survey designs and derive from the participants’ answers conclusions about algorithmic content selection mechanisms. While we acknowledge the importance of these research approaches, they are not able to disentangle NRS effects from other features and affordances of platform use, such as, amongst others, users’ custom filtering criteria in the platform settings (Dylko et al., 2017), as well as selective exposure mechanisms, which have been found to affectively polarize users on social media (e.g., Stroud, 2010). The studies relying on experimental mock set ups, for example, comprising small samples of screenshots of news articles, lack external validity. This is because these small, pre-selected samples, trying to mimic NRS, are biased by researcher decisions and do not reflect actual implemented NRS on news platforms and social media.
When turning to effect experiments that actually use running NRS, evidence is even more scarce—and also inconclusive. Neumann et al. (2021) show that ideological and affective polarization are not heavily influenced by the exposure to pro- or counter-attitudinal news about COVID-19 suggested through collaborative filtering. On the other hand, Cho et al. (2020) found YouTube’s algorithmic recommendations of political videos, based on the participants’ search preferences, to affectively polarize users.
Content-based NRS, which are used in this study, base their future recommendations on content similarities to the articles previously read by the user, following the paradigm of “more of the same” (Karimi et al., 2018, p. 3). We decided to use CB-NRS as (a) they are the most widely used NRS, and (b) the participants in our experiment are anonymous and therefore no demographic data or other prior knowledge about the participants is known, which makes content-based filtering the only reasonable choice in this setup (Raza & Ding, 2021). These content-based recommendations are expected to create content homogeneity, which, in turn, is said to be a major contributor to political polarization (Pariser, 2011). Hilbert et al. (2018), for example, find that “algorithm-based recommender systems seem to function as a structural factor promoting polarization by providing confirmatory information and thus reinforcing prior predispositions” (p. 12). Other studies, nevertheless, point in the opposite direction, finding that content diversity does not necessarily have depolarizing effects on the attitude of the user (Flaxman et al., 2016; Lee et al., 2014), questioning the possible negative influence of content-based recommendation systems. Therefore we ask:
In which ways does the incorporation of content-based NRS (as compared to a random selection of news items) influence a) affective, b) ideological, and c) perceived polarization?
Sentiment in News Recommendation Systems
Sentiment is the emotional valence of a text; hence, if a text has a negative or a positive polarity, or is rather neutral in language. It was found that sentiment influences content distribution on social media, with emotionally charged Tweets being retweeted more (Stieglitz & Dang-Xuan, 2013), and news content with high sentiment being more popular and receiving more attention (Hsu et al., 2019; Chang & Tseng, 2020). It was also shown that incorporating sentiment into recommendation systems improves the quality of recommendations by making them more individually tailored to the interests of the participant (e.g., Osman & Noah, 2018; Dang, Moreno-García & De la Prieta, 2021). The dominance of negative sentiment in a collection of news articles, nevertheless, might narrow down the chance of encountering diverse viewpoints and receiving a balanced media diet (Wu et al., 2020), as high amounts of negative emotion point to a bias of the news article (Rozado & Musa al-Gharbi, 2021). Likewise, negative sentiment in news is often being connected to opinionated coverage (Sheafer, 2007). Feldman (2011) finds effects of direct persuasion through opinionated news coverage, with attitude change aligning in direction with the attitude portrayed in the news. Although finding this direct effect, Feldman (2011) notes the influence of partisanship: “partisans were most critical of a news source when it conflicted with their predispositions and least critical when it agreed with their predispositions,” showing signs of polarization (p. 176). Likewise, Del Vicario et al. (2016) argue in the context of social media usage that “it is highly likely that the greater the emotional distance between the same concept in two echo chambers, the greater the polarization of users involved in the discussion” (p. 3). This should, nevertheless, not only pertain to ideological and affective polarization but also perceived polarization because people who are exposed to extremely negative exemplars of the news coverage (which we try to evoke by incorporating negative sentiment) might perceive the coverage as more extreme than it really is, resulting in perceived polarization (Yang et at., 2016). This was shown by Banks et al. (2021) who found individuals who were being exposed to negative tweets perceiving greater ideological distance between presidential candidates and the respective political parties. Therefore, we argue based on the findings above that
incorporating negative sentiment into a content-based NRS will lead to stronger effects on a) ideological, b) affective, and c) perceived polarization. A central norm for journalists across the (Western) world is to report objectively and without political bias (e.g., Hallin & Mancini, 2012). Contrary to strong negative sentiment, which is mostly connected to opinionated news coverage (Sheafer, 2007), neutral sentiment, respectively, a balanced mixture of positive and negative sentiment, should pertain to this more objective and neutral, less opinionated and less extreme type of news coverage, showing “both sides of the story”. Hopmann et al., (2012) argue in their literature review about politically balanced news coverage that “the absence of balance implies a bias” (p. 243). Conversely, the presence of balance implies unbiased content. This entails that balanced sentiment in news coverage could also provide a more emotionally balanced picture of news events and portrayed persons. As political polarization stems from strong opposing opinions, a balanced media portrayal could help to counteract this process by providing a balanced view, which should not be polarizing and should come closer to the ideal of a diverse and balanced media diet in respect to sentiment. Therefore, we assume that incorporating balanced, respectively, neutral, sentiment into the NRS will have a depolarizing effect on all three forms of polarization.
Incorporating balanced sentiment into a content-based NRS will reduce a) ideological, b) affective, and c) perceived polarization.
The Moderating Impact of Processing Depth
Technological developments have increasingly changed news consumption habits (Costera Meijer & Groot Kormelink, 2015), with changing devices, such as mobile phones, and changing times for news consumption, which heavily “determine the amount of information received by the users from the news” (Dunaway et al., 2018; Molyneux, 2019; Makhortykh et al., 2021, p. 2775). Also, Costera Meijer and Groot Kormelink (2015) carved out recent news consumption habits coined as “checking,” “scanning,” and “snacking,” which have become predominant patterns nowadays. What these consumption patterns have in common is a rather superficial interaction with the content, to get “a basic overview,” in case of snacking, “finding out if something new or interesting has happened,” in case of checking, or “seeing whether there are any new developments within a specific domain,” in case of scanning (Costera Meijer & Groot Kormelink, 2015, p. 669–671). This implies a less in-depth interest, and faster news reading (“scanning”) habit, especially in the case of snacking which “is not about pursuing in-depth knowledge or develop[ing] opinions” (Costera Meijer & Groot Kormelink, 2015, p. 670). The distinction in different modes of information processing is also described by the Heuristic Systematic Model pointing out two different modes of information processing: heuristic and systematic (Bohner et al., 1995). In case of low motivation and processing capabilities, information is processed heuristically, which does not require many resources and is based on simplifications (Bohner et al., 1995). In case of high motivation and processing capabilities, nevertheless, information is processed systematically, which results in more detailed and critical information processing when forming judgments (Bohner et al., 1995). These different modes of news consumption should have profound effects on the information intake, and thus also on the possible effects evoked by the NRS on the dimensions of political polarization of users. This was demonstrated by Banks et al. (2020) who showed that polarization effects of news feed consumption increase with processing time. As we assume the aforementioned news consumption habits to be rather based on heuristic processing and to be less time-consuming than systematic in-depth news reading, we incorporate the time spent with the recommendation system as a moderator variable. Therefore, we ask:
How does the time spent with an NRS affect the effects hypothesized above?
Method
To test our research questions and hypotheses, we conducted a two-factorial online experiment with an incomplete design (factor 1: content-based NRS vs. no NRS; factor 2: balanced sentiment in NRS vs. negative sentiment in NRS vs. no sentiment in NRS) resulting in four experimental conditions. We compared three different types of content-based NRS, partly enriched with different sentiment polarities, and a control condition with random article selection, and analyzed their impact on the three dimensions of political polarization described above. The topic chosen for this study is the refugee and migration discourse in German media. We scraped a corpus of real news articles from a large variety of German legacy and alternative media that was used to conduct the experiment. The study interface was programmed using oTree (Chen et al., 2016), an open-source platform for online experiments that offers seamless integration with Python. A pretest was conducted prior to the experiment.
Sample and Procedure
The sample consists of 750 participants who were selected using a quote procedure to match German-speaking Internet users living in Germany aged 18 to 74 with regard to age (M = 46.56; SD = 15.68), gender (47.7% female), and level of education (38.5 % with Abitur [high school degree] or a higher qualification). The participants were recruited through the online-access panel of respondi AG, an ISO-certified German social research company often contracted for academic purposes. The participants were equally distributed on the four different experimental conditions, and randomization checks showed no significant differences concerning age (F(1, 747) = 0.48, p = .488), gender (X2(3, N = 750) = 6.09, p = .1075), and education (X2(3, N =750) = 7.45, p = .05896) between the treatment groups. During the experiment, participants were randomly assigned to one of the four experimental groups and were then asked to choose a news article from a displayed collection of six randomly generated recommendations. The selection interface featured the headlines as well as the first 50 words of each of the six articles (compare Figure 1). After reading the respective article, participants were shown another six recommendations which were generated either randomly (control condition) or by one of the three NRS versions, depending on the experimental condition a participant was assigned to. This process was repeated four times, with recommendations becoming gradually more tailored to the participants’ selection. Confrontation with this interactive stimulus was embedded in an online questionnaire. Before stimulus exposure, participants were asked to give information on general political variables such as political orientation, political interest, and topic interest. After the stimulus, the dependent variables, as well as sociodemographic variables, were assessed. Exemplar of experimental user interface of recommended articles.
Prior to conducting the analyses, the sample was cleaned up by removing participants who dealt with the NRS and the presented news articles for less than 120 seconds in total (i.e., 30 seconds per NRS iteration), or more than 2000 seconds. This is because we assume that participants dealing with the NRS for less than 120 seconds did not thoroughly process the news articles, but rather clicked through them quickly, whereas participants dealing for a very long time with the NRS might have been distracted or have performed other tasks while participating in the online experiment. Due to the display of the experimental conditions, only participants with desktop-based devices were included.
Corpus
The news corpus consisted of 3827 articles on immigration and refugee news coverage from 39 news outlets, covering the time span from January 1st 2019 to October 20th 2020. The list of outlets included high-quality national news outlets (e.g., Süddeutsche Zeitung), tabloid-style outlets (e.g., Bild) as well as left- (e.g., Junge Welt) and right-wing alternative media (e.g., PI-News). A full list of outlets can be found in the appendix. The articles were scraped with keyword searches concerning the topic of immigration and refugees (e.g., “refugee,” “asylum,” and “immigrant”).
To ensure a strong response to the stimulus and to limit reading time to a reasonable duration, only those scraped articles were included in the experiment that ranged between a minimum length of 150 words and a maximum length of 1500 words. We also excluded articles purely consisting of live tickers, video descriptions, or letters to the editor. After deleting page elements, such as advertisements, texts were uniformly prepared. To guarantee no primary bias toward one of the following elements, outlet and author names as well as images and logos were removed.
Versions of News Recommendation Systems
For the implementation of content-based NRS, we decided to rely on the term frequency-inverse document frequency (tf-idf) since this is one of the most widely used text preprocessing methods for content-based recommendation systems (Beel et al., 2016). By applying this method, we converted the article’s texts into a numeric representation. We then used the cosine similarity to obtain a measurement of how similar two texts are. Using the tf-idf approach, the cosine similarity ranges from 0 to 1, where 0 implies no similarity at all and 1 implies identical texts. With this information, all texts can be ranked according to their similarity to the previous input. When the user has already read multiple articles, the average similarity to each unseen article was calculated.
In order to compare different news sentiment scores regarding their suitability for the present corpus, we did a manual gold standard coding of news articles (n = 42) into positive, negative, and balanced sentiment and selected from three candidate sentiment models the implementation which fitted the gold standard best. The chosen implementation uses a pretrained sentiment model of German language texts (Guhr et al., 2020).
The model returns a probability estimate, which classifies each document as positive, neutral, or negative. Each of these estimates ranges from 0 to 1, and the total sum of them equals 1 for each document. To transform these probabilities into a polarity score in the range of −1 to 1, we take the negative sentiment probability and deduct it from the positive sentiment probability. More formally, this is expressed in the following equation: where
The average sentiment of our corpus is slightly negative (M = −0.21; SD = 0.23), ranging from −0.91 as the most negative to 0.2 as the most positive sentiment value for a given news article. Overall, we did not find many articles with positive sentiment. This was, nevertheless, to be expected due to the negative connotated topic of the news corpus.
Given these sentiment scores, we enriched the ordinary tf-idf recommender system with (a) negative and (b) balanced sentiment. The tf-idf recommender incorporating the negative sentiment assigns articles that have a negative sentiment a higher score than articles with a balanced sentiment. To achieve this effect, we multiplied the sentiment score with −1 to reverse its meaning and afterward normalized it between 0 and 1. We then multiplied the modified sentiment score with the cosine similarity score and ranked each article according to the resulting score. Likewise, the recommender favoring neutral texts also preprocessed the sentiment in a different way before multiplying it with the cosine similarity score. The absolute value of the sentiment was subtracted from 1, and thus, the neutral sentiment ranked higher than the negative sentiment.
Finally, we also included a condition that displayed randomly selected articles from the news corpus, serving as a control condition.
In order to ensure the validity of the sentiment implementation, we calculated a sentiment score based on the article previews which were recommended as well as a score concerning the articles actually read by the participant. ANOVA results reveal that the sentiment implementation worked out as intended (compare Figures 2 & 3): the stimulus groups significantly differed in terms of the sentiment score of the recommended articles (F(3,746) = 604.2, p < .001, η2 = .708), as well as the articles that were selected to read by the participants (F(3,746) = 286, p < .001, η2 = .535). As visible, the spread of the tf-idf with negative sentiment scores is much larger than the variance of the tf-idf with balanced sentiment. This is because the NRS version emphasizing balanced sentiment was able to draw from a large portion of texts with very similar sentiment values (as neutral reporting is a journalistic norm that many texts still adhere to in the German news media, resulting in a lot of reports with little or balanced sentiment). The algorithm version emphasizing negative sentiment drew from a group of texts in which sentiment scores had more variation since the degree of negativity in those texts that does feature negative sentiment at all varies, naturally. Boxplot for the average sentiment score of recommended articles per participant, grouped by experimental condition. Boxplot for the average sentiment score of articles selected to read by a participant, grouped by experimental condition.

Measures
To assess affective polarization, we asked respondents to indicate how positive or negative they evaluate each of the six political parties currently represented as groups in the German parliament on the commonly used and well-tested feeling-thermometer scale, ranging from 0 to 100, whereby lower values indicate less warmth toward the respective party (Stroud, 2010). To calculate an index of affective polarization, we followed Wagner’s suggestion for multi-party systems (2020), calculating the “average absolute party like-dislike difference relative to each respondent’s average party like-dislike score” (p. 10f). The resulting scale ranges between zero and one hundred whereby the latter indicates a maximum amount of affective polarization (M = 49.01; SD = 19). Participants with an affective polarization value of zero, as well as a maximum party rating of 50, were excluded (n =15), due to the high likelihood that the participants have left the scale untouched and not responded thoroughly.
To measure ideological polarization, participants were asked to indicate on a 7-point scale the extent to which they agree or disagree with 12 items on the topic of immigration and refugees, with contrasting viewpoints, based on strong right–left political stances (α = .95). Items included topics such as the intake of refugees in Germany as well as opposing items on economic, cultural, and criminal threats, respectively, enrichment to the German society. By folding the scores of the 12 scales at the mid-score, representing the low end of the polarization scale, while the two ends represent the high end, an index of ideological polarization was created on a 4-point scale (M = 1.84; SD = 0.66).
Perceived polarization was measured directly by 6 items on a 7-point agreement/disagreement scale, three items targeting perceived affective polarization, with items such as “The supporters of different political parties in Germany are more and more hostile towards each other,” and three items aiming at perceived ideological polarization, with items such as “The opinions about immigration in the German population are drifting more and more apart.” Separate mean indices were calculated each for perceived affective polarization (α = .72; M = 4.67; SD = 1.20) and perceived ideological polarization (α = .85; M = 5.43; SD = 1.22).
Additionally, control variables such as age, gender, education, the parents’ place of birth (Germany/non-Germany), political orientation, strength of partisanship, contact with immigrants, and political and topical interest were included, and the time spent with the recommendation system was included as a moderator variable. Strength of partisanship was determined based on the strength of political orientation on an 11-point political left–right scale. To assess the extremity of partisanship the scale was folded at the mid-score, resulting in a 6-point scale, with the midpoint corresponding to moderate partisans (M = 1.34; SD = 1.44). Contact with immigrants was assessed by creating a mean index of two items questioning the amount of contact with immigrants in personal and professional life (M = 3.47; SD = 1.8). The mean index of political interest consists of five items (α = .94; M = 4.57; SD = 1.59), and the mean index of topical interest consists of three items (α = .94; M = 4.25; SD = 1.72). All of the previously mentioned variables were assessed on a 7-point scale. Users were free to spend as much time as they wanted reading the news articles as well as answering the questionnaire, and the time which users spent dealing with the NRS was captured by automatic logs.
Results
Linear Regression Results for Affective Polarization.
Note. Values are standardized linear regression coefficients. * p ≤ .05; ** p ≤ .01; *** p ≤ .001.
Linear Regression Results for Ideological Polarization.
Note. Values are standardized linear regression coefficients. * p ≤ .05; ** p ≤ .01; *** p ≤ .001.
Linear Regression Results for Perceived Affective Polarization.
Note. Values are standardized linear regression coefficients. * p ≤ .05; ** p ≤ .01; *** p ≤ .001.
Linear Regression Results for Perceived Ideological Polarization.
Note. Values are standardized linear regression coefficients. * p ≤ .05; ** p ≤ .01; *** p ≤ .001.
Regarding RQ1, the data indicate no significant differences concerning any of the three dimensions of polarization when incorporating a content-based NRS in comparison to random news recommendations. Thus, the incorporation of content-based NRS (without considering article sentiment) did not influence political polarization when controlling for the abovementioned variables. Likewise, we do not find significant effects of the NRS containing negative sentiment on any of the dimensions of polarization when looking at the Model 1 versions in Tables 1, 2, 3, and 4 without taking into account the interaction term. Thus, H1a–c have to be discarded. The same applies to H2a–c; the NRS containing balanced sentiment did not influence any of the dimensions of polarization in the respective Model 1 versions.
Nevertheless, when turning to RQ2 and incorporating the time spent with the recommended articles as a moderator variable, the impact of the negative sentiment NRS version on affective polarization turns significant. In interaction analysis with one dummy and one continuous variable, however, this does not indicate an actual main effect. Rather it means that if the continuous moderator (in our case, time spent with the recommended articles) were to be zero, participants in the negative sentiment condition showcased significantly lower affective polarization than in the control condition with random article selection. Since this is a hypothetical scenario only, the interaction effect (that is significant as well) should be interpreted. For this purpose, we plotted the significant interaction (see, Figure 4). The plot reveals that while participants with shorter reading time tended to exhibit lower affective polarization when confronted with content-based NRS pronouncing negative sentiment as compared to participants reading a random article selection, this pattern was reversed for participants who spent more time on the recommended articles. For ideological polarization and the two dimensions of perceived polarization, no similar patterns were observed (compare Model 2, Tables 1, 2, 3, and 4). Plot of the interaction between negative sentiment NRS and time spent on articles on affective polarization.
Likewise, the data indicate one significant interaction effect for exposure to the balanced sentiment NRS condition. The longer participants were using this NRS version, the more ideological polarization was reduced (as compared to the random article selection control condition, see Figure 5). For affective polarization as well as the two dimensions of perceived polarization, no effects were found. Plot of the interaction between balanced sentiment NRS and time spent on articles on ideological polarization.
Discussion
In this experiment, we could show that the use of a plain content-based NRS does not yield any effects on the political polarization of the participants as compared to being exposed to a random selection of articles on a specific topic. This means that content-based recommendations following a “more of the same” logic in news coverage do not necessarily have polarizing effects on their readers. This finding speaks against the notion of homogenous content exposure necessarily leading to the infamous “filter bubble” or “echo chamber” effects.
While this finding is not statistically significant, we observe some interesting tendencies for affective polarization: surprisingly, the control group, with random article recommendations, shows the highest amount of affective polarization, particularly among individuals who do not spend much time on reading the recommended articles. This could be due to the fact that the incorporation of any one of the NRS creates a more homogenous opinion climate without contradictory “irritations” on the content level, which might have a calming effect on users in terms of their attitude toward political parties. Nevertheless, if more time was spent with the random suggestions, affective polarization was reduced again, suggesting that more in-depth reading of a diverse range of news articles might mitigate the effects described above, and possibly opening up the perception to the plurality of opinions and party ideologies. Moreover, the negative sentiment NRS condition seems to provide even stronger grounds for irritations and subsequent aversive feelings toward opposing political camps. When incorporating the time spent with the NRS and its recommended articles as a moderator, we find, as hypothesized, that the NRS version which was designed to accentuate negative sentiment in addition to a content-based recommendation logic influences affective polarization. Especially for those participants who dealt with the negative sentiment NRS for a longer period, and thus were more likely to process the news items more thoroughly, affective polarization was significantly heightened. This suggests a problematic trait of algorithmic filtering on social media platforms, which (at least partly) employ content-based filtering while favoring negative sentiment in texts. However, this effect is problematic mainly for individuals who engage in more intense processing of information. Brief reading as it is associated with social-media typical usage patterns such as news “snacking” or “scanning” (Costera Meijer, & Groot Kormelink, 2015), on the contrary, seems to slightly buffer affective polarization if individuals are confronted with a content-based NRS that put emphasis on negative sentiment. One reason for this seemingly counterintuitive finding might be seen in the fact that our operationalization of sentiment considered the whole text of the news articles that were part of this study. This sentiment might only be detectable for readers that spent a certain amount of time with these texts. Thus, a negative-sentiment NRS version that considered merely the sentiment of headlines and intro text might have led to increased affective polarization among quick readers as well.
Turning to ideological polarization we find, as hypothesized, a balanced sentiment in news articles recommended by content-based NRS to have depolarizing effects. Again, however, this effect is moderated by the time spent with the NRS and the recommended articles. The longer participants were using the NRS, the more ideological polarization was reduced. The usage of balanced sentiment recommenders, therefore, seems to provide a fruitful path for future research interested in constructing NRS which are intended to reduce ideological polarization by design.
It is interesting to note that ideological polarization was heightened the more time was spent with the random suggestions, while, as described above, This counters the finding that affective polarization was reduced the more time was spent. Somewhat contradictory, the random suggestions, therefore, seem to have adversarial effects on affective and ideological polarization over time. This could support the notion of a backfire effect on ideological polarization (Bail et al., 2018; Lee et al., 2014). It occurs when people are confronted with counter-attitudinal information that they try to counter with motivated reasoning. This, in turn, strengthens preexisting views (Taber & Lodge, 2006). This effect might have been amplified by the inclusion of alternative media in our experiment, which tend to publish very strong and extreme views (Holt et al., 2019; Müller & Freudenthaler, 2022; Authors, XXXX).
In total, none of the NRS versions tested in this experiment affected perceived polarization significantly. This might be explicable by the high average values of the two dimensions of perceived polarization in our sample (perceived ideological polarization: M = 5.43, SD = 1.22; perceived affective polarization: M = 4.67, SD = 1.2; 7-point scale). Since our study featured no a priori measurement of polarization before stimulus confrontation, this result could, on one hand, point to a high preexisting perceived polarization of participants which may have led to a ceiling effect and, thus, remained unaffected by the experimental stimuli. On the other hand, the high amounts of perceived polarization could imply that exposure to any kind of news article of refugee and migration coverage strengthens perceived polarization, as the topic is still quite prevalent in the media and is being intensely discussed in German public debate. This argument can be underpinned, by the (not significant) observation that the more time was spent with any version of the NRS, the stronger perceived polarization seemed to be.
For the interpretation of all previously discussed effects of the content-based news recommendation in this study, it is important to emphasize that the effect sizes of all analyzed treatment effects were very small. This is even more remarkable in an experimental setting in which effect sizes are typically bigger than in out-of-the-lab research. The observed patterns should therefore not be overinterpreted. Rather, this outcome suggests that content-based news recommendation does not have zero, but only small effects on individuals’ political polarization. Also, these effects only occur in relationship with the time a user spends with the recommended articles and only for NRS algorithms that take into account article sentiment in addition to the mere content-based recommendation. Possibly, we might have found bigger effect sizes and more significant results, when including the political orientation of the participants as the moderator variable. Political orientation was found to be a crucial factor when it comes to message effects on political polarization (e.g., Bail et al., 2018). However, we decided to focus on engagement time and against adding further moderators as this would have led to models including three-way interactions which would have made the already complex design even more complex, if not overloaded. Nevertheless, political orientation needs to be considered in future NRS effects research, especially so, when collaborative and demographic filtering are considered as well.
Limitations and Future Research
This study naturally does not come without limitations. First, we did not measure pre-existing views and political polarization of participants before the stimulus exposure, which might have added additional insight and could have answered the question if perceived polarization (and the other dimensions of polarization) were already strong from the onset.
Furthermore, there might be limitations to the stimulus itself: We only analyzed content-based NRS, which does not fully mimic the real-world usage of implemented NRS on news pages, often consisting of hybrid NRS. Therefore, future research should conduct experiments with running algorithms incorporating other forms of NRS, such as collaborative and demographic filtering or hybrid forms. Moreover, the topic of refugee and migration coverage is already quite polarized from the onset, wherefore the short stimulus exposure period might not be sufficient to yield big effects. This implies another limitation to the stimulus: the reading of four news articles in a constrained experimental setting is not really comparable with a real-world online user experience. Taken together, these factors might have contributed to the small or null effects our study revealed. Future research will have to explore whether a more externally valid experimental setup and the selection of a less polarized news topics might lead to stronger effects. As our results also point to a significant influence of the time spent with the NRS, even in this short experimental setting, we see the need for analyzing the long-term effects of NRS on political polarization, for example, in a panel design while incorporating web-browsing histories or tracking devices, also in order to get insight into real-world algorithmically filtered news consumption behavior.
Moreover, it would be also interesting to further explore the effects of random news suggestions, as these seem to point in opposite directions for effects on affective versus ideological polarization.
Furthermore, we could not include an NRS version enriched with positive sentiment, as our news corpus due to its topical composition shows a negativity bias, with almost no positive news articles present. We opted against oversampling articles with a positive sentiment (potentially extending the corpus period), as we wanted to offer the experiment’s participants a realistic and recent cross-section of immigration-related news coverage from their country. As positive news coverage, nevertheless, is said to only have limited effects on attitude formation (Jacobs and van der Linden, 2018; Grabe et al., 2003; Soroka & McAdams, 2015), we deem the comparison of negative and balanced NRS as a reasonable first step. Still, it would be interesting for future research to analyze effects of NRS particularly selecting items with a positive sentiment.
Conclusion
To sum up, this study adds to previous research indicating that the “filter-bubble” effect of content-based NRS seems overstated (e.g., Ludwig & Müller, 2022; Möller et al., 2018; Neumann et al., 2021). We only found small effects of the content-based NRS, enriched with sentiment, on the three dimensions of political polarization analyzed. For perceived polarization, we did not find any effects of the NRS, hinting on the possibly different nature of perceived polarization compared to actually polarized attitudes. More research is needed to analyze the relationship between perceived polarization and actually polarized attitudes. Further research should also focus on the time that participants spent reading the articles recommended by an NRS, as it seems to have a profound influence on political polarization processes.
Translating the findings from our experiment to real-world scenarios suggests that people using NRS, which favor negative news coverage, and intensively reading the recommended news items might be affectively polarized—or experience ideologically depolarizing effects, if the sentiment of the content is balanced. However, a number of arguments can be made for why real-world NRS algorithms might favor articles with negative sentiment: First, news media generally show a bias toward negative content (Soroka, et al., 2019). Second, many NRS do not only consider content-based recommendation but also take into account collaborative filtering that draws from other users’ engagement with the content (Van Dijck & Poell, 2013). In such a setting, negative content could be pronounced since a negative tonality has been shown to increase user engagement (Heiss, et al., 2019, p. 1497; Bene, 2017). Therefore, our implementation of the negative-sentiment NRS seems to be closest to real-world NRS used, for instance, on social media platforms. Our results, thus, imply that extensive news readers in algorithmically curated news architectures might be more likely to be polarized by the content they encounter than the so-called news “snackers” or “scanners” (Costera Meijer & Groot Kormelink, 2015), who spend less time reading the news. Therefore, social media platforms, search engines, and other algorithmically curated news providers should try to avoid the negativity bias, by calling into question the prioritization of negative content, at least for heavy news users. Of equal importance should be research on (and the implementation of) balanced sentiment NRS versions. Our study indicated that they seem to provide a fruitful avenue for platforms and news providers to reduce ideological polarization, especially for heavy news users.
That being said, the main finding of the present study should be seen in the fact that any observable polarization effects of content-based news recommendation remained at a very low overall level. This calls into question claims of algorithmic news aggregators being heavily responsible for growing political polarization in many societies. While social media platforms might still offer various other ways in which polarization might flourish (for instance, by conveying flawed impressions of the size of extreme political camps within a society, or by offering good opportunity structures for self-radicalization processes within an individual), the isolated effects of recommendation algorithms on polarization seem to be rather limited.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by a grant from Baden-Württemberg Stiftung within the research program Responsible Artificial Intelligence.
Appendix
List of Outlets Included in the News Corpus.
Outlet Name
Outlet Domain
Die Welt
welt.de
Der Spiegel
spiegel.de
Süddeutsche Zeitung
suddeutsche.de
Tagesschau
tagesschau.de
Tagesspiegel
tagesspiegel.de
TAZ
taz.de
Cicero
cicero.de
Frankfurter Allgemeine Zeitung
faz.net
Der Freitag
freitag.de
Bild-Zeitung
bild.de
Focus
focus.de
Stern
stern.de
ntv
ntv.de
T-Online
t-online.de
Merkur
Merkur.de
Campact Blog
blog.campact.de
Junge Welt
jungewelt.de
Jungle World
jungle.world
NachDenkSeiten
nachdenkseiten.de
Neues Deutschland
neues-deutschland.de
Klasse Gegen Klasse
klassegegenklasse.org
Politplatschquatsch
politplatschquatsch.com
Rationalgalerie
rationalgalerie.de
Achse des Guten
achgut.com
Compact Magazin
compact-online.de
RT Deutsch
deutsch.rt.com
Junge Freiheit
jungefreiheit.de
PI-News
pi-news.net
Tichys Einblick
tichyseinblick.de
Sputnik News
de.sputniknews.com
Journalistenwatch
journalistenwatch.com
EF-Magazin
ef-magazin.de
Contra Magazin
contra-magazin.com
Rubikon
rubikon.news
Man Tau
man-tau.com
Opposition 24
opposition24.com
Vice Magazine
vice.com/de
Anti-Spiegel
anti-spiegel.ru
Kritisches Netzwerk
kritisches-netzwerk.de
