Abstract
In an era of intense partisanship, there is widespread concern that people are self-sorting into separate online communities which are detached from one another. Referred to as echo chambers, the phenomenon is sometimes attributed to the new media landscape and internet ecosystem. Of particular concern is the idea that communication between disparate groups is breaking down due to a lack of a shared reality. In this article, we look to evaluate these assumptions. Applying text and semantic network analyses, we study the language of users who represent distinct partisan political ideologies on Reddit and their discussions in light of the January 6, 2021, Capitol Riots. By analyzing over 58k posts and 3.4 million comments across three subreddits, r/politics, r/democrats, and r/Republican, we explore how these distinct groups discuss political events to understand the possibility of bridging across echo chambers. The findings of this research study provide insight into how members of distinct online groups interpret major political events.
Introduction
Discussion of echo chambers in social media and online news has been a major subject of research and the broader public interest (Flaxman et al., 2016; Garimella et al., 2018; Garrett, 2009). At a time when people around the world, including majorities in the United States, cite concerns about democracy (Newall et al., 2022), social media, and open forums can play a vital role in creating healthy public discussion. Moreover, online participation and discussion can lead to spillover effects which increase offline involvement in participatory democracy (Loader & Mercea, 2011; Sung & Jang, 2020). This suggests that echo chambers within online communities may have significant ramifications for democracy within a society and should be thoughtfully considered and evaluated.
Echo chambers, in principle, suggest that a person is not being exposed to views that differ from their own (Garimella et al., 2018). Due to the perceived prevalence of echo chambers, it is suggested that the shared reality between different political groups is fraying. However, there is an open question about how this splintering of reality manifests itself. For example, it may be that the same topics are discussed differently between differing groups, or conversely, differing ideological groups may be discussing different topics entirely. With this study, we examine the extent to which different communities with distinct political orientations are themselves echo chambers. In this article, we first look to evaluate what is being discussed before moving to how it is being discussed. In so doing, we assess polarized communication from two perspectives to gain a nuanced perspective on what similarities and differences exist between partisan groups.
Our work seeks to contribute to the growing research literature surrounding echo chambers. We apply text analysis techniques and semantic network analysis to measure echo chambers in online communities and to explore the level of commonality in communication between partisan communities. The network approach offers a relational perspective that maps out the structures and themes in online community discussions, which helps us better understand how disparate communities communicate online.
We analyze 58k online political submissions on Reddit from January 2021—the month in which the Capitol Riots occurred—and 3.4 million associated comments. We choose to focus on this time frame because the Capitol Riots were essentially an embodiment of partisan worldviews and an example of the threats that come when those views go unchallenged. In our work, we find that despite differences between the communities around a highly polarized political event, there is substantial overlap between the discussions in each community. Our findings indicate that differences between partisan groups may be bridgeable. Communicating across viewpoints and ideologies is important for a well-functioning democratic system. Our research goal is to develop a better understanding of online partisan communication, which can be useful for researchers and practitioners alike who want to support communication between groups.
Echo Chambers
Online echo chambers are a major topic of research interest that have become more pressing in light of concerns about the spread of misinformation and increasing political polarization. The discussions of echo chambers have extended beyond the research community to the public interest, with journalists, politicians, and even a former president lamenting the impact of information bubbles (Bruns, 2019).
Echo chambers most often refer to communities in which people self-select into and have little room for disagreement from an established orthodoxy. Nguyen (2020) defines an echo chamber as “a social epistemic structure in which other relevant voices have been actively discredited.” Echo chambers offer selective exposure, where individuals are exposed only to information that conforms to their prior beliefs. Speaking of fears around echo chambers, Nguyen (2018) suggests that different communities may “no longer share basic foundational beliefs,” which may make communication between disparate groups increasingly difficult.
While the potential danger of echo chambers is concerning, research in this area paints a picture, that is, at times, contradictory. Studying the effects of media consumption in the United Kingdom, Dubois and Blank (2018) argue that the effects of echo chambers are overstated and that a very small slice of the population finds itself in an echo chamber. Guess et al. (2018) find that Americans are less walled-off from diverse political perspectives than expected. Among the most noteworthy findings that Guess et al. (2018) discover is that “vulnerability to echo chambers may instead be greatest in offline social networks, where exposure to diverse views is often more rare” which runs contrary to much of the discussion of online echo chambers.
Other researchers find more mixed results. Cinelli et al. (2021) quantified the echo chamber effect across four online social media sites by examining the degree of ideological homophily in user interaction networks for several topics. The authors found evidence for the echo chamber effect across the sites but with considerable variation between them. Gab and Reddit had single homogeneous interaction networks, while user groups were more split on Twitter and Facebook. Flaxman et al. (2016) found evidence that online social media contributes to ideological segregation while also finding that social media users are more likely to encounter opposing viewpoints than individuals who are directly browsing through online news. Finally, studying the networks of Democrats and Republicans on Twitter, Colleoni et al. (2014) found evidence for the idea that Twitter can function both as a public sphere and as an echo chamber, depending on the specific subgroup being studied and the relationships that are focused on. These works demonstrate the complex nature of people’s networks and the open question of whether social media contributes to the echo chamber effect or helps to reduce its impact.
While prior studies about echo chambers have predominantly focused attention on Twitter and Facebook, several others have centered their research on Reddit specifically. De Francisci Morales et al. (2021) examined Reddit discussions within and between supporters of Hillary Clinton and Donald Trump using the geo-location of Reddit users as a proxy for their socio-demographic environment. The authors found that geographical homophily significantly impacted cross-cutting conversations. For example, cross-cutting conversations were more common among users from states with lower voting rates and were less common when users were from the same state. Studying a mixture of political posts across 101 popular subreddits, Bond and Sweitzer (2022) found that while users are more likely to interact with others who share a similar ideological perspective, this varied considerably depending on the levels of political interest at a given time, and whether a subreddit’s focus was political. These studies primarily focused on the number of interactions across different groups and tried to understand the environmental factors associated with increased cross-ideological discussions. Our study approaches the echo chamber phenomenon with a different question in mind: how similar are discussions between disparate political groups, in terms of both their topics and their perspective on those topics? In other words, we address what topics are being discussed and how they are discussed in groups with different political ideologies. These questions are important because regardless of whether partisans are interacting or not, it may be the case that they are not able to have any meaningful dialogue since their concerns and views are so far apart. Therefore, our questions complement those of other researchers who have assessed the occurrence of cross-ideological conversations on social media.
Our work seeks to address the open questions around the use of social media and echo chambers. In particular, we analyze the level of shared reality between groups on these forums to understand to what extent these online discussion forums can bring people together or may only serve to deepen partisan divisions. Note that when referring to shared reality, we are referring to the type of “foundational shared beliefs” that are a precondition for discussion (Nguyen, 2018). We study this shared reality both through evaluating the similarity of words used between subreddits and the topics that are shared among them.
In the formulation of our questions, we consider both the topics of discussion (e.g., what is discussed), as well as how these topics are discussed between disparate groups. We seek to measure the commonality between the subreddit discussions, where we define commonality as the extent to which topics, keywords, and sentiments are shared between each of the subreddits.
First, what gets discussed is of central importance in the formation and propagation of echo chambers. With content recommendation algorithms and online curation, users may find themselves in a filter bubble, where they are not exposed to certain important or newsworthy topics (Pariser, 2011), perhaps because these topics fall outside of the user’s usual interests or perspective (Feltwell et al., 2020). How invested an individual is in learning about a topic (i.e., topic involvement) can also influence a user’s information-seeking behavior about that topic. For example, users may be more likely to exercise selective exposure and to echo the opinions of peers on topics they are less familiar with and less invested in (Liao & Fu, 2013; Mankoff et al., 2011). The topics of discussion also affect the likelihood of being in an echo chamber. Garimella et al. (2018) showed that Twitter users are likely to encounter an echo chamber on politically contentious topics but not on non-contentious topics. This is consistent with the work of Barberá et al. (2015), who found that on Twitter, nonpolitical events are often discussed as part of a national conversation between users of different political persuasions, but that discussions of political topics lead to conversations more in line with the idea of an echo chamber.
Given the importance of the topics discussed, we propose our first research question as follows:
RQ1. What topics are discussed by different online partisan groups during political events, and how much is shared between these discussions?
The echo chamber phenomenon cannot be solely confined to the topics which are discussed but instead relies on the echo of polarized and self-reinforcing opinions about a topic (e.g., how a topic is discussed) (Liao & Fu, 2014; Van Alstyne & Brynjolfsson, 1996). Even when discussing the same topic, people—and by extension, groups—can have widely different views on the topic and its significance. For example, Rho et al. (2018) analyzed how users across the political spectrum discussed the #MeToo movement, and the authors found considerable variation between communities representing the alt-right perspective (e.g., Breitbart), the far-left view (e.g., Democracy Now), and the mainstream view (e.g., the New York Times). Despite some shared agreement over the importance of the #MeToo movement, Rho et al. (2018) found substantial differences in how each group discussed and viewed the movement. Similarly, Stewart et al. (2017) found considerable differences in how the #BlackLivesMatter movement was discussed by right-leaning and left-leaning users across Twitter. Certain topics were recognized as important across the political spectrum in these examples, but there was substantial disagreement between partisan and ideological groups. These examples suggest that it is important to consider how topics are discussed within different groups. As such, we propose our second research question:
RQ2. How are topics discussed by online partisan groups in political events, and how much is shared between these discussions?
In the following section, we discuss semantic networks and how they can help us answer our questions and improve our understanding of partisan political echo chambers.
Semantic Networks
Sometimes referred to as keyword networks, semantic networks have been described as representing the collective cognition of the networked communities in which they are modeling (Danowski, 1993). In a semantic network, each word is a node, and an edge between them symbolizes the co-occurrence of the words. Semantic analysis has been applied in various contexts to measure how disparate communities discuss different subjects.
Semantic Political Networks
Semantic networks have been applied as a way to better understand political advocacy, discussion, and communities. Semantic networks provide a mechanism for understanding “emerging concepts” in networked communication and the context in which these concepts arise due to their networked, relational structure (Eddington, 2018; Kostygina et al., 2021).
Because semantic networks can represent the emerging collective consciousness of a community, they are valuable for understanding how different groups in a political context think and communicate. For example, Eddington (2018) used semantic network analysis to investigate the connection between the #MakeAmericaGreatAgain hashtag—related to former President Donald Trump’s campaign slogan—and extremist groups. The article concludes that the asynchronicity of social media affords supporters of extreme, conspiratorial, and prejudiced visions an opportunity to organize around and converse with one another about similar topic areas. Badawy et al. (2018) developed a novel way to characterize users’ political identities using label propagation of shared news sources on Twitter. Recently, Abraham et al. (2021) studied how politicians speak on Twitter. The authors found that a small number of topical communities form within politicians’ Twitter posts consistently over time. Several other studies have used semantic networks built with Twitter data to analyze political discourse, specifically assessing attitudes on certain issues. Jiang and Xu (2021) analyzed the word networks of different types of Twitter users and their discourse around the trade war/conflict between the United States and China, finding that influencers of different levels took a negative stance on President Trump while regular (non-influencer users) had more polarized responses. Kwon et al. (2016) studied rumors propagated by Twitter users during a time of conflict between South Korea and North Korea (what the authors refer to as Korean saber rattling) where the authors found that discussions of rumors were a gauge of public opinion during a time of uncertainty. Radicioni et al. (2021) analyzed Italian discourse on migration during a time of tension in 2019, finding five distinct partisan groups who primarily amplified the views of their members via retweeting while using mentions to interface with the opinions of other partisan groups indirectly. Extending this stream of research, we apply semantic network analysis to examine echo chambers in political discussions.
Our work builds on the theories and methodologies of prior research and applies them to the study of partisan echo chambers. Communication in a homogeneous network, such as an echo chamber, can be self-reinforcing in that tone, communication style, and viewpoints go unchallenged (Grevet et al., 2014; Wollebæk et al., 2019). Semantic networks, with their relational representation of communication, offer a valuable means by which to understand the communication of groups at an organizational and conceptual level (Eddington, 2018; Kostygina et al., 2021). Furthermore, because social influence contributes to the echo chamber phenomenon, studying partisan communication through a relational perspective provides further insight into how communication in a partisan context occurs (Dubois et al., 2020; Sasahara et al., 2021).
Methodology
Data
To address our research questions, we focused on Reddit. Reddit is a discussion forum and social networking website. Reddit brings people with shared interests together through a type of user self-sorting into subreddits. Subreddits are discussion forums where all members share a particular interest for discussion. Subreddits span a wide variety of subject matters, ranging from dogs (r/dogs) for canine enthusiasts to personal finance (r/personalfinance) for individuals who want to learn more about personal money management. Of particular relevance to our questions are political subreddits. Reddit users’ ability to self-classify along partisan and ideological lines provides a useful means for distinguishing between members of different communities. Prior research has also demonstrated that the pseudonymity afforded by Reddit provides room for users to discuss sensitive topics (Ammari et al., 2018; Gauthier et al., 2022), which may further apply to the expression of political views.
We collected 58 K Reddit submissions from January 2021 and 3.4 million associated comments posted through March 2021 from Pushshift. Pushshift is a website and social media data repository that has been storing Reddit data since 2015 (Baumgartner et al., 2020). The Pushshift Reddit dataset is updated in real-time and includes monthly data dumps. Pushshift provides a means by which to access Reddit activity over time. We collected all January 2021 posts from three major political subreddits: r/politics, r/Republican, and r/democrats, using the Python package PSAW. 1
We focused on January 2021 because it was an especially active and turbulent time in US partisan politics. The month of January 2021 included: two Georgia special elections which would determine control of the US Senate, the inauguration of newly elected President Joe Biden, the beginning of the second impeachment trial for US President Donald Trump, and most glaringly, the attack on the US Capitol (hereafter referred to as the Capitol Riots) (Frias, 2021; Phillips, 2020). Each of the aforementioned events occurred in the context of a historically partisan environment (Amlani & Algara, 2021), but the Capitol Riots specifically were an unprecedented act of partisanship expressed through violence (Kydd, 2021). The Capitol Riots were rooted in unfounded beliefs about a “stolen election,” and accordingly, the events at the Capitol are a distillation of the dangers of self-reinforcing and unchallenged beliefs and their catastrophic real-world implications (Hinsz & Jackson, 2022).
We collected data from the three subreddits because, first, r/politics is the most well-established and largest political subreddit, with over 7.3 million members as of January 31st, 2021. For a point of comparison, we decided to focus on partisan networks. Thus, we chose to focus on two subreddits that explicitly tie themselves to the two major parties in the United States; r/Republican and r/democrats. These subreddits describe themselves as ideological homes for members of the Republican and Democratic parties, respectively. Although these two subreddits are not the only partisan political subreddits on Reddit, they are the largest discussion forums that explicitly align themselves with the two major parties, both in terms of their names and their subreddit descriptions.
Our decision process for choosing r/Republican and r/democrats was based on several considerations. Our top consideration was finding two equivalent partisan groups for an apples-to-apples comparison. r/democrats and r/Republican are relatively similar in the terms of membership size (207k vs 165k) while ideological subreddits such as r/liberal and r/conservative are more imbalanced (as of this writing, r/liberal has 114k members compared with r/conservative’s one million members). This asymmetry could be partially attributable to the idea that “liberal” is sometimes considered a pejorative term in political discourse, while “conservative” doesn’t necessarily carry the same negative connotation (Neiheisel, 2016). Left-leaning political discussions may also separate in ways that right-leaning political discussions don’t. For instance, as of this writing, r/socialism has more members than r/liberal, while the membership of r/progressive isn’t far behind r/liberal. These ideological distinctions frustrate direct comparison between liberal and conservative subreddits. In addition, we considered alignment with a political party to be an important feature of our analysis because, in the context of the United States, political actions such as voting are primarily expressed through a binary decision (i.e., a voter aligns with either the Republican party or the Democratic party). Our approach is similar to that of De Francisci Morales et al. (2021), except that instead of focusing on specific political candidates (e.g., Donald Trump and Hillary Clinton), we chose to examine the two major political parties. Finally, we considered the relative popularity within the scientific literature of different subreddits. While many prior studies have evaluated discussions on conservative subreddits such as r/The Donald (Rieger et al., 2021; Trujillo & Cresci, 2022), few have evaluated subreddits that explicitly tie themselves to the two major political parties. The distinction may produce differences; for example, party membership may change the perception of the “in-group” toward an “out-group.”
While r/politics is not strictly politically neutral, it nonetheless does not explicitly tie itself to a specific political party or ideology. The description of r/politics included in Figure 1a says that “[it] is for news and discussion about U.S. politics.” while the descriptions for r/Republican and r/democrats (seen in Figure 1b and c) explicitly tie themselves to their eponymous parties.

Descriptions of each subreddit. (a) The r/politics subreddit description. (b) The r/republican subreddit description. (c) The r/democrats subreddit description.
By studying r/politics, we can distinguish between the communication in a popular and important subreddit—which is not tied to any party—and explicitly partisan subreddits. In addition, r/politics has a demonstrated ability to serve as a bridge between partisan subreddits, making its study an interesting point of comparison. We find that over 50% of unique comment authors in r/Republican and r/democrats also posted a comment in r/politics. In this respect, r/politics may facilitate and encourage conversation among people with disparate views.
Our analysis of each subreddit is two-pronged. We start by analyzing the general themes and topics discussed on Reddit before exploring how each topic is discussed. We start with Reddit posts, referred to as submissions on Reddit, before moving to the study of the text comments associated with these submissions. In our formulation, submissions signify what is being discussed (i.e., the general parameters of the discussion), while comments represent the discussion itself (i.e., how the topic is being discussed).
Within each submission, we focus exclusively on the title and not the body of the submission. We chose to do this because the majority of submissions that we are examining do not contain additional text, and therefore incorporating any extra text could potentially skew the dataset. Precisely 80%–93% of the submissions in each of the three subreddits are without additional text beyond the title. This is likely because Reddit submissions often have lengthy titles, and in the case of political posts, they often reference specific news stories or political articles. See Figure 2 for an example of a Reddit submission. However, comments often represent the majority of dialogue in a subreddit. Accordingly, we have included all comments that were posted through the end of March 2021 on the January 2021 submissions. While it is possible that a small number of comments were made after March on January submissions, it is unlikely to alter our experimental results substantially.

A sample Reddit submission.
The statistics for each dataset can be seen in Table 1. After removing comments from auto and moderator accounts, we construct semantic word networks for each of the three subreddits (r/politics, r/Republican, and r/democrats). For each subreddit network, words have an edge between them if they are used in the same post at least once. The edges are weighted based on the number of co-occurrences of words, and the graph is undirected.
Reddit Communities (as of 31 January 2021).
Analysis and Metrics
For our analysis, we started by performing several natural language processing tasks, including removing stop words, lemmatization, making words lowercase, and removing punctuation. We primarily relied on the Natural Language Toolkit (NLTK) package (Bird et al., 2009) in Python to perform these tasks. Since the NLTK has a relatively limited number of stop words, we chose to rely instead on the more extensive stop word list provided by tidytext (Silge & Robinson, 2016), which is a composite of several stop word lists2,3 (Lewis et al., 2004).
We then selected the top 100 words by calculating their frequency in each dataset (i.e., the total number of times a word appears in the dataset). The top words were limited to 100 to ensure that we kept only the most meaningful words, excluding words with more limited usage. This is particularly important for the less active subreddits such as r/Republican where words outside of the top 100 are used only a handful of times. We found that the top 100 words in each subreddit’s submission titles are collectively used more times than the following 400 words combined. The use of the top 100 most frequently used words is consistent with other research studies examining language usage (Ibrahim et al., 2017; Lyddy et al., 2014).
For the top 100 words from each dataset, we generated 100 × 100 adjacency matrices where the values in the matrices correspond to the number of co-occurrences for each pair of words. We set the diagonal values to 0 to avoid counting words as co-occurring with themselves. Using the constructed adjacency matrices, we created semantic network representations for each of the subreddits. We used the Louvain community detection algorithm (Blondel et al., 2008) in igraph (Csardi & Nepusz, 2006) to detect communities within each network. The networks were visualized using R (R Core Team, 2013), Gephi (Bastian et al., 2009), and the Fruchterman–Reingold algorithm (Fruchterman & Reingold, 1991).
Combining the Jaccard index—which uses the relative ranking of words based on their usage—and the eigenvector centrality—which measures a word’s connections to other influential words—provides a multifaceted understanding of the words used within each network.
In the following sections, we start with an analysis of January 2021 submissions in each subreddit to address RQ1 and gain an understanding of what the topics being discussed are and how these vary between each of the subreddits. We then move to answer RQ2, or how each topic is being discussed in each subreddit, and the level of commonality between them based on submission comments. We conclude the following results sections by zooming in with a case study where we analyze two discussions about the Capitol Riots, one occurring in r/Republican and another in r/democrats.
Results
General Themes and Topics
Our results reveal notable overlap among the topics discussed across political subreddits. Table 2 showcases the top 10 most commonly used words in each of the semantic networks. Table 3 highlights the Jaccard index between each of the keyword networks. As these tables highlight, the top 10 words as well as the top 100 words are reasonably similar between each of the networks. For example, despite the ideological segregation of the two partisan subreddits, they nonetheless have a Jaccard index of over 0.4 when analyzing the top 10 words. The findings for the top 10 words hold for the top 100 words, where r/Republican and r/democrats have a relatively similar Jaccard index value. This indicates that the network similarity extends beyond the most commonly discussed topics.
Ten Most Common Words—Submission Title.
Jaccard Index of Top Keywords—Submission Title.
Perhaps counter-intuitively, the two partisan networks actually have more in common with each other based on the Jaccard index than r/Republican has with r/politics, which could be an indication of the similarity between political discussions by partisans.
To ground the Jaccard values, we also calculated the number of shared words between each subreddit’s top 100 (in submission titles). The lowest number of shared words is between r/Republican and r/politics, with 51 out of 100 words shared, while the highest is between r/democrats and r/politics, with 71. r/Republican and r/democrats share 53 words out of the top 100. This indicates that though the networks have notable differences, even the least similar networks share more words in common than they do not.
Table 4 represents the top 10 words ranked by eigenvector centrality in each subreddit’s submissions. The top words are generally similar to that of the top words by usage, although their placement sometimes differs. Certain words, including “trump,” “biden,” “election,” “republican,” and “president” are widely shared across each of the subreddits and are often at the center of discussions.
Top 10 Words Ranked by Eigenvector Centrality For Each Subreddit—Submission Title.
Community detection in the three semantic networks also shows substantial similarities in themes across subreddits. Communities in semantic networks identify tightly connected words, thus indicating major topics in the discussion. Tables 5 to 7 represent the results of the Louvain community detection algorithm over each of the networks, where each line represents the top words of a different community, along with the size of the community (out of the 100 words in each network). The top words in each community are identified by their usage (occurrence) within each subreddit. The modularity for each of the networks is 0.24, 0.20, and 0.24 for r/democrats, r/politics, and r/Republican, respectively. The relatively low modularity scores may result from the fact that many of the top 100 words are frequently used together, even across communities. Compared with several other algorithms, the Louvain algorithm performed better. As a point of comparison, the greedy modularity optimization algorithm proposed by Clauset et al. (2004) yielded modularity scores of 0.24, 0.19, and 0.23 for r/democrats, r/politics, and r/Republican respectively, while the Walktrap community finding algorithm (Pons & Latapy, 2005) produced modularity scores of 0.20, 0.14, and 0.19 for the same networks. While this does not preclude other community detection algorithms from outperforming the Louvain detection algorithm, it does indicate that it performed reasonably well given the networks.
Most Commonly Used Words by Community: r/politics.
Most Commonly Used Words by Community: r/democrats.
Most Commonly Used Words by Community: r/Republican.
Analyzing the community detection results, we see that the networks share a similar number of communities, with r/democrats and r/Republican having four and five communities, respectively, and r/politics having four communities. Figure 3 provides a visualization of each network, with the color denoting the community that each word belongs.

Semantic networks for each subreddit—submission title. Color represents community membership, and the relative size of the node label is based on eigenvector centrality. (a) r/Republican. (b) r/democrats. (c) r/politics.
The community detection results yield a consistent and small number of communities. Topically, the communities have notable similarities as well as differences between each of the networks. For instance, the four major themes covered in r/politics are the Georgia Election, President Biden’s Inauguration, the Capitol Riots, and former President Trump’s Impeachment hearings. r/democrats largely share the same communities as r/politics, though the impeachment and inauguration topics form only a single community. r/Republican also has communities for the Capitol Riots and the Georgia Election, in addition to one community discussing tensions between the United States and China, while the other two communities appear to be discussing electoral certification of the 2020 election and President Joe Biden broadly. While it was not one of the top words, r/Republican also did discuss former President Trump’s impeachment hearings as part of community five. Nonetheless, impeachment was likely discussed less by r/Republican, such that there wasn’t a community focused on this topic. The word “inauguration,” however, was not in the top 100 words for r/Republican.
How are These Themes Discussed?
Whereas users in disparate political subreddits may discuss similar themes, they might respond to the topics differently. To further investigate this, we examine the themes and sentiments in the comments to the submissions. Table 8 shows the Jaccard index of the top words used in the comments of each subreddit. Notice that when compared with Table 3 which looked only at the submission title, the shared words are substantially higher. In the top 100 words, r/politics and r/democrats share the most in common with a Jaccard index value of 0.72, while r/politics and r/Republican have the least in common with a value of 0.55. r/democrats and r/Republican notably share a bit more commonality than r/Republican does with r/politics as was the case with the submission title. Overall, the results suggest high overlap in the comments across the subreddits. See Table A1 in the Appendix for the most common words used in the comments for each subreddit.
Jaccard Index of Top Keywords by Usage—Comments.
Next, we compare the Jaccard index to the fraction of shared top 100 words, this time only considering submission comments. The lowest number of shared words is between r/Republican and r/politics with 71 (out of 100), while the highest is between r/democrats and r/politics with 84. r/Republican and r/democrats share 73 out of 100 words. This indicates that despite some differences in the words used, there is still a substantial overlap.
Sentiments toward shared topics also demonstrated considerable similarity across subreddits. Table 9 displays the Jaccard index of the top sentiment words used for each topic, based on the frequency of use. The topics chosen represent many of the most significant political actors and events of January 2021, including the former president (e.g., “trump”), the incoming president (e.g., “biden”), the Capitol Riots, the inauguration of the newly elected president, and the Georgia special elections. While not universally popular across each subreddit, the chosen topics represent major debate and discussion areas. As Table 9 shows, the shared sentiment words are surprisingly high for certain contentious topics while remaining much lower for others. Former President Trump and President Biden are two of the most talked about subjects in all subreddits, and these topics also share surprisingly similar sentiment words. For example, r/democrats and r/Republican have a Jaccard index of 0.40845 when discussing Trump, and all three subreddits use the words “attack,” “fraud,” “protest,” “racist,” and “terrorist” in their discussions.
Jaccard Index of Top-50 Sentiment Words by Usage in Comments: By Topic.
Discussions about (President Joe Biden’s) “inauguration” and the “riot” (at the Capitol) yield the least similarity in sentiment words. This is likely a result of some difference between the popularity of these subjects between the subreddits, as well as partisan differences in how these events are perceived. Discussions in r/Republican may be more likely to downplay the severity of the Capitol Riots, while discussions across r/politics and r/democrats may express more concern. The topic of the “inauguration” features little similarity between each of the subreddits, which could be explained by excitement toward the incoming president from users in r/democrats and apathy or perhaps antipathy from users in r/Republican.
Discussions about the “capitol” and “georgia” land somewhere between the other contentious topics in terms of their similarity. Notably, “georgia” is the only topic in which the discussions of r/Republican and r/politics are substantially more similar than those of r/politics and r/democrats.
For most of the topics, r/politics shares more commonality with the partisan subreddits than they do with each other.
A Case Study of the Capitol Riots
Our analysis has primarily focused on quantitative comparisons thus far. However, it is worth considering an example of how different discussions unfold across the subreddits. Here we focus on two discussions about the Capitol Riots, one occurring in r/Republican and another in r/democrats. Both posts have had their authors’ pseudonymous usernames removed for the sake of privacy.
Figure 4a shows a discussion of the Capitol Riots from r/Republican. The poster tacitly acknowledges the Capitol Riots and attempts to establish equivalency with the protesters of Brett Kavanaugh’s Supreme Court confirmation, while the comment reply from another user suggests that this equivalency is false. While not necessarily representative of all dialogue on r/Republican, this discussion is notable in that both Redditors acknowledge that the Capitol Riots are a real event and implicitly that the Riots were negative. The apparent disagreement between the two Redditors also shows that substantive debate and disagreement can be held even in a partisan forum such as r/Republican. Figure 4b shows a discussion of the Capitol Riots from r/democrats. The initial comment suggests that former President Trump needed to be impeached for his role in the Capitol Riots, and another (different) Redditor agrees and offers reasons for impeachment. Here, as in r/Republican, both the commenters acknowledge the danger of the Capitol Riots, though they go further in their indictment of the event and the former President’s role in it.

Example discussion of the capitol riots. (a) Comment thread from r/republican. (b) Comment thread from r/democrats.
The difference between the two discussion threads primarily comes down to how culpable each respective group holds former President Trump for the Capitol Riots. The users in r/Republican acknowledge that the Capitol Riots were negative, but one of them argues that any legal case against former President Trump is “anemic.” However, the discussion in r/democrats rests on the assumption that former President Trump played an instrumental role in the Riots, and both Redditors suggest that the (then) President needed to be impeached.
What stands out in both of these discussions is that, first, the discussions are lengthy and substantive. There also seems to be room for disagreement and debate. Both sets of Redditors agree that the Riots were substantively negative but disagree in terms of their perspective on just how unprecedented the Riots were and how responsible former President Trump was. While these discussions can’t represent the totality of discussion on each of the respective subreddits, they do paint a picture that shows some common understanding of the events, despite differences in partisan perception.
Discussion
Summary of Findings
Our work provides a relational means by which to understand communication on online forums and polarized communication in online communities.
Based on our analysis, a couple of points are raised. First, what is discussed in each subreddit appears relatively similar. All subreddits discuss the Capitol Riots, the Georgia Election, and aspects of President Biden’s inauguration and President Trump’s impeachment. Certain topics, however, remain more partisan. For instance, discussions of US–China relations and the Electoral Certification process were much more extensive in the r/Republican subreddit than in r/democrats or r/politics. Likewise, while President Biden’s inauguration and President Trump’s impeachment are mentioned in all three subreddits, only r/politics and r/democrats have full topical communities about these subjects. Despite these differences, however, there is still substantial similarity in what is being discussed across all three subreddits.
Considerable similarities were also identified when we analyzed how each topic is discussed, despite variances across topics. On one hand, the words used in comments are surprisingly similar across the subreddits, with r/Republican and r/democrats sharing a Jaccard index of over 0.57 when the top 100 words are evaluated. This indicates that the language used in comments is relatively similar across partisan subreddits and perhaps across political subreddits in general since r/politics also shares common language with the two partisan subreddits. On the other hand, the sentiment results reveal some differences. Interestingly, topics such as “trump” and “biden” yield modestly similar results, suggesting shared sentiments toward the topics across subreddits. This indicates that while opinions are likely to differ between subreddits, there may still be a common base from which they are arguing, even regarding seemingly controversial topics. This is further substantiated by the high proportion of users who comment not exclusively in partisan subreddits but also in r/politics.
Takeaways and Implications
What should concerned researchers and the broader public make of our results? The results demonstrate that although there are differences in perception of important events and people across partisan communities, the scale of the difference may be less than one might expect. Our results suggest a basis for cross-partisan discussion via “shared foundational beliefs.” Bridging these discussions takes intention and thoughtful design, but r/politics may be one such bridging space.
Our findings provide room for cautious optimism. Despite concerns about increasing polarization in US politics, our findings suggest that online partisan discussion forums are not necessarily unrecognizable to those of different political persuasions, particularly when the platform is relatively mainstream such as Reddit. While ideological differences exist between the partisan subreddits we have studied, these gaps may be bridgeable. The strength of democracy in an open society depends at least partially on a public square of discussion where divergent views can be civilly debated and discussed. Reddit may be able to provide a forum for such discussions.
Designing such bridging spaces will continue to be a major challenge for researchers and practitioners alike, but prior research suggests a way forward. Online discussions are more likely to be healthy and well-functioning when norms of respectful conversation are established (Grönlund et al., 2015). Having room for disagreement and debate, without concern of being ostracized, or shunned, could be vital to establishing healthy discussions (Coscia & Rossi, 2022; Grönlund et al., 2015; Nelimarkka et al., 2018). If one has a controversial opinion, it may be easier to express it to other sympathetic voices, and furthermore, there is a risk that users who perceive threats may exit the broader discussion altogether. As demonstrated by Nelimarkka et al. (2018) exposure to different points of view alone doesn’t necessarily alleviate group polarization, and other interventions may be necessary.
In addition, we note that while focused on partisan political communication, the approach we have described in our article is not limited only to echo chambers. Our approach can also be leveraged by other members of the research community to better understand the level of commonality between communications within distinct and disparate communities. The semantic network based approach could be useful in understanding a wide range of disparate communications and the similarities that are shared across them, for example, how men and women discuss sensitive topics about health and wellness, how millennials view capitalism compared with other generations, and how pet owners conceptualize their pets relative to how parents discuss their children. These topics are considerably different; however, in each case, the relational semantic network view provides a window into how different groups conceptualize topics and the commonalities that are shared. This approach can prove helpful for better understanding people and their different perspectives at a group level.
Furthermore, the incorporation of transformer models is an area of future work and exploration. Several studies have proposed transformer-based echo chamber detection on online discussions and social media with a particular emphasis on analyzing users and their interaction networks (Jiang et al., 2021; Morini et al., 2021; Han et al., 2019). An extension of our current work could use these types of transformer models to embed user-generated content (e.g., submissions and comments) in vector space and then measure the semantic similarity of discussions across subreddits. The advantage of such methods is that they go beyond keyword comparisons and take into account broader context. However, there are drawbacks regarding the consistency of their performance and interpretation (Khattak et al., 2019). This area of study requires further investigation.
Limitations
We start by acknowledging that the subreddits analyzed may not be representative of all political discussions that occur on social media or Reddit. We have chosen two subreddits with explicit partisan leanings and one without for the sake of evaluating partisan online communication (e.g., r/Republican and r/democrats) and how it compares with less explicitly partisan online communication (e.g., r/politics). Other subreddits that are tied to a particular ideology (e.g., conservatism or liberalism) but not directly to a particular political party may communicate differently and could be an area of further study. For example, it may be the case that r/Republican and r/democrats represent more mainstream positions which are aligned with the two major political parties, while more extreme positions may be more commonly shared in subreddits that are organized around left-wing or right-wing ideologies, such as r/socialism and r/conservative. Self-described “moderates” may, for instance, help to moderate the positions of the two major parties (Fowler et al., 2022).
Moreover, the members of Reddit may not be representative of the general population. While precise demographic user data are generally unavailable, 2016 polling by the Pew Research Center found that Reddit users were more likely to be under the age of 30, college educated, and to identify as white, male, and liberal than the general population (Barthel et al., 2016). Furthermore, Reddit is, by most measures, a commonly visited and arguably mainstream platform that may be less likely to traffic in extremist or conspiratorial content than other more fringe sites. This could be partly attributed to Reddit users posting under pseudonyms, while users of sites like 4chan and 8chan (now 8kun) post anonymously. In an investigation of hate speech, Rieger et al. (2021) showed that while hate speech existed in Reddit’s since-banned r/The Donald subreddit, it was less prevalent than in comparable discussion forums on 4chan and 8chan. What’s more, we focused on one particular time period (e.g., January 2021), which contained a discrete political event (e.g., the Capitol Riots). Future research should investigate political discussions in different time periods to see if the findings hold.
Conclusion
In this research study, we have sought to address the extent to which echo chambers exist in partisan forums. To answer this, we have examined three political discussion forums (subreddits) on Reddit in the context of the January 2021 Capitol Riots; r/politics, r/Republican, and r/democrats. By looking at both what topics are discussed as well as how these topics are talked about, we looked to analyze what level of shared reality (i.e., basic foundational beliefs) exists between disparate groups to assess whether online discussion forums can bring people together and bridge political differences.
Utilizing text and semantic network analyses, we have found considerable commonalities between the words used and the topics discussed in each of the subreddits that we have studied. Our work suggests that online discussion forums can potentially bridge political differences, particularly when the platform in question is one representing relatively mainstream views. Our results suggest that users in r/Republican and r/democrats share significant views about what events and politicians are worth discussing. Many users across both partisan subreddits communicate on the less explicitly partisan r/politics, and both partisan subreddits share commonalities with r/politics as well.
One of the most fundamental open questions in the research literature is the extent to which echo chambers exist and are exacerbated by online social media websites. Our work adds to this body of research by exploring the level of shared reality that exists between partisan forums. This research study suggests that despite many differences between partisans, online social media discussions may not be as walled off from a shared vision of reality as some have feared. The text analysis and semantic network analysis that we have used in this work can be leveraged by social science researchers and practitioners to understand online communication spaces better and to explore the possibility of bridging between disparate groups.
Footnotes
Appendix
Most Common Words by Subreddit—Comments.
| r/politics | r/Republican | r/democrats |
|---|---|---|
| trump | people | trump |
| people | trump | people |
| time | republican | republican |
| republican | vote | time |
| vote | democrat | party |
| president | election | democrat |
| biden | time | vote |
| party | biden | biden |
| sh*t | party | president |
| day | fraud | day |
| election | tax | election |
| country | evidence | sh*t |
| f*ck | country | country |
| senate | president | gop |
| democrat | government | f*ck |
| guy | medium | guy |
| f*cking | voter | senate |
| capitol | day | lot |
| lot | american | house |
| american | left | life |
| house | ballot | american |
| gt | bad | white |
| government | job | f*cking |
| gop | sh*t | bad |
| white | right | money |
| law | lol | yeah |
| office | agree | office |
| power | money | capitol |
| yeah | court | conservative |
| life | conservative | wage |
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
