Abstract
There is a growing body of health communication literature addressing health-related discourse across user-generated platforms. Specifically,
Keywords
The prevalence of health communication scholarship in the context of social media is not new. As social media platforms have emerged as a more prevalent source of health information and insight, an expanding body of research has emerged. This includes
Researchers have suggested that Reddit may be viewed as a social phenomenon itself in that what health topics are
Scholars have noted an uptake in research using computational-driven analyses built on Reddit data (Proferes et al., 2021) and directives for scholarship as a whole addressing health communication (Rains, 2020). While any social media platform may have ephemerality, the importance of this research provides directions for mediated communication research, a context for investigating online interaction, and a way to examine masspersonal communication (O’Sullivan & Carr, 2018). For the past several decades, research has considered numerous health challenges, such as the complexity of individual experiences with illness (Brashers & Babrow, 1996) as well as interpersonal interactions online that might facilitate discourse to encourage or hinder advocacy behavior (such as serving as a donor, e.g., in the context of bone marrow; O’Donnell & Guidry, 2020). Previous research has examined the visible nature of Reddit to make sense of health testimonials (O’Donnell & Guidry, 2020), personal disclosures surrounding oral health care (B. C. Britt et al., 2020), interactions regarding childlessness and medical procedures (Moore, 2020), the degree to which health crises are covered and shared (Kilgo et al., 2019), among many others. As Lampinen (2016) noted in the concentrated research surrounding mainstream social media sites such as Facebook and Twitter, there is a need to examine other platforms, as they also facilitate the spread of health testimonials and public opinions of health behaviors (Park et al., 2018; Valiavska & Smith-Frigerio, 2023). Burgeoning micro-communities within Twitter interactions formed from short text, as well as the pervasive value placed on communicating within one’s existing interpersonal network rather than establishing new connections on Facebook, have led researchers to understand communication across other sites.
Observations must be aggregated across platforms with contrasting digital affordances and demographics, representing a diverse range of social media and health discourse. Reddit likewise provides a fertile ground to expand beyond the well-studied Facebook and Twitter (which have been given attention in quantitative reviews and meta-analyses; see Boulianne, 2015; Chung & Lim, 2022; Marino et al., 2018). Emerging research has also explored TikTok (Zenone et al., 2021), WhatsApp, and Weibo (Yeung et al., 2022), among others, with regard to health challenges.
In articulating goals for contemporary health communication scholarship using novel theoretical and methodological approaches, Rains (2020) wrote, “understanding trends in existing research is critical to ensure the robustness of future scholarship” (p. 26). Reddit is an important platform for communication and information, yet there is no synthesis of what the current body of scholarship entails; thus, the present study is particularly valuable for identifying the common topics of research. In the present study, we quantitatively examine the literature surrounding health studies on Reddit with a communication lens, identifying gaps in the literature that are priority items for future research. Health communication research must move toward a deeper examination of how pseudonymous public discourse constructs and makes sense of complex phenomena to aid in public and behavioral outcomes.
Contextualizing health communication and the social nature of Reddit
Throughout the history of health communication research, scholars have been committed to producing a body of knowledge reflective of trends, themes, and national and global priorities (Beck et al., 2004). In recent years, this research has increasingly examined communication practices on user-generated platforms like Reddit. As Beck and colleagues reflected on the state of health communication research in 2004, scholars increasingly gave their attention to issues that spanned beyond the academy and disciplinary boundaries due to the complexities of contemporary health challenges. Kreps (2001) further contended that health communication scholarship is a broad area, spanning numerous social contexts.
We considered the definition of
Notably, individual members comprise groups on the platform using pseudonyms, and communication takes place on a dominantly visible platform (the interested publics; Burns et al., 2003), so we suggest that the role of the public must be considered in research directions. In considering the often-public role of health communication on Reddit, prior studies have examined discourse among healthcare workers on Reddit (Hintz et al., 2023), patient-provider communication in navigating personal health experiences, interpersonal discourse surrounding bone marrow donation (O’Donnell & Guidry, 2020), and more recently, the volatile nature of communication surrounding the COVID-19 pandemic (Britt et al., 2023), among numerous other studies. All these emphasize the necessity of the dynamic role that Reddit has held as a platform often in the public sphere, as scholars have produced a body of knowledge surrounding dialogue within the platform.
Discussions on the Reddit website and mobile app allow users to make posts that include links from external sites, images, videos, or text-based posts. Users can leave comments on these posts, which is where the main activity of the platform happens (Reddit.com, 2020). Reddit is grouped into over 138,000 topical sub-forums called subreddits (Marotti, 2018), which vary on topical themes, size, activity, and nature, and are often seen as self-selecting sampling frames (Adams et al., 2019). Reddit has several unique features compared to other social media sites, such as Facebook or Twitter. It is primarily pseudonymous in nature, where individuals create a unique username in which they need to participate. According to the website administrators, 58% of users are located in the United States and are 18–34 years of age; 56% are male, 44% are female; and 37% of visits are through desktop and 73% through mobile devices (Redditinc.com, 2021). Some users have been known to use “throwaway” accounts which are either one-topic and/or disposable accounts used to discuss stigmatized or personal topics (Leavitt, 2015). For example, Andalibi et al. (2016) examined throwaway accounts of online disclosure of experienced sexual abuse. In this case, individuals sharing personal experiences that may result from privacy concerns can provide an opportunity to still exchange supportive messages.
The affordances of the platform are important as platforms offer distinguishing features. As Lee (2016) wrote on social media on opinion polarization, digital media advancements have led to certain “high-choice media environments” (p. 57), which can make it easier for users to engage in selective exposure. The nature of subreddits as self-selective groups may be considered spaces where users do not have the same restrictions that they do on platforms like Twitter or Facebook in terms of text limitations. Moreover, as previous scoping reviews of social media research argued (Boulianne, 2015; which included Facebook, Twitter, YouTube, Google+, and MySpace; Marino et al., 2018, on a meta-analysis of Facebook), the majority of users on sites like Facebook tend to participate within their existing networks, while those on Twitter develop their own communities. In recent years, opinions on Twitter have shown how highly valenced arguments surrounding vaccine opinions have resulted in self-identified communities and sentiments (Bello-Orgaz et al., 2017). These platforms have their merits for the dissemination of and engagement in health discourse. Reddit has been absent in these studies, despite a growing abundance of scholarship and encouragement from scholars (Record et al., 2018).
Reddit allowed free access to its API for public use of the data before a public announcement made on April 18, 2023, which will limit third-party access. However, developers and researchers will still be able to collect data, although the details of this access have not yet been made public as of May 5, 2023. Other big-data storage and analysis projects like Pushshift.io (Baumgartner, 2021) allowed anyone to access the platform to download large amounts of web data through its own API. In recent years, Reddit has been used by scholars for artificial intelligence (AI)/machine learning (ML) approaches, content analyses, topic models, and often qualitative methods. As we have reviewed, many studies published within the area of health communication using Reddit have been concerned with
The present study considered the nature of human behavior as largely social (e.g., Klingle, 1993), and that the nature of Reddit is a public platform with masspersonal communication elements that help to form interaction (O’Sullivan & Carr, 2018). Thus, the study design questioned how research might advance theoretically and methodologically. When examining the unique features of the platform as we focus on social factors as paramount, we have the opportunity to understand how human action is understood on platforms like Reddit that prioritize pseudonymity, long-form text interactions (De Choudhury & De, 2014), and understanding the networks and large data sets that can emerge from longitudinal analyses (Rains, 2020). Theory can be cultivated, developed, and analyzed using a variety of methods, so we sought to identify and isolate theories and seek the consistencies and inconsistencies in which theory was used among health communication studies on Reddit. In addition, we strove for a reduction in biases against studies published within and outside the social sciences, particularly as health communication scholars often collaborate with partners in other disciplines (e.g., B. C. Britt et al., 2020; Park & Conway, 2017; Rains, 2020).
Moreover, health communication scholars have noted that there is a need to embrace methodological opportunity using a range of techniques (Rains, 2020), which can likewise assist in the extension and building of new theories, as researchers ask important questions to conduct research using increased volumes of data, which might require theoretical extensions, multiple theories, or bridging paradigms. As such, the following research questions are offered:
Method
Data collection and procedure
Studies were obtained in several ways. First, electronic databases with an available API were scraped in Python 3.6 (Elsevier, Arxiv, PubMed), followed by systematically searching for studies, including published articles, proceedings, and conference papers (Google Scholar, JSTOR, ProQuest, Royal Society of Chemistry, IEEE, ACM, Taylor & Francis, Sage Journals, Scopus, Web of Science). The terms and their derivations used to search the API and query included Reddit and health, Reddit and health communication, and related queries. Studies that were not related to health communication and Reddit were excluded from the data set.
The initial data set included 2232 potential articles, which was narrowed down to 333 after they were read in full and irretrievable articles were accounted for. There were 67 articles that did not meet all inclusion criteria. Three coders trained on a subset of these studies and a satisfactory level of agreement was reached (

ROSES flowchart detailing search, screening, and coding process.
We coded the following information: where the article was gathered from, the document type (journal article, proceeding, article preprint), publication year, whether or not theory was used; if so, the type of theory used, methods used, any additional methodological details, the article title, the journal title, authors, the document object identifier (DOI), the health topic studied, the number of participants if the study employed Reddit as a recruitment source; how Reddit was utilized, the primary health communication topic studied on Reddit, additional information about the topic, how many subreddits were examined on Reddit if stated, the primary subreddit studied on Reddit if it was listed 1 and additional notes made by the researchers regarding the study.
Selection of studies
Parameters that affected coding health communication topics were straightforward, as many studies utilizing Reddit often examine a subreddit(s), use Reddit as a recruitment source, or for alternative purposes. As part of the inclusion criteria, a study must have included Reddit as a data source (examining the public nature of a subreddit), a recruitment source for participants in a study, or in another manner (e.g., as a stimulus). Referencing Reddit alone disqualified the study from inclusion. Parameters of inclusion criteria do mean that the available studies for inclusion are smaller as a result, though we looked to prior quantitative literature reviews, content analyses, and meta-analyses that examined computer-mediated communication (Walther et al., 1994), noting how those investigated trends in reported data (Baltes et al., 2002).
Coding
Prior to coding, the authors reviewed the frontend, methods, and results of each article to ensure that the articles included were part of the study in a meaningful manner. For the purposes of the study, in coding article topics, we followed protocols in line with Beck et al. (2004), where an article had to refer to health communication, with topics such as general health, disease, illness, social support, and communication, or related issues. The article topics were coded into six primary categories, which were identified based on the specific health phenomena examined: mental, sexual, physical, drugs, illness, and health discourse. For example, a study that examined Reddit posts about suicide ideation was categorized as a mental health study. It is important to note that this categorical coding system did yield some instances of crossover. To reduce coding crossover discrepancies, authors coded primary health topics by secondary topics, including notes for the studies. Supplementary Appendix A provides a complete list of the coding scheme and criteria.
Following this, the authors coded the use of Reddit in the study either as a data source, recruitment source, or
Coding theory
Data were coded for the presence or absence of theory. If a study referenced the use of a theory or model as the guiding “theoretical framework,” we coded that as the theory used. For instance, in Squirrell’s (2019) examination of nootropics, the concepts of affordances and online communities were used as the framework. In that instance, this was coded as
Coding methods
As one representative example of coding methods, natural language processing was classified under the larger umbrella of computational science as the method, though some analyses employed additional approaches. In the case of studies like Naderi and colleagues (2019), they employed the sci-kit-learn package in Python to train the data set prior to modeling it using an ML approach, but the overarching method was classified as computational in nature. Haritosh et al. (2019) proposed an improved method to estimate the weight and body mass index of a person by evaluating a Reddit data set using a convolutional neural network, which was classified as an ML approach and ultimately coded as a computational method. We coded six variables for the methods employed: quantitative, qualitative, mixed-methods, computational science, rhetorical, or none. 2
Frequently, optimal practices in quantitative studies like this suggest estimating effect size (Lipsey & Wilson, 2001). While similar studies commonly report effect sizes, the present study did not because many of the studies simply did not include them as the methods used were often computational or qualitative in nature and did not have consistent reporting; at best, effect sizes were inconsistently reported among studies that did use quantitative methods. Moreover, the array of methods reported in the studies included computational approaches, so qualitative effect size was not typically included. Notably, other studies have taken this approach in not reporting effect size (Snyder et al., 2004). We further outline this in the discussion section.
Measures and analysis
The assessment of RQ1 and RQ2 included frequency distributions. We conducted a cross-tabulation and a chi-square test of independence for trends in publications and the theories and methods employed. 3 The analyses for RQ3 were performed using linear regression where the DV was the number of publications, and the IV was the interaction term between the publication year and the health topic. This was tested via general linearized models using the glm function in R 3.6.2 with α= .05 and assumptions of normality.
Results
Topic prevalence of health communication studies in the context of Reddit
The first research question asked about the dominant health topics examined in the context of Reddit. Overall, studies examining mental health on Reddit were among the most dominant topics (
Table 1 reports the subreddits listed for which multiple studies were reported. Out of 164 studies that reported a dominant subreddit examined in the study, 19 studies reported use of the same subreddit. The remaining 145 subreddits listed were listed in a single study. Notably, the dominant subreddits reported among multiple studies included
List of subreddits reported in multiple studies. All other subreddits reported
Theories and methods
The second research question asked what theories and methods are employed among health studies in the context of Reddit. First, we assessed whether theory was used as a framework for the study design. Of the 292 studies included in the analysis, 66 (22.6%) used a theory as the framework for the study, while 226 (77.4%) did not. Table 2 shows the breakdown of the theories employed in health studies. Notably, the dominant theories or frameworks employed included online communities (8 articles, 12.1%), social support frameworks (5 articles, 7.6%), followed by social cognitive and objectification theories (3 articles each, 4.5%); these were broad in nature, with specific theories comprising only individual studies.
Theories in health communication studies in the context of Reddit and their prevalence.
Among the methods employed, 117 studies used computational science approaches (40.1%), followed by qualitative methods (78 studies, 26.7%), quantitative methods (70 studies, 24.0%), mixed methods (21 studies, 7.2%), rhetorical approaches (4 studies, 1.4%), and no methods employed (2 studies, 0.7%).
Publications and health topics
The histogram in Figure 2 shows trends in publication data over time with an upward trend in publications that examine health topics on Reddit. According to the descriptive nature of the data, in 2014, 3 studies (1.0%) were published; in 2015, 7 (2.4%) and likewise in 2016, 7 (2.4%) with a surge in publications in 2017 of 34 publications (11.6%) to 48 in 2018 (16.4%), and 91 studies in 2019 (31.2%) with the largest increase in 2020 of 102 studies (34.9%); the discussion section elaborates on the upward trajectory of growth in studies.

Trends in publications of health topics examined in the context of Reddit.
Among all 292 studies, The
Health topic changes over time
Table 3 lists the results of the general linearized model to carry out the regression for RQ3, which asked about the topical change that occurred over time in the studies. The model included the following variables represented as topics: Year (of Publication), Drugs, Illness, Mental Health, Physical Health, Sexual Health, and the following interaction terms: Year × Drugs, Year × Illness, Year × Mental Health, Year × Physical Health, Year × Sexual Health. From 2014 to 2020, there was a significant increase in studies associated with mental health and Reddit (
Regression of variables predicting topical change from 2014 to 2020.

Assumptions of normality among data fit in glm predicting topical change.
Discussion
Health communication scholarship has in recent years emphasized how novel methods facilitate greater theory building (Rains, 2020); through understanding interpersonal and organizational practices in human communication on platforms like Reddit (O’Donnell & Guidry, 2020; Rhidenour et al., 2022). The overall increase in published health communication studies on Reddit is perhaps inevitable, given that Reddit’s own monthly views, comments, and daily active users continue to rise concurrently with the site’s valuation (Curry, 2023). Accordingly, these platforms do not solely exist in a virtual world, because participants engage in activities both on and offline (e.g., in O’Donnell & Guidry, 2020, on promoting bone marrow transplantation). Reddit participation, in several cases, reveals the pronounced connections between online and offline lives, echoing a long-standing argument that technology can foster or inhibit social interactions and connections in “When we say society, we usually mean citizens in entities known as nations. We take those categories for granted. [. . .] All the questions about community in cyberspace point to a similar kind of transition that might be taking place now, for which we have no technical names” (pp. 53–54).
Arguments referencing the ephemerality of platforms (Flanagin, 2020) like Reddit are not lost on us, though we offer in this discussion section insights for the future of health communication scholarship, platforms, and the nature of public communication.
Theory, methods, and directions for public communication
The significance of applying and extending theories of human communication cannot be overstated with a caveat. It is important to note that in this study, our objective is not to state that Reddit is an optimal study site or methodological toolkit as a priority for health communication research. Many health communication studies were published widely, from the social sciences to preprints in computer science and an array of domains. In line with this, the results of this study revealed that the presence of theory was not wholly surprising, comprising 22.6% out of all 292 studies included in the analysis. The earlier “caveat” we mentioned, in this case, is that because this was a comprehensive data scrape where we attempted to reduce our respective biases toward communication and the social sciences, we included studies from any available academic databases. We particularly note the diversity in topics and disciplines represented, spanning social and natural sciences. As Eger and Jaworowski (2001) wrote in a letter to
Computational methods dominated the research findings, followed by qualitative methods. This is not to say that we should avoid using any specific methodological approach or that theory does not “fit” the model or method, especially since computational methods have previously been employed in exploratory research. Rather, this means that there is an opportunity to utilize novel theories and methods to lay a foundation for knowledge building—a strength in public communication research and as we engage in masspersonal communication (O’Sullivan & Carr, 2018) to unpack meaningful interactions and supportive inquiry (Wright & Bell, 2003). As Rains (2020) noted, a primary benefit of computational methods should allow for researchers to gain insight into public responses to health topics and events, model the nature of those discussions or networks, and extend theory. Some computational methods include the use of Python and R to scrape data and conduct a latent Dirichlet allocation to develop topic models (Britt et al., 2023 on analyzing r/covid19 to make sense of the health discussions by users, applying the Crisis and Emergency Risk Communication (CERC) theory). Moreover, we stated earlier that the social phenomenon of Reddit as a platform, its user engagement, and the subjects explored therein themselves might very well merit additional study to observe the natural phenomena that occur there. As subreddits emerge and form, the potential for user-to-user relationships, and the manner in which reality is constructed constitute the construction of scientific practice (e.g., Giorgi, 2010).
Such approaches (e.g., computational, further scientific inquiry of natural phenomena) require nuanced interpretation from researchers, so integrating theory to help frame these approaches would aid in how theory can help to make sense of larger social phenomena that occur on sites like Reddit (Rains, 2020). Theories organize scholarly ideas to build and test prior knowledge, and in the context of new platforms, allow us to make sense of human communication (social effects and the role of cues; Walther & Parks, 2002).
Health topics evolve in public communication in a changing media environment
Mental health topics were the most prominent among health communication studies in the context of Reddit, comprising 30.8% of the data set. Over the past few decades, there has been a noticeable rise in the prevalence of mental health issues in research studies, as evidenced by those like Karim et al. (2020). More importantly, they were the only health topics to be statistically significantly associated with an increase over the 7-year period of published studies. Several dominant subreddits were also composed of mental health topics or those that were comprised of stigmatized health issues; of the top subreddits that were reported in studies, the top 3 included
As many studies examined topics associated with weight loss or mental health (e.g., Sowles et al., 2018), studies associated with weight loss did not emerge as their own classification; nonetheless, we question whether the study of individual subreddits would lead to oversaturation or staticity. Moderation and rules change among active subreddits, and communities evolve. Notably, few studies used Reddit as a recruitment site, notwithstanding some scholars asserting that it is more appealing for recruiting participants than MTurk (Shatz, 2017). Moreover, we must consider ethically sensitive topics, how those are studied, and the data that is gathered by researchers. It was out of the scope of this study to identify how researchers specifically dealt with ethical issues, yet it should be noted that great care should be given when investigating these topics.
Limitations and research directions
Like all studies, there were several limitations and strengths to the present research. First, many studies were from scientific domains outside of communication, not only because the platform might lend itself toward those types of studies (and thus, such studies get published because small sample sizes might be inefficient) but because publication in English was an inclusion criterion. This could have led to an underrepresentation of studies from countries where publication in English was not standard. Moreover, we did not have a clear way of knowing whether the research design for studies was accurate in their design, use of theory, methods, and use of Reddit, especially as methodological guidelines for computational research are not standardized. In addition, even now there are emerging studies on health communication and Reddit, though such articles are preprints or ahead-of-print publications. The current research holds a complete database with the first known study included to the latest for a full 7 years. Notably, in the majority of studies, because computational methods were dominantly reported in studies (often employing ML, convolutional neural networks, linear discriminant analysis (LDA), and so forth; see Supplementary Appendix A), followed by qualitative methods, estimates of effect sizes could not be reported in a treatment.
A fruitful avenue for future research may be to investigate the specific nuances of computational methods examined by platform-based research, comprising different forms of social media such as Reddit, Facebook, and Twitter, and how those are used in public health communication contexts, particularly given the growth in many ways such methods are employed. Moreover, while existing research has explored many health outcomes associated with mental health, research often examines the role of undiagnosed depression, anxiety, stress, and similar outcomes. Future research may extend that research by looking at the prevalence of experiences, diagnoses, and the evolution of conversations surrounding the domains of mental health conversations on various platforms, for which both computational and qualitative methods may pair well in future scholarship. Given these directions, there are numerous avenues—both in extending this work and conducting critically important research in public health communication—that researchers can take up that address the role of platform features, such as those on Reddit, the role of health-related behaviors, and online discourse.
Supplemental Material
sj-docx-1-ctp-10.1177_20570473231209075 – Supplemental material for Trends and challenges within Reddit and health communication research: A systematic review
Supplemental material, sj-docx-1-ctp-10.1177_20570473231209075 for Trends and challenges within Reddit and health communication research: A systematic review by Rebecca K Britt, Courtny L Franco and Naiyan Jones in Communication and the Public
Supplemental Material
sj-docx-2-ctp-10.1177_20570473231209075 – Supplemental material for Trends and challenges within Reddit and health communication research: A systematic review
Supplemental material, sj-docx-2-ctp-10.1177_20570473231209075 for Trends and challenges within Reddit and health communication research: A systematic review by Rebecca K Britt, Courtny L Franco and Naiyan Jones in Communication and the Public
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
Notes
Author biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
