Abstract
Background:
Twitter is one of the most popular social media platforms. The growing use of Twitter by health-care consumers creates a novel venue to understand patient experiences. To understand the potential for this platform to be utilized in patient- and family-oriented health research, this study reviewed published literature on the use of Twitter in health research.
Methods:
In collaboration with the research team, a research librarian designed and implemented a search strategy in eight databases. Primary and secondary screenings were conducted using predetermined criteria by one reviewer. A second reviewer verified screening decisions in 10% of the studies. Evidence tables were created to synthesize across the following study elements: research design, data collection techniques, analytic approaches, and author’s insights on Twitter as a data collection method. Descriptive narrative analysis was used to synthesize data.
Results:
The search strategy captured 618 articles; 233 were eliminated in primary screening and 366 articles were eliminated during secondary screening. Verification by the second reviewer resulted in very good agreement (κ = .980). Seventeen articles were included in the final data set. Synthesis across the studies demonstrated that Twitter is currently used to search and mine research data, while active recruitment strategies on Twitter are just beginning to emerge.
Conclusion:
The novelty of Twitter for study recruitment and data collection with health-care consumers presents advantages and challenges that differ from traditional methods of data collection.
Keywords
What Is Already Known?
Twitter, a popular and growing social media platform, has been increasingly utilized in health contexts including public awareness of natural disasters, syndromic surveillance, assessing public misunderstanding surrounding antibiotic use, and as a cost-effective means of research recruitment. The use of these 140 character microblogs known as “tweets” and their rapid exchange of information are only a few characteristics of Twitter that may provide novel methods for data collection, which have yet to be fully explored.
What This Paper Adds?
This manuscript is the first review of published research on the use of Twitter to collect health research data with health consumers. From this study, we concluded that use of Twitter in health research, including participant recruitment, data collection methods, and type of data collected, is similar across different qualitative and quantitative study designs. Nevertheless, Twitter is ideal for qualitative research, as data are presented with unbiased text giving insight to underlying reasons and opinions of a phenomenon. In addition, Twitter deviates from traditional methods of data collection (i.e., in-person focus groups, interviews) in terms of how data are collected, how data are verified, ethical considerations, and accessibility of research. Thus, this manuscript provides new considerations for health research and researchers using Twitter and other social media platforms.
Background
As the use of social media proliferates, researchers have begun to explore these platforms as tools to efficiently disseminate information to vast and diverse audiences in a cost-effective manner (Robillard, Johnson, Hennessey, Beattie, & Illes, 2013). Although social media is approximately 15 years old, these platforms have already changed approaches to health communication (Stump, Zilch, & Coustasse, 2012). In addition to information dissemination, social media has influenced the collection of health information (Wittmeier et al., 2014) and these platforms have been increasingly explored as mechanisms to increase engagement in health education and to conduct health research (Finfgeld-Connett, 2015). The strength of social media lies in its ability to develop relationships and communities around shared interests (i.e., rare disease and chronic illnesses) even if groups are geographically dispersed as well as being able to develop communities asynchronously (Wittmeier et al., 2014). There is increasing interest in understanding the full impact of social media platforms on health research (Finfgeld-Connett, 2015).
Twitter, one social media platform, is a form of microblogging with brief 140 character “tweets” that may consist of images, text, and links (O’Connor, Jackson, Goldsmith, & Skirton, 2014). Registered users follow other accounts to quickly exchange information and see updates; unregistered users may only read tweets (Stump et al., 2012). Established in 2006, Twitter reports 1.3 billion registered users, 313 million active registered users, and 500 million daily tweets (Finfgeld-Connett, 2015). Twitter is used by 23% of online adults, with 37% of users between 19 and 29 years old and 25% between the ages of 30 and 49 years (Finfgeld-Connett, 2015). The growth rate of Twitter has been considerable, with no evidence of slowing down (Stump et al., 2012).
Twitter has been increasingly utilized in health (Finfgeld-Connett, 2015), in contexts ranging from spreading awareness during natural disasters to tweeting real time in hospitals (Holt, 2011; Parker-Pope, 2009). “TwitterCare” includes using the platform for diabetes management, blood glucose monitoring, drug safety alerts, chronic condition self-management, diagnostic brainstorming, infant care tips, and post discharge follow-up care (Fisher & Clayton, 2012).
Twitter has also played various roles in health research initiatives (Robillard et al., 2013), including tracking flu epidemics by conducting syndromic surveillance (Eysenbach, 2009) and assessing public misunderstandings surrounding antibiotic use (O’Connor et al., 2014). As a tool for research recruitment, Twitter is a cost-effective platform with the potential to engage hard-to-reach populations (Clinical Digest, 2014). Although tweets are limited to 140 characters, they include useful data and metadata for researchers, such as the account holder’s language, geolocation, their handles (account names), and the number and handles of followers as well (Finfgeld-Connett, 2015), thus maximizing useful data with minimal effort.
As one of the most popular social media platforms and a popular tool for health-care communication, Twitter presents a possible new venue to engage health-care consumers in health research. The ability of Twitter to connect individuals around shared interests and the unique qualities of this particular social media platform warrant the need for further exploration.
In particular, Twitter is ideal for qualitative research because it is one of the most well-established platforms used to discuss a variety of topics (Salzmann-Erikson, 2017). Twitter content is also already in text form, thus making it increasingly valuable to health researchers (Kilaru et al., 2016) seeking insight on underlying reasons and opinions of a phenomenon. The tweets, themselves, are naturally occurring contextual responses that can be collected with or without prompts from researchers and encompass cross-cultural and international views of the general public (Emma Hilton, 2017). To understand and design health interventions, it is imperative to broaden and examine diverse populations and audiences (Lyles, Lopez, Pasick, & Sarkar, 2013) and Twitter provides a well-established medium for this type of investigation. Hence, the purpose of this scoping review was to systematically identify and describe published health research evaluating the use of Twitter as a data collection method with health consumers.
Method
A scoping review is a knowledge synthesis methodology that is intended to review literature addressing broad topics inclusive of many different study designs (Arksey & O’Malley, 2005). Scoping reviews are used to assess emerging and established fields by evaluating the landscape of a body of knowledge and identifying gaps in existing literature (Arksey & O’Malley, 2005; Colquhoun et al., 2014). The goal of a scoping review is to provide a descriptive overview, and often strict limitations are not placed on search terms and critical appraisal is not used with identified literature (Arksey & O’Malley, 2005; Pham et al., 2014). Scoping reviews are a relatively new methodology and are becoming increasingly popular (Arksey & O’Malley, 2005). This scoping review sought to provide an overview of previously published literature describing Twitter as a data collection method with health-care consumers and provide researchers with considerations when potentially using this data collection approach.
Search Strategy
The research team and a health research librarian developed and implemented search strategies in eight electronic databases OvidEpub, Ovid MEDLINE (R) Daily and Ovid MEDLINE(R), Ovid Embase, Ovid PsycINFO, Ovid EBM, EBSCO CINAHL, Elsevier Scopus, and Thomas Reuters Web of Science Core Collection, using language (English) and date restrictions (2010–2015; Appendix A). The decision to restrict studies to English was informed by recent systematic research evidence suggesting there is no empirical evidence of bias if papers written in languages other than English are excluded (Morrison et al., 2012). These date restrictions reflect the recent increase in popularity of Twitter within health care (Stump et al., 2012).
Study Inclusion Criteria
Studies were included if they met the criteria outlined in Table 1. Studies were not excluded based upon research design. Primary research studies were included if they used Twitter to collect research data with health consumers (Appendix C). Studies were excluded if they assessed other social media platforms (i.e., Facebook, Instagram) focused on research dissemination or included health-care providers. The large, heterogeneous population that is health-care consumers was selected to assess the state of the science. Even with this broad target population, only 17 studies were included in the final set; therefore, analysis of subpopulations was not warranted.
Inclusion Criteria.
Study Selection
Primary screening of titles and abstracts was completed by one reviewer (A.Z.). Each article was rated as include, exclude, or unclear using a primary screening tool (Appendix C). Full-text articles were retrieved for secondary screening if they were classified as include or unclear in primary screening. Full articles were assessed using standard forms and predetermined inclusion criteria (Table 1). Articles included during secondary screening could only discuss Twitter, focus on only data collection, and target health-care consumers (see Appendix C). A second independent reviewer (X.W.) randomly screened 10% of all primary and secondary screened articles, using the secondary screening tool (Appendix C). All disagreements were resolved by discussion or third-party adjudication (L.A.).
Data Extraction
Data were extracted by one reviewer (A.Z.) for the following variables: research design, data collection techniques, analytic approaches, tools used to analyze and collect data, and the author’s conclusions on Twitter as a health research method. These variables were selected to explore whether there is a predominant research design used, whether there are least common and most common methodologies being employed, and draw attention to any conclusions or insights authors noted after using this method. After analyzing these variables, we aim to provide an overview of how Twitter is being used to reach this population for research.
Data Analysis
Descriptive narrative analysis was used to identify potential patterns (e.g., similarities, anomalies) in terms of differences of data collection techniques using Twitter. Evidence tables were built to facilitate synthesis across the studies. The descriptive analysis allowed us to (1) understand the current extent of how Twitter is used as a data collection method with health-care consumers, (2) evaluate the disadvantages and advantages of this method, and (3) compare and contrast our own experience using Twitter as a data collection method to help develop this body of literature.
Results
Seventeen studies met our inclusion criteria (Figure 1). The interrater reliability of screening decisions between the two reviewers was “very good” with a κ coefficient of .980, standard error of 0.020, and 95% confidence interval [0.940, 1.000]. (“GraphPadQuickCalcs: Quantify interrater agreement with kappa”, 2016). Table B1 (Appendix B) summarizes the 17 studies including important study elements and is organized by alphabetical order.

PRISMA diagram.
Quantitative (n = 2), qualitative (n = 7), and mixed methods (n = 8) research designs were represented in Table B1 (Appendix B). A wide range of health topics and research questions were explored including health challenges such as pain, migraines, and cancer (Ahlwardt et al., 2014; Nascimento et al., 2014; Parsons, Breckons, & Durham, 2015), social discourse of conditions like perceptions of portrayal of seizures (McNeil, Brna, & Gordon, 2011), and cyberspace in comparison with real-world phenomena (Nagel et al., 2013).
Mixed-Methods Studies
Mixed-methods studies (n = 8) were published within the last 3 years, with the exception of two articles published in 2011 (McNeil et al., 2011) and (Marton, 2012). Articles were all published in Western countries including Australia (n = 2; Beykikhoshk, Arandjelović, Phung, Venkatesh, & Caelli, 2014, 2015), United Kingdom (n = 3; Greaves et al., 2014; Parsons et al., 2015; Ramagopalan, Wasiak, & Cox, 2014), Canada (n = 2; Marton, 2012; McNeil et al., 2011), and the United States (Ahlwardt et al., 2014). Mixed-method studies included health focuses on pain (toothache, backache, earache, and headache; Ahlwardt et al., 2014); multiple sclerosis (Ramagopalan et al., 2014); autism spectrum disorder (Beykikhoshk et al., 2014, 2015); epilepsy (McNeil et al., 2011); pain tracking (Parsons et al., 2015); cancer patient tweets from Canadian Cancer Society (Marton, 2012); tweets to the hospitals in the English National Health Service about quality of care (Greaves et al., 2014).
Data collection methods using Twitter
Three studies used retrospective data collection to examine past tweets (Beykikhoshk et al., 2014, 2015; Ramagopalan et al., 2014). Six studies collected data prospectively by setting a future date to begin collecting tweets as data (Ahlwardt et al., 2014; Greaves et al., 2014; Marton, 2012; McNeil et al., 2011; Nakhasi et al., 2016; Parsons et al., 2015). Seven of the eight studies used various tools and programs to search for and collect data in addition to tweets including key words, phrases, or geolocations attached to the Twitter post or profile (Ahlwardt et al., 2014; Beykikhoshk et al., 2014, 2015; Greaves et al., 2014; Marton, 2012; McNeil et al., 2011; Ramagopalan et al., 2014). One common study, included in both the eight mixed-method studies, and the total 17 studies in the final set, used active recruitment to identify participants on Twitter and collect data with predetermined individuals at a predetermined time, about a specific topic (Parsons et al., 2015). Alternatively, all other studies relied upon collecting data from tweets passively from any user without actively engaging them.
Qualitative Studies
Qualitative studies (n = 7) were published within the last 3 years, with the exception of one study in 2012 (Sugawara et al., 2012). One article was from Australia (Hewis, 2015), three from Japan (Sugawara et al., 2012; Tsuya, Sugawara, Tanaka, & Narimatsu, 2014), three from the United States (Nakhasi et al., 2016; Nascimento et al., 2014; Shive, Bhatt, Cantino, Kvedar, & Jethwani, 2013), one from New Zealand (Henzell, Knight, Morgaine, Antoun, & Farella, 2014). Qualitative studies examined health focuses including patient safety (Nakhasi et al., 2016); infodemiology of self-reported migraine headache suffering (Nascimento et al., 2014); acne (Shive et al., 2013); orthodontics (Henzell et al., 2014; cancer (breast, leukemia, colon cancer, rectal cancer, colorectal, uterine, stomach, lung, and ovarian; Tsuya et al., 2014); magnetic resonance imaging patient’s perspectives (Hewis, 2015); Twitter usage and its role in “wired” cancer patients (Sugawara et al., 2012).
Data collection methods using Twitter
Two studies used retrospective data collection (Hewis, 2015; Tsuya et al., 2014). Four studies collected data prospectively (Henzell et al., 2014; Nascimento et al., 2014; Shive et al., 2013; Sugawara et al., 2012). All seven qualitative studies also used various tools and programs to search for and collect data in addition to tweets including key words, phrases, or geolocations attached to the Twitter post or profile (Henzell et al., 2014; Hewis, 2015; Nakhasi et al., 2016; Nascimento et al., 2014; Shive et al., 2013; Sugawara et al., 2012; Tsuya et al., 2014).
Quantitative Studies
Quantitative studies (n = 2) were published within the last 3 years, and both were published in the United States (Love, Himelboim, Holton, & Stewart, 2013; Nagel et al., 2013). One study examined tweets about vaccinations (Love et al., 2013). The second study examined interaction between cyberspace messages and real-world occurrences of influenza and pertussis (Nagel et al., 2013).
One study used retrospective data collection and one study used prospective data collection, respectively (Love et al., 2013; Nagel et al., 2013). Both quantitative studies used various tools and programs to search for and collect data in addition to tweets including key words, phrases, or geolocations attached to the Twitter post or profile (Love et al., 2013; Nagel et al., 2013).
Analytic approaches used for data collected via Twitter
The collected data were used for qualitative content and discourse analysis examining text, hashtags, and word frequency (n = 7). Quantitative statistical analysis was also used for linguistic analysis and applied to calculate sentiment, word frequency, and make comparisons (e.g., between tweets and disease; n = 2).
Discussion
This study highlights an emerging body of health research that is using Twitter, especially within the last 3 years. While qualitative, quantitative, and mixed-method study designs were represented in this review, our findings demonstrate that the way Twitter is used within health research, including participant recruitment, data collection methods, and type of data collected, is similar across study designs. For example, similar factors include the use of passive recruitment, data mining techniques, and collection of tweets, and user profiles display only one approach to collecting data. We will further explore these findings in the following paragraphs but we argue this highlights the novelty of Twitter as a data collection method. Many alternate approaches to using Twitter as a data collection method including approaches such as active recruitment, holding virtual focus groups and interviews, and collecting different data from tweets have yet to be utilized. Likewise, Twitter also deviates from traditional methods of data collection (i.e., in-person focus groups, interviews) particularly in the areas of participant recruitment, how data are collected, how data are verified, ethical considerations, and accessibility of research, which results in new considerations for health research and for researchers interested in using social media to enhance their work.
Participant Recruitment
Data collection using Twitter mostly focuses on passively collecting tweets from conversations that do not involve active participant recruitment (i.e., informed consent) into a study. Of the included articles, 16 studies conducted data mining, where new information that may previously be hidden is sought from large changing data sets, through Twitter searches (Barbier & Lui, 2011). Only one study actively recruited and consented participants and collected data through a preorganized, planned Twitter conversation (or chat; Parsons et al., 2015). This demonstrates that health research via Twitter does not typically involve active recruitment, and the typical recruitment technique involves tweeting to large audiences unaware that they are participating in research, termed covert research. Arguments for covert research focus on concerns of bias when participants are made aware of the researcher’s professional identity, which affects the validity of observations and data (Parker & Crabtree, 2014). Opponents of covert research argue that it is a form of deception that eliminates participants’ choice and that lack of transparency can lead to less than ideal research circumstances, which may cause stress and harm to participants and decrease credibility of future researchers (Parker & Crabtree, 2014). Active or overt participant recruitment on Twitter may involve contacting participants individually and conducting the informed consent process based on research ethics approval process, which takes valuable time and resources; however, it is argued that trustworthy data can only be created through establishing a trusting relationship between the researcher and participants (Pitts & Mille-Day, 2007).
Data Collection Methods
Sixteen studies of the included studies employed data mining techniques using Twitter’s advanced search and stream functions and Twitter’s application programming interface. These functions posed significant challenges that resulted in missing and/or excluding important data from their respective studies. For example, key words searched were not inclusive of all related word variations like alternate names and slang terms (Ramagopalan et al., 2014). Search strategies did not filter or properly eliminate nonhealth-related tweets, advertisements, and repetitive words, which required manual screening by research teams (Henzell et al., 2014).
Strategies to address data collection challenges and increase rigor were also identified. Searches were sometimes conducted by individuals with expertise in searching and extracting tweets (e.g., software developer) to increase speed and accuracy. Users’ profiles were also examined for context (Hewis, 2015) to clarify tweet content and increase accuracy. Studies also used predetermined criteria to aid decision-making, when deciding to include or exclude tweets as part of the data (Ahltward et al., 2014; Sugawara et al., 2012). While these novel solutions overcame some challenges, there were difficulties scaling these solutions to larger data sets using Twitter’s search and stream functions. Larger data sets that used search and stream functions used other strategies including having two reviewers and by hosting multiperson coding sessions to ensure interrater reliability (Ahlwardt et al., 2014; Nascimento et al., 2014).
Verifying Information
Using Twitter in health research, particularly in light of the passive and covert research methods typically used, means that researchers must trust that information provided in profiles and contained in tweets is accurate, since research wasn’t the intended use of such information. For example, in some studies, key participant demographic data (i.e., age, gender, geography) could not be confirmed (Nakhasi et al., 2016; Nascimento et al., 2014); therefore, researchers defined their population of interest by health concerns/interests (e.g., mining for key words like pain or cancer). Studies may also enforce rigor by contacting participants or reviewing their Twitter feed to verify inclusion or exclusion criteria (Hewis, 2015). However, this may be time-consuming and raise ethical concerns. Additional strategies like triangulation, thick description, member checks, and development of a coding system and interrater reliability can be used; however, Morse (2015) cautions that these may only be used with particular methods. An important limitation of collecting data covertly on Twitter is that data are difficult to verify and methods should be built into the research design to overcome this challenge.
Ethical Reconsiderations
Twitter is a public forum and all studies included in this review utilized public tweets; however, access to these data for research purposes is not well-defined in Twitter’s user agreement (Hewis, 2015) and it is unclear whether Twitter users would actively consent and participate in the same research if they knew tweets were collected as research data. Because of this, increased transparency is warranted.
Accessibility
Although using Twitter in the context of health research poses several challenges, it also has the unique advantage of accessibility. Groups of individuals can quickly form conversations and discussions around shared interests and specific topics. Geographically dispersed participants can connect and jump in and out of a virtual space, which creates flexibility that is hard to achieve with in-person data collection. Twitter has 313,000,000 active users (Statista, 2017), which means that this method of data collection could reduce barriers to research participation based on geographical location of researchers and research resources. It can also maximize resources, including time, effort, and convenience.
Finally, Twitter is an emerging venue of qualitative data for researchers. Sixteen of the 17 articles screened contained qualitative elements (i.e., qualitative mixed-methods studies). The dominant use of Twitter for qualitative research demonstrates the potential of Twitter to provide researchers access and valuable insights about health-care consumers in 140-character tweets. Participants are able to overcome geographical and physical barriers, and researchers can obtain a comprehensive and representative group of participants for their studies. This alone makes Twitter a valuable resource for qualitative health researchers; however, caution must be taken in the design and implementation of health research studies to ensure transparency, trustworthy data, and authentic health-care consumer engagement.
Conclusion
Utilizing Twitter as a recruitment and data collection tool in health research remains largely unexplored. Currently, Twitter is predominantly used for passive and covert data collection, but there is potential to gather data through more active and overt methods with some safeguards in place. Given Twitter’s unique accessibility advantages, future research should explore these active and overt participant recruitment and data collection methods to determine best practices. Further methodological refinement is required to fully realize the potential of Twitter in health research.
Footnotes
Appendix A
Appendix B
Appendix C
Authors’ Note
Further research materials related to this article, for example, data, samples, or models, are accessible by contacting the Evidence in Child Health to enhance Outcomes (ECHO) Research Team.
Acknowledgments
The authors wish to acknowledge Dr. Jude Spiers’ reflections on an early draft of this study, and the University of Alberta, Faculty of Nursing Undergraduate Student Summer Research Program.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
