Sage Journals: Discover world-class research

Abstract

Background:

Data representing people’s behaviour, attitudes, feelings and relationships are increasingly being harvested from social media platforms and re-used for research purposes. This can be ethically problematic, even where such data exist in the public domain. We set out to explore how the academic community is addressing these challenges by analysing a national corpus of research ethics guidelines and published studies in one interdisciplinary research area.

Methods:

Ethics guidelines published by Research Councils UK (RCUK), its seven-member councils and guidelines cited within these were reviewed. Guidelines referring to social media were classified according to published typologies of social media research uses and ethical considerations for social media mining. Using health research as an exemplar, PubMed was searched to identify studies using social media data, which were assessed according to their coverage of ethical considerations and guidelines.

Results:

Of the 13 guidelines published or recommended by RCUK, only those from the Economic and Social Research Council, the British Psychological Society, the International Association of Internet Researchers and the National Institute for Health Research explicitly mentioned the use of social media. Regarding data re-use, all four mentioned privacy issues but varied with respect to other ethical considerations. The PubMed search revealed 156 health-related studies involving social media data, only 50 of which mentioned ethical concepts, in most cases simply stating that they had obtained ethical approval or that no consent was required. Of the nine studies originating from UK institutions, only two referred to RCUK ethics guidelines or guidelines cited within these.

Conclusions:

Our findings point to a deficit in ethical guidance for research involving data extracted from social media. Given the growth of studies using these new forms of data, there is a pressing need to raise awareness of their ethical challenges and provide actionable recommendations for ethical research practice.

Keywords

Social media Internet ethics guidelines data science digital research

Introduction

Social media in research

Technological advances over the past decade have enabled widespread access to the Internet in most countries and the number of social media users has grown to around 2.8 billion people worldwide (Kemp, 2017). Social media are online, often mobile, platforms that support the creation and exchange of user-generated content (Kaplan and Haenlein, 2010), a phenomenon sometimes referred to by the terms Web 2.0 or the Social Web. They include generic platforms for networking, information sharing and content curation, such as Facebook,¹ Twitter,² YouTube³ and LinkedIn⁴; online forums aimed at specific communities, such as PatientsLikeMe,⁵ Mumsnet⁶ and BaristaExchange⁷; some private collaborative work tools such as Trello⁸ and Yammer⁹; and crowdsourcing platforms such as Ushahidi¹⁰ and Zooniverse¹¹, although opinions vary as to what precisely does or does not qualify.

Several uses of social media in research have been described in the literature. These include the deployment of social media platforms for the conduct of research, such as for gathering opinions (Hilyard et al., 2015), recruiting study participants (Pedersen and Kurz, 2016), undertaking participative ‘citizen science’ (Trisha, 2013) or fostering stakeholder involvement (Russell et al., 2016). People’s online activity in social media is also increasingly being used as a source of data for research (Wilson et al., 2012). Such ‘secondary uses’ include studies seeking to profile or understand users’ behaviours, demographics, interactions and networks, or to assess their responses or sentiments towards particular topics, products or policies (Anstead and O’Loughlin, 2015; Murphy et al., 2014). One of the most significant trends, from both a scientific and societal perspective, is the application of automated tools for mining and analysing social media as a means of revealing new associations or predicting future behaviours or outcomes. Increasingly this is taking place alongside data mining from institutional or business repositories, to link historical and real-time information (Smith, 2014). While the business sector has been using social media data for some time; such as to monitor brand reputation; their value for academic research is gradually being realised. In the United Kingdom (UK) considerable government funding has been invested in a network of major ‘big data’ research centres. Although these are mainly concerned with public sector administrative data (including health, housing and tax records, amongst others), recent investments include research centres focused on social media (Cardiff University, 2012).

A number of potential benefits of using social media in research have been described in the literature, including the ability to reach larger numbers of participants than might otherwise be possible (Moorhead et al., 2013), being able to analyse trends and associations within large corpuses of open-access data (Paul and Dredze, 2011), reducing the costs of conducting research in large populations (Munson et al., 2013), greater opportunities for interaction across extended time periods, as may be required in longitudinal or post-market studies (Hokby et al., 2016), providing a channel for social research that is less prone to bias than approaches involving direct contact between researchers and participants (McKee, 2013), involving citizens in the research process (INVOLVE, 2014), being able to curate and enrich biomedical knowledge (Good et al., 2012) and generating new channels for research dissemination (Balm, 2014).

Methodological and ethical challenges

Despite these advantages, the complexity of interactions between individuals, groups and technical systems in these online spaces presents a number of challenges for academics wishing to use social media data in research (Munson et al., 2013). These include the self-selecting nature of social media users, inequalities in access to social media platforms and data, the difficulty of obtaining meaning from heterogeneous data of variable quality and provenance, and a dependence on observing and interpreting what is ‘out there’ in a way that differs from traditional sampling approaches. Arguably, however, the greatest challenges for researchers in this area are ethical ones (David, 2004; Eysenbach and Till, 2001), such as variable perceptions of and unclear boundaries between ‘public’ and ‘private’ spaces, as well as the difficulty of ensuring anonymity and preserving the privacy of data subjects, whose identities may not be disguised or may be easily deduced from their postings and affiliations. Related issues of ownership and intellectual property are also poorly defined and consent to the use of social media data in research is rarely obtained through informed choice, but rather assumed on the basis that users have chosen to place it in the public domain (Koene and Adolphs, 2015; McKee, 2013; Munson et al., 2013; Nunan and Yenicioglu, 2013; Orton-Johnson, 2010; Vayena et al., 2012). Awareness of the potential privacy implications of sharing personal information on social media is growing, driven by newsworthy cases such as Facebook’s experiments in emotion manipulation (Jouhki et al., 2016) and its identification of ‘vulnerable’ teenagers for advertisers (Pells, 2017), or the use of social media by data analytics companies seeking insights into citizens’ political attitudes and networks, to influence voter behaviour (Fromm, 2016; Arthur, 2010). In this environment, pinning down the ethical guidance for researchers is now more critical than ever, with a requirement for any guidance to be responsive and adaptable to the changes invoked by the rapid evolution of social media platforms and data science.

Most research institutions, irrespective of academic discipline, publish or adhere to some form of research ethics guidelines or standard operating procedures, as a means of ensuring the appropriate governance of studies undertaken by their staff and collaborators. While these vary in structure, content and application, they are all intended to ensure responsible and trustworthy research practice and ‘to protect all groups involved in research: participants, institutions, funders and researchers throughout the lifetime of the research and into the dissemination process’ (ESRC, 2010: 2). Social media research is still a relatively new and changing field and commentators have pointed to the destabilisation of traditional ethics and an unsettling of ethical expectations and assumptions for both researchers and Internet users (Whiteman, 2012). This has been compounded by a lack of relevant ethics guidance and poses particular challenges for research involving ‘sensitive’ data, such as information about people’s health conditions, political affiliations or religious beliefs (see https://www.gov.uk/data-protection/the-data-protection-act)

Scope of ethics guidelines considered in this study

Given the growth of research using social media platforms, and its potential implications for information privacy, confidentiality and ownership, it is timely to examine the extent to which existing research ethics guidelines take such uses into account and what additions may be warranted. Social media research is taking place across multiple academic disciplines and applications for research ethics approval may thus defer to a range of different bodies. This presents challenges for the effective oversight of such research where, it has been claimed, ‘no official guidance or answers regarding internet research ethics have been adopted at any national or international level’ (AoIR, 2012). Mindful of the need for a cross-disciplinary perspective, we chose to study one identifiable national corpus of multidisciplinary research ethics guidelines, represented by Research Councils United Kingdom (RCUK).

RCUK is a strategic partnership between the UK’s seven research councils, which according to its homepage, ‘has invested around £3 billion in research covering the full spectrum of academic disciplines from the medical and biological sciences to astronomy, physics, chemistry and engineering, social sciences, economics, environmental sciences and the arts and humanities’ (see http://www.rcuk.ac.uk). They share an aim to ‘advance knowledge and generate new ideas which lead to a productive economy, healthy society and contribute to a sustainable world’. While RCUK itself has published a set of general research ethics guidelines, each of the seven disciplinary bodies in the RCUK family of research councils (see Table 1) provides its own form of ethical advice, either through developing bespoke guidelines or deferring to other relevant guidelines in the literature. For the purposes of our study, the corpus of RCUK ethics guidelines and external guidelines recommended within these was felt to be an appropriate sample to enable a meaningful analysis of the guidance available for academic researchers in the UK.

Table 1.

RCUK umbrella organisation and the seven UK Research Councils.

• Research Councils United Kingdom (http://www.rcuk.ac.uk)• Arts and Humanities Research Council (http://www.ahrc.ac.uk/Pages/Home.aspx)• Biotechnology and Biological Sciences Research Council (http://www.bbsrc.ac.uk/home/home.aspx)• Engineering and Physical Sciences Research Council (http://www.epsrc.ac.uk)• Economic and Social Research Council (http://www.esrc.ac.uk)• Medical Research Council (http://www.mrc.ac.uk)• Natural Environment Research Council (http://www.nerc.ac.uk)• Science and Technology Facilities Council (https://www.stfc.ac.uk/home.aspx)

Aims

We set out to examine how RCUK and affiliated research ethics guidelines acknowledge and deal with research involving social media overall, and specifically research involving data extracted from social media platforms (which we refer to using the generic term ‘mining’). We also wanted to understand how researchers using these new forms of data in their studies are responding to the ethical challenges this presents, by examining how ethical concepts or guidelines are referred to in published research articles. We chose health research as an exemplar area, since it is highly interdisciplinary (transecting the social, medical and computational sciences, amongst others) and in which study results based on social media are being used to inform scientific knowledge and theory, public services and policies, business practices and methodological innovations (e.g. Pagliari and Vijaykumar, 2016; Tursunbayeva et al. 2017).

We are not aware of any previously published studies to have analysed the extent to which the RCUK guidelines address the use of social media data for research purposes, or how ethical concepts and guidelines are being referred to by researchers undertaking relevant projects. Our research therefore sought to answer the following two broad questions.

RQ1: How do RCUK ethics guidelines address the use of social media in research overall and specifically research using data harvested from social media?

RQ2: How are ethical issues and guidelines described in published health research using social media data?

Methods

Theoretical frameworks

To aid our analysis we drew on two ethical frameworks which, although developed in the context of social media research for health, are sufficiently generic to be applied to any field of research involving the use of social media.

The first is Bjerglund-Andersen and Söderqvist’s (2012) typology of social media uses in research, which delineates five broad categories:

research dissemination;

scientific discussion and networking;

engaging the public;

academic teaching;

research and data collection.

For the reasons already described, we divided the last of these into two qualitatively different categories: first, using social media platforms to enable the conduct of research; and, second, using social media as a source of data for research.

Conway (2014) has gone further by suggesting a taxonomy of ethical considerations specifically relevant to the secondary use of social media data. Although this was developed in the context of Twitter mining for public health surveillance and research, it is applicable to many types of research involving data harvested from social media. This includes 10 specific considerations:

privacy;

informed consent;

ethical theory;

institutional review board (IRB)/regulation;

traditional research versus social media (e.g. Twitter) research;

geographical information;

researcher lurking;

economic value of personal information;

medical exceptionalism;

benefit of identifying socially harmful medical conditions.

While considerations 9 and 10 refer to medical issues, they can also be applied to other topics which are also uniquely sensitive (e.g. research on political attitudes) or are aimed at preventing harm (e.g. analysing extremist discourse), respectively. For the broader purposes of our study we therefore re-labelled them as ‘exceptionalism’ and ‘benefit of identifying potential harms’.

RQ1. How do RCUK ethics guidelines address the use of social media in research overall and specifically research using data harvested from social media?

To identify the corpus of ethics guidelines represented by RCUK, the websites of RCUK itself and the seven UK Research Councils were first identified via Google. The websites were then searched by entering the key words ‘ethics’, ‘guidelines’, ‘funding applications’ and variants of these, into their respective search boxes, and the outputs sifted manually. Searches were undertaken by the first author in February 2017.

Where a research agency was found to have more than one current ethics guideline, each of these was included, and in cases where the RCUK guidelines explicitly referred to external guidelines, the relevant source documents were also obtained for further analysis. Individual research councils were also contacted via email, asking them to state whether their organisation had developed or specifically recommended any ethics guidelines concerning the use of social media in research. Responses were received from six out of eight agencies, the non-respondents being the BBSRC and the ESRC.

The following information was extracted from each identified guideline: the name of the originating organisation, the title of the guideline, the date of the most recent version and whether the guideline explicitly referred to the use of social media or related concepts such as online or internet research.

The four guidelines referring to social media were scrutinised, to determine how they corresponded with the (adapted) typology of social media uses in research outlined by Bjerglund-Andersen and Söderqvist (2012). They were further appraised in terms of their reference to Conway’s (2014) list of 10 ethical considerations for research involving social media data. The guideline search and appraisal process is summarised in Figure 1.

RQ2: How are ethical issues and guidelines described in published health research using social media data?

Figure 1.

Summary of the guideline search and appraisal process.

For the reasons already described, we chose the example of health research using social media data, to explore how relevant ethical considerations and recommendations are being addressed in practice. The online database PubMed was searched up to 28 February 2017, using the structured query shown in Box 1.

Box 1.

The search query applied to PubMed.

((“health 2.0” or “web 2.0” or “social media” or “social network” or “blog” or “wiki” or “virtual world” or “discussion forum” or “online forum” or “chat room” or “facebook” or “twitter” or “patientslikeme” or “youtube” or “instagram”) AND (“surveillance” or “infoveillance” or “mining” or “netnography” or “listening”) AND (“health” or “disease”) NOT “animal”)

The inclusion criteria encompassed peer-reviewed journal articles and conference papers describing empirical research using data from social media platforms such as Twitter or Facebook, whether extracted or studied situ, using either manual or automated methods. Studies not in English, dissertations/theses, reports or abstracts, letters to the editor and feature articles and articles intended as marketing or advertising material were excluded. No publication timeframe was applied. See Box 2 for the inclusion and exclusion criteria.

Box 2.

Article inclusion and exclusion criteria.

Inclusion criteria Types of publication: Peer-reviewed research articles. Full conference papersLanguage: EnglishPublication timeframe: NoneTypes of research: Empirical studies using health-related data from social media platforms, extracted or studied in situ, using both manual and automated methods.Exclusion criteria Types of publication: Dissertations/theses; Reports or abstracts only; Letters to the editor; Marketing or advertising material; Reviews or editorialsLanguage: Not EnglishTypes of research: Studies based on data from online sources other than social media (e.g. internet search histories, online news reports). Commercial research aimed at obtaining market intelligence or informing product promotion. Studies examining social media platforms, rather than using them as a source of data. Studies describing social media as a communication or broadcasting channel (e.g. for public health promotion).

Search results were imported into the reference management software, EndNote. Abstracts and titles were initially screened for eligibility and full-text articles were obtained for those considered potentially relevant. Articles found to meet the inclusion criteria were summarised according to author name, author affiliation, publication title, publication year and abstract. Each article was also hand searched, to determine whether the authors referred to ethical considerations or guidelines when describing their study design or analysis. Where this was the case, the relevant text was extracted, tabulated and classified using Conway’s taxonomy.

The components of the study, at each stage, are briefly summarised in Figure 2.

Figure 2.

Focus, objectives and methods at each stage of the study.

Results

RQ1. How do RCUK ethics guidelines address the use of social media in research overall and specifically research using data harvested from social media?

A total of 13 separate ethics guidelines were identified, including 10 produced by RCUK itself or the individual UK research councils, 2 external guidelines recommended within these (BPS, 2012; AoIR, 2012) and one recommended by MRC during the email verification phase (INVOLVE, 2014). Of these, only four guidelines (ESRC, BPS, AoIR, NIHR) mentioned the use of social media in research.

The 13 guidelines are listed in Table 2, which also illustrates the co-referencing of guidelines within the RCUK family; for example, ARHC’s guideline defers to the ESRC’s guideline which, in turn, cites guidelines from the BPS and AoIR. Highlighted in bold are the four guidelines found to include guidance and recommendations specifically relating to the use of social media in research: ESRC, BPS, AoIR and BPS.

Table 2.

Ethics guidelines screened for references to social media uses in research.

Research Council (date)	Guideline title	Includes	Refers to social media
RCUK (2013)	Policy and Guidelines on Governance of Good Research Conduct
AHRC (2016)	Research Funding Guide	RCUKESRC
BBSRC (2017)	BBSRC Research Grants The Guide
EPSRC (2013)	Framework for Responsible Innovation	RCUK
MRC (2012b)	Policy and Guidance on Sharing of Research Data from Population and Patient Studies
MRC (2000)	Personal Information in Medical Research
MRC (2012a)	Good research practice
NERC (2015)	Ethics Policy
STFC (2013)	Public Engagement with Science and Technology	MRCRCUK
ESRC (2015)	Framework for research ethics	BPSAoIRRCUK	x
BPS (2012) In ESRC	Guidelines for ethical practice in psychological research online		x
AoIR (2012) In ESRC	Ethical Decision Making and Internet Research		x
NIHR (2014) Recomm MRC	Guidance on the use of social media to actively involve people in research		x
TOTAL			4

Table 3 illustrates a further level of analysis, focused on the four guidelines that encompassed social media. Based on the adapted version of Bjerglund-Andersen and Söderqvist’s (2012) taxonomy, all four referred to social media as a research tool, three as a source of research data, two each as a medium for scientific discussion, networking or public engagement and none for research dissemination or academic teaching. According to Conway’s (2014) list of ethical considerations in social media research all four of these guidelines referred to privacy and the difference between traditional and social media research, three referred to informed consent and the use of IRBs, two referred to researcher lurking and one to ethical theory. None considered geographical information, the economic value of personal information, exceptionalism or the benefit of identifying sources of potential harm.

RQ2: How are ethical issues and guidelines described in published health research using social media data?

Table 3.

Types of research use and ethical considerations for data re-use.

Research council	Guideline title (date)	Types of social media use in research (RQ1)						Ethical considerations for the use of social media data in research (RQ2)
Research council	Guideline title (date)	1	2	3	4	5	6	A	B	C	D	E	F	G	H	I	J
ESRC	Framework for research ethics (ESRC, 2015)					x	x	x	x	x	x	x
BPS(In ESRC)	Guidelines for ethical practice in psychological research online (BPS, 2012)			x		x	x	x	x			x		x
AoIR(In ESRC)	Ethical Decision Making and Internet Research (AoIR, 2012)		x			x	x	x	x		x	x		x
NIHR (Recomm. MRC)	Guidance on the use of social media to actively involve people in research (INVOLVE, 2014)		x	x		x		x			x	x
Total		0	2	2	0	4	3	4	3	1	3	4	0	2	0	0	0

Bjerglund-Andersen and Söderqvist’s classes of social media use (adapted): 1, Research Dissemination; 2, Scientific discussion/networking; 3, Engaging the public; 4, Academic teaching; 5, Social media as a research tool; 6, Social media as a source of research data.

Conway’s ethical considerations for social media data use: A, Privacy; B, Informed consent; C, Ethical theory; D, IRB approval/regulations; E, Traditional vs social media research; F, Geographical information; G, Research lurking; H, Economic value of personal information; I, Exceptionalism; J, Benefit of identifying sources of potential harm.

The structured search of PubMed yielded 469 potentially relevant studies, of which 156 remained after screening against the inclusion and exclusion criteria. These studies had a variety of aims, including assessing public reactions to health reforms, identifying health behaviours such as medication compliance, understanding health attitudes and sentiments, undertaking post-market surveillance, exploring social networks relevant to health, searching for indicators of infectious and non-communicable disease trends and comparing the value of different social media platforms or tools for analysing health-related events or patterns. Only 50 articles referred to one or more of the ethical concepts, procedures or approval processes specified in Conway’s taxonomy (Figure 3). However, while most of these mentioned IRB approval, only 13 referred to other relevant ethical considerations and five of the ethical considerations in Conway’s taxonomy were not mentioned at all. In order of frequency, the breakdown of ethical considerations was as follows: Research Ethics IRB Approval/Regulation (43), Privacy (26), Informed Consent (16), Ethical Theory (7), Traditional Research vs Social Media Research (3), Researcher Lurking (3), Identifying Potential Harms (2), Geographical Information (0), Economic Value of Personal Information and (0) Medical Exceptionalism (0).

Figure 3.

Number of studies included at each stage of the screening process.

Nine of the studies we identified using PubMed were affiliated with UK-based organisations and their consideration of ethical concepts are further described in Table 4. In short, they described:

a study using data from Facebook and Twitter to examine the usefulness of social media for post-market drug safety surveillance (Powell et al., 2016),

a content analysis of social media data posted on two web forums to monitor the misuse and non-medical use of the antidepressant and smoking cessation drug bupropion (Anderson et al., 2017),

an analysis of the online response to a case of a breastfeeding mother being ejected from a UK retail premises (Grant et al. 2016),

a case study into the difficulties, challenges and rewards of using social media by student nurses through analysing data from a Twitter chat (Sinclair et al., 2015),

a netnographic study of user decision-making, home preparation and consumptive patterns of laudanum (Van Hout and Hearne, 2015),

a study investigating the feasibility of developing predictive models that identify potential superusers of online healthcare support groups (van Mierlo et al., 2017)

a qualitative study into how young people used a youth-orientated, moderated, online, eating disorders discussion forum, run by an eating disorders charity (Kendal et al., 2017),

a thematic analysis of readers’ comments to UK online news reports on the acceptability of financial incentives for breastfeeding (Giles et al., 2015),

a qualitative and quantitative summary of online reaction to media reports to the UK government strategy on childhood obesity in England (Gregg et al., 2017).

Table 4.

Ethical guidelines and concepts referred to in studies using social media data for health-related research found in PubMed.

#	Article title	Categories of ethical consideration, from Conway’s taxonomy										Relevant text
#	Article title	A	B	C	D	E	F	G	H	I	J
1	Campaigns and counter campaigns: reactions on Twitter to e-cigarette education (Allem et al., 2016)	x			x							‘Ethics approval The University of Southern California Institutional Review Board approved all procedures.’ (p. 229)
2	Using Social Listening Data to Monitor Misuse and Nonmedical Use of Bupropion: A Content Analysis (Anderson et al., 2017)	x	x	x	x							‘…Content that is deemed sensitive and is in the public domain sits in a gray zone from an ethical perspective, and the extent of protection for the individuals who write the content and the communities that host the content should be assessed on a case-by-case basis. The community discussions demonstrate that contributors are aware of the public nature of the content that they post, and almost all contributors utilize pseudonyms to mask their identities. Although the subject matter may be seen as sensitive, these elements led the research authors to determine that consent from individual contributors was not necessary to conduct the research. It was also important to maintain any particular contributor’s anonymity, as the extent to which their pseudonym may reveal identifying information about them is unknown to the researchers…. Because our research did not involve intervention or interaction with the individuals, nor is the information individually identifiable, our study did not meet the criteria of the Office for Human Research Protections (OHRP) framework that guides institutional review board (IRB) status. As such, IRB approval was not pursued. Some researchers anonymize the names of the Web forums that they utilize as data in order to further assure confidentiality of the individual contributors or because the group had neither been actively involved in the research nor given consent to be involved …We reviewed the site’s privacy notice and user agreement and determined that gathering data for research purposes was within the scope of permitted uses…. We contacted a third potential data source, Erowid, to request consent and terms of access for gathering samples from their database of user-reported experiences with drugs…. their usage agreement explicitly prohibited data gathering or publishing of analyses without prior permission…Erowid was excluded as a data source for this study.’ (p. 5)
3	Characterizing the followers and tweets of a marijuana-focused Twitter handle (Cavazos-Rehg et al., 2014)				x							‘The Twitter data in the current study is public. The Washington University Institutional Review Board reviewed our study protocol and our research was deemed exempt from human subjects review.’ (p. 3)
4	What Online Communities Can Tell Us About Electronic Cigarettes and Hookah Use: A Study Using Text Mining and Visualization Techniques (Chen et al., 2015)	x			x							‘Research Ethics Statement: Publicly available social media content can be an invaluable complement to data provided by study participants in more explicit research contexts because it is a rich source of information on how behaviors with health impacts may naturally occur in the real world. In order to protect the identities of forum users, we have not provided explicit quotations, but instead described the content in as much detail as possible, both quantitatively and qualitatively, in line with ethical guidelines [44,45]. The work reported in this paper has been certified as exempt from review under 45 CFR 46.101(b), category 4 by the University of California San Diego Institutional Review board (Project #140844X).’ (p. 5)
5	Impact of Twitter intensity, time, and location on message lapse of bluebird’s pursuit of fleas in Madagascar (Da’ar et al., 2016)				x						x	‘Ethical approval: Not required.’ (p. 6)(N.B. The study concerns a bubonic plague outbreak.)
6	When ‘Bad’ is ‘Good’: Identifying Personal Communication and Sentiment in Drug-Related Tweets (Daniulaityte et al., 2016)	x			x							‘The Wright State University institutional review board reviewed the protocol and determined that the study meets the criteria for Human Subjects Research exemption 4 because it is limited to publicly available tweets. Tweets used as examples were modified slightly to ensure the anonymity of Twitter users who had posted them.’ (p. 3)
7	Surveillance Tools Emerging From Search Engines and Social Media Data for Determining Eye Disease Patterns (Deiner et al., 2016)				x							‘With approval from the University of California San Francisco (UCSF) Institutional Review Board, we obtained total weekly counts of all encounters with diagnosis names containing the string “conjunctivi”‘ (p. 1025)
8	How to exploit twitter for public health monitoring? (Denecke et al., 2013)			x								‘we did not yet consider the legal and ethical issues related to the use of data from TV/radio and social-media for public health surveillance as well as the reliability of the collected information’ (p. 339)
9	Computer-assisted update of a consumer health vocabulary through mining of social network data (Doing-Harris and Zeng-Treitler, 2011)				x							‘These records were obtained by another group in our department with internal review board (IRB) approval. IRB approval was given for a member of that group to compare terms to this database for us, returning a yes/no answer.’ (p. 4)
10	#discrimination: The Online Response to a Case of a Breastfeeding Mother Being Ejected from a UK Retail Premises (Grant, 2016)				x							‘As the data were hosted on a website intended for public consumption that did not require membership to access content, ethical approval was not This position was con- firmed by the chair of the Cardiff University School of Medicine Research Ethics Committee.’ (p. 143)
11	Importance of Internet surveillance in public health emergency control and prevention: evidence from a digital epidemiologic study during avian influenza A H7N9 outbreaks (Gu et al., 2014)	x			x							‘All the information obtained online was in simplified Chinese language and released publicly by the websites, but no personal identification information, such as name or email address, was collected. This study was approved by the Institutional Review Board in the Zhejiang Provincial Centers for Disease Control and Prevention.’ (p. 3)
12	Identifying Chinese Microblog Users With High Suicide Probability Using Internet-Based Profile and Linguistic Features: Classification Model (Guan et al., 2015)		x		x							‘Ethical considerations of the study have been reviewed and granted by the Review Board of the Institute of Psychology, Chinese Academy of Sciences’ (p. 2)
13	Public health surveillance of dental pain via Twitter (Heaivilin et al., 2011)	x										‘Username removed to maintain privacy of Twitter user.’ (p. 1049)
14	A cross-sectional examination of marketing of electronic cigarettes on Twitter (Huang et al., 2014)		x		x							‘Ethics approval This study is cleared for ethics by Research Ethics Boards or International Review Boards at the University of Illinois at Chicago (USA)’ (p. 30)
15	Text classification for assisting moderators in online health communities (Huh et al., 2013)				x							‘Thread conversations in WebMD online communities are publicly available. We applied for approval from the University of Washington’s Institutional Review Board (IRB) and received a letter stating that our project is unregulated by IRB because the data is publicly available.’ (p. 999)
16	The Use of Social Media by State Health Departments in the US: Analyzing Health Communication Through Facebook (Jha et al., 2016)				x							‘Compliance with Ethical Standards’ (p. 178)
17	Cadec: A corpus of adverse drug event annotations (Karimi et al., 2015)				x							‘Ethics approval for this project was obtained from the CSIRO ethics committee which classified the work as low risk’ (p. 80)
18	Blogging for weight loss: personal accountability, writing selves, and the weight-loss blogosphere (Leggatt-Cook and Chamberlain, 2012)	x	x	x	x							‘Researching in cyberspace presents challenges. One concerns ethical practice in using material posted in public forums, where a major contested concern is the boundary between public and private, and related issues of consent. In our use of blog content, we concurred…that archived blog content, unprotected by subscription or password access, is public, is intended to be so by its authors, and so does not require consent… In reporting blog content, we exercised judgement to ensure that bloggers were represented sensitively. Reporting also raises ethical concerns around verbatim citation of blog content, since this can facilitate searches to identify individuals …we considered that the overtly public content … did not warrant such protection and, moreover, that blog writers require credit for their published work online. These various arguments were used in gaining institutional ethical consent for the research.’ (p. 965)
19	Using Real-Time Social Media Technologies to Monitor Levels of Perceived Stress and Emotional State in College Students: A Web-Based Questionnaire Study (Liu et al., 2017)		x		x							‘Ethics approval was obtained from the UCLA Research Ethics Board.’ (p. 2)
20	Participatory surveillance of diabetes device safety: a social media-based complement to traditional FDA reporting (Mandl et al., 2014)		x		x							‘Online consent for use of member-provided data for research is obtained when a member first accesses the app.’ (p. 688)‘Ethics approval Boston Children’s Hospital Institutional Review Board.’ (p. 691)
21	Symptom clusters in women with breast cancer: an analysis of data from social media and a research study (Marshall et al., 2016)		x	x	x							‘Ethical approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all individual participants included in the research study of breast cancer survivors. Permission to extract data from the forum MedHelp.com was granted by site managers, and all data from this source were collected anonymously.’ (p. 547)
22	Reaction on Twitter to a Cluster of Perinatal Deaths: A Mixed Method Study (Meaney et al., 2016)	x			x							‘Only data which were publically [sic] available were collected and no attempts were made to contact any individual; therefore, no ethical approval was sought for this study. Despite these data being available to public, there is still an onus to ensure that ethical standards are met. Therefore, in line with other similar studies … personal identity information, including individuals’ Twitter usernames, have been removed from the example tweets presented below.’ (p. 3)
23	What can we learn about the Ebola outbreak from tweets? (Odlum and Yoon, 2015)	x			x						x	‘Ethical approval This study used publically [sic] available data, and analyses meet the criterion for exemption §46.101(b)4 Research, involving the collection or study of existing data, documents, records, pathological specimens, or diagnostic specimens, if these sources are publicly available, or if the information is recorded by the investigator in such a manner that subjects cannot be identified, directly or through identifiers linked to the subjects.’ (p. 563)
24	Towards early discovery of salient health threats: A social media emotion classification technique (Ofoghi et al., 2016)	x										‘All URLs, email addresses, mentions (i.e., @replies and @usernames), and hash tags were replaced by “url”, “emailAddress”, “atSign”, and “hashTag”, respectively.’ (p. 507)
25	Discovering health topics in social media using topic models (Paul and Dredze, 2014)				x							‘Ethics Statement: The work described in this paper was reviewed by the Homewood Institutional Review Board at Johns Hopkins University and received an exemption since all data is publicly available.’ (p. 2)
26	Social Media Listening for Routine Post-Marketing Safety Surveillance (Powell et al., 2016)	x			x							‘Much of the data posted by these patients are publicly available on the Internet, depending on the individual’s use of privacy settings when posting.’ (p. 444)‘For the purpose of this research project, the following additional steps were taken to protect privacy:• Once the data had been de-identified by the vendor, no attempt was made to re-identify the person making the post. As a result, no attempt was made to obtain follow- up information about potential AEs.• Posts from the same person were not linked.’ (p. 446)
27	YouTube: a promotional vehicle for little cigars and cigarillos? (Richardson and Vallone, 2014)				x							‘This study did not require institutional review board approval given that it used freely accessible media.’ (p. 2)
28	SentiHealth-Cancer: A sentiment analysis tool to help detecting mood of patients in online social networks (Rodrigues et al., 2016)	x			x							‘This work is part of a project evaluated and approved by the Ethics Committee under the number 31191214.7.0000.5083/UFG. Moreover, all data collected from online social network were published by the user as a public text in a public group.’ (p. 82)
29	Social Media Mining for Toxicovigilance: Automatic Monitoring of Prescription Medication Abuse from Twitter (Sarker et al., 2016)	x			x							‘Ethical approval: Not applicableInformed consent: Not applicable’ (p. 239)
30	Supplementing Public Health Inspection via Social Media (Schomberg et al., 2016)				x							The University of California Irvine Internal Review Board granted non-human subjects exemption to this study. This classification exempted this study from further University of California Irvine Internal Review Board review.’ (p. 4)
31	Tanning bed burns reported on Twitter: over 15,000 in 2013 (Seidenberg et al., 2016)				x							‘Ethics approval: This article does not contain any studies with human participants performed by any of the authors.’ (p. 275)
32	Data Mining of Web-Based Documents on Social Networking Sites That Included Suicide-Related Words Among Korean Adolescents (Song et al., 2016)	x			x							‘The Institutional Review Board of the authors’ institutions approved the study protocol. No personally identifiable information was available in the collected data.’ (p. 670)
33	Social Big Data Analysis of Information Spread and Perceived Infection Risk During the 2015 Middle East Respiratory Syndrome Outbreak in South Korea (Song et al., 2017)				x							‘The Institutional Review Board at the authors’ institution approved the study protocol of this study.’ (p. 27)
34	Sources of information and behavioral patterns in online health forums: observational study (Sudau et al., 2014)	x			x							‘The Ethics Committee of the University Medical Center Göttingen confirmed (ref 11/5/13) that ethical approval was not necessary due to the nature of the data (secondary data analysis of anonymized data).’ (p. 12)
35	Do cancer patients tweet? Examining the twitter use of cancer patients in Japan (Tsuya et al., 2014)				x							‘This study was approved by the Institutional Review Board at Yamagata University Faculty of Medicine (H24-133).’ (p. 2)
36	Sharing data for public health research by members of an international online diabetes social network (Weitzman et al., 2011a)	x	x		x							‘All study activities were reviewed and approved by the Children’s Hospital Boston Committee on Clinical Investigation under a model of implied consent that was based on the pre-existing norms for sharing in the community and in alignment with the published privacy policy and terms of use of the site that clearly inform the community about conditions for sharing data and privacy protections.’ (p. 4)
37	Surveillance of an Online Social Network to Assess Population-level Diabetes Health Status and Healthcare Quality (Weitzman et al., 2011b)		x		x							‘Objective: Test a novel health monitoring approach by engaging an international online diabetes social network (SN) in consented health surveillance.’ (p. 1)‘Study activities were reviewed and approved by the Children’s Hospital Boston Institutional Review Board.’ (p. 2)
38	Participatory surveillance of hypoglycemia and harms in an online social network (Weitzman et al., 2013)		x		x							‘Consent is obtained within the app for use of member data. Study activities were approved by the Boston Children’s Hospital institutional review board’ (p. 346)
39	Public Trauma after the Sewol Ferry Disaster: The Role of Social Media in Understanding the Public Mood (Woo et al., 2015)	x										‘There is no information that could potentially reveal the identity of social media user, namely user confidentiality is maintained.’ (p. 10977)
40	‘Trip-Sitting’ in the Black Hole: A Netnographic Study of Dissociation and Indigenous Harm Reduction (Hearne and Van Hout, 2016)	x				x		x				‘Data collection and analysis were regarded as observations of online public behavior, in accordance with the ethical protocols and recommendations of SACHRP (Secretary’s Advisory Committee for Human Research Protections 2013). As per the ethical and methodological protocols of passive netnography, the researcher did not make contact with fora users and upheld observational status at all times. Anonymity of the fora and its members was safeguarded by removal of user pseudonyms and URLs to the site, and illustrative quotations were paraphrased by the team with consensus in interpretation to avoid backtracking of these through Internet search engines’ (p. 234–235)
41*	Demographic and Indication-Specific Characteristics Have Limited Association With Social Network Engagement: Evidence From 24,954 Members of Four Health Care Support Groups (van Mierlo et al., 2017)	x			x							‘All data collection policies and procedures adhered to international privacy guidelines and were in accordance with the Helsinki Declaration of 1975, as revised in 2008. The study was consistent with the University Research Ethics Committee procedures at Henley Business School, University of Reading, and was exempt from full review.’ (p. 5)
42*	Public reaction to the UK government strategy on childhood obesity in England: A qualitative and quantitative summary of online reaction to media reports (Gregg et al., 2017)	x	x	x	x	x						‘Ethical approval was awarded by the Manchester Metropolitan University ethics committee prior to data collection. We have followed the guidance for internet- mediated research from the British Psychological Society and adhered to copyright laws in conducting this work. Direct consent could not be attained because of the nature of the data collection however; implicit consent was deemed to have been given by virtue of posting in an open forum. No directly identifiable data were collected, comments were disassociated from usernames prior to analyses.’ (p. 451)
												‘The ethics of methods that make use of readily available online data are less developed than an “off line” project which relies on the informed consent of its participants for its moral sufficiency. The participants in this study could not have anticipated that their contributions would be used for this research therefore; they have not provided their consent in the usual sense. However, participants have contributed to public online fora with the intention of having their views heard and by doing so they have sought to influence the debate surrounding the government’s childhood obesity strategy. Notwithstanding that, the participants have not provided their express consent, in our view and as agreed by the ethics review board, it is reasonable to infer that by their actions they implied consent to the use of their contributions in subsequent debate. While data collection without prior notice raises ethical issues, it has the advantage of allowing a level of candour that may not be forthcoming where a researcher declares their presence and seeks explicit consent. Some of the comments were inappropriate but have the advantage that they were open and reflect the reality of citizens’ experience. It is conceivable that we may have taken steps to notify participants of our intention to use material for research. For example by seeking to amend the terms of the forum with the cooperation of the site owners, or simply by joining the discussions and announcing our presence and thereby imputing acquiescence from continued participation rather than express consent. On balance, we took the view that such action might influence discussions and undermine the authenticity of the debate and, was disproportionate to the risk of any harm.’ (pp. 456–457)
43	Doing Recovery Online (Mudry and Strong, 2013)	x	x		x							‘We obtained ethical approval from our university research ethics board, and the first author gained informed consent from the participants involved. Participation was voluntary, anonymous, and confidential. We did not collect any personally identifying information and all participants remained anonymous. We extracted and present verbatim quotes (punctuation corrected, grammar uncorrected) as exemplars.’ (p. 315)
44	Patterns of Treatment Switching in Multiple Sclerosis Therapies in US Patients Active on Social Media: Application of Social Media Content Analysis to Health Outcomes Research (Risson et al., 2016)	x			x							‘… The secondary data source used for the analysis meets all the US Health Insurance Portability and Accountability Act (HIPAA) compliance standards, ensuring patient anonymity. As such, approval from an Institutional Review Board was not necessary.’ (p. 3)
45*	How a moderated online discussion forum facilitates support for young people with eating disorders (Kendal et al., 2017)	x	x		x							‘Online communities for young people are recognized as a valid focus for research, and the forum was in the public domain. However, there is also a debate on the ethical issues around using publicly accessible online discussion for research. The identities of the forum users were unknown so we were not in a position to obtain individual consent from them. Instead, the charity gave us proxy consent to access and use the posts. To enhance transparency, we advertised and explained our research on the charity’s website and Twitter feed before commencing the study. We protected the privacy of the forum users by removing terms and phrases that could identify them, including the name of the charity. We obtained ethical approval for this study in March 2012 from the UK NHS National Research Ethics Service and the University of Manchester Research Ethics Committee.’ (p. 102)
46*	To Twitter to Woo: Harnessing the power of social media (SoMe) in nurse education to enhance the student’s experience (Sinclair et al., 2015)	x	x	x								‘As all Twitter chat transcripts are made freely available to anyone as a record afterwards at www.wenurses.co.uk. it wasn’t felt there were any consent issues involved for the participants. Indeed, the case study was constructed around an analysis of the participant’s tweets, all of which are in the public domain and open to scrutiny.’ (p. 509)‘The NMC (2012) state that it should be assumed everything posted online is public, permanent and shared, even with the strictest privacy settings and we also needed to get this message across to students effectively. Understanding this can prevent unintentional postings that may cross professional boundaries. The “privacy illusion” is discussed by Aylott (2011) suggesting that nurses often believe that their strict privacy settings will protect their posts, when in fact this is purely an illusion….’ (p. 511)
47	Analysis of Patient Narratives in Disease Blogs on the Internet: An Exploratory Study of Social Pharmacovigilance (Matsuda et al., 2017)		x		x							‘The study protocol was reviewed and approved by the nonprofit MINS Institutional Review Board. The board waived informed consent because the data source did not contain personal information. In addition, we presented the data at the group level rather than at the individual level.’ (p. 6)
48	Health Risk Information Engagement and Amplification on Social Media: News About an Emerging Pandemic on Facebook (Strekalova, 2017)				x							‘An institutional review board approval was obtained to ensure that data collection and analysis were compliant with ethical standards for behavioral research.’ (p. 333)
49*	Acceptability of financial incentives for breastfeeding: thematic analysis of readers’comments to UK online news reports (Giles et al., 2015)	x	x	x	x	x		x				‘The chair of the Newcastle University Faculty of Medical Sciences research ethics committee confirmed that ethical approval was not required for this study. However, we did consider, in detail, numerous ethical issues arising from this research and report on these in the discussion section.’ (p. 5)‘Netnography is still a relatively new approach to data collection and analysis and there is limited guidance on the ethics of using this approach. As such, we feel it is appropriate to explicitly discuss the main ethical issues raised in some detail. The comments analysed here were not provided for research purposes, and commenters are not aware that we have used them for this. They have, therefore, not provided informed consent to take part in the research…We sought permission from the websites involved to use their content and adhered to copyright guidelines throughout…We did not identify ourselves as researchers and observers to the online communities. This was primarily because we chose, as others have done, not to interfere with comments and discussions as they developed. This meant that as researchers we did not influence the data included in the research – as might have been the case in more traditional interviews or focus groups…To preserve the anonymity of commenters, we have been careful not to include any details in quotations that could have identified the commenter. We also followed best practice guidance provided by the British Psychological Society.’ (pp. 11–12)
50*	‘Vintage Meds’: A Netnographic Study of User Decision-Making, Home Preparation, and Consumptive Patterns of Laudanum (Van Hout and Hearne, 2015)	x						x				‘In compliance with unobtrusive and naturalistic features of netnographic research, researchers acknowledged the dynamics of the public online “drug user” environment, maintained observational status, and respected the inherent flexibility and openness of the approach. Confidentiality measures applied to the dataset included storage in an online, password-protected computer and removal of screen pseudonyms, URLs, country and city identifiers’ (p. 600)
	Total	26	16	7	43	3	0	3	0	0	2

Key to Conway’s categories of ethical consideration: A, Privacy; B, Informed consent; C, Ethical theory; D, IRB/regulation; E, Traditional research vs social media research; F, Geographical information; G, Researcher lurking; H, Economic value of personal information; I, Medical exceptionalism; J, Benefit of identifying socially harmful medical conditions.

The first two of these were authored by researchers from the pharmaceutical sector while the remaining seven were from UK universities. Of these nine UK studies, two (Giles et al., 2015; Gregg et al., 2017) referenced the research ethics guidelines produced by the BPS, as identified in part 1 of our study, while none referred to the guidelines developed by RCUK or its member councils.

The number of papers identified at each stage of the search process is shown in Figure 3, while Table 4 provides a breakdown of the ethical considerations represented in each of the included articles, along with illustrative quotations.

Discussion

Our analysis indicates significant gaps in the ethical governance of research using data mined from social media, illustrated by the incompleteness and inconsistency of current guidelines and an absence of ethical discourse in published research articles.

Status of RCUK ethics guidelines on social media and social media data

Of the seven multi-disciplinary ethics guidelines published by RCUK, only one (ESRC) specifically considered the use of social media in research, despite such research now straddling the remits of many national funding agencies. Two research councils (ESRC, MRC) nevertheless recommended guidelines from other bodies (AoIR, BPS, NIHR/INVOLVE), generating a corpus of four social-media relevant guidelines for UK researchers. These referred to social media as a set of research tools (4/4), as a source of data (3/4), as a means of public engagement (2/4) and as a channel for scientific discussion and networking (2/4), but did not mention their use for research dissemination or teaching, which also appear in our adapted version of Bjerglund-Andersen and Söderqvist’s (2012) taxonomy. With specific reference to the mining and re-use of social media data, these guidelines prioritised privacy (4/4), differences between digital and conventional research (4/4), informed consent (3/4), IRB approval/regulation (3/4) and researcher lurking (2/4), although none of the other four ethical considerations in Conway’s (2014) framework were covered. Although MRC was the source of three research ethics guidelines, none referred to the use of social media, in contrast to their detailed consideration of ethical issues surrounding the re-use of institutional and scientific datasets, where most of the UK’s ‘big data’ investments are taking place. ESRC provided the most comprehensive overview of social media ethics, also deferring to the external AoIR and BPS guidelines, likely reflecting the importance of digital social research within ESRC’s portfolio. While these differences between research councils are to some extent understandable, they indicate a segmentation of data ethics along disciplinary lines, which is unhelpful in an environment where interdisciplinary projects are the norm, rather than the exception, underscoring the need for collaboration and agreement on universal principles.

Our focused analysis of articles indexed in PubMed also indicates a widespread neglect of ethical issues amongst research practitioners using social media data in health-related studies. Where ethical issues were discussed, this tended to centre on the procedures and requirements necessary to obtain IRB approval, such as demonstrating an awareness of privacy risks and determining whether consent was necessary, rather than showing a deeper concern with the moral or societal implications of repurposing information that people have shared for reasons other than research. Indeed, many published studies either did not mention ethical issues at all or simply stated that the data were available in the public domain and consent was therefore not required. While articles containing more comprehensive and thoughtful ethical discussion were found (Anderson et al., 2017; Gregg et al., 2017; Leggatt-Cook and Chamberlain, 2012), few studies using social media data considered the full range of ethical issues articulated in Conway’s taxonomy. These 50 studies prioritised IRB approval/regulation (43), privacy (26), informed consent (16), ethical theory (7), traditional vs social media research (3), researcher lurking (3) and the benefit of identifying potential harms (2). None of the other three considerations in Conway’s framework of ethical considerations were covered. Significantly, of the nine eligible articles originating from UK institutions, only two referred to the RCUK guidelines, suggesting either a lack of awareness or a strategic neglect, both of which indicate the need for better communication and training.

As already noted, ESRC was the only UK research council whose own ethics guidelines explicitly considered the use of social media in research. Their Framework for Ethics (2015) includes a detailed overview of relevant issues, along with examples, and illustrates the potential for ethics guidelines to evolve in response to emerging innovations. While the earlier version of this framework (ESRC, 2010) advised that research involving respondents through the Internet, may ‘involve more than minimal risk’, no specific examples of risk were provided to guide researchers in this assessment. This lack of specific guidance was also reflected in the ‘frequently asked questions’ section dealing with Internet searches, where it was simply noted that the rapidly evolving nature of the field and the use of web pages and instant messaging for research purposes ‘pose new ethical dilemmas’ that need to be addressed. In contrast, the guidelines published in January 2015 refer explicitly to ethical considerations associated with the use of social media as a research tool and as a source of research data. These include uncertainties over how to apply ethical concepts such as ‘privacy’ and ‘anonymity’, which may be interpreted differently by social media users and researchers, and the potential sensitivity of topics discussed in these settings, such as health issues. They caution that, while information intentionally published on the Internet is ‘in the public domain’, the identity of individuals should be protected unless it is critical to the research, such as in studies analysing statements by public officials. ESRC’s 2015 guidelines also advise researchers to abide by the regulations and permissions set by the data holders (e.g. Twitter, Facebook), particularly when these are required for compliance with data protection legislation, bearing in mind that such research may cross legislative jurisdictions. The framework also benefits from deferring to two internet-specific research ethics guidelines developed by the BPS and the AoIR.

While the BPS guidelines do not explicitly refer to social media, this is implied in the term ‘internet-mediated research’, which encompasses the use of online platforms as means of engaging the public, as a set of research tools and as a source of data for secondary uses, consistent with our taxonomy. A total of 10 ethical considerations are highlighted, which overlap with but are somewhat different from those provided by Conway. These include verifying identity, private versus public space, informed consent, levels of control, withdrawal, debriefing, deception, monitoring, protection of participants and researchers, and data protection. These are grouped into four sectors of a grid, relating to whether participants are actively recruited or are unaware of their involvement in the study, as well as whether they are identified or anonymous. Although the BPS guidelines go some way towards providing actionable recommendations for researchers, they should not be considered exhaustive, given that only four of the 10 ethical concepts identified by Conway (privacy, informed consent, IRBs and researcher lurking) are addressed. A newer BPS guideline, currently under beta-testing, has extended the 2012 framework but, as yet, does not refer to social media specifically (BPS, 2017). Based on our study, we recommend including this.

The AoIR is a widely recognised international academic association dedicated to the advancement of the cross-disciplinary field of Internet studies. The AoIR ethics guideline referred to by the ESRC (AoIR, 2012), outlines several high-level themes, including the difficulty of understanding whether such research involves ‘human subjects’ for the purposes of ethics approval, differentiating ‘public from private’, conceptualising data or text as an extension of ‘persons’, and reconciling ‘top down versus bottom-up approaches’ for managing potential harms and benefits of research. The document includes an extensive list of considerations, such as understanding the context of the research, the primary objective of the research, how the data will be accessed, stored and disseminated, and the rights of participants, who may be unaware that their data are being used. Unlike the BPS guideline, the AoIR guideline explicitly mentions social media, and gives examples of social media data uses that present ethical challenges.

Given the potential sensitivity of medical information available online, it is somewhat surprising that the MRC does not provide specific guidance for researchers conducting studies using social media data. Nevertheless, in their email verifying this, the MRC recommended that we review the guidance provided by the NIHR as part of the INVOLVE advisory group. INVOLVE was established by NIHR in 1996 to support active public involvement in NHS, public health and social care research. In 2014, they published ethics guidelines on using social media to engage citizens in public debate and research, as a forum for scientific discussion and networking, and as tool for undertaking research and consultation. They list the types of social media platforms available, provide case studies of their use, outline the benefits and challenges, consider how to manage risk, and offer tips based on researcher experience. Applying Conway’s taxonomy to the NIHR guidance, however, indicates that only three of the 10 ethical concepts are addressed, namely; privacy, the use of IRBs and the difference between traditional and social media based research. These reflect the public-engagement remit of INVOLVE, which may explain why the secondary use of social media data for research is not discussed explicitly.

The absence of any reference to research using social media in the remaining RCUK guidelines is noteworthy. Whilst in some cases this is entirely understandable, for example the STFC focuses primarily on particle and nuclear physics and science infrastructure, in others it would seem appropriate to include these new forms of data. For example, one EPSRC project in which the second author is involved specifically focuses on the use of social media, crowdsourcing and citizen science, albeit driven by computer scientists (SOCIAM; see http://sociam.org). This project includes themes in health and social science, illustrating how social media research transects disciplinary boundaries and may potentially fall within the scope of several ethics bodies.

The following quotation from the AoIR (2012) guideline neatly illustrates the need for this trans-disciplinary thinking.

‘Manipulation and close study of information generated by social media networks certainly constitutes a different research environment than sticking a needle into a volunteering person in a medical laboratory. On the other hand, entire communities have felt harm from use of their DNA data more than a decade after it was collected and anonymously aggregated’ (AoIR, 2012: p. 13)

Ethical maturity of health research using social media data

The paucity of ethical considerations in the health-related research identified via PubMed is noteworthy; indeed, very few relevant studies went further than acknowledging consultation with their IRB, which is primarily undertaken for instrumental reasons. Those that did originated predominantly from the sub-field of primary care research or from researchers based in pharmaceutical companies routinely subjected to ethical oversight. Although very few studies were affiliated with UK research organisations, it is troubling to see that only two of the nine we identified referred to the RCUK or associated ethics guidelines.

The dominance of instrumental over moral considerations seen in the scientific papers we reviewed, suggests that researchers using these methods are heavily dependent on IRBs and journal editors to play the role of their ethical conscience. It is therefore essential that ethics committees and editors evaluating research using social media data are aware of the range of platforms available and how they work, and can draw on the latest interdisciplinary guidelines to inform their decision-making. We recommend that editors and peer-reviewers seek authors’ explanations of the ethical challenges they faced and how these were managed during the conduct of their studies, therefore enabling greater transparency and encouraging knowledge sharing within the research community.

Policy implications

Despite their use now being common, the emergence of social media and other online platforms has taken traditionally slow-moving governments and academic institutions somewhat off-guard. Uncertainties about what is appropriate, acceptable, legal and responsible in these new virtual spaces, and for different forms of digital personal information, has also fuelled broader debates. These include debates around the need for ‘net neutrality’ or equal access to online content and services amongst all users (McKee, 2011), how to maintain control of key Internet domain names in the global public interest (Mackey et al., 2014) and calls for a ‘Magna Carta for Data’ (Kiss, 2014; O’Sullivan, 2017). Moreover, it is contributing to the dilemma of governments seeking to generate economic, scientific and societal value from existing data assets whilst also protecting citizens from unwanted surveillance and intrusion. Health research is one area in which this discussion has been particularly acute, due to the traditionally stringent ethical demands placed on the protection of confidentiality. In the UK, the growing use of health records for research (Knapton, 2014), coupled with public disquiet over controversial programmes such as Care.Data (Boseley, 2016) and Google DeepMind’s Streams project (Wakefield, 2017) have focused considerable policy attention on the need for ethical and robust governance when it comes to the use of patient information (e.g. Richards et al., 2015; National Data Guardian, 2017). In this context, it is noteworthy that, by comparison, the ethics of using social media data in health research has been somewhat neglected, albeit such data is seldom managed by the state or by healthcare institutions with a duty to protect it. It is nevertheless arguable that the same principles of respect, confidentiality and protection from harm or embarrassment should be followed as would be expected in any other form of bona fide research.

Caveats and opportunities for further research and development

Our review of ethics guidelines was limited to those provided or recommended by RCUK and its seven UK Research Councils and we are aware of other relevant guidelines developed by UK-based researchers (Convery and Cox, 2012) and organisations beyond the scope of this study (e.g. NCCPE; see http://www.publicengagement.ac.uk/work-with-us/completed-projects/ethics-cbpr/resources/ethical-guidelines-web-resources). We recommend further research involving a wider corpus of research ethics guidelines, to test the generalisability of our results in the UK, and as a means of catalysing the development of internationally applicable ethics guidelines for research involving social media platforms and data.

The variable coherence, consistency and navigability of the RCUK websites presented a challenge for identifying relevant ethics guidelines, particularly in the case of MRC and EPSRC. For MRC, this was mainly due to its diverse portfolio of specialised guidelines, covering topics from clinical trial management through to the use of human tissue samples. For EPSRC the distribution of ethical information represented more difficulties, with a list of high-level ethical considerations accompanied by hyperlinks to the RCUK framework and a variety of external sources, many with little or no annotation. One exception is the ‘Framework for Ethical and Responsible Innovation’, which arose from an EPSRC-funded research project and is referenced repeatedly on the website, although its full text is only accessible via a hyperlink to the authors’ journal proof. We recommend action to improve consistency amongst RCUK members in their presentation of ethical guidance, including appropriate content tagging, to avoid confusion and facilitate access to relevant advice for researchers using social media in their studies.

The multiplicity of departmental and institutional ethics committees operating within UK universities and research organisations adds further complexity to this landscape. New empirical studies are needed, to shed light on the ways in which such committees are addressing approval requests for studies involving the reuse of data from social media, including which published guidelines they refer to, whether they have their own written policies for this type of research, and whether disciplinary affiliation affects decision making.

Our review of relevant health-related research indexed in one database was intended as an exploratory scoping exercise and should be regarded as indicative rather than exhaustive. We are currently undertaking a comprehensive, rigorous, multi-database, systematic review of data mining research in health, which will inevitably yield further studies. Nonetheless our current results provide valuable insights into the ethical maturity of research involving social media mining and echo the gaps seen in the guidelines we reviewed. We recommend similar analyses of ethical considerations in published articles from other disciplines where social media data are being mined for research, including computer science, the social sciences, economics, business studies, political science and criminology, to name but a few. Given the growing research activities of major social media providers and businesses, research indexed in the scientific literature may represent only the tip of the iceberg, and finding new ways of obtaining access to commercial research would also be worthwhile, although the monetisation of data insights and intellectual property restrictions will inevitably present barriers.

The scope of our analysis did not extend to legal or regulatory aspects of information governance in the context of social media data, which are designed to control or limit certain forms of research. In contrast, ethical guidelines aim to ensure research integrity, discourage irresponsible or socially unacceptable research conduct and support the prioritisation of studies likely to benefit rather than harm society. Likewise, we did not seek to compare methodological innovations such as automated data mining, social network analysis, machine learning or ‘black box’ algorithms, which also present challenges around consumer choice, control and privacy (Pasquale, 2015). Comparable analyses conducted from each of these perspectives are warranted.

Conclusions and recommendations

Beyond statements about IRB approval, the generally poor integration of ethical concepts and guidelines within the corpus of published articles we have reviewed suggests low levels of awareness amongst researchers using social media mining in their studies, echoing observations from other areas of ‘big data’ research (e.g. Metcalf et al., 2017). This is consistent with the wide variability we have observed in the research ethics guidance offered by RCUK members in relation to uses of social media platforms and the data derived from them. Our finding that only one RCUK council (ESRC) directly refers to social media research in its ethical guidance is a cause for concern, given the highly interdisciplinary nature of studies in this area, as illustrated by our analysis of relevant health-related publications.

We recommend further cross-council collaboration to develop shared, interdisciplinary guidelines for the ethical use of social media in research, and specifically research involving the harvesting and reuse of social media data.

In the shorter term, effort should be invested to improve consistency in the presentation, accessibility and comprehensiveness of existing ethical guidance available on the various RCUK websites. For example, we observed that some websites are difficult to navigate and contain highly distributed and poorly connected information on ethics, approval processes and regulation. Adequate literature review to ensure the timely inclusion of relevant guidance from other sources is also required; for example, we came across a guide to ethics in social media research which had emerged from a project part-funded by ESRC and EPSRC but was not mentioned on either of their websites (Evans et al., 2015).

Future RCUK ethics guidelines would also benefit from including a broader range of social media uses, clear criteria for judging projects against a variety of ethical considerations, and pragmatic recommendations for researchers planning to undertake studies involving social media.

Until such meta-guidelines are available, we recommend that UK researchers prioritise the existing guidelines produced by the ESRC, BPS, AoIR and NIHR, alongside the ethical taxonomies we have adapted for this study. We also encourage researchers to explore the wider universe of ethical frameworks emerging nationally and internationally in relation to new forms of data, including those from the OECD (2016), the US Council for Big Data Ethics and Society (Metcalf et al., 2017) and the UK Data Service (Bishop, 2017) as well as emerging initiatives such as the UK Society for Data Miners’ plans to develop ethical principles (SocDM, 2017) and primary research exploring the boundaries of public acceptability in the reuse of digital personal data (e.g. Aitken et al., 2016; Williams et al., 2017).

We recommend that UK researchers applying for project funding or permission to undertake studies using social media data should explicitly state which ethics guidelines they have consulted, and we call upon IRBs to integrate this requirement into their approvals documentation. We also call upon authors and editors to ensure that publications describing studies involving social media data clearly state the ethical issues that have been considered during the research and specify the guidelines consulted.

Given the substantial investments made in digital research and data science by the UK government and research councils over the last 5 years, coupled with increased policy attention on responsible research and innovation (European Commission, 2013) and the protection of personal data (European Parliament, 2016), ensuring the robust design and implementation of ethical guidelines for social media research is essential.

We hope that the results of this scoping study will inform the future development of such guidelines in the UK and elsewhere, and catalyse a broader interdisciplinary discussion amongst research councils, institutional ethics boards and researchers themselves.

Footnotes

Acknowledgements

None.

Abbreviations

AoIR: Association of Internet Researchers

AHRC: Arts and Humanities Research Council

BBSRC: Biotechnology and Biological Sciences Research Council

BPS: The British Psychological Society

EPSRC: Engineering and Physical Sciences Research Council

ESRC: Economic and Social Research Council

IRB: Institutional Review Board

MRC: Medical Research Council

NERC: Natural Environment Research Council

NHS: National Health Service (UK)

NIHR: National Institute for Health Research

RCUK: Research Councils United Kingdom

STFC: Science and Technology Facilities Council

Competing Interests

This study was conducted as part of JT’s self-funded PhD research project, supervised by CP. JT is also an employee of Ernst and Young Ltd and CP is an RCUK grant holder. Neither organization was involved in the study design, data collection and analysis, decision to publish or preparation of the manuscript.

Contributorship

Both authors contributed to study conception, planning, analysis and manuscript writing. JT designed and undertook the searches and the email verification exercise, screened the outputs, extracted the data and classified these according the specified taxonomies, with input from and cross-checking by CP.

Ethical approval

This study adheres to the Research Ethics Policy of the University of Edinburgh Medical School and was approved by its Internal Review Board.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: CP is an RCUK grant holder and collaborator on the ESRC Administrative Data Research Centre for Scotland (grant number ES/L007487/1), the MRC Farr Institute for Health Informatics Research UK (grant number MR/K007017/1) and the EPSRC Science and Practice of Social Machines (grant number EP/J017728/1).

Guarantor

Not Applicable.

Notes

References

AHRC (2016) Research Funding Guide. Available at: http://www.ahrc.ac.uk/documents/guides/research-funding-guide/ (accessed 16 October 2017).

Aitken

de St Jorre

Pagliari

et al . (2016) Public responses to the sharing and linkage of health data for research purposes: a systematic review and thematic synthesis of qualitative studies. BMC Med Ethics 17: 73.

Allem

Escobedo

Chu

et al . (2016) Campaigns and counter campaigns: reactions on Twitter to e-cigarette education. Tob Control 26: 226–229.

Anderson

Bell

Gilbert

et al . (2017) Using social listening data to monitor misuse and nonmedical use of bupropion: a content analysis. JMIR Public Health Surveill 3: e6.

Anstead

O’Loughlin

(2015) Social media analysis and public opinion: The 2010 UK General Election. J Comput-Mediated Commun 20: 204–220.

AoIR (2012) Ethical Decision-Making and Internet Research. Report from the AoIR Ethics Working Committee (compiled by Markham

Buchanan

). Available at: http://www.aoir.org/reports/ethics2.pdf (accessed 16 October 2017).

Arthur

(2010) 2010: The first social media election. Available at: https://www.theguardian.com/media/2010/apr/30/social-media-election-2010 (accessed 16 October 2017).

Balm

(2014) Open access and social media: helping science move forwards. Available at: http://www.evidentlycochrane.net/open-access-social-media-can-help-science-move-forwards/ (accessed 16 October 2017).

BBSRC (2017) BBSRC Research Grants The Guide. Available at: http://www.bbsrc.ac.uk/documents/grants-guide/ (accessed 16 October 2017).

10.

Bishop

(2017) Big data and data sharing: Ethical issues. Available at: https://www.ukdataservice.ac.uk/media/604711/big-data-and-data-sharing_ethical-issues.pdf (accessed 16 October 2017).

11.

Bjerglund Andersen

Söderqvist

. (2012) Social Media and Public Health Research. Working Paper/Technical Report. University of Copenhagen. Available at: https://bjerglund.files.wordpress.com/2012/11/final-social-media-and-public-health-research1.pdf (accessed 16 October 2017).

12.

Boseley

(2016) NHS to scrap single database of patients’ medical details. Available at: https://www.theguardian.com/technology/2016/jul/06/nhs-to-scrap-single-database-of-patients-medical-details (accessed 16 October 2017).

13.

BPS (2012) Conducting Research on the Internet - Guidelines for ethical practice in psychological research online. Available at: http://www.bps.org.uk/sites/default/files/documents/conducting_research_on_the_internet-guidelines_for_ethical_practice_in_psychological_research_online.pdf (accessed 16 October 2017).

14.

Cardiff University (2012) Collaborative Online Social Media Observatory (COSMOS). Available at: http://www.cardiff.ac.uk/research/explore/research-units/collaborative-online-social-media-observatory (accessed 16 October 2017).

15.

Cavazos-Rehg

Krauss

Grucza

et al . (2014) Characterizing the followers and tweets of a marijuana-focused Twitter handle. J Med Internet Res 16: e157.

16.

Chen

Zhu

Conway

(2015) What online communities can tell us about electronic cigarettes and hookah use: a study using text mining and visualization techniques. J Med Internet Res 17: e220.

17.

Convery

Cox

(2012) A review of research ethics in internet-based research. Practit Res Higher Educ 6: 1.

18.

Conway

(2014) Ethical issues in using twitter for public health surveillance and research: developing a taxonomy of ethical concepts from the research literature. J Med Internet Res 16: e290.

19.

Da’ar

Yunus

Md Hossain

et al . (2016) Impact of Twitter intensity, time, and location on message lapse of bluebird’s pursuit of fleas in Madagascar. J Infect Public Health 10: 396–402.

20.

Daniulaityte

Chen

Lamy

et al . (2016) “When ‘bad’ is ‘good’”: identifying personal communication and sentiment in drug-related tweets. JMIR Public Health Surveill 2: e162.

21.

David

(2004) Internet research: privacy, ethics and alienation: an open source approach. Internet Res 14: 323–332.

22.

Deiner

Lietman

McLeod

et al . (2016) Surveillance tools emerging from search engines and social media data for determining eye disease patterns. JAMA Ophthalmol 134: 1024–1030.

23.

Denecke

Krieck

Otrusina

et al . (2013) How to exploit twitter for public health monitoring? Methods Inf Med 52: 326–339.

24.

Doing-Harris

Zeng-Treitler

(2011) Computer-assisted update of a consumer health vocabulary through mining of social network data. J Med Internet Res 13: e37.

25.

EPSRC (2013) Framework for Responsible Innovation. Available at: http://www.epsrc.ac.uk/research/framework/

26.

ESRC (2010) ESRC Framework for Research Ethics. Available at: http://www.esrc.ac.uk/files/funding/guidance-for-applicants/esrc-framework-for-research-ethics-2010/ (accessed 16 October 2017).

27.

ESRC (2015) Framework for research ethics. Available at: http://www.esrc.ac.uk/files/funding/guidance-for-applicants/esrc-framework-for-research-ethics-2015/ (accessed 16 October 2017).

28.

European Commission (2013) Options for Strengthening Responsible Research and Innovation - Report of the Expert Group on the State of Art in Europe on Responsible Research and Innovation. Available at: http://ec.europa.eu/research/science-society/document_library/pdf_06/options-for-strengthening_en.pdf (accessed 16 October 2017).

29.

European Parliament (2016) Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation).

30.

Evans

Ginnis

Bartlett

(2015) #SocialEthics a guide to embedding ethics in social media research. Available at: https://www.ipsos.com/sites/default/files/migrations/en-uk/files/Assets/Docs/Publications/im-demos-social-ethics-in-social-media-research-summary.pdf (accessed 16 October 2017).

31.

Eysenbach

Till

(2001) Ethical issues in qualitative research on internet communities. BMJ 323: 1103.

32.

Fromm

(2016) New Study Finds Social Media Shapes Millennial Political Involvement and Engagement. Forbes. June 22^nd. https://www.forbes.com/sites/jefffromm/2016/06/22/new-study-finds-social-media-shapes-millennial-political-involvement-and-engagement/#35cf98e22618 (accessed 16 October 2017).

33.

Giles

Holmes

McColl

et al . (2015) Acceptability of financial incentives for breastfeeding: thematic analysis of readers’ comments to UK online news reports. BMC Pregnancy Childbirth 15: 116.

34.

Good

Clarke

Loguercio

et al . (2012) Building a biomedical semantic network in Wikipedia with Semantic Wiki Links. Database (Oxford) 2012: bar060.

35.

Grant

(2016) “#discrimination”: the online response to a case of a breastfeeding mother being ejected from a UK retail premises. J Hum Lact 32: 141–151.

36.

Gregg

Patel

et al . (2017) Public reaction to the UK government strategy on childhood obesity in England: A qualitative and quantitative summary of online reaction to media reports. Health Policy 121: 450–457.

37.

Chen

Zhu

et al . (2014) Importance of Internet surveillance in public health emergency control and prevention: evidence from a digital epidemiologic study during avian influenza A H7N9 outbreaks. J Med Internet Res 16: e20.

38.

Guan

Hao

Cheng

et al . (2015) Identifying Chinese microblog users with high suicide probability using internet-based profile and linguistic features: classification model. JMIR Ment Health 2: e17.

39.

Heaivilin

Gerbert

Page

et al . (2011) Public health surveillance of dental pain via Twitter. J Dent Res 90: 1047–1051.

40.

Hearne

Van Hout

(2016) “Trip-Sitting” in the black hole: a netnographic study of dissociation and indigenous harm reduction. J Psychoactive Drugs 48: 233–242.

41.

Hilyard

Broniatowski

Dredze

(2015) How Far Can Twitter Reach in Good Survey Research? Available at: http://www.socialsciencespace.com/2015/04/how-far-can-twitter-reach-in-good-survey-research/ http://www.socialsciencespace.com/2015/04/how-far-can-twitter-reach-in-good-survey-research/ (accessed 16 October 2017).

42.

Hokby

Hadlaczky

Westerlund

et al . (2016) Are mental health effects of internet use attributable to the web-based content or perceived consequences of usage? A longitudinal study of European adolescents. JMIR Ment Health 3: e31.

43.

Kemp

(2017) Digital in 2017: Global Overview. We Are Social and Hootsuite. Online. Available at https://wearesocial.com/special-reports/digital-in-2017-global-overview (accessed 16 October 2017).

44.

Huang

Kornfield

Szczypka

et al . (2014) A cross-sectional examination of marketing of electronic cigarettes on Twitter. Tob Control 23(Suppl. 3): iii26–30.

45.

Huh

Yetisgen-Yildiz

Pratt

(2013) Text classification for assisting moderators in online health communities. J Biomed Inform 46: 998–1005.

46.

INVOLVE. (2014) Guidance on the use of social media to actively involve people in research. Available at: http://www.invo.org.uk/wp-content/uploads/2014/11/9982-Social-Media-Guide-WEB.pdf (accessed 16 October 2017).

47.

Jha

Lin

Savoia

(2016) The use of social media by state health departments in the US: analyzing health communication through Facebook. J Community Health 41: 174–179.

48.

Jouhki

Lauk

Penttinen

Sormanen

Uskali

(2016). Facebook’s Emotional Contagion Experiment as a Challenge to Research Ethics. Media and Communication, 4 (4), 75-85. doi:10.17645/mac.v4i4.579

49.

Kaplan

Haenlein

(2010) Users of the world, unite! The challenges and opportunities of Social Media. Business Horizons 53: 59–68.

50.

Karimi

Metke-Jimenez

Kemp

et al . (2015) Cadec: A corpus of adverse drug event annotations. J Biomed Inform 55: 73–81.

51.

Kendal

Kirk

Elvey

et al . (2017) How a moderated online discussion forum facilitates support for young people with eating disorders. Health Expect 20: 98–111.

52.

Kiss

(2014) An online Magna Carta: Berners-Lee calls for bill of rights for web. Available at: http://www.theguardian.com/technology/2014/mar/12/online-magna-carta-berners-lee-web (accessed 16 October 2017).

53.

Knapton

(2014) Health records of every NHS patient to be shared in vast database. Available at: http://www.telegraph.co.uk/news/10565160/Health-records-of-every-NHS-patient-to-be-shared-in-vast-database.html (accessed 16 October 2017).

54.

Koene

Adolphs

(2015) Ethics considerations for Corpus Linguistic studies using internet resources. Available at: http://casma.wp.horizon.ac.uk/wp-content/uploads/2015/04/CL2015-CorpusLinguisticsEthics_KoeneAdolphs.pdf (accessed 16 October 2017).

55.

Leggatt-Cook

Chamberlain

(2012) Blogging for weight loss: personal accountability, writing selves, and the weight-loss blogosphere. Sociol Health Illn 34: 963–977.

56.

Liu

Zhu

et al . (2017) Using real-time social media technologies to monitor levels of perceived stress and emotional state in college students: a web-based questionnaire study. JMIR Ment Health 4: e2.

57.

Mackey

Eysenbach

Liang

et al . (2014) A call for a moratorium on the .health generic top-level domain: preventing the commercialization and exclusive control of online health information. Globaliz Health 10: 62.

58.

Mandl

McNabb

Marks

et al . (2014) Participatory surveillance of diabetes device safety: a social media-based complement to traditional FDA reporting. J Am Med Inform Assoc 21: 687–691.

59.

Marshall

Yang

Ping

et al . (2016) Symptom clusters in women with breast cancer: an analysis of data from social media and a research study. Qual Life Res 25: 547–557.

60.

Matsuda

Aoki

Tomizawa

et al . (2017) Analysis of patient narratives in disease blogs on the internet: an exploratory study of social pharmacovigilance. JMIR Public Health Surveill 3: e10.

61.

McKee

(2011) Policy matters now and in the future: net neutrality, corporate data mining, and government surveillance. Computers and Composition 28: 276–291.

62.

McKee

(2013) Ethical issues in using social media for health and health care research. Health Policy 110: 298–301.

63.

Meaney

Cussen

Greene

et al . (2016) Reaction on Twitter to a cluster of perinatal deaths: a mixed method study. JMIR Public Health Surveill 2: e36.

64.

Metcalf

Keller

Boyd

(2017) Perspectives on Big Data, Ethics, and Society. Available at: http://bdes.datasociety.net/council-output/perspectives-on-big-data-ethics-and-society/ (accessed 16 October 2017).

65.

Moorhead

Hazlett

Harrison

et al . (2013) A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication. J Med Internet Res 15: e85.

66.

MRC (2000) Personal Information in Medical Research. Available at: http://www.mrc.ac.uk/documents/pdf/personal-information-in-medical-research/ (accessed 16 October 2017).

67.

MRC (2012a) Good research practice. Available at: http://www.mrc.ac.uk/news-events/publications/good-research-practice-principles-and-guidelines/ (accessed 16 October 2017).

68.

MRC (2012b) MRC Policy and Guidance on Sharing of Research Data from Population and Patient Studies. Available at: https://www.mrc.ac.uk/publications/browse/mrc-policy-and-guidance-on-sharing-of-research-data-from-population-and-patient-studies/ (accessed 16 October 2017).

69.

Mudry

Strong

(2013) Doing recovery online. Qual Health Res 23: 313–325.

70.

Munson

Cavusoglu

Frisch

et al . (2013) Sociotechnical challenges and progress in using social media for health. J Med Internet Res 15: e226.

71.

Murphy

Link

Hunter Childs

J, et al.

(2014) Social Media in Public Opinion Research: Report of the AAPOR Task Force on Emerging Technologies in Public Opinion Research. Available at: https://www.aapor.org/AAPOR_Main/media/MainSiteFiles/AAPOR_Social_Media_Report_FNL.pdf (accessed 16 October 2017).

72.

National Data Guardian (2017) National Data Guardian (NDG) statement on government response to the NDG Review. Available at: https://www.gov.uk/government/news/national-data-guardian-ndg-statement-on-government-response-to-the-ndg-review (accessed 16 October 2017).

73.

NERC (2015) Ethics Policy. Available at: http://www.nerc.ac.uk/about/policy/policies/nerc-ethics-policy.pdf (accessed 16 October 2017).

74.

Nunan

Yenicioglu

(2013) Informed, uninformed and participative consent in social media research. Int J Market Res 55: 791.

75.

O’Sullivan

(2017) Towards a Magna Carta for Data. Dublin: Royal Irish Academy.

76.

Odlum

Yoon

(2015) What can we learn about the Ebola outbreak from tweets? Am J Infect Control 43: 563–571.

77.

OECD (2016) Research Ethics and New Forms of Data for Social and Economic Research: OECD Publishing. Available at: http://www.oecd-ilibrary.org/science-and-technology/research-ethics-and-new-forms-of-data-for-social-and-economic-research_5jln7vnpxs32-en (accessed 16 October 2017).

78.

Ofoghi

Mann

Verspoor

(2016) Towards early discovery of salient health threats: a social media emotion classification technique. Pac Symp Biocomput 21: 504–515.

79.

Orton-Johnson

(2010) Ethics in online research; evaluating the ESRC framework for research ethics categorisation of risk. Sociol Res Online 15: 13.

80.

Pagliari

Vijaykumar

(2016) Digital Participatory Surveillancce and the Zika Crisis: Opportunities and Caveats. PLoS Negl Trop Dis 10(6) e0004795

81.

Pasquale

(2015) The Black Box Society: The Secret Algorithms That Control Money and Information. Cambridge: Harvard University Press.

82.

Paul

Dredze

(2011) You Are What You Tweet: Analyzing Twitter for Public Health. Available at: http://www.cs.jhu.edu/~mpaul/files/2011.icwsm.twitter_health.pdf (accessed 16 October 2017).

83.

Paul

Dredze

(2014) Discovering health topics in social media using topic models. PLoS One 9: e103408.

84.

Pedersen

Kurz

(2016) Using Facebook for health-related research study recruitment and program delivery. Curr Opin Psychol 9: 38–43.

85.

Pells

(2017) Facebook research targeted insecure youth, leaked documents show. Available at: http://www.independent.co.uk/news/media/facebook-leaked-documents-research-targeted-insecure-youth-teenagers-vulnerable-moods-advertising-a7711551.html (accessed 16 October 2017).

86.

Powell

Seifert

Reblin

et al . (2016) Social media listening for routine post-marketing safety surveillance. Drug Saf 39: 443–454.

87.

RCUK (2013) RCUK Policy and Guidelines on Governance of Good Research Conduct. Available at: http://www.rcuk.ac.uk/documents/reviews/grc/rcukpolicyguidelinesgovernancegoodresearchconduct-pdf/ (accessed 16 October 2017).

88.

Richardson

Vallone

. (2014) YouTube: a promotional vehicle for little cigars and cigarillos? Tob Control 23: 21–26.

89.

Risson

Saini

Bonzani

et al . (2016) Patterns of treatment switching in multiple sclerosis therapies in US patients active on social media: application of social media content analysis to health outcomes research. J Med Internet Res 18: e62.

90.

Rodrigues

das Dores

Camilo-Junior

et al . (2016) SentiHealth-Cancer: a sentiment analysis tool to help detecting mood of patients in online social networks. Int J Med Inform 85: 80–95.

91.

Russell

Sprung

McCauley

et al . (2016) Knowledge exchange and discovery in the age of social media: the journey from inception to establishment of a parent-led web-based research advisory community for childhood disability. J Med Internet Res 18: e293.

92.

Sarker

O’Connor

Ginn

et al . (2016) Social media mining for toxicovigilance: automatic monitoring of prescription medication abuse from Twitter. Drug Saf 39: 231–240.

93.

Schomberg

Haimson

Hayes

et al . (2016) Supplementing public health inspection via social media. PLoS One 11: e0152117.

94.

Seidenberg

Pagoto

Vickey

et al . (2016) Tanning bed burns reported on Twitter: over 15,000 in 2013. Transl Behav Med 6: 271–276.

95.

Sinclair

McLoughlin

Warne

(2015) To Twitter to Woo: harnessing the power of social media (SoMe) in nurse education to enhance the student’s experience. Nurse Educ Pract 15: 507–511.

96.

Smith

(2014) Social Networks Are Only Just Getting Started In Mining User Data. Available at: http://www.businessinsider.com/social-medias-big-data-future-2014-2?IR=T http://www.rcuk.ac.uk/documents/reviews/grc/rcukpolicyguidelinesgovernancegoodresearchconduct-pdf/ (accessed 16 October 2017).

97.

SocDM (2017) Society of Data Miners: Towards a Professional Code of Ethics for Data Mining. Available at: https://www.turing.ac.uk/events/socdm-towards-professional-code-ethics-data-mining-part-2/ (accessed 16 October 2017).

98.

Song

Seo

et al . (2017) Social big data analysis of information spread and perceived infection risk during the 2015 Middle East respiratory syndrome outbreak in South Korea. Cyberpsychol Behav Soc Netw 20: 22–29.

99.

Song

Seo

et al . (2016) Data mining of web-based documents on social networking sites that included suicide-related words among Korean adolescents. J Adolesc Health 59: 668–673.

100.

STFC (2013) Public Engagement with Science and Technology Strategic Plan 2013-2016. Available at: http://webarchive.nationalarchives.gov.uk/20150106104755/https://www.stfc.ac.uk/1780.aspx (accessed 16 October 2017).

101.

Strekalova

(2017) Health risk information engagement and amplification on social media. Health Educ Behav 44: 332–339.

102.

Sudau

Friede

Grabowski

et al . (2014) Sources of information and behavioral patterns in online health forums: observational study. J Med Internet Res 16: e10.

103.

Trisha

(2013) Citizen science: amateur experts. Nature 496: 259.

104.

Tsuya

Sugawara

Tanaka

et al . (2014) Do cancer patients tweet? Examining the twitter use of cancer patients in Japan. J Med Internet Res 16: e137.

105.

Tursunbayeva

Franco

Pagliari

(2017). Use of social media for e-Government in the public health sector: A systematic review of published studies. Government Information Quarterly. DOI: 10.1016/j.giq.2017.04.001

106.

Van Hout

Hearne

(2015) “Vintage meds”: a netnographic study of user decision-making, home preparation, and consumptive patterns of laudanum. Subst Use Misuse 50: 598–608.

107.

van Mierlo

Hyatt

et al . (2017) Demographic and indication-specific characteristics have limited association with social network engagement: evidence from 24,954 members of four health care support groups. J Med Internet Res 19: e40.

108.

Vayena

Mastroianni

Kahn

(2012) Ethical issues in health research with novel online sources. Am J Public Health 102: 2225–2230.

109.

Wakefield

(2017) Google DeepMind’s NHS deal under scrutiny. Available at: http://www.bbc.com/news/technology-39301901 (accessed 16 October 2017).

110.

Weitzman

Adida

Kelemen

et al . (2011a) Sharing data for public health research by members of an international online diabetes social network. PLoS One 6: e19256.

111.

Weitzman

Kelemen

Mandl

(2011b) Surveillance of an online social network to assess population-level diabetes health status and healthcare quality. Online J Public Health Inform 3(3): ojphi.v3i3.3797.

112.

Weitzman

Kelemen

Quinn

et al . (2013) Participatory surveillance of hypoglycemia and harms in an online social network. JAMA Intern Med 173: 345–351.

113.

Whiteman

(2012) Undoing Ethics: Rethinking Practice in Online Research. New York: Springer.

114.

Williams

Burnap

Sloan

(2017) Towards an ethical framework for publishing Twitter data in social research: taking into account users’ views, online context and algorithmic estimation. Sociology. doi:10.1177/0038038517708140

115.

Wilson

Gosling

Graham

. (2012) A review of Facebook research in the social sciences. Perspect Psychol Sci 7: 203–220.

116.

Woo

Cho

Shim

et al . (2015) Public trauma after the Sewol ferry disaster: the role of social media in understanding the public mood. Int J Environ Res Public Health 12: 10974–10983.

Mining social media data: How are research sponsors and researchers addressing the ethical challenges?

Abstract

Background:

Methods:

Results:

Conclusions:

Keywords

Introduction

Social media in research

Methodological and ethical challenges

Scope of ethics guidelines considered in this study

Aims

Methods

Theoretical frameworks

Results

Discussion

Status of RCUK ethics guidelines on social media and social media data

Ethical maturity of health research using social media data

Policy implications

Caveats and opportunities for further research and development

Conclusions and recommendations

Footnotes

Acknowledgements

Abbreviations

Competing Interests

Contributorship

Ethical approval

Funding

Guarantor

Notes

References