Abstract
Background:
Data representing people’s behaviour, attitudes, feelings and relationships are increasingly being harvested from social media platforms and re-used for research purposes. This can be ethically problematic, even where such data exist in the public domain. We set out to explore how the academic community is addressing these challenges by analysing a national corpus of research ethics guidelines and published studies in one interdisciplinary research area.
Methods:
Ethics guidelines published by Research Councils UK (RCUK), its seven-member councils and guidelines cited within these were reviewed. Guidelines referring to social media were classified according to published typologies of social media research uses and ethical considerations for social media mining. Using health research as an exemplar, PubMed was searched to identify studies using social media data, which were assessed according to their coverage of ethical considerations and guidelines.
Results:
Of the 13 guidelines published or recommended by RCUK, only those from the Economic and Social Research Council, the British Psychological Society, the International Association of Internet Researchers and the National Institute for Health Research explicitly mentioned the use of social media. Regarding data re-use, all four mentioned privacy issues but varied with respect to other ethical considerations. The PubMed search revealed 156 health-related studies involving social media data, only 50 of which mentioned ethical concepts, in most cases simply stating that they had obtained ethical approval or that no consent was required. Of the nine studies originating from UK institutions, only two referred to RCUK ethics guidelines or guidelines cited within these.
Conclusions:
Our findings point to a deficit in ethical guidance for research involving data extracted from social media. Given the growth of studies using these new forms of data, there is a pressing need to raise awareness of their ethical challenges and provide actionable recommendations for ethical research practice.
Introduction
Social media in research
Technological advances over the past decade have enabled widespread access to the Internet in most countries and the number of social media users has grown to around 2.8 billion people worldwide (Kemp, 2017). Social media are online, often mobile, platforms that support the creation and exchange of user-generated content (Kaplan and Haenlein, 2010), a phenomenon sometimes referred to by the terms Web 2.0 or the Social Web. They include generic platforms for networking, information sharing and content curation, such as Facebook, 1 Twitter, 2 YouTube 3 and LinkedIn 4 ; online forums aimed at specific communities, such as PatientsLikeMe, 5 Mumsnet 6 and BaristaExchange 7 ; some private collaborative work tools such as Trello 8 and Yammer 9 ; and crowdsourcing platforms such as Ushahidi 10 and Zooniverse 11 , although opinions vary as to what precisely does or does not qualify.
Several uses of social media in research have been described in the literature. These include the deployment of social media platforms for the
A number of potential benefits of using social media in research have been described in the literature, including the ability to reach larger numbers of participants than might otherwise be possible (Moorhead et al., 2013), being able to analyse trends and associations within large corpuses of open-access data (Paul and Dredze, 2011), reducing the costs of conducting research in large populations (Munson et al., 2013), greater opportunities for interaction across extended time periods, as may be required in longitudinal or post-market studies (Hokby et al., 2016), providing a channel for social research that is less prone to bias than approaches involving direct contact between researchers and participants (McKee, 2013), involving citizens in the research process (INVOLVE, 2014), being able to curate and enrich biomedical knowledge (Good et al., 2012) and generating new channels for research dissemination (Balm, 2014).
Methodological and ethical challenges
Despite these advantages, the complexity of interactions between individuals, groups and technical systems in these online spaces presents a number of challenges for academics wishing to use social media data in research (Munson et al., 2013). These include the self-selecting nature of social media users, inequalities in access to social media platforms and data, the difficulty of obtaining meaning from heterogeneous data of variable quality and provenance, and a dependence on observing and interpreting what is ‘out there’ in a way that differs from traditional sampling approaches. Arguably, however, the greatest challenges for researchers in this area are ethical ones (David, 2004; Eysenbach and Till, 2001), such as variable perceptions of and unclear boundaries between ‘public’ and ‘private’ spaces, as well as the difficulty of ensuring anonymity and preserving the privacy of data subjects, whose identities may not be disguised or may be easily deduced from their postings and affiliations. Related issues of ownership and intellectual property are also poorly defined and consent to the use of social media data in research is rarely obtained through informed choice, but rather assumed on the basis that users have chosen to place it in the public domain (Koene and Adolphs, 2015; McKee, 2013; Munson et al., 2013; Nunan and Yenicioglu, 2013; Orton-Johnson, 2010; Vayena et al., 2012). Awareness of the potential privacy implications of sharing personal information on social media is growing, driven by newsworthy cases such as Facebook’s experiments in emotion manipulation (Jouhki et al., 2016) and its identification of ‘vulnerable’ teenagers for advertisers (Pells, 2017), or the use of social media by data analytics companies seeking insights into citizens’ political attitudes and networks, to influence voter behaviour (Fromm, 2016; Arthur, 2010). In this environment, pinning down the ethical guidance for researchers is now more critical than ever, with a requirement for any guidance to be responsive and adaptable to the changes invoked by the rapid evolution of social media platforms and data science.
Most research institutions, irrespective of academic discipline, publish or adhere to some form of research ethics guidelines or standard operating procedures, as a means of ensuring the appropriate governance of studies undertaken by their staff and collaborators. While these vary in structure, content and application, they are all intended to ensure responsible and trustworthy research practice and ‘to protect all groups involved in research: participants, institutions, funders and researchers throughout the lifetime of the research and into the dissemination process’ (ESRC, 2010: 2). Social media research is still a relatively new and changing field and commentators have pointed to the destabilisation of traditional ethics and an unsettling of ethical expectations and assumptions for both researchers and Internet users (Whiteman, 2012). This has been compounded by a lack of relevant ethics guidance and poses particular challenges for research involving ‘sensitive’ data, such as information about people’s health conditions, political affiliations or religious beliefs (see https://www.gov.uk/data-protection/the-data-protection-act)
Scope of ethics guidelines considered in this study
Given the growth of research using social media platforms, and its potential implications for information privacy, confidentiality and ownership, it is timely to examine the extent to which existing research ethics guidelines take such uses into account and what additions may be warranted. Social media research is taking place across multiple academic disciplines and applications for research ethics approval may thus defer to a range of different bodies. This presents challenges for the effective oversight of such research where, it has been claimed, ‘no official guidance or answers regarding internet research ethics have been adopted at any national or international level’ (AoIR, 2012). Mindful of the need for a cross-disciplinary perspective, we chose to study one identifiable national corpus of multidisciplinary research ethics guidelines, represented by Research Councils United Kingdom (RCUK).
RCUK is a strategic partnership between the UK’s seven research councils, which according to its homepage, ‘has invested around £3 billion in research covering the full spectrum of academic disciplines from the medical and biological sciences to astronomy, physics, chemistry and engineering, social sciences, economics, environmental sciences and the arts and humanities’ (see http://www.rcuk.ac.uk). They share an aim to ‘advance knowledge and generate new ideas which lead to a productive economy, healthy society and contribute to a sustainable world’. While RCUK itself has published a set of general research ethics guidelines, each of the seven disciplinary bodies in the RCUK family of research councils (see Table 1) provides its own form of ethical advice, either through developing bespoke guidelines or deferring to other relevant guidelines in the literature. For the purposes of our study, the corpus of RCUK ethics guidelines and external guidelines recommended within these was felt to be an appropriate sample to enable a meaningful analysis of the guidance available for academic researchers in the UK.
RCUK umbrella organisation and the seven UK Research Councils.
Aims
We set out to examine how RCUK and affiliated research ethics guidelines acknowledge and deal with research involving social media overall, and specifically research involving data extracted from social media platforms (which we refer to using the generic term ‘mining’). We also wanted to understand how researchers using these new forms of data in their studies are responding to the ethical challenges this presents, by examining how ethical concepts or guidelines are referred to in published research articles. We chose health research as an exemplar area, since it is highly interdisciplinary (transecting the social, medical and computational sciences, amongst others) and in which study results based on social media are being used to inform scientific knowledge and theory, public services and policies, business practices and methodological innovations (e.g. Pagliari and Vijaykumar, 2016; Tursunbayeva et al. 2017).
We are not aware of any previously published studies to have analysed the extent to which the RCUK guidelines address the use of social media data for research purposes, or how ethical concepts and guidelines are being referred to by researchers undertaking relevant projects. Our research therefore sought to answer the following two broad questions.
Methods
Theoretical frameworks
To aid our analysis we drew on two ethical frameworks which, although developed in the context of social media research for health, are sufficiently generic to be applied to any field of research involving the use of social media.
The first is Bjerglund-Andersen and Söderqvist’s (2012) typology of social media uses in research, which delineates five broad categories:
research dissemination;
scientific discussion and networking;
engaging the public;
academic teaching;
research and data collection.
For the reasons already described, we divided the last of these into two qualitatively different categories: first, using social media platforms to
Conway (2014) has gone further by suggesting a taxonomy of ethical considerations specifically relevant to the secondary use of social media data. Although this was developed in the context of Twitter mining for public health surveillance and research, it is applicable to many types of research involving data harvested from social media. This includes 10 specific considerations:
privacy;
informed consent;
ethical theory;
institutional review board (IRB)/regulation;
traditional research versus social media (e.g. Twitter) research;
geographical information;
researcher lurking;
economic value of personal information;
medical exceptionalism;
benefit of identifying socially harmful medical conditions.
While considerations 9 and 10 refer to medical issues, they can also be applied to other topics which are also uniquely sensitive (e.g. research on political attitudes) or are aimed at preventing harm (e.g. analysing extremist discourse), respectively. For the broader purposes of our study we therefore re-labelled them as ‘exceptionalism’ and ‘benefit of identifying potential harms’.
To identify the corpus of ethics guidelines represented by RCUK, the websites of RCUK itself and the seven UK Research Councils were first identified via Google. The websites were then searched by entering the key words ‘ethics’, ‘guidelines’, ‘funding applications’ and variants of these, into their respective search boxes, and the outputs sifted manually. Searches were undertaken by the first author in February 2017.
Where a research agency was found to have more than one current ethics guideline, each of these was included, and in cases where the RCUK guidelines explicitly referred to external guidelines, the relevant source documents were also obtained for further analysis. Individual research councils were also contacted via email, asking them to state whether their organisation had developed or specifically recommended any ethics guidelines concerning the use of social media in research. Responses were received from six out of eight agencies, the non-respondents being the BBSRC and the ESRC.
The following information was extracted from each identified guideline: the name of the originating organisation, the title of the guideline, the date of the most recent version and whether the guideline explicitly referred to the use of social media or related concepts such as online or internet research.
The four guidelines referring to social media were scrutinised, to determine how they corresponded with the (adapted) typology of social media uses in research outlined by Bjerglund-Andersen and Söderqvist (2012). They were further appraised in terms of their reference to Conway’s (2014) list of 10 ethical considerations for research involving social media data. The guideline search and appraisal process is summarised in Figure 1.

Summary of the guideline search and appraisal process.
For the reasons already described, we chose the example of health research using social media data, to explore how relevant ethical considerations and recommendations are being addressed in practice. The online database PubMed was searched up to 28 February 2017, using the structured query shown in Box 1.
The search query applied to PubMed.
The inclusion criteria encompassed peer-reviewed journal articles and conference papers describing empirical research using data from social media platforms such as Twitter or Facebook, whether extracted or studied situ, using either manual or automated methods. Studies not in English, dissertations/theses, reports or abstracts, letters to the editor and feature articles and articles intended as marketing or advertising material were excluded. No publication timeframe was applied. See Box 2 for the inclusion and exclusion criteria.
Article inclusion and exclusion criteria.
Search results were imported into the reference management software, EndNote. Abstracts and titles were initially screened for eligibility and full-text articles were obtained for those considered potentially relevant. Articles found to meet the inclusion criteria were summarised according to author name, author affiliation, publication title, publication year and abstract. Each article was also hand searched, to determine whether the authors referred to ethical considerations or guidelines when describing their study design or analysis. Where this was the case, the relevant text was extracted, tabulated and classified using Conway’s taxonomy.
The components of the study, at each stage, are briefly summarised in Figure 2.

Focus, objectives and methods at each stage of the study.
Results
A total of 13 separate ethics guidelines were identified, including 10 produced by RCUK itself or the individual UK research councils, 2 external guidelines recommended within these (BPS, 2012; AoIR, 2012) and one recommended by MRC during the email verification phase (INVOLVE, 2014). Of these, only four guidelines (ESRC, BPS, AoIR, NIHR) mentioned the use of social media in research.
The 13 guidelines are listed in Table 2, which also illustrates the co-referencing of guidelines within the RCUK family; for example, ARHC’s guideline defers to the ESRC’s guideline which, in turn, cites guidelines from the BPS and AoIR. Highlighted in bold are the four guidelines found to include guidance and recommendations specifically relating to the use of social media in research: ESRC, BPS, AoIR and BPS.
Ethics guidelines screened for references to social media uses in research.
Table 3 illustrates a further level of analysis, focused on the four guidelines that encompassed social media. Based on the adapted version of Bjerglund-Andersen and Söderqvist’s (2012) taxonomy, all four referred to social media as a research tool, three as a source of research data, two each as a medium for scientific discussion, networking or public engagement and none for research dissemination or academic teaching. According to Conway’s (2014) list of ethical considerations in social media research all four of these guidelines referred to privacy and the difference between traditional and social media research, three referred to informed consent and the use of IRBs, two referred to researcher lurking and one to ethical theory. None considered geographical information, the economic value of personal information, exceptionalism or the benefit of identifying sources of potential harm.
Types of research use and ethical considerations for data re-use.
Bjerglund-Andersen and Söderqvist’s classes of social media use (adapted): 1, Research Dissemination; 2, Scientific discussion/networking; 3, Engaging the public; 4, Academic teaching; 5, Social media as a research tool; 6, Social media as a source of research data.
Conway’s ethical considerations for social media data use: A, Privacy; B, Informed consent; C, Ethical theory; D, IRB approval/regulations; E, Traditional vs social media research; F, Geographical information; G, Research lurking; H, Economic value of personal information; I, Exceptionalism; J, Benefit of identifying sources of potential harm.
The structured search of PubMed yielded 469 potentially relevant studies, of which 156 remained after screening against the inclusion and exclusion criteria. These studies had a variety of aims, including assessing public reactions to health reforms, identifying health behaviours such as medication compliance, understanding health attitudes and sentiments, undertaking post-market surveillance, exploring social networks relevant to health, searching for indicators of infectious and non-communicable disease trends and comparing the value of different social media platforms or tools for analysing health-related events or patterns. Only 50 articles referred to one or more of the ethical concepts, procedures or approval processes specified in Conway’s taxonomy (Figure 3). However, while most of these mentioned IRB approval, only 13 referred to other relevant ethical considerations and five of the ethical considerations in Conway’s taxonomy were not mentioned at all. In order of frequency, the breakdown of ethical considerations was as follows: Research Ethics IRB Approval/Regulation (43), Privacy (26), Informed Consent (16), Ethical Theory (7), Traditional Research vs Social Media Research (3), Researcher Lurking (3), Identifying Potential Harms (2), Geographical Information (0), Economic Value of Personal Information and (0) Medical Exceptionalism (0).

Number of studies included at each stage of the screening process.
Nine of the studies we identified using PubMed were affiliated with UK-based organisations and their consideration of ethical concepts are further described in Table 4. In short, they described:
a study using data from Facebook and Twitter to examine the usefulness of social media for post-market drug safety surveillance (Powell et al., 2016),
a content analysis of social media data posted on two web forums to monitor the misuse and non-medical use of the antidepressant and smoking cessation drug bupropion (Anderson et al., 2017),
an analysis of the online response to a case of a breastfeeding mother being ejected from a UK retail premises (Grant et al. 2016),
a case study into the difficulties, challenges and rewards of using social media by student nurses through analysing data from a Twitter chat (Sinclair et al., 2015),
a netnographic study of user decision-making, home preparation and consumptive patterns of laudanum (Van Hout and Hearne, 2015),
a study investigating the feasibility of developing predictive models that identify potential superusers of online healthcare support groups (van Mierlo et al., 2017)
a qualitative study into how young people used a youth-orientated, moderated, online, eating disorders discussion forum, run by an eating disorders charity (Kendal et al., 2017),
a thematic analysis of readers’ comments to UK online news reports on the acceptability of financial incentives for breastfeeding (Giles et al., 2015),
a qualitative and quantitative summary of online reaction to media reports to the UK government strategy on childhood obesity in England (Gregg et al., 2017).
Ethical guidelines and concepts referred to in studies using social media data for health-related research found in PubMed.
Key to Conway’s categories of ethical consideration: A, Privacy; B, Informed consent; C, Ethical theory; D, IRB/regulation; E, Traditional research vs social media research; F, Geographical information; G, Researcher lurking; H, Economic value of personal information; I, Medical exceptionalism; J, Benefit of identifying socially harmful medical conditions.
The first two of these were authored by researchers from the pharmaceutical sector while the remaining seven were from UK universities. Of these nine UK studies, two (Giles et al., 2015; Gregg et al., 2017) referenced the research ethics guidelines produced by the BPS, as identified in part 1 of our study, while none referred to the guidelines developed by RCUK or its member councils.
The number of papers identified at each stage of the search process is shown in Figure 3, while Table 4 provides a breakdown of the ethical considerations represented in each of the included articles, along with illustrative quotations.
Discussion
Our analysis indicates significant gaps in the ethical governance of research using data mined from social media, illustrated by the incompleteness and inconsistency of current guidelines and an absence of ethical discourse in published research articles.
Status of RCUK ethics guidelines on social media and social media data
Of the seven multi-disciplinary ethics guidelines published by RCUK, only one (ESRC) specifically considered the use of social media in research, despite such research now straddling the remits of many national funding agencies. Two research councils (ESRC, MRC) nevertheless recommended guidelines from other bodies (AoIR, BPS, NIHR/INVOLVE), generating a corpus of four social-media relevant guidelines for UK researchers. These referred to social media as a set of research tools (4/4), as a source of data (3/4), as a means of public engagement (2/4) and as a channel for scientific discussion and networking (2/4), but did not mention their use for research dissemination or teaching, which also appear in our adapted version of Bjerglund-Andersen and Söderqvist’s (2012) taxonomy. With specific reference to the mining and re-use of social media data, these guidelines prioritised privacy (4/4), differences between digital and conventional research (4/4), informed consent (3/4), IRB approval/regulation (3/4) and researcher lurking (2/4), although none of the other four ethical considerations in Conway’s (2014) framework were covered. Although MRC was the source of three research ethics guidelines, none referred to the use of social media, in contrast to their detailed consideration of ethical issues surrounding the re-use of institutional and scientific datasets, where most of the UK’s ‘big data’ investments are taking place. ESRC provided the most comprehensive overview of social media ethics, also deferring to the external AoIR and BPS guidelines, likely reflecting the importance of digital social research within ESRC’s portfolio. While these differences between research councils are to some extent understandable, they indicate a segmentation of data ethics along disciplinary lines, which is unhelpful in an environment where interdisciplinary projects are the norm, rather than the exception, underscoring the need for collaboration and agreement on universal principles.
Our focused analysis of articles indexed in PubMed also indicates a widespread neglect of ethical issues amongst research practitioners using social media data in health-related studies. Where ethical issues were discussed, this tended to centre on the
As already noted, ESRC was the only UK research council whose own ethics guidelines explicitly considered the use of social media in research. Their
While the BPS guidelines do not explicitly refer to social media, this is implied in the term ‘internet-mediated research’, which encompasses the use of online platforms as means of engaging the public, as a set of research tools and as a source of data for secondary uses, consistent with our taxonomy. A total of 10 ethical considerations are highlighted, which overlap with but are somewhat different from those provided by Conway. These include verifying identity, private versus public space, informed consent, levels of control, withdrawal, debriefing, deception, monitoring, protection of participants and researchers, and data protection. These are grouped into four sectors of a grid, relating to whether participants are actively recruited or are unaware of their involvement in the study, as well as whether they are identified or anonymous. Although the BPS guidelines go some way towards providing actionable recommendations for researchers, they should not be considered exhaustive, given that only four of the 10 ethical concepts identified by Conway (privacy, informed consent, IRBs and researcher lurking) are addressed. A newer BPS guideline, currently under beta-testing, has extended the 2012 framework but, as yet, does not refer to social media specifically (BPS, 2017). Based on our study, we recommend including this.
The AoIR is a widely recognised international academic association dedicated to the advancement of the cross-disciplinary field of Internet studies. The AoIR ethics guideline referred to by the ESRC (AoIR, 2012), outlines several high-level themes, including the difficulty of understanding whether such research involves ‘human subjects’ for the purposes of ethics approval, differentiating ‘public from private’, conceptualising data or text as an extension of ‘persons’, and reconciling ‘top down versus bottom-up approaches’ for managing potential harms and benefits of research. The document includes an extensive list of considerations, such as understanding the context of the research, the primary objective of the research, how the data will be accessed, stored and disseminated, and the rights of participants, who may be unaware that their data are being used. Unlike the BPS guideline, the AoIR guideline explicitly mentions social media, and gives examples of social media data uses that present ethical challenges.
Given the potential sensitivity of medical information available online, it is somewhat surprising that the MRC does not provide specific guidance for researchers conducting studies using social media data. Nevertheless, in their email verifying this, the MRC recommended that we review the guidance provided by the NIHR as part of the INVOLVE advisory group. INVOLVE was established by NIHR in 1996 to support active public involvement in NHS, public health and social care research. In 2014, they published ethics guidelines on using social media to engage citizens in public debate and research, as a forum for scientific discussion and networking, and as tool for undertaking research and consultation. They list the types of social media platforms available, provide case studies of their use, outline the benefits and challenges, consider how to manage risk, and offer tips based on researcher experience. Applying Conway’s taxonomy to the NIHR guidance, however, indicates that only three of the 10 ethical concepts are addressed, namely; privacy, the use of IRBs and the difference between traditional and social media based research. These reflect the public-engagement remit of INVOLVE, which may explain why the secondary use of social media data for research is not discussed explicitly.
The absence of any reference to research using social media in the remaining RCUK guidelines is noteworthy. Whilst in some cases this is entirely understandable, for example the STFC focuses primarily on particle and nuclear physics and science infrastructure, in others it would seem appropriate to include these new forms of data. For example, one EPSRC project in which the second author is involved specifically focuses on the use of social media, crowdsourcing and citizen science, albeit driven by computer scientists (SOCIAM; see http://sociam.org). This project includes themes in health and social science, illustrating how social media research transects disciplinary boundaries and may potentially fall within the scope of several ethics bodies.
The following quotation from the AoIR (2012) guideline neatly illustrates the need for this trans-disciplinary thinking.
Ethical maturity of health research using social media data
The paucity of ethical considerations in the health-related research identified via PubMed is noteworthy; indeed, very few relevant studies went further than acknowledging consultation with their IRB, which is primarily undertaken for instrumental reasons. Those that did originated predominantly from the sub-field of primary care research or from researchers based in pharmaceutical companies routinely subjected to ethical oversight. Although very few studies were affiliated with UK research organisations, it is troubling to see that only two of the nine we identified referred to the RCUK or associated ethics guidelines.
The dominance of instrumental over moral considerations seen in the scientific papers we reviewed, suggests that researchers using these methods are heavily dependent on IRBs and journal editors to play the role of their ethical conscience. It is therefore essential that ethics committees and editors evaluating research using social media data are aware of the range of platforms available and how they work, and can draw on the latest interdisciplinary guidelines to inform their decision-making. We recommend that editors and peer-reviewers seek authors’ explanations of the ethical challenges they faced and how these were managed during the conduct of their studies, therefore enabling greater transparency and encouraging knowledge sharing within the research community.
Policy implications
Despite their use now being common, the emergence of social media and other online platforms has taken traditionally slow-moving governments and academic institutions somewhat off-guard. Uncertainties about what is appropriate, acceptable, legal and responsible in these new virtual spaces, and for different forms of digital personal information, has also fuelled broader debates. These include debates around the need for ‘net neutrality’ or equal access to online content and services amongst all users (McKee, 2011), how to maintain control of key Internet domain names in the global public interest (Mackey et al., 2014) and calls for a ‘Magna Carta for Data’ (Kiss, 2014; O’Sullivan, 2017). Moreover, it is contributing to the dilemma of governments seeking to generate economic, scientific and societal value from existing data assets whilst also protecting citizens from unwanted surveillance and intrusion. Health research is one area in which this discussion has been particularly acute, due to the traditionally stringent ethical demands placed on the protection of confidentiality. In the UK, the growing use of health records for research (Knapton, 2014), coupled with public disquiet over controversial programmes such as Care.Data (Boseley, 2016) and Google DeepMind’s Streams project (Wakefield, 2017) have focused considerable policy attention on the need for ethical and robust governance when it comes to the use of patient information (e.g. Richards et al., 2015; National Data Guardian, 2017). In this context, it is noteworthy that, by comparison, the ethics of using social media data in health research has been somewhat neglected, albeit such data is seldom managed by the state or by healthcare institutions with a duty to protect it. It is nevertheless arguable that the same principles of respect, confidentiality and protection from harm or embarrassment should be followed as would be expected in any other form of bona fide research.
Caveats and opportunities for further research and development
Our review of ethics guidelines was limited to those provided or recommended by RCUK and its seven UK Research Councils and we are aware of other relevant guidelines developed by UK-based researchers (Convery and Cox, 2012) and organisations beyond the scope of this study (e.g. NCCPE; see http://www.publicengagement.ac.uk/work-with-us/completed-projects/ethics-cbpr/resources/ethical-guidelines-web-resources). We recommend further research involving a wider corpus of research ethics guidelines, to test the generalisability of our results in the UK, and as a means of catalysing the development of internationally applicable ethics guidelines for research involving social media platforms and data.
The variable coherence, consistency and navigability of the RCUK websites presented a challenge for identifying relevant ethics guidelines, particularly in the case of MRC and EPSRC. For MRC, this was mainly due to its diverse portfolio of specialised guidelines, covering topics from clinical trial management through to the use of human tissue samples. For EPSRC the distribution of ethical information represented more difficulties, with a list of high-level ethical considerations accompanied by hyperlinks to the RCUK framework and a variety of external sources, many with little or no annotation. One exception is the ‘
The multiplicity of departmental and institutional ethics committees operating within UK universities and research organisations adds further complexity to this landscape. New empirical studies are needed, to shed light on the ways in which such committees are addressing approval requests for studies involving the reuse of data from social media, including which published guidelines they refer to, whether they have their own written policies for this type of research, and whether disciplinary affiliation affects decision making.
Our review of relevant health-related research indexed in one database was intended as an exploratory scoping exercise and should be regarded as indicative rather than exhaustive. We are currently undertaking a comprehensive, rigorous, multi-database, systematic review of data mining research in health, which will inevitably yield further studies. Nonetheless our current results provide valuable insights into the ethical maturity of research involving social media mining and echo the gaps seen in the guidelines we reviewed. We recommend similar analyses of ethical considerations in published articles from other disciplines where social media data are being mined for research, including computer science, the social sciences, economics, business studies, political science and criminology, to name but a few. Given the growing research activities of major social media providers and businesses, research indexed in the scientific literature may represent only the tip of the iceberg, and finding new ways of obtaining access to commercial research would also be worthwhile, although the monetisation of data insights and intellectual property restrictions will inevitably present barriers.
The scope of our analysis did not extend to legal or regulatory aspects of information governance in the context of social media data, which are designed to control or limit certain forms of research. In contrast, ethical guidelines aim to ensure research integrity, discourage irresponsible or socially unacceptable research conduct and support the prioritisation of studies likely to benefit rather than harm society. Likewise, we did not seek to compare methodological innovations such as automated data mining, social network analysis, machine learning or ‘black box’ algorithms, which also present challenges around consumer choice, control and privacy (Pasquale, 2015). Comparable analyses conducted from each of these perspectives are warranted.
Conclusions and recommendations
Beyond statements about IRB approval, the generally poor integration of ethical concepts and guidelines within the corpus of published articles we have reviewed suggests low levels of awareness amongst researchers using social media mining in their studies, echoing observations from other areas of ‘big data’ research (e.g. Metcalf et al., 2017). This is consistent with the wide variability we have observed in the research ethics guidance offered by RCUK members in relation to uses of social media platforms and the data derived from them. Our finding that only one RCUK council (ESRC) directly refers to social media research in its ethical guidance is a cause for concern, given the highly interdisciplinary nature of studies in this area, as illustrated by our analysis of relevant health-related publications.
We recommend further cross-council collaboration to develop shared, interdisciplinary guidelines for the ethical use of social media in research, and specifically research involving the harvesting and reuse of social media data.
In the shorter term, effort should be invested to improve consistency in the presentation, accessibility and comprehensiveness of existing ethical guidance available on the various RCUK websites. For example, we observed that some websites are difficult to navigate and contain highly distributed and poorly connected information on ethics, approval processes and regulation. Adequate literature review to ensure the timely inclusion of relevant guidance from other sources is also required; for example, we came across a guide to ethics in social media research which had emerged from a project part-funded by ESRC and EPSRC but was not mentioned on either of their websites (Evans et al., 2015).
Future RCUK ethics guidelines would also benefit from including a broader range of social media uses, clear criteria for judging projects against a variety of ethical considerations, and pragmatic recommendations for researchers planning to undertake studies involving social media.
Until such meta-guidelines are available, we recommend that UK researchers prioritise the existing guidelines produced by the ESRC, BPS, AoIR and NIHR, alongside the ethical taxonomies we have adapted for this study. We also encourage researchers to explore the wider universe of ethical frameworks emerging nationally and internationally in relation to new forms of data, including those from the OECD (2016), the US Council for Big Data Ethics and Society (Metcalf et al., 2017) and the UK Data Service (Bishop, 2017) as well as emerging initiatives such as the UK Society for Data Miners’ plans to develop ethical principles (SocDM, 2017) and primary research exploring the boundaries of public acceptability in the reuse of digital personal data (e.g. Aitken et al., 2016; Williams et al., 2017).
We recommend that UK researchers applying for project funding or permission to undertake studies using social media data should explicitly state which ethics guidelines they have consulted, and we call upon IRBs to integrate this requirement into their approvals documentation. We also call upon authors and editors to ensure that publications describing studies involving social media data clearly state the ethical issues that have been considered during the research and specify the guidelines consulted.
Given the substantial investments made in digital research and data science by the UK government and research councils over the last 5 years, coupled with increased policy attention on responsible research and innovation (European Commission, 2013) and the protection of personal data (European Parliament, 2016), ensuring the robust design and implementation of ethical guidelines for social media research is essential.
We hope that the results of this scoping study will inform the future development of such guidelines in the UK and elsewhere, and catalyse a broader interdisciplinary discussion amongst research councils, institutional ethics boards and researchers themselves.
Footnotes
Acknowledgements
None.
Abbreviations
AoIR: Association of Internet Researchers
AHRC: Arts and Humanities Research Council
BBSRC: Biotechnology and Biological Sciences Research Council
BPS: The British Psychological Society
EPSRC: Engineering and Physical Sciences Research Council
ESRC: Economic and Social Research Council
IRB: Institutional Review Board
MRC: Medical Research Council
NERC: Natural Environment Research Council
NHS: National Health Service (UK)
NIHR: National Institute for Health Research
RCUK: Research Councils United Kingdom
STFC: Science and Technology Facilities Council
Competing Interests
This study was conducted as part of JT’s self-funded PhD research project, supervised by CP. JT is also an employee of Ernst and Young Ltd and CP is an RCUK grant holder. Neither organization was involved in the study design, data collection and analysis, decision to publish or preparation of the manuscript.
Contributorship
Both authors contributed to study conception, planning, analysis and manuscript writing. JT designed and undertook the searches and the email verification exercise, screened the outputs, extracted the data and classified these according the specified taxonomies, with input from and cross-checking by CP.
Ethical approval
This study adheres to the Research Ethics Policy of the University of Edinburgh Medical School and was approved by its Internal Review Board.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: CP is an RCUK grant holder and collaborator on the ESRC Administrative Data Research Centre for Scotland (grant number ES/L007487/1), the MRC Farr Institute for Health Informatics Research UK (grant number MR/K007017/1) and the EPSRC Science and Practice of Social Machines (grant number EP/J017728/1).
Guarantor
Not Applicable.
