Abstract
Internet users’ comments in online spaces have attracted researchers’ attention in recent years. Although this data is typically publicly available, its use requires careful consideration so as to not cause harm to the users, while complying with the terms and conditions (Ts & Cs) of the online spaces. However, the Ts & Cs and researchers’ ethical considerations may sometimes be in conflict. I faced such a conflict when I conducted discourse analysis of online discussions that were sourced from a public online learning platform owned by a private company. In this article, I reflect on how I navigated the Ts & Cs and copyright law, taking users’ likely expectations into consideration when deciding whether to seek informed consent and anonymize content. I employed an ‘attribution with anonymization’ method to acknowledge users for their comments while safeguarding their confidentiality. Given the variety of online spaces and research methods, ethical decision-making must be a contextualized process that requires researchers to consider the nature of the online platform and the potential experience of the users, rather than simply following guidelines or Ts & Cs.
Introduction
Internet users’ comments in various digital spaces, such as blogs, Twitter, Facebook, forums, commenting sections of news websites and YouTube, have provided researchers with new sources of data with which to explore people’s opinions, discursive construction of social issues and language practices online. Although this data is seemingly easy to access, researchers face new challenges in ensuring ethical usage of such data, especially when the analysis involves reading users’ comments in-depth and reproducing them in research publications to illustrate findings (Eysenbach and Till, 2001; Nissenbaum, 2004; Spilioti and Tagg, 2017; Sugiura et al., 2017).
One such method of analysis is discourse analysis, where researchers examine words or rhetorical strategies (e.g. derogatory terms and metaphors) employed by internet users in their comments or responses to each other. Discourse analysis has been used to show how a person’s or a community’s language use can reveal their attitudes. For example, tabloid newspapers have represented refugees negatively through use of metaphors such as flooding and pouring (Gabrielatos and Baker, 2008). Although discourse analysis does not normally involve evaluation of single specific persons who produce the language samples, an analysis of language use can, nevertheless, expose identity; when attitudes towards sensitive issues are revealed, this can infringe upon confidentiality. This exposure becomes more salient when one’s online comments are reproduced in publications to illustrate research findings. At the same time, it is important for researchers to present the exact words or language structure adopted by users to illustrate the implications of language use because rephrasing might not reveal nuances.
The need to analyse and reproduce internet users’ comments in publications, while protecting their well-being, can create ethical challenges for researchers. These challenges include: (1) whether and how to seek informed consent and anonymize data, (2) how to protect users’ confidentiality when the quotes can easily be traced to the original website where users can be identified (Markham, 2012) and (3) how to comply with the terms and conditions (Ts & Cs) of the online platforms while upholding accepted ethical principles (Vaccaro et al., 2015). These challenges arise from the fact that the data (users’ comments) is not originally intended for research purposes, unlike data gathered from interviews or surveys. Yet, this data is accessible by anyone, including researchers, through public digital spaces.
The nuances of challenges facing researchers vary because each research project and online space has its unique conditions. For example, MacKenzie (2017) was able to approach users on Mumsnet via private messaging to seek informed consent for quoting with anonymization, although not every user agreed. In contrast, Giaxoglou (2017) found that private messages to members of a public group on Facebook went into spam as she was not ‘friends’ with the users. Sugiura et al. (2017) could only seek consent by posting in the online forums as there was no private messaging function.
Besides these differences in technological affordances for approaching users, the nature of the online discussions must also be taken into consideration. In a study on ‘pick-up artists’ (PUAs), an online community that promotes manipulative strategies to sexually conquer women, Rüdiger and Dayter (2017) decided not to seek informed consent because users might be hostile to researchers. This is based on their observation of similar forums that contain comments regarding researchers ‘as prejudiced and [are] therefore absolutely incapable of seeing the positive sides [of the movement] and doing proper research’ (Rüdiger and Dayter, 2017: 260). This kind of derogatory comment towards research(ers) has also been found in other forums (Sugiura et al., 2017).
Users’ expectations of the public/private nature of online spaces also varies between individuals within the same online space. This may complicate researchers’ ethical considerations. For example, following the advice of their institutional ethics committee, Sugiura et al. (2017) approached all users to seek informed consent by regularly announcing via posts entitled ‘Researcher using and requesting information on this forum’ (p. 190), providing contact details for users to write to them if they would like to opt out. However, this post generated varied responses from users and administrators of the six forums they planned to examine. Some users suggested there was no need to ask for their consent as the space is publicly accessible, while some considered this post as spam and the researcher as intruding upon their discussions. Some administrators simply removed this post while others asked for monetary reimbursements. In two forums, none of the users nor administrators responded to the researchers’ posts. In light of these responses, Sugiura et al. (2017) took a pragmatic approach and only used the data from the forums that permitted use of data without the need for informed consent. However, this choice might exclude important data and may not be available to all researchers (Giaxoglou, 2017). A systematic review by Golder et al. (2017) found that social media users usually agreed to use of aggregated and anonymized data, and appreciated efforts to seek informed consent. However, there were times when researchers were kicked out of the online spaces; some users believed their privacy is more important than research and researchers’ postings were viewed as spamming.
Another complication arises from the terms and conditions (Ts & Cs) established by the companies who own the online spaces. As pointed out by Vaccaro et al. (2015), the Ts & Cs are generally written to protect the companies from liabilities and can conflict with researchers’ work. They presented examples in which Ts & Cs do not allow procedures necessary for an investigation of the impact of algorithms on racial discrimination. Although these examples are not in the field of discourse analysis, they show that Ts & Cs may hinder research that is potentially beneficial to society and put researchers in strained relationships with the companies. In such cases, many researchers choose to ignore the Ts & Cs but risk facing legal action from the companies (Vaccaro et al., 2015).
These examples show that ethical considerations depend on the nature of the research and the online spaces. As such, many researchers and associations have iterated that ethical decision-making is a contextualized process rather than a one-size-fits-all solution (Franzke et al., 2020; Markham and Buchanan, 2015; Spilioti and Tagg, 2017; Sugiura et al., 2017). It has been suggested that ethical guidelines, as well as the Ts & Cs of the online platforms, should be regarded as a reference rather than a deterministic code (Ess and Hård Af Segerstad, 2019; Vaccaro et al., 2015).
Despite the complications associated with using online data, Stommel and de Rijk (2021) found that researchers seldom detail their ethical considerations in their research publications – perhaps due to issues such as the word count limitation in journal articles – and called for ‘more overt attention to ethical issues in discourse analytic publications’ (p. 18). Nonetheless, there have been efforts by some in the research community to discuss the associated ethical concerns in detail (e.g. Spilioti and Tagg, 2017; Sugiura et al., 2017), as well as a call by the ethics working committee of the Association of Internet Researchers for researchers to join the debate and deliberation (franzke et al., 2020).
Following these initiatives, in this article I reflect upon my decision-making process for the ethical use of online users’ comments in my PhD research which involved discourse analysis. The online space I investigated is a public online educational platform offering massive open online courses (MOOCs). This is rather different from social media and other public forums that have been studied before. The discussion space on the platform is meant for learning purposes and monitored by facilitators assigned by the course designers. It is only active for a limited period of time when the course is running. The discussion is not in a centralized forum, but is distributed across different learning units similar to the commenting section underneath YouTube or an online news story. There are more than 50 learning units in each MOOC.
My study
During my study (Chua, 2021), I explored the language practices that users employ to engage with each other in online discussions, including how they write their comments to attract replies, how they respond to each other when disagreement arises, and how they respond to URLs posted by others. To achieve this research goal, I collected 221,823 comments contributed by 22,970 users who participated in 12 MOOCs. I employed corpus linguistic methods to conduct quantitative analysis of all users’ comments and concordance reading of a large number of comments. More relevant to the current discussion, I conducted discourse analysis that required me to read through hundreds of discussion threads, and identify who addresses whom in the discussion threads. In my thesis, 132 comments (new posts and replies) were reproduced separately from the threads they were from, and portions of 50 threads were presented in context to illustrate my findings.
The online discussions I examined are in the commenting sections of the learning units on a platform that hosts MOOCs. In order to comment in the online discussions, users must hold an account with the platform, and also register as a learner on a particular MOOC. Users are encouraged to use their real name on the platform. It is free of charge for users to access a MOOC and its online discussions during the course period, usually for 2–8 weeks. However, users will normally lose access once the MOOC finishes, unless they pay for a premium subscription. The premium subscription was not launched for the MOOCs I examined. Hence, the users who registered with these MOOCs have unlimited access to the online discussions, although they could no longer add comments in the online discussions after the MOOC ended and no new users could access them. In order to examine the online discussions, I registered myself as a learner for 12 MOOCs but did not comment in any of the discussions.
While a user’s comment will only be seen by other users who are registered users of the same MOOC, comments might still be read by thousands of users. For example, one of the MOOCs in my study was signed up to by more than 12,000 users. Given the need to register for access, these online discussions are not completely public, although users may not know the other users in the same MOOC. The online discussions can be considered as semi-public (Markham and Buchanan, 2015), as the MOOCs are freely available to anyone who has signed up for the course, yet it is not immediately available to all internet users. Markham and Buchanan (2015) argue that the public/private nature of online spaces is largely about users’ perceived experience rather than any objective criteria and can only be gleaned through situated analysis. Therefore, users’ likely perception of public/private nature of the online learning platform is taken into account in this analysis.
Challenges arising from the conflict between research ethics and Ts & Cs
My primary ethical concern was how to reproduce users’ comments in my thesis and future publications, such that their confidentiality is safeguarded and they are not put at unnecessary risk. This involved decision-making about whether to anonymize users’ comments and/or seek their consent for quotations and analysis, while complying with the Ts & Cs of the platform. According to the staged approach suggested by the AoiR, my ethical dilemma mainly concerned the dissemination stage of my project, although it was interwoven with the analysis stage. In this article, I focus mainly on reproduction of users’ comments in publications because this involves presenting users’ comments in context other than the original context (publications vs the online space) and the presentation in publications remains permanent for others to see (Nissenbaum, 2004). Nonetheless, analysis of the comments also requires researchers to protect users’ well-being. Amongst other things, this includes protection of data with secured computing systems. Additionally, researchers might read the users’ comments in context in the online spaces such that their profile is identifiable to the researchers. Sometimes researchers need to identify internet users for analysis of their collective contributions. In this scenario, researchers need to be careful not to use users’ profile for other purposes, including discussing their analysis with colleagues.
Although the platform owners welcomed research, the Ts & Cs have at times complicated my ethics considerations as well as the conduct of my research. Here I give an overview of the relevant Ts & Cs. It should be noted that the Ts & Cs have been constantly updated, and therefore, I might not address all the changes in Ts & Cs over the years. The fact that Ts & Cs keep changing can make it arbitrary and less comprehensible to both researchers and users (Vaccaro et al., 2015).
One of the Ts & Cs regarding research is that users are informed that their activities are monitored for research purposes when they sign up for the MOOC, so an opt-in consent is not needed, although users can opt out by unregistering from the site. Opting in means users actively give consent for their participation in the research, while opting out means that users actively express their not wanting to participate in the research. Although users are informed of the possibility of research, they might want to be informed of the nature of a particular study and be given the chance to opt-out. Also, these Ts & Cs seemingly contradict the European General Data Protection Regulation (GDPR) that users must actively opt in but the GDPR was introduced after my data collection ended. Nevertheless, this situation raises another challenge for researchers who normally work on a set of data over a long period of time during which new regulations or Ts & Cs can be introduced.
At the same time, the Ts & Cs caution that users have rights to anonymity so researchers should work with anonymized data and not associate users’ identity (name and profile) with their comments and activities on the platform. This is applicable to analysis of aggregated data, but not achievable in discourse analysis which requires in-depth reading of threads where multiple users respond to each other. In a recent update of the Ts & Cs (after I completed my analysis and write-up), there seems to be an exception to anonymization; that is, researchers can now identify users for the purpose of obtaining permission to quote users’ comments. However, the Ts & Cs still state that association of datasets with the user account is not permitted. Therefore, it remains unclear how researchers should seek users’ consent, and when researchers could associate users’ comments with their identity.
Besides the Ts & Cs regarding research, the Ts & Cs also state that users’ comments are to be treated as intellectual property and subjected to a Creative Commons Licence (Attribution-Non Commercial-NoDerivs; BY-NC-ND). Users can report any copyright infringement. In this scenario, users should be attributed for their comments or their permission sought for quotations (Pihlaja, 2017). This copyright rule is not compatible with the anonymization rule in the Ts & Cs. These contradictory Ts & Cs complicate considerations about anonymization and informed consent. The inconsistency of Ts & Cs can arise because companies need to protect themselves and they may not always take the potential benefits of research, and well-being of the researchers and users into account (Vaccaro et al., 2015). In the following, I comment on the conflict between my ethical decisions and the Ts & Cs in terms of anonymization and informed consent, and describe my solution to the conflict.
Anonymization
I decided to anonymize users’ identity when I reproduce their comments in research publications. This is to protect their confidentiality and privacy, not least because some of the comments touch on sensitive issues (e.g. whether climate change is anthropogenic), political stances (e.g. governments should not pay out childcare benefits), personal experience (e.g. a homoeopathy practitioner, an employee who thinks their company’s vision is useless) or health issues (e.g. a stepmother seeking advice for her stepchildren’s diet). Revealing users’ identity in these instances may pose unforeseen threats to the users (Eysenbach and Till, 2001). Furthermore, according to MacKenzie (2017), users value anonymity highly and prefer not being identifiable in a forum that is publicly accessible.
Although the copyright regulation as specified by the Ts & Cs of the platform means that I should attribute quoted comments, it is worth mentioning that in research, especially in the field of discourse analysis, users’ comments are treated as data, rather than content or ideas that can be publicized, sold, copied or referenced. The object of inquiry is the language use in users’ comments, rather than their ideas or creative work. Admittedly, I stand to gain through use of the users’ data for publications from which I may benefit academically, but this is far from using their ideas for commercial gains. Additionally, for data gathered by way of other research methods such as interview, participants’ responses are normally quoted and anonymized. Therefore, I prioritize their confidentiality through anonymization over attribution to recognize their copyright.
From the perspective of users’ expectations regarding the public/private nature of the online discussions, the use of quotations with users’ names in settings other than the original setting assumes that contributions to the online discussions are publicly available and that users are aware of potential implications. However, as discussed earlier, the comments are only visible to those registering with the MOOC. It is possible that the contributing users may only intend their comments to be read by fellow users in the course, rather than by researchers or readers of a research publication (MacKenzie, 2017; Nissenbaum, 2004). This also contrasts with other online users such as bloggers, YouTubers and activists who intend their content to be disseminated in public (Pihlaja, 2017). Therefore, attributing users’ comments in research publications may be contrary to users’ expectations that they are not in a public online space.
Informed consent
In most other forms of research, participant data is gathered after they give their informed consent for involvement in the research. However, for discourse analysis of online discussions, especially a retrospective one such as my research, the online discussions are no longer active, and the data appears before the informed consent could be sought. As suggested by recent updated Ts & Cs, researchers should seek users’ permission to reproduce their comments for publications. However, the means of contacting users are at the discretion of the researchers, who do not have any access to users’ contact details. When I first started my research – that is, before the updated Ts & Cs – I asked the platform about seeking users’ consent via email. The response was that they would not assist in contacting users, and they suggested I seek users’ permission by posting replies to individual users on the online discussions. This suggestion may not be feasible for four practical reasons.
First, by doing so, I would have to associate users’ comments with their identity, which is not allowed according to the Ts & Cs.
Second, users may not necessarily see my posts and I would not be able to add comments after the MOOCs end. For studies where the online discussions are still on-going and where there is the possibility of adding to a centralized discussion forum (MacKenzie, 2017; Sugiura et al., 2017) this might work. However, in my study, aside from being retrospective, there are more than 50 discussion spaces, one for each learning unit, and I would potentially need to post in every learning unit. The number of users’ postings in each learning unit is also overwhelmingly large (up to 5514 comments), such that my posting could be buried within them.
Third, for discussion threads that involve multiple users, some may not permit analysis and reproduction of comments while others do. This renders the discussion threads ineligible for analysis and presentation. It could be challenging to find suitable discussion threads that are illustrative for research findings and for which all the contributing users agree to be quoted. This can potentially result in a biased and limited sample, thus compromising the research findings.
Fourth, approaching users for consent to use direct quotes also raises the question of whether to inform them of the exact context in which their comment will be used in the research publications. Even if users do not require the exact context, they may want to know the precise aims and objectives for the research in which their quotes are used (Markham, 2012). This could be impractical for research like mine which is data-driven in nature. I only finalized two of the four main foci, that is how users disagree and how they use URLs to counter-argue, during my analysis and writing.
Admittedly, in all these scenarios, I might have prioritized research findings over users’ consent. However, I do not undermine their well-being because I protect their privacy through anonymization.
Posting a reply in a discussion thread to seek users’ permission also poses ethical challenges in terms of users’ expectations of the online spaces they inhabit (Nissenbaum, 2004). First, this posting is not relevant to users’ learning, which is their expectation when joining an online course. At worst, it can be considered as spamming and might compromise their interactions with others as well as their learning. Second, if only certain users are approached for consent in the online discussions, other unaddressed users might be left wondering why they have not been chosen, and vice-versa for those who are approached. This might affect their perceptions of their own and others’ comments, which is not expected in a learning environment. Furthermore, seeking users’ individual informed consent in the discussion spaces rather than in private does not respect privacy. Third, as argued by Sugiura et al. (2017), the explicit presence of researchers, or rather the posts made by them, in the online space may inadvertently turn the perceived private space into a public space. This change in perception regarding the public/private nature of discussion in the online learning platform may deter some users from engaging with the discussions, thus compromising their learning experience.
Solutions to the conflict
My considerations led to the decision to quote users’ comments but to anonymize them in my thesis and future publications, without actively seeking their consent. In order to navigate discrepancies between the Ts & Cs of the platform and my own ethical judgement, I applied ‘attribution with anonymization’ by providing the URL linked to the platform where the comment and the contributing user’s name can be found. This way, readers of my publications will not know who wrote the comments. While the origin of the comments is acknowledged, the URLs can only be accessed by people who registered for the same course. This also aligns with users’ likely expectation that only fellow users can read their comments and identify them (MacKenzie, 2017; Nissenbaum, 2004). Attribution with anonymization is possible in this case because the online discussions on the platform are not retrievable through a search engine, or accessible to the public, unlike other public forums or social media sites.
Remaining concerns and conclusions
Although my solution takes into account the Ts & Cs of the platform, users’ well-being and likely expectations, there may still be risks associated with not gaining informed consent for reproducing users’ comments.
First, it remains unknown whether attribution with anonymization complies with the copyright law and whether I could be held responsible if one of the users takes legal action against me. Ideally, I would have run through my final decision-making with the personnel of the platform. However, my early queries and proposals via email to company personnel, including one of their legal counsels, led to rather closed conversations as they did not advise individual researchers, and I was often referred to the Ts & Cs. During my write up, I submitted this query to the support site of the company, along with a sample of threads and writing to illustrate attribution with anonymization. I was again referred to the Ts & Cs. I could have pursued this further and made more effort to establish a conversation with the company. However, I decided to stop because of worry about my research progress and potential repercussion arising from ‘pestering’ the company.
Second, the public education site involved is accessed by users across world; it is possible that copyright for their discussion contributions falls under legislation of the individual countries. Regardless, as stated in the Ts & Cs, users’ contributions are subjected to the Creative Commons Licence (Attribution-Non Commercial-NoDerivs; BY-NC-ND), which is an international licence. The licence requires me to ‘give appropriate credit, provide a link to the licence, . . .. . . but not in any way that suggests the licensor endorses [me] or [my]use’. Therefore, my attribution with anonymization, through provision of URLs linked to the comments, can be considered as fulfilling this requirement. Nonetheless, it should be noted that the Creative Commons Licence is imposed by the Ts & Cs of the platform, rather than actively chosen by individual users.
Third, the decision not to seek users’ informed consent helped to ensure the smooth delivery of my research, given that I foresaw a similar backlash experienced or anticipated by Sugiura et al. (2017) and Rüdiger and Dayter (2017). Undeniably, my actions were based primarily upon what other researchers and I foresee, rather than users’ preferences. Some users may be happy for me to use their comments for research and publications (Golder et al., 2017). However, the act of seeking informed consent from more than 20,000 users would have precluded my research. Even for smaller numbers, for instance, locating 200 users for their consent after the MOOCs and online discussions have ended would be a mammoth task for a lone PhD student. Smaller datasets might make the process of seeking informed consent more practicable. Even so, for online discussions that are only active for a limited period of time (fewer than 8 weeks), it could be a stressful task for researchers to undertake analysis and decide on whom to approach for their consent.
I should emphasize that I am not criticizing the platform for their Ts & Cs at all. Rather, sharing the views of other researchers (Giaxoglou, 2017; Pihlaja, 2017; Sugiura et al., 2017), I have found guidelines, Ts & Cs and copyright rules can be ambiguous and may not be applicable to individual research projects. Therefore, researchers undertaking similar studies should note that they might also have to navigate the complexity of their own ethical considerations and the Ts & Cs of online platforms. For example, given the nature of the platform I accessed, the Ts & Cs mainly addressed educational research with use of interviews, surveys and quantitative analysis of aggregated data; they do not mention discourse analysis.
Importantly, researchers who apply discourse analysis to online communications might do more to advocate for the value of investigation into language and communication methods in online discussion spaces. As widely known, online spaces can be fraught with hate speech, polarizations and aggressions, and internet users can employ various discourse practices to enact social relationships with others (Jones and Hafner, 2012). It is important to investigate users’ language use on various online platforms to reveal discourse practices that are facilitative of online discussions. For example, according to my analysis of the MOOC discussions (Chua, 2021), to increase the chance of receiving replies from others, users can express uncertainty and tentativeness in their claims, such that others will be more willing to fill in the gap by replying. This finding will be particularly useful for informing users about how to post in the online discussions, thus improving their learning experience on the platform. My findings, and those from other discourse studies, speak to the potential benefits of investigating language communication on online platforms and the need for private companies to be aware of the potential benefits of engaging with relevant research.
In this article I have highlighted the tensions that can exist between the policies of online platforms and researchers regarding copyright and confidentiality. I believe it demonstrates the need for further conversation regarding ethics between the private (online) sector and academia. This is vital for ethical and effective collaboration between academic researchers and technology companies to investigate users’ contributions and behaviour online.
Footnotes
Funding
All articles in Research Ethics are published as open access. There are no submission charges and no Article Processing Charges as these are fully funded by institutions through Knowledge Unlatched, resulting in no direct charge to authors. For more information about Knowledge Unlatched please see here:
.
