Sage Journals: Discover world-class research

Abstract

This paper considers some of the ethical issues surrounding the study of malicious activity in social networks, specifically using a technique known as social honeypots combined with the use of deception. This is a potentially touchy area of study that is common to social and behavioral research that is well understood to fall within the boundaries of human subjects research that is regulated in the United States and reviewed by institutional review boards, but is not well understood by computer security researchers or those in the private sector. The firestorm of controversy over the 2014 “emotional contagion” study of Facebook users shows that learning about being deceived may itself be the harm, to both those users involved and their trust in researchers and research in general. Should researchers have an obligation to try to find research methods that do not involve deception to achieve the same research results?

Keywords

computer security deception Menlo Report research ethics social engineering social honeypots social networks

Introduction

Users of the internet are constantly under attack by malicious actors who are intent on stealing login credentials, bank account numbers, credit card numbers, etc. (Schwartz, 2013). The tactic that is most blatant is known broadly as social engineering, deceiving users by pretending to be someone who the victim trusts and asking for users to give up sensitive information. The victims are often approached by email messages, direct messages, tweets and status updates. This is known as phishing (Jakobsson et al., 2008). A classic phishing messages goes something like this:

This is the Security Department. We have noticed abnormal activity with your account. Please validate your identity immediately by giving us your name, address, Social Security number, and date of birth. Failure to validate your account in the next 48 hours will result in IMMEDIATE TERMINATION of your account!

Thank you for your cooperation.

A subtler method of targeting victims is to simply send what looks like a link to a video or post about a current topic. After an earthquake or tsunami, it may be a video of someone in an amazing rescue, or horrific accident. The link sends the victim to a compromised web server that has a browser exploit kit (Bueno, 2011) on it that identifies one of dozens of vulnerabilities on the victim’s computer and forces it to load malicious software. The victim may have no idea their computer was just compromised, yet the next time they log in to their bank, the “bad guys” can capture their account access credentials.

Social networks, such as Twitter and Facebook, are now used by over a billion people around the globe and are becoming influential in social movements, such as pre-election “get out the vote” campaigns and the Arab Spring protests. They are prime targets for a phishing attack; in this case, URLs sent in tweets, status updates, and direct messages from already infected accounts can cause problems. A malicious program that the computer security industry named Koobface (Villeneuve, 2010; Thomas and Nicol, 2010) is one threat to Facebook users. Koobface sends a message to Facebook users from a fake “friend,” purporting to be a link to a video of the recipient caught naked on a hidden web cam! When the user clicks on the link, they are infected with Koobface. Koobface in turn creates fake clicks on ads, stealing from pay-per-click advertisers who think the user viewed an ad, as it simultaneously continues to propagate through the social network. The Koobface operators were able to garner US$2 million per year in aggregate by stealing fractions of a penny (US$0.01) per infected victim in ad “referral” revenue. While stealing less than a penny seems like nothing to worry about, consider what the Koobface operators could do if they leveraged their control of tens of millions of users’ accounts for far more nefarious purposes?

Privacy of personal data

Social networks serve many purposes, primarily focused on both public and private communication. They link friends and family into groups, or provide a simple means of speaking out on a topic in a way that makes it easy for others to follow. They are growing in popularity as a means of keeping in touch with others around the globe and are used by some people on a nearly constant basis via smart phones and other personal electronic devices.

Users of social networks who restrict access to their posts to only “friends” have made a decision to limit what posts are made public and thus have an expectation that some communications and personal data they store in the social network platform are private. Privacy of personal data is a serious subject, and the Federal Trade Commission (FTC) in the United States has brought a number of lawsuits against corporations for privacy rights violations. In one case, Google agreed to a record $23 million settlement for violating its users’ privacy rights (Valentino-Devries, 2012), and Facebook settled an agreement over failing to honor privacy promises in its user agreement (Federal Trade Commission, 2011). A third lawsuit filed against a company named Compete charged that consumer tracking software surreptitiously collected credit card and Social Security numbers from consumers who downloaded their software, as well as failing to properly protect the sensitive data it had collected (Unspecified, 2012). Individuals whose private data is collected by corporations without permission, or used in ways that go beyond agreed upon Terms of Service,¹ can get very angry when they learn what is happening to them or their data. How would you feel if you believed someone had taken away your autonomy in deciding how to protect your personal data, or not given you a way to opt out of your sensitive data being used in ways you do not approve of? Many people were furious, and the FTC has responded to some of their complaints.

The idea of honeypots – computers placed on the internet solely to allow someone to compromise them in order to learn how people break into them and to learn how to prevent or respond to those break-ins – was popularized in the early 2000s by the Honeynet Project (https://www.honeynet.org/). The classic honeypots, by definition, had no real “users,” and thus contained no personal communications or personally identifiable information (PII). There was, therefore, no need to get anyone’s permission to collect, use, or disclose the data collected.

Social honeypots may be filled with potentially private communications and PII. This research is done in situ in a live production social network where Terms of Service describe privacy controls and appropriate data use. This distinction is sometimes lost on researchers who wish to perform social honeypot research, focusing on the honeypot and not the context in which it exists and how it will be used for research.

Use of deception in research

In addition to the issue of the collection, use and disclosure of PII, there is a somewhat orthogonal, but still important, issue: the use of deception in research studies. The recent “emotional contagion” study which manipulated Facebook users’ news feeds without their knowledge shows how deception itself may be the harm (as manifested by the controversy that arose upon release of the study results) in what is otherwise a minimal-risk setting.

Deception in research is often necessary in order to perform social and behavioral studies that involve things such as actions based on conscious or subconscious decision-making. If you informed a research subject that you were providing them with false information, this knowledge could change their choice in a study and ruin the scientific validity of any observations.

Deception sometimes is used in computer security research, such as in the study of social engineering. It is unlikely a user will be tricked into giving their password to someone pretending to be an authority figure if they must first sign a consent form that explains, “In this study, we will send you fake email messages in which we pretend to be your bank and ask you to give us your account number and password in order to learn how easy it is to trick you.”

Researchers within the Honeynet Project have long discussed applying the concept of honeypots in the social networking realm, using social honeypots to detect attacks such as Koobface. Other researchers in academia have picked up on this idea and are pursuing this research (Lee et al., 2010; Zhu et al., 2012).

After the publication of the results of an “emotional contagion” study of Facebook users (Kramer et al., 2014), a firestorm of controversy arose (Meyer, 2014). While much of the debate centered around whether Facebook’s policies amounted to “informed consent” or whether the university researchers were engaged in “human subjects research” through collection of identifiable data, the resulting controversy from this study – and other subsequent disclosures, such as OKCupid’s admission of manipulating match ratings (Carmody, 2014) – shows that the deception itself, and how the users involved feel about such research studies, could be the real harms at issue.

This paper examines the ethical issues and challenges of researching threats to social networks using social honeypots with two audiences in mind. First, it is hoped that future researchers will benefit from understanding these issues and how to address them. Secondly, it is hoped that those who review research studies will be in a better position to understand the technical methods of social network research (especially research involving the use of deception) and be better equipped to evaluate and promote the use of research protocols that protect the interests of the humans using those social networks.

Social network defense research case studies

Social networks are designed both to link users together in a fabric of friend and/or follower relationships, as well as to facilitate communication between these friends/followers as part of daily life. A friend in social networks is usually someone with whom you engage in dialogue, while a follower is often someone who wishes to passively receive status updates (e.g. sales for a retail store, dates when a favored musical artist or actor will be performing, news stories released by a well-known reporter or news organization). Either way, friends/followers are a source of links to web sites intended for the recipient to visit. As such, friend/follower relationships are a target for malicious actors who wish to exploit weaknesses in the social network to send malicious messages designed to cause harm but to come in the guise of normal friendly communication, known as social network spam. Lee et al. (2010) and Grier et al. (2010) address the issue of social network spam detection using the honeypot concept of passively monitoring for posted messages and examining them in order to identify malicious communications.

The Lee and Grier studies only collect data from public (i.e. non-private) communications. These are not private communications that carry any expectation of privacy that could subject them to IRB review. This fact is not clearly mentioned in either paper, however, and the words privacy and ethics do not occur at all (except in titles of cited references). Anyone who reads these papers and does not already have a strong understanding of the nuance of use of public data in research might build on this study and extend it to non-public posts without even knowing these ethical issues exist.

A more recent study by Zhu et al. (2012) cites both Lee and Grier as the basis for their proposed experiment. This study does not focus on just public accounts. It takes the concept of social honeypot and explicitly adds deception in order to build a social network of 10⁶ users. This study proposes to use a large number of fake identities and send out invitations to a million users in order to establish a large social network. All messages sent to these fake identities are examined and analyzed to identify malicious spam messages intended to infect users’ computers or steal information from them. From the results of analysis, defensive mechanisms can be deployed to protect the users of the social network.

Observations

In all three of these papers, the lack of any statements about issues of privacy, ethics, IRB review, etc., leave silent any guidance for readers (most critically, other researchers). As Miller states, “Discussing key ethical issues in published articles promotes public moral accountability, just as discussing research methods promotes scientific accountability” (Miller and Rosenstein, 2002). The Menlo Report (Dittrich and Kenneally, 2013) includes a separate principle – Transparency and Accountability – that echoes Miller’s (Miller and Rosenstein, 2002) position. Deibert and Crete-Nishihata (2011) stress the importance of conducting research into widespread criminal activity on the internet, including the “need for explicit research rationales and the use of research warrants.” They similarly suggest a form of pre-research transparency is needed, both to help guide ethical research design, and also to help inform future researchers of how to similarly take the right path. “The best way to ensure that [ethical] standards are met is through careful, clear, and explicit documentation of research methods and justification of the choices that are made along each step of a particular project. Doing so can build up reference points and a knowledge base for future research.” The method they suggest, something called a research warrant, is “written by the principal investigator, [outlining] the nature and justification for all aspects of the research, which [is] then incorporated into the text of the published reports.” One of the main benefits of the research warrant is that this statement is prepared well in advance of the research even starting in case the researcher is called to defend their research activities.

In discussing the 1975 modification to the Declaration of Helsinki calling for unethical research to not be published, Miller and Rosenstein (2002) state:

The debate over publication of unethical research suggests a false dichotomy. On one hand, if research is unethical, its results should not be published, or published only if accompanied by an editorial condemnation. On the other hand, if research is ethical, then there is no need to discuss ethical considerations. This dichotomy ignores the fact that research might have morally controversial features without necessarily being unethical.

The purpose of this paper is not to judge the ethics of these studies. Rather, it is to urge researchers to be open and transparent about issues that may be morally controversial and to, as called for in the Menlo Report, “enable ICT researchers and oversight entities to appropriately and consistently assess and render ethically defensible research” (Dittrich and Kenneally, 2013).

Consider what would happen if these papers were submitted for grading, or submitted for publication to a conference such as the Symposium On Usable Privacy and Security (SOUPS). In the same way that providing an answer to a complicated mathematical proof in a homework assignment or exam requires transparency as to how the answer was derived, such papers could be marked down for not illuminating the thought process that would have been necessary had these studies been submitted to an IRB for review. Researchers may be well versed in ethics and may have every base covered, or they may have not have considered the ethical issues involved. A reader has no way of knowing because of the silence on the topic. If the papers were submitted to SOUPS, the authors would be asked to:

…follow the basic principles of ethical research, e.g., beneficence (maximizing the benefits to an individual or to society while minimizing harm to the individual), minimal risk (appropriateness of the risk versus benefit ratio), voluntary consent, respect for privacy, and limited deception. Authors may be asked to include explanation of how ethical principles were followed in their final papers should questions arise during the review process.²

Based on the SOUPS requirement, each set of authors may have been asked to explain the ethical issues involved and how they were addressed, may have been asked to use precious page space (or add a link to more detailed documentation) explaining the ethical issues and how they are addressed, or at worst may have had the paper rejected on ethical grounds. The studies involving public data are much easier to address than a study involving collection of private communications and/or using deception. At minimum, the issue could be raised and clarified with statement as simple as, “This study involves only data made public by those from whom it was collected, which avoids the necessity of review on ethical or privacy grounds.” Asking a researcher to consider and justify the ethical issues in a study at the time of submission of results for publication, however, is itself a risky practice that the research community has a responsibility to address (Dittrich et al., 2011). If any harm does manifest from conducting experiments without properly addressing the ethical issues in advance, the users involved will have suffered harm long before the publication was submitted and rejected.

It is unclear which would be considered a tougher ethical call: To allow deception of 100 research subjects in a consent form they read and signed in acknowledgment of the risks and benefits of the research in advance, or granting a waiver of informed consent under an “impracticability” assertion and allowing deception of a million users of a social network whose communications are mined for malicious content. Does the number of people involved increase the risk or the burden on a researcher to mitigate potential harms? These are good questions for researchers to ask of an IRB administrator or committee member, preferably prior to engaging in potentially controversial research. (To promote the discussion of how this type of research could be presented to an IRB, a mock IRB application is presented in Appendix 1.)

Addressing the ethical issues

This paper now looks at the ethical issues involved, and explains how they might be addressed within the IRB regime in academic institutions in the United States. Researchers in other countries or professional circumstances³ may have different regulations, or different review bodies, under which they operate (Dittrich et al., 2011). As such, each situation must be evaluated within the context of the applicable ethical review regime, and researchers and/or reviewers must act in a manner appropriate to the circumstances.

While this discussion focuses on researchers in the United States under the IRB review regime, that does not mean that non-academic researchers (e.g. private individuals, researchers in the private sector, or employees of governmental or non-governmental organizations) can do whatever they want. They are still subject to Terms of Service of the platforms on which they perform their research studies, and will still be called to answer for actions that are perceived as harmful by those who believe their privacy has been violated (as seen in the FTC law suits mentioned in the Privacy and Personal Data section). Nor does it mean that an academic researcher can simply partner with someone in the private sector in order to do their research without having to go to an IRB, unless they are doing their research under the auspices of the service provider and conforming to that service provider’s agreements with its users or customers.

Stakeholder analysis

As an initial step in understanding the risks, benefits, and how to appropriately balance the two, a comprehensive stakeholder analysis is helpful. Stakeholders are divided into three categories and have both positive (innocent) and negative (malicious) aspects: key stakeholders are those with a direct impact on delivering benefits or harms; primary stakeholders are the principle entities receiving benefits or harms; secondary stakeholders are intermediaries in the delivery of products or services related to the subject of study, such as service providers or vendors.

Key stakeholders

Researchers (positive). Researchers gain knowledge from the study of malicious attack mechanisms, and benefit from the development of defensive mechanisms through the development of generalizable knowledge, the generation of intellectual property (technology) that can be transitioned into products or services to benefit the general public, and through publicity and further funding resulting from publication of research findings. They are also potentially harmed through negative publicity, having to respond to complaints from primary or secondary stakeholders who are harmed, and even by attacks from negatively inclined key stakeholders (Gaudin, 2007). In some cases, researchers’ actions may affect the results of other researchers’ experiments (Enright et al., 2008) as these studies are being performed in a live crime scene, not an isolated laboratory.

Malicious actors (negative). This stakeholder group is implicitly cited in the justification for the use of deception in social network attack studies: if these stakeholders are aware of the defensive mechanisms, they can take action to avoid the detective mechanisms, including attacking the researchers or the accounts they are using for their experiments (Gaudin, 2007).

Primary stakeholders

Users of the social network platform (positive). The benefit to users from developing a detective and protective mechanism to defend against malicious spamming is an increase in security. The primary risk to users of the platform is loss of privacy; adding deception into the mix raises the potential for harm as noted in the Ethical Issues Surrounding Deception section.

Law enforcement (positive). As this research is focused on criminal activity, there is the distinct possibility that agents of law enforcement, who have the authority and responsibility for investigating and prosecuting criminal activity, may have an active investigation ongoing at the same time that researchers are interacting with the botnet. If researcher actions manipulate the crime scene and inject false information that misleads or hampers a criminal investigation, this can harm the efforts of law enforcement to do their duties and in turn, hurt the users who the researcher claims to be helping defend. The situation is entirely different if researchers perform experiments in an isolated offline environment where there can be no conflict, but to perform research using active and clandestine methods that alter a crime scene incurs real risk of harm. While human subjects review does not consider impact on law enforcement, researchers should still consider how their actions impact this stakeholder group in terms of reputational harm to themselves and their institution, if a serious conflict occurs.

Secondary stakeholders

Social network platform owners (positive). The case studies just reviewed involve the Facebook and Twitter platforms. There may be a benefit to these service providers if an effective detective and defensive mechanism can be developed to address attacks on their user base.

Less obvious is the potential for harm to these service providers. The most direct potential harm relates to the Terms of Service, which can give users expectations about how the system is to be used by other users (which includes the researchers’ fake accounts). If users feel their rights have been violated, they may file complaints with the service provider, who must now expend resources to address the complaints/issues. If a researcher is performing an experiment without the knowledge and involvement of the service provider, and the researchers’ actions (even final publication of results) make users aware of those actions and they file complaints about them, the service provider may suffer what they consider to be harm and could reasonably take action. This could include filing criminal or civil charges against the researcher. Deception, if not properly addressed, can add fuel to the fire of anger from users who believe their autonomy has not been respected.

Lastly, the problem of conflicts with law enforcement or with other researchers could both be mitigated through researchers engaging with social network platform owners and working with them, rather than simply performing potentially harmful research independently and in isolation.

Spammers and criminals (negative). Negatively inclined stakeholders who profit from delivery of spam, theft of login or financial account credentials, etc., benefit when users’ defenses are low, or when researchers publish information about vulnerabilities without also publishing mitigation information, or following through on delivering defensive countermeasures once the general public knows of the researchers’ findings.

When does research become “human subjects” research?

Researchers around the world have various restrictions and requirements for ethical review of their research when it poses harm to humans. In the United States, these restrictions and requirements are spelled out in the Code of Federal Regulations, 45 CFR 46 (United States Executive Branch, n.d.), also known as the Common Rule as it applies the same way to research funded by any federal agency. Ethical review in the US is performed by bodies known as institutional review boards (IRBs).

The most obvious activity that poses harm to humans is direct interaction during experimental medical procedures, but another risk of harm to humans involved in research comes from collecting and using “identifiable private information.” The Common Rule includes in its definition of identifiable private information, “information about behavior that occurs in a context in which an individual can reasonably expect that no observation or recording is taking place, and information that has been provided for specific purposes by an individual and that the individual can reasonably expect will not be made public.” This may include electronic communications such as postings in a social network that are restricted by privacy settings, geographic/geo-spatial coordinates associated with a mobile device, or keyboard activity captured by malicious actors recording online financial transactions.

There is a distinction that can be made between studying humans who are using a social network (which could be seen as human subjects research), and studying malicious software attacking users of the social network. In the latter case, researchers might argue they are studying malicious software, not humans; therefore, their research is not subject to IRB review. This may be a distinction without a difference, when it comes to the potential harm to humans. This point is made by the Menlo working group members in the Menlo Report (Dittrich and Kenneally, 2013), by the Association of Internet Researchers (AoIR) in their ethics guide (Markham and Buchanan, 2012), and in guidance provided by SACHRP (Buchanan and Gallant, 2012).

Collecting those communications thus falls under the “identifiable private information” category and likely triggers the requirement for IRB review. Even when an IRB approves a study, there can be problems. IRBs have limitations in their technical capabilities and their members may not fully understand the methods used to anonymize data, nor their limits (Ohm, 2009). One example where IRB members failed to understand the methods and risks is a 2008 study of taste preferences of students via their Facebook profile data, which was examined by Zimmer (2010). Anonymization of the data failed, allowing linkage of data back to students at the specific university, and the data was collected by research assistants who were members of a group that was restricted to only students at that university. Those students believed their profile data, being part of a closed group, was not publicly available. Release of data identifying students in the United States, without their permission, is regulated by federal law and is spelled out as such in many universities’ policy statements (Harvard University Information Security, 2010). The violation of this expectation of privacy was a key source of controversy with this study.

Ethical issues surrounding deception

On top of the privacy issue is the issue of the use of deception. Deception in research studies involves “deliberately misleading communication about purpose of research and/or procedures employed” (Miller, 2010). By definition, when a person whose data is collected is deceived in some way, they are deprived of being fully informed about the nature of a research study, its benefits and risks, and are not allowed to make an autonomous decision about whether they want to be involved in the research or not. Deception raises issues of: violating respect for persons by manipulating people to do something that they otherwise might not want to do; violating the right to choose what to do based on relevant information; possibly causing distress when it is later discovered that one was deceived. “If use is not disclosed in advance, consent to research is not valid” (Miller, 2010).

The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research discussed deception prior to writing the Belmont Report. Appendix Vol. 2 (National Commission, 1979) contains a paper Dr Diana Baumrind (1979: 23–22, PDF p. 379) wrote, including a story of her former secretary who was part of a research study involving false praise and dispraise on creative production in college. The former secretary accidentally learned of the deception, and there was no debriefing. Years later, this woman felt shame about the study, distrust in her own self-confidence, and felt the study “confirmed [in] me … my lack of success.” Some may argue this story is a single data point, and harm to one individual may be minimal in the grand scheme of things. It raises the issue of autonomy and whether “[tricking] and [duping],” as this subject self-describes her experience, and the harm caused in later years from the experience, was adequately considered in the way the research was evaluated and performed.

The Menlo Report (Dittrich and Kenneally, 2013) explicitly covers the topic of deception:

Sometimes informing stakeholders about the research procedure, purpose, risk-benefit analysis, and withdrawal opportunities impacts the scientific integrity of research results. Informing research subjects that some web sites are fake during a research experiment on phishing vulnerabilities could negatively impact the research validity by altering the subjects’ behavior. Appropriate Respect for Persons in such deception research can typically be achieved by debriefing the subjects after the research is completed. Debriefing is typically required when deception is used in order to mitigate harm resulting from loss of trust in researchers by those subjects who were deceived.

Waiver or alteration of the informed consent requirement

Obtaining informed consent is not always strictly necessary. 45 CFR 46.1l6(d) (United States Executive Branch, n.d.) states:

An IRB may approve a consent procedure which does not include, or which alters some of the elements of informed consent [provided] the IRB finds and documents that: (1) the research involves no more than minimal risk to the subjects; (2) the waiver or alteration will not adversely affect the rights and welfare of the subjects; (3) the research could not practicably be carried out without the waiver or alteration; and (4) whenever appropriate, the subjects will be provided with additional pertinent information after participation.

The Menlo Report touches on the issue of impracticability. “In many cases, it is impracticable to notify all affected individuals, but it may be feasible to notify service providers or other entities who have the authority and capability – derived from their relationship with the affected stakeholders – to mitigate harm” (Dittrich and Kenneally, 2013). The alteration or waiver of informed consent can be a complicated and difficult determination by an IRB committee, and cannot be adequately addressed here.

If any personal data was collected, “debriefing should include an offer to withdraw data” (Miller, 2010). Researchers could partially mitigate their fear of losing data if subjects prefer to withdraw their personal data if they do not keep any data that is processed when they collect malicious software artifacts (collecting malicious software artifacts is the fundamental reason for monitoring social network posts using social honeypots). Whether researchers will retain personal data should be clearly stated in the debriefing statement in order to minimize harm and respect persons.

Debriefing statement or request for waivers

Requests for alteration or elimination of the requirement for obtaining informed consent from an IRB committee are not trivial matters. Fully examining this issue goes well beyond the scope of this paper. Researchers must not only be able to explain to an IRB why their research cannot proceed without the waiver or alteration, but also how they will protect the humans involved in the study. Researchers may be asked to debrief deceived persons after the study is over. It seems logical to anticipate both of these situations and prepare in advance the kind of language that would go into a consent form or debriefing statement and modify it slightly for each specific use.

An example of a debriefing statement could read like this:

We wish to inform you that your account was involved in a research study. The purpose of the study was to understand and counter the threats to users of social networks by people with malicious intent who send false messages intended to infect your computer, or steal your login credentials, bank account, or other very private personal identifiers.

You should be aware that the investigators in this study have used fake identities and invited you to connect them into your social network. This was necessary in order to receive actual malicious communications sent through the social network. In doing so, the investigators have intentionally withheld any mention that these identities do not represent real people. There was no intention of collecting any personal communications or private information about you or others in your social network. You are not identifiable as an individual in any research results. Efforts have been taken to avoid or destroy any such information as soon as it was determined not to be hostile.

The use of deception was necessary to prevent malicious actors from being able to readily spot the false identities and avoid them and preventing the researchers from observing how these malicious actors are preying on you and other innocent users of the social network. If you have any questions or wish to confirm removal of any/all identifiable data, please contact the investigators at email.address@example.com.

Of course there is risk (i.e. a burden) to the researchers themselves when having to communicate to even a small percentage of a million users of a social network around the world. Researchers must acknowledge this burden and not push it aside as an inconvenience. The challenge of how to address issues of informed consent, debriefing, etc. at this scale is one that the research community, the ICT community, policy makers, and research ethics boards must also embrace.

Conclusions

New discoveries in ongoing fields of study that have existed for thousands of years, such as biomedicine, are often incremental changes in a well-understood field. Research into emerging technologies that have only existed for a handful of years, where everything is new and rapidly changing, means there may be little or no precedent on which to build. There may not yet be any lessons learned – be they lessons of success, or of problems encountered – to help researchers in finding their path. To make matters worse, publishers (e.g. journals or conferences) often impose page limits, necessitating leaving out material (such as ethical considerations and how to deal with them) in favor of detailing the science. Such omissions may leave unstated some very important issues regarding risks to avoid, harms that researchers mitigated while performing a study, or procedures for responsible conduct of novel research that those who are new to a field would benefit from knowing.

This paper briefly examined the subject of social honeypots and the risks involved in performing research on communications data in large social networks without consent of the users of those networks. It also includes the subject of deception in such research, and its implications. This examination uses the perspective of academic research in the United States, which is (whether researchers know it, like it, or neither) regulated by federal code and subject to institutional review board review. Other researchers outside of this sub-population may have other ethical review regimes in which they operate.

We all want to help victims whom malicious actors prey upon through the internet via social media. At the same time, we need to respect those who we are trying to protect. We can do this by learning the ethical issues involved with our proposed research methods and help communicate to those we are trying to protect that we respect their autonomy and are doing everything in our power to minimize the harm they experience. In doing so, senior researchers will also foster a culture of ethical and responsible conduct of research for those junior researchers who follow.

Footnotes

Appendix 1 Declaration of conflicting interest

The author declares that there is no conflict of interest.

Funding

This content is based in part on research sponsored by the Department of Homeland Security (DHS) Science and Technology Directorate, Homeland Security Advanced Research Projects Agency, Cyber Security Division (DHS S&T/HSARPA/CSD), BAA 11-02 and Air Force Research Laboratory, Information Directorate under agreement number FA8750-12-2-0329.

Notes

References

Buchanan

Gallant

(2012) Considerations and recommendations concerning Internet research and human subjects research regulations. Available at: http://www.hhs.gov/ohrp/sachrp/mtgings/2012%20Oct%20Mtg/revisedInternetreport.pdf

Bueno

(2011) Updates on ZeroAccess and BlackHole front. Available at: http://isc.sans.edu/diary.html?storyid=12079

Carmody

(2014) The problem with OKCupid is the problem with the social web. Available at: http://kottke.org/tag/OKCupid

Deibert

Crete-Nishihata

(2011) Blurred boundaries: Probing the ethics of cyberspace research. Review of Policy Research 28(5): 531–537.

Dittrich

Bailey

Dietrich

(2011) Building an active computer security ethics community. Security Privacy IEEE 9(4): 32–40. Available at: https://staff.washington.edu/dittrich/papers/ieee-snp-ethics-2011.pdf

Dittrich

Kenneally

. (2013) The Menlo Report: Ethical principles guiding information and communication technology research. Available at: http://www.dhs.gov/sites/default/files/publications/CSD-MenloPrinciplesCORE-20120803.pdf

Enright

Voelker

Savage

. (2008) Storm: When researchers collide.; Login USENIX 33(4).

Federal Trade Commission (2011) Facebook settles FTC charges that it deceived consumers by failing to keep privacy promises. Available at: http://www.ftc.gov/opa/2011/11/privacysettlement.shtm

Gaudin

(2007) Storm botnet puts up defenses and starts attacking back. Available at: http://www.informationweek.com/storm-botnet-puts-up-defenses-and-starts/201800635

10.

Grier

Thomas

Paxson

. (2010) @spam: the underground on 140 characters or less. In Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS ‘10. New York, NY: ACM, 27–37.

11.

Harvard University Information Security (2010) Harvard Research Data Security Policy Protection Memo. Available at: http://security.harvard.edu/harvard-research-data-security-policy-protection-memo

12.

Jakobsson

Johnson

Finn

(2008) Why and how to perform fraud experiments. IEEE Security and Privacy 6(2): 66–68.

13.

Kramer

Guillory

Hancock

(2014) Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences. Available at: http://www.pnas.org/content/111/24/8788.full

14.

Lee

Caverlee

Webb

(2010) Uncovering social spammers: Social honeypots + machine learning. Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘10. New York, NY: ACM, 435–442.

15.

Markham

Buchanan

(2012) Final Draft: Ethical decision-making and internet research: Recommendations from the AOIR Ethics Committee. Available at: http://aoirethics.ijire.net/aoirethicsprintablecopy.pdf

16.

Meyer

(2014) Everything we know about Facebook’s secret mood manipulation experiment. The Atlantic. Available at: http://www.theatlantic.com/technology/archive/2014/06/everything-we-know-about-facebooks-secret-mood-manipulation-experiment/373648/

17.

Miller

(2010) Deception and research: Ethics and regulation. Available at: http://www.hhs.gov/ohrp/sachrp/mtgings/mtg10–10/present/deception_and_research-ethics_and_regulation_sachrp.ppt

18.

Miller

Rosenstein

(2002) Reporting of ethical issues in publications of medical research. The Lancet 360(9342): 1326–1328.

19.

National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research (1979) The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research: Appendix Vol. 2. DHEW Publication No. (OS) 78–0014. Available at: http://videocast.nih.gov/pdf/ohrp_appendix_belmont_report_vol_2.pdf

20.

Ohm

(2009) Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review 57: 1701–2010. University of Colorado Law Legal Studies Research Paper No. 9–12. Available at: http://ssrn.com/abstract=1450006

21.

Schwartz

(2013) Microsoft, FBI trumpet citadel botnet takedowns. Available at: http://www.informationweek.com/security/attacks/microsoft-fbi-trumpet-citadel-botnet-tak/240156171

22.

Thomas

Nicol

(2010) The Koobface botnet and the rise of social malware. 5th International Conference on Malicious and Unwanted Software (MALWARE) 2010, 63–70. DOI: 10.1109/MALWARE.2010.5665793.

23.

United States Executive Branch (n.d.) 45 CFR 46. Available at: http://www.hhs.gov/ohrp/documents/OHRPRegulations.pdf

24.

United States Executive Branch (n.d.) 45 CFR 46.116. Available at: http://www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.html#46.116

25.

Unspecified (2012) WPP’s compete settles FTC deception charges. Available at: http://www.mrweb.com/drno/news16289.htm

26.

Valentino-Devries

(2012) Google to pay $22.5 million in FTC settlement. Available at: http://online.wsj.com/article/SB10000872396390443404004577579232818727246.html

27.

Villeneuve

(2010) Koobface: Inside a crimeware network. Available at: http://www.infowar-monitor.net/reports/iwm-koobface.pdf

28.

Zhu

Clark

Poovendran

. (2012) SODEXO: A system framework for deployment and exploitation of deceptive honeybots in social networks. Available at: http://arxiv.org/pdf/1207.5844.pdf

29.

Zimmer

(2010) ‘But the data is already public’: On the ethics of research in Facebook. Ethics and Information Technology 12(4): 313–325.

The ethics of social honeypots