Abstract
The speed of development in Big Data and associated phenomena, such as social media, has surpassed the capacity of the average consumer to understand his or her actions and their knock-on effects. We are moving towards changes in how ethics has to be perceived: away from individual decisions with specific and knowable outcomes, towards actions by many unaware that they may have taken actions with unintended consequences for anyone. Responses will require a rethinking of ethical choices, the lack thereof and how this will guide scientists, governments, and corporate agencies in handling Big Data. This essay elaborates on the ways Big Data impacts on ethical conceptions.
On 21 September 2012, a crowd of 3000 rioting people visited a 16-year-old girl’s party at home in the little village of Haren, the Netherlands, after she had mistakenly posted a birthday party invite publicly on Facebook (BBC, 2012). Some might think that the biggest ethical and educational challenge that modern technology is posing concerns mainly children. It seems, however, that particularly with the emergence of Big Data, ethicists have to reconsider some traditional ethical conceptions.
Since the onset of modern ethics in the late 18th century with Hume, Kant, Bentham, and Mills, we took premises such as individual moral responsibility for granted. Today, however, it seems Big Data requires ethics to do some rethinking of its assumptions, particularly about individual moral agency. The novelty of Big Data poses ethical difficulties (such as for privacy), which are not per se new. These ethical questions, which are commonly known and understood, are also widely discussed in the media. For example, they resurface in the context of the Snowden revelations and the respective investigations by
This essay aims to underline how certain principles of our contemporary philosophy of ethics might be changing and might require a rethinking in philosophy, professional ethics, policy-making, and research. First, it will briefly outline the traditional ethical principles with regard to moral responsibility. Thereafter, it will summarize four qualities of Big Data with ethical relevance. The third delves deeper into the idea of the changing nature of power and the emergence of hyper-networked ethics; and the fourth section illustrates which ethical problems might emerge in society, politics and research due to these changes.
Traditional ethics
Since the enlightenment, traditional deontological and utilitarian ethics place a strong emphasis on moral responsibility of the individual, often also called moral agency (MacIntyre, 1998). This idea of moral agency very much stems from almost religiously followed assumptions about individualism and free will. Both these assumptions experience challenges when it comes to the advancement of modern technology, particularly Big Data. The degree to which an entity possesses moral agency determines the responsibility of that entity. Moral responsibility in combination with extraneous and intrinsic factors, which escape the will of the entity, defines the culpability of this entity. In general, the moral agency is determined by several entity innate conditions, three of which are commonly agreed upon (Noorman, 2012):
Causality: An agent can be held responsible if the ethically relevant result is an outcome of its actions. Knowledge: An agent can be blamed for the result of its actions if it had (or should have had) knowledge of the consequences of its actions. Choice: An agent can be blamed for the result if it had the liberty to choose an alternative without greater harm for itself.
Implicitly, observers tend to exculpate agents if they did not possess full moral agency, i.e. when at least one of the three criteria is absent. There are, however, lines of reasoning that consider morally relevant outcomes independently of the existence of a moral agency, at least in the sense that negative consequences establish moral obligations (Leibniz and Farrer, 2005; Pogge, 2002). New advances in ethics have been made in network ethics (Floridi, 2009), the ethics of social networking (Vallor, 2012), distributed and corporate moral responsibility (Erskine, 2004), as well as computer and information ethics (Bynum, 2011). Still, Big Data has introduced further changes, such as the philosophical problem of ‘many hands’, i.e. the effect of many actors contributing to an action in the form of distributed morality (Floridi, 2013; Noorman, 2012), which need to be raised.
Four qualities of Big Data
When recapitulating the core criteria of Big Data, it will become clear that the ethics of Big Data moves away from a personal moral agency in some instances. In other cases, it increases moral culpability of those that have control over Big Data. In general, however, the trend is towards an impersonal ethics based on consequences for others. Therefore, the key qualities of Big Data, as relevant for our ethical considerations, shall be briefly examined. At the heart of Big Data are four ethically relevant qualities:
There is more data than ever in the history of data (Smolan and Erwitt 2012): Beginning of recorded history till 2003—5 billion gigabytes 2011—5 billion gigabytes every two days 2013—5 billion gigabytes every 10 min 2015—5 billion gigabytes every 10 s Big Data is organic: although this comes with messiness, by collecting everything that is digitally available, Big Data represents reality digitally much more naturally than statistical data—in this sense it is much more organic. This messiness of Big Data is (among others, e.g. format inconsistencies and measurement artifacts) the result of a representation of the messiness of reality. It does allow us to get closer to a digital representation of reality. Big Data is potentially global: not only is the representation of reality organic, with truly huge Big Data sets (like Google's) the reach becomes global. Correlations versus causation: Big data analyses emphasize correlations over causation.
Certainly, not all data potentially falling into the category of Big Data is generated by humans or concerns human interaction. The Sloan Digital Sky Survey in Mexico has generated 140 terabytes of data between 2000 and 2010. Its successor, the Large Synoptic Survey Telescope in Chile, when starting its work in 2016, will collect as much within five days (Mayer-Schönberger and Cukier, 2013). There is, however, also a large spectrum of data that relates to people and their interaction directly or indirectly: social network data, the growing field of health tracking data, emails, text messaging, the mere use of the Google search engine, etc. This latter kind of data, even if it does not constitute the majority of Big Data, can, however, be ethically very problematic.
New power distributions
Ethicists constantly try to catch up with modern-day problems (drones, genetics, etc.) in order to keep ethics up-to-date. Many books on computer ethics and cyber ethics have been written in the past three decades since, among others, Johnson (1985) and Moor (1985) established the field. For Johnson, computer ethics “pose new versions of standard moral problems and moral dilemmas, exacerbating the old problems, and forcing us to apply ordinary moral norms in uncharted realms” (Johnson, 1985: 1). This changes to some degree with Big Data as moral agency is being challenged on certain fundamental premises that most of the advancements in computer ethics took and still take for granted, namely free will and individualism. Moreover, in a hyperconnected era, the concept of power, which is so crucial for ethics and moral responsibility, is changing into a more networked fashion. Retaining the individual’s agency, i.e. knowledge and ability to act, is one of the main challenges for the governance of socio-technical epistemic systems, as Simon (2013) concludes.
There are three categories of Big Data stakeholders: Big Data collectors, Big Data utilizers, and Big Data generators. Between the three, power is inherently relational in the sense of a network definition of power (Hanneman and Riddle, 2005). In general, actor A’s power is the degree to which B is dependent on A or alternatively A can influence B. That means that A’s power is different vis-à-vis C. The more connections A has, the more power he or she can exert. This is referred to as micro-level power and is understood as the concept of centrality (Bonacich, 1987). On the macro-level, the whole network (of all actors A–B–C–D…) has an overall inherent power, which depends on the density of the network, i.e. the amount of edges between the nodes. In terms of Big Data stakeholders, this could mean that we find these new stakeholders wielding a lot of power:
Big Data collectors determine which data is collected, which is stored and for how long. They govern the collection, and implicitly the utility, of Big Data. Big Data utilizers: They are on the utility production side. While (a) might collect data with or without a certain purpose, (b) (re-)defines the purpose for which data is used, for example regarding: Determining behavior by imposing new rules on audiences or manipulating social processes; Creating innovation and knowledge through bringing together new datasets, thereby achieving a competitive advantage. Big Data generators: Natural actors that by input or any recording voluntarily, involuntarily, knowingly, or unknowingly generate massive amounts of data. Artificial actors that create data as a direct or indirect result of their task or functioning. Physical phenomena, which generate massive amounts of data by their nature or which are measured in such detail that it amounts to massive data flows.
The interaction between these three stakeholders illustrates power relationships and gives us already an entirely different view on individual agency, namely an agency that is, for its capability of morally relevant action, entirely dependent on other actors. One could call this agency ‘dependent agency', for its capability to act is depending on other actors. Floridi refers to these moral enablers, which hinder or facilitate moral action, as infraethics (Floridi, 2013). The network nature of society, however, means that this dependent agency is always a factor when judging the moral responsibility of the agent. In contrast to traditional ethics, where knock-on effects (that is, effects on third mostly unrelated parties, as for example in collateral damage scenarios) in a social or cause–effect network do play a minor role, Big Data-induced hyper-networked ethics exacerbate the effect of network knock-on effects. In other words, the nature of hyper-networked societies exacerbates the collateral damage caused by actions within this network. This changes foundational assumptions about ethical responsibility by changing what power is and the extent we can talk of free will by reducing knowable outcomes of actions, while increasing unintended consequences.
Some ethical Big Data challenges
When going through the four ethical qualities of Big Data above, the ethical challenges become increasingly clearer. Ads (1) and (2): as global warming is an effect of emissions of many individuals and companies, Big Data is the effect of individual actions, sensory data, and other real world measurements creating a digital image of our reality. Cukier (2013) calls this “datafication”. Already, simply the absence of knowledge about which data is in fact collected or what it can be used for puts the “data generator” (e.g. online consumers, cellphone owning people, etc.) at an ethical disadvantage qua knowledge and free will. The “internet of things” further contributes to the distance between one actor’s knowledge and will and the other actor’s source of information and power. Ad (3): global data leads to a power imbalance between different stakeholders benefitting mostly corporate agencies with the necessary know-how to generate intelligence and knowledge from information. Ad (4): like a true Delphian oracle, Big Data correlations suggest causations where there might be none. We become more vulnerable to having to believe what we see without knowing the underlying whys.
Privacy
The more our lives become mirrored in a cyber reality and recorded, the more our present and past become almost completely transparent for actors with the right skills and access (Beeger, 2013).
Group privacy
Data analysts are using Big Data to find out our shopping preferences, health status, sleep cycles, moving patterns, online consumption, friendships, etc. In only a few cases, and mostly in intelligence circles, this information is individualized. De-individualization (i.e. removing elements that allow data to be connected to one specific person) is, however, just one aspect of anonymization. Location, gender, age, and other information relevant for the belongingness to a group and thus valuable for statistical analysis relate to the issue of group privacy. Anonymization of data is, thus, a matter of degree of how many and which group attributes remain in the data set. To strip data from all elements pertaining to any sort of group belongingness would mean to strip it from its content. In consequence, despite the data being anonymous in the sense of being de-individualized, groups are always becoming more transparent. This issue was already raised by Dalenius (1977) for statistical databases and later by Dwork (2006) that “nothing about an individual should be learnable from the database that cannot be learned without access to the database”. This information gathered from statistical data and increasingly from Big Data can be used in a targeted way to get people to consume or to behave in a certain way, e.g. through targeted marketing. Furthermore, if different aspects about the preferences and conditions of a specific group are known, these can be used to employ incentives to encourage or discourage a certain behavior. For example, knowing that group A has a preference α (e.g. ice cream) and a majority of the same group has a condition β (e.g. being undecided about which party to vote for), one can provide α for this group to behave in the domain of β in a specific way by creating a conditionality (e.g. if one votes for party B one gets ice cream). This is standard party politics; however, with Big Data the ability to discover hidden correlations increases, which in turn increases the ability to create incentives whose purposes are less transparent.
Conversely, hyper-connectivity also allows for other strategies, e.g. bots which infiltrate Twitter (the so-called Twitter bombs) are meant to create fake grass-roots debates about, for example, a political party that human audiences also falsely perceive as legitimate grassroots debates. This practice is called “Astroturfing” and is prohibited by Twitter policies, which, however, does not prevent political campaigners from doing it. The electoral decision between Coakley and Brown (in favor of the Republican Brown) of the 2010 special election in Massachusetts to fill the Senate seat formerly held by Ted Kennedy might have been decided by exactly such a bot, which created a Twitter smear campaign in the form of a fake public debate (Ehrenberg, 2012). A 2013 report showed that in fact 61.5% of website visitors were bots (with an increasing tendency). Half of this traffic consisted of “good bots” necessary for search engines and other services, the other half consisted of malicious bot types such as scrapers (5%), hacking tools (4.5%), spammers (0.5%), and impersonators (20.5%) for the purpose of market intelligence and manipulation (Zeifman, 2013).
Propensity
The movie
Research ethics
Ethical codes and standards with regard to research ethics lag behind this development. While in many instances research ethics concerns the question of privacy, the use of social media such as Twitter and Facebook for research purposes, even in anonymous form, remains an open question. On the one hand, Facebook is the usual suspect to be mentioned when it comes to questions of privacy. At the same time, this discussion hides the fact that a lot of non-personal information can also reveal much about very specific groups in very specific geographical relations. In other words, individual information might be interesting for investigative purposes of intelligence agencies, but the actually valuable information for companies does not require the individual tag. This is again a problem of group privacy. The same is true for research ethics. Many ethical research codes do not yet consider the non-privacy-related ethical effect (see, for example, BD&S’ own statement “preserving the integrity and privacy of subjects participating in research”). Research findings that reveal uncomfortable information about groups will become the next hot topic in research ethics, e.g. researchers who use Twitter are able to tell uncomfortable truths about specific groups of people, potentially with negative effects on the researched group. 1 Another problem is the “informed consent”: despite the data being already public, no one really considers suddenly being the subject of research in Twitter or Facebook studies. However, in order to represent and analyze pertinent social phenomena, some researchers collect data from social media without considering that the lack of informed consent would in any other form of research (think of psychological or medical research) constitute a major breach of research ethics.
Conclusions
Does Big Data change everything, as Cukier and Mayer-Schönberger have proclaimed? This essay tried to indicate that Big Data might induce certain changes to traditional assumptions of ethics regarding individuality, free will, and power. This might have consequences in many areas that we have taken for granted for so long.
In the sphere of education, children, adolescents, and grown-ups still need to be educated about the unintended consequences of their digital footprints (beyond digital literacy). Social science research might have to consider this educational gap and draw its conclusions about the ethical implications of using anonymous, social Big Data, which nonetheless reveals much about groups. In the area of law and politics, I see three likely developments:
political campaign observers, think tank researchers, and other investigators will increasingly become specialized data forensic scientists in order to investigate new kinds of digital manipulation of public opinion; law enforcement and social services as much as lawyers and legal researchers will necessarily need to re-conceptualize individual guilt, probability and crime prevention; and states will progressively redesign the way they develop their global strategies based on global data and algorithms rather than regional experts and judgment calls.
When it comes to Big Data ethics, it seems not to be an overstatement to say that Big Data does have strong effects on assumptions about individual responsibility and power distributions. Eventually, ethicists will have to continue to discuss how we can and how we want to live in a
Footnotes
Acknowledgement
The author wishes to thank Barteld Braaksma, Anno Bunnik and Lawrence Kettle for their help and feedback as well as the editors and the anonymous reviewers for their invaluable insights and comments.
Declaration of conflicting interest
The author declares that there is no conflict of interest.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
