Abstract
In a moment of heightened ethical questioning concerning data-intensive analytics, “data ethics” has become a site of dispute over its very definition in teaching, research, and practice. In this paper, we contextualize this dispute based on the experience of teaching data ethics. We describe how the field of computer ethics has historically informed the training of computer experts and how, in recent years, the scholarship on science and technology studies has created opportunities for transforming the way we teach with the inclusion of critical scholarship on relational ethics and sociotechnical systems. The emergent literature on “critical data ethics” has created a space for interdisciplinary collaboration that integrates technical and social science research to examine digital systems in their design, implementation, and use through a hands-on approach. As a contribution to the recent efforts to reimagine and transform the field of data science, we conclude with a discussion of the approach we devised to bridge technology/society divides and engage students with questions of social justice, accountability, and openness in their data practices.
Introduction
We are experiencing generative but troubling times for computing as it becomes exceedingly more relevant to the ways we conduct, experience, and plan our collective lives. As educators, researchers, and practitioners in critical data studies, we have found it necessary to engage its promises to work against the machine uses and abuses of human beings. What we mean by engagement here can be better understood as a critical pedagogy that is committed to the interdisciplinary work of translation across domains of research and practice.
In a moment of heightened ethical questioning, “data ethics” has become a new site for dispute over the definition of the term in teaching, research, and practice. In this context, “ethics” has served as a nebulous guise for public debates concerning the promises and perils of ubiquitous and data-intensive computing. Curiously, this newfound interest has sidelined social, political, and philosophical inquiry, rendering the identifier a mere placeholder for normative agendas that shape-shift in service of industry demands (Metcalf et al., 2019). Think of “tethics” as a humorous take from the TV show “Silicon Valley” on the obsession of the industry with “tech ethics” to picture how applied ethics has been resignified to circulate widely without much substance.
Technology ethics was built historically upon applied professional ethics, particularly deontological engineering ethics (Hoffmann and Cross, 2021; Finn & Dupont, 1981; Johnson, 2020; Wylie, 2020). Contrary to public perception, Fiesler et al. (2020) have demonstrated through content analysis of 115 “tech ethics” syllabi that “ethics’ is, in fact, very much part of accredited computer science education, despite being mostly taught in stand-alone seminars that are not integrated with other courses. The authors also show that one-fourth of the publications on “ethics” in SIGCSE have appeared in the past 6 years. Not surprisingly they have also found that most courses are taught by computer scientists and philosophers, having artificial intelligence (AI) as the most popular research topic and “law and policy” as the most common general one. What is of particular interest to an examination of data ethics instruction, however, is their finding that the most common learning outcome is “critique,” described as a cycle of “recognize, critique, reason” (Fiesler et al., 2020). In comparison, Metcalf and colleagues (2015) found that the majority of data ethics courses they surveyed integrate ethics with technical work through design projects. They speculate that data ethics may be more conducive to integration than engineering ethics because it is still an emerging field (p. 4). These promising results pale, though, when contrasted with Oliver and McNeil's (2021) findings that undergraduate data science curricula contain very little ethics orientation. Hence the need for integrative courses in critical data engineering and data science.
Practice-based “responsible data science” and “critical data studies” projects have been elaborated and proposed recently (Neff et al., 2017; Stoyanovich and Lewis, 2019; Beaulieu and Leonelli, 2021; Dumit, 2018), but a comparison of pedagogical goals and outcomes is still missing from the literature (Stahl et al., 2016). Calls for integration of social sciences literature with technical hands-on exercises in data analytics have been, unfortunately, limited to what Selbst et al. (2019) have called the “algorithmic” and “data” frames, losing sight of infrastructural aspects of sociotechnical systems that data engineers and scientists interface with on a daily basis. Critical data studies have, conversely, demonstrated their relevance with sociohistorical analyses of data-driven applications but have otherwise missed an important opportunity for direct engagement with technical practices (Iliadis and Russo, 2016; Eubanks, 2020). As Stoyanovich and Lewis (2019) have observed in their call for balancing engineering and social science methodologies in data science instruction, we still have to build the pedagogical resources to teach and practice data analytics in more integrative ways. This observation is supported by the curriculum review performed by Atenas et al. (2023) as a contribution to the debate on “data literacy” in higher education. The authors have examined 250 syllabi of 80 graduate programs in data science from Euro-American countries (USA, UK, Spain, France, Germany, Brazil, Portugal, and Italy). They report that existing data ethics courses do not only provide little guidance in matters of data governance, stewardship, and sovereignty but also possess a “strong technocentric nature at the cost of humanistic approaches to data and data-driven technologies” (op. cit., p.8).
At the University of Virginia School (UVA) School of Data Science, we experienced these impasses as instructors in critical data ethics for a new data science graduate program. In response, our approach has changed radically over time with a trajectory that speaks to different (and somewhat distant) genealogies in computer and data ethics. Our program has, for example, integrated debates and perspectives from computer and information ethics (henceforth CE and IE), relational ethics, and science and technology studies (STS) allied with an empirical research agenda for the critical study of computing (Johnson, 1985; Floridi, 2008; Amrute, 2019; Vertesi et al., 2019; Zigon, 2019). The course development is also a result of the staffing challenges typical of an emerging program in data ethics, such that five instructors have taught the course in 6 years (2015–2021). Each instructor drew from their areas of research to prepare the students to respond to the challenges of critical data science. Each iteration of the course incorporated aspects of the previous instructor's approach, however, creating an integrated version that continues to change along with broader debates on the urgency of transforming the field of data science.
Early versions of our course combined the extensive experience of Deborah Johnson, one of the founders of the field of computer ethics, and Phil Bourne, a bio-informatics founder who brought his practical experience with open science to the classroom. In its second iteration, our course was redesigned after anthropologist Jarrett Zigon's phenomenological work on relational ethics and applied by anthropologist Samuel Lengen. More recently, our course was once again transformed to incorporate literature on open technologies and collaborative methodologies to facilitate direct engagement with sociotechnical systems by Luis Felipe R Murillo and to engage with the tradition of engineering studies by Caitlin Wylie. We drew from our own and our colleagues’ syllabi, course reviews, and course evaluations in this article to propose an assessment, a reflection, and a set of recommendations for curricular activities. We have learned along the way that inhabiting the complexity of philosophical, computational, and socioeconomic issues requires an instructor with training in the critical traditions of social sciences, as well as working knowledge of data science and engineering. Our role as “data ethics” instructors, we discovered, has become that of translators and meta-researchers in and of the field of data science.
In this article we will present, first, the differences between three non-rival approaches that we engaged for data ethics education: (1) computer ethics, (2) relational ethics, and (3) sociotechnical studies of data science. We will then proceed to examine (a) course designs, (b) the experiences of our instructors, and (c) the feedback from students and colleagues to provide concrete recommendations for data ethics instructors. We will conclude with a discussion of the importance of reorienting technology ethics from the usual one-methodology-fits-all and one-ontology-fits-all approaches toward critical data science pedagogies that demonstrate contextual and translational relevance across expert domains through active, collaborative, and critical engagement. Our institutional trajectory is of particular interest to the critical data studies community for providing examples of distinct pedagogies, furnishing us with a case study to compare and contrast with other experiences in data science. We suggest that our three iterations examined here reflect much longer and broader genealogies in the study of ethics, morality, and computing that concern the present and future of “data ethics” and “responsible data science” in connection with the new scholarship advanced by the FAccT community and specialized journals such as Big Data and Society.
Three data ethics approaches
There are striking similarities in the ways in which computer and engineering ethics courses have been justified as necessary additions to technical curricula in the past 40 years. “Ethics” has been identified as a foundation for professional training, and ethical reasoning has been deemed essential for articulating responses to complex ethical dilemmas (Johnson, 1985; Spinello, 2010; Hoffmann and Cross, 2021). The so-called “policy vacuum” is still a pressing matter of concern as globe-spanning monopolies on data collection and processing are literally out of control (Moor, 1985a). In her special issue for the National Academy of Engineering dedicated to the topic, Deborah Johnson (2017) summarized the goals that guides most programs in these terms: “make students aware of what will be expected of them in their work as engineers; sensitize students to ethical issues that might arise in their professional practice; improve students’ overall ethical decision making and judgment; motivate and even inspire students to behave ethically” (p. 61). Despite these similarities, there are fundamental epistemological and pedagogical differences that we must examine in order to develop critical data pedagogies with support from empirical research in science and technology studies.
“Computer ethics” has opened up the first pedagogical interfaces with engineering and has several starting points. Philosophers such as Terrell Bynum and Luciano Floridi situate the field with respect to the early developments in cybernetics around the figure of Norbert Wiener (Bynum, 2001; Floridi, 1999). Deborah Johnson, one of the founders of the field, argues that it was only in the 1970s that ethical concerns started to be articulated with respect to the use of computers in the work of Joseph Weizenbaum, James Moor, and Abbe Mowshowitz (Weizenbaum, 1976; Mowshowitz, 1976; Moor, 1985b). The 1980s brought a concern for the question of “computerization” of society, a question concerning much broader and long-lasting implications of digital systems (Kling, 1980; Kling and Dunlop, 1991). In this context, Johnson's work was key for delivering a very popular textbook, Computer Ethics (1985), which was then updated, translated, and republished in 1994, 2001, and 2009.
Computer ethics was conceived primarily as a form of applied ethics, grounded in “macro” frameworks of moral inquiry, such as deontology, contractualism, and utilitarianism. Central questions for the establishment of computer ethics as a new branch of ethics included “why is technology of relevance to ethics? What difference can technology make to human action? To human affairs? To moral concepts or theories?” (Johnson, 2008, p. 66). Its pedagogical method was dialogical, structured around exercises of Socratic dialog widely used in ethics courses. Its pedagogical goal was to help students and computer professionals exercise their ethical reasoning by examining dilemmas through controversial cases in computer science. Classroom instruction proceeded with the application of exercises of ethical reflection on existing or fictional scenarios spanning from professional ethics to codes of online behavior, computer-mediated crime, privacy and transparency, software piracy, and accountability of digital systems for automated decision-making. Skeptical of contemporary relativist positions, CE privileged a normative instead of a descriptive stance with the argument that “ethical issues concerning computer and information technologies [are] new species of general moral issues” (Johnson, 1985, p. 14). Codes of ethics, created and disseminated by professional engineering societies such as ACM and IEEE, were an important component of the CE approach. As a philosopher who educated engineers not philosophers, Johnson describes her teaching approach as moving from elementary concepts to concrete cases in order to help students move away from their “fallback position” of (naive) technological determinism. Ethics constitutes, according to her, a “conversation” where instructors meet students where they are (that is, taking their first steps in computer science, engineering, and data science). Over time, CE became very influential and contributed to efforts to operationalize ethics in the process of design and implementation of digital systems (Shilton, 2013; Johnson, 2010).
“Relational ethics” represents yet another project founded in moral philosophy, but its similarities with CE end here. From the start, it was set out to be a reaction against normative frameworks. Grounded in phenomenological anthropology, RE countered “checklist” or “rule-book” approaches to data ethics as technologies that foreclose the very possibility of ethics (Zigon, 2019). In his work, anthropologist Jarrett Zigon defines ethics as a capacity to respond to an “ethical demand” and thereby “attune” oneself to others’ worldviews and needs (Zigon, 2017). After Emmanuel Levinas, the author defined the primary question in ethics to be “How is it between us?” This approach places “us” not only in immediate relation with other human beings and their situations but also in contrast to macroethics frameworks to be found in CE, that is, “all those ethical theories that make the ontological assumption that human subjects exist as ethical subjects with some sort of a priori built-in capacity to act ethically—for example, to be rule followers, or moral calculators, or autonomous law givers and adherents” (Zigon, 2019, p. 1005).
Methodologically, RE invested in ethnographic descriptions of “situations” for the identification of ethical demands. Its method is that of critical hermeneutics for, first, demanding the examination of a priori assumptions and, second, for pointing to the social and political possibilities that are opened up by questioning them (say, a particular way to frame the “good,” “fair,” or “just” in a given context). Consider, for example, an ethical demand that is familiar to us in the technology field: the most recent reckoning with the symbolic violence of “master/slave” nomenclature in systems design. Waves of anti-racist and abolitionist protests worldwide created an ethical demand in which technologists were called to respond and repair the harms of these terms in the context of expert communities (Eglash, 2007; McIlwain, 2020). Many experts in engineering and computing did not respond affirmatively, quite the contrary. Some reacted with silence, others with frustration and anger, but they were forced to hear the demand in several social spheres online and offline: from free and open-source projects to IEEE subcommittees, academic departments and ACM interest groups, social media networks, and beyond. Pedagogically, this is a prime educational material to foreground questions of historical reparation and responsibility to our human and non-human others, our “response-ability” with respect to our social, technical, and ecological entanglements, situated in a specific “ecology of practices” that is everything but straightforward (Haraway, 2016).
What is particularly central for our purposes here is that critical hermeneutics has given important historical contributions for a pedagogical program for data ethics, a contribution that surpasses and updates the early critique of symbolic AI elaborated by Hubert Dreyfus, Fernando Flores, and Terry Winograd (Dreyfus, 1972; Winograd and Flores, 2008): to ground teaching and research of the subject in the tradition of hermeneutic exercises and critical reflection on the historical “a prioris of thought” to examine, for example, concepts that underlie our forms of reasoning concerning the “good, transparent, and fair” in statistical learning and other areas of research for data scientists. A problem still persists in this pedagogical approach, however, for its overemphasis on textual critique, making it difficult for students to stay close to other forms of representation and (re)mediation (with programming languages, scientific computing libraries, digital interfaces, and infrastructures that populate the lifeworld of data engineers and scientists).
“Relationality” is also key for sociotechnical approaches to computing, but its unit of analysis is not the relationship between Self and Other or the phenomenological focus on the tension and indeterminacy between subject–object, tool, and tool–user but a wider range of relational possibilities that include non-human actors, such as artificial agents and ML-based classification models and decision-making systems. This form of sociotechnical relationality has been instrumental in course preparation for instructors who identify as transdisciplinary STS scholars. In our experience, this is a generative position to take if one is to work across disciplinary boundaries in emergent fields such as data science and data studies.
Although pragmatism is far from a consensual influence in STS, it is indisputable that it has played an important role in our turn to sociotechnical practices (not only systems), reconnecting macro- and micro-analyses that have been developed historically in distinct (if not distant) traditions of social theory. This influence led to the development of a new pedagogical approach to data ethics. In parallel with phenomenology, pragmatism also played a central role in distancing researchers from a representationalist and toward a post-constructivist stance. In this process, social scientists turned their attention to the study of science and technology, while moving away from textualist approaches to digital phenomena (Table 1).
Students’ countries of origin and “race and ethnicity” categorization.
Students’ countries of origin and “race and ethnicity” categorization.
In their versions of the data ethics course, Luis Felipe R Murillo and Caitlin Wylie drew from a sociotechnical approach to design in-class activities with a pragmatic orientation, i.e., project-oriented and inquiry-based. With students, for instance, Murillo examined debates on data ethics in terms of the “modes of justification” they mobilized in contexts of practice (Boltanski and Thévenot, 1991). Students were encouraged to engage in collaborative projects as a condition to learn by doing the possibilities and constraints for participation in the design of digital tools and infrastructures. Based on what Turkle and Papert (1990) have called “epistemological pluralism” in computing, the classroom was conceived primarily as a space of socialization with utmost respect for different forms of (interdisciplinary) learning in data science. While valuing the importance of abstractions in computing, their histories, and contexts of application, our sociotechnical perspective in data science education was not limited to the role of (professional and lay) representations. To address any antagonism between humanities and STEM fields, Murillo and Wylie made a special effort to dispute the perceived divisions between the social and the technical in class and in curricular activities. This dispute was framed rather in a non-adversarial but collaborative way: it was through the actual experience of a technical object and its use that students began to engage questions of historicity and social implication of data work. In what follows we will first describe our (distinct) experiences of teaching data ethics. We will then proceed to discuss a set of concrete recommendations for curricular activities that bridge classroom activities with broader public and professional concerns.
How were these three pedagogical orientations put in practice in the classroom? What recommendations can be taken from these pedagogical experiences? (Table 2). The first version of “Ethics of Big Data” was conceived by Deborah Johnson in 2015 and taught until 2017 in various iterations, which included her collaboration with Phil Bourne. Johnson described the course as being “first and foremost an ideas course” that was meant to provide students with a way discussing emergent issues in computing through ethical concepts and theories. According to this pedagogical orientation, “knowledge (of codes and standards), skill (the ability to identify ethical issues), reasoning (the ability to make moral decisions), and motivation (the will to take action)” should be the aim of our instructional efforts (Johnson, 2017, p. 64). Class exercises involved the construction of scenarios to exemplify ethical dilemmas that students identified. Based on these scenarios, the instructor introduced key concepts in computer ethics and helped students rehearse responses from multiple standpoints. These responses were elicited by normative frameworks for research and professional ethics. The course drew, for example, from the debate on human subject research that ensued from the publication of Kramer et al. (2014) on the psychological manipulation of Internet users to study network effects of “emotional contagion” (mis)conducted by researchers from Facebook and Cornell University (Selinger and Hartzog, 2016).
Data ethics approaches.
Data ethics approaches.
Subsequent versions of the course included guest speakers who were either domain experts or scholars working on a particular topic of data ethics (such as criminal justice, health care, and privacy). Course material included debates about research ethics (with an overview of the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research Belmont Report and the ACM Code of Ethics and Professional Conduct), chapters of Deborah Johnson's influential textbook on computer ethics, and a collection of articles on ethical issues from philosophers (Helen Nissenbaum, Deborah Johnson), legal scholars (Tal Zarsky, Neil Richards, Danielle Citron, Frank Pasquale), and media researchers (such as danah boyd, Kate Crawford, Tarleton Gillespie, Siva Vaidhyanathan). Assignments were designed around cases that were examined in the literature. Students were requested to draft a “code of conduct” for data scientists and apply their own guidelines to their “capstone projects,” drawing from the extensive experience of the instructor in this area.
After teaching for two years, the instructor reported having experienced strong resistance from students who disputed the curricular decision to it a required course. One student doubted in their review if the instructor “could name a single algorithm or pass an introductory statistics class.” Along similar cruel and unwarranted lines that online reviews enable, another student expressed that they “learned neither data science nor ethical concepts” and that the instructor struggled to teach the “intersection of data science and ethics.” This tension did not only surface in the reviews, but was felt in classroom interactions as well. The instructor suspected that this resistance had to do with marked gender and age differences. In addition to the challenge of engaging public debates on socioeconomic inequity and injustice in the classroom (a recurrent problem identified by colleagues from other universities), the fact that data science cohorts came from several disciplines (as varied as economics, biology, mechanical engineering, physics, computer science, and literature) made teaching critical data science more even challenging. This particular difficulty has been identified by data science instructors in other schools as well, such as Stoyanovich and Lewis (2019) who proposed a “differentiated pedagogy” to handle the variability in background and preparation for data science training. Johnson's report is consistent with the course reviews. Even though they are not a reliable portrait of learning outcomes, they can be used as proxies for the expectations that the students bring to the classroom with respect to data science, a burgeoning field that has been justifiably criticized for its inability to redress its own societal effects and impacts (boyd and Crawford, 2012; Mittelstadt et al., 2016). Overall, this iteration of the data ethics course ranked lower with respect to other technical courses in our program: 2.68/5.0 versus 3.36/5.0 (average), respectively—in the scale of 1 (poor) to 5 (excellent)—and received mixed feedback from students. Some found it satisfactory (32%), while others did not (48%) or expressed a neutral position (20%).
Despite the challenges we faced in the first iteration of our course, we can still derive important lessons and recommendations from it. One of them is to continue to take advantage of the dialogic approach to engage pressing issues of data ethics that challenge students to rethink the assumptions they bring to the field. In our experience, there is, at least, one major issue per week on the international news that could be brought to the classroom for examination. Interfaces between the classroom and the public sphere can build more effectively if guest speakers are invited to the classroom debates more often (which was a model that we started in the first iteration of the course but continued to pursue in future ones). Another lesson that we can convert into a concrete recommendation is to identify the tensions between the technical, philosophical, and social early on in the course, so they can be addressed by a combination of educational materials and pedagogical activities (from journalistic and academic texts to hands-on coding exercises annotated in research notebooks, examination of codes of conduct and their responses in the public sphere, and the invitation of researchers and practitioners to speak about these matters to the students from their professional standpoints). These recommendations are meant to account, first and foremost, for the disciplinary and sociocultural differences represented in the classroom (Table 1) and prevent events of discrimination before they create an adversarial environment.
The second iteration of our course was taught between 2017 and 2019 but had a different philosophical and pedagogical orientation. Case studies were substituted for close readings of the literature on anthropology of morality and ethics, critical data studies, and the journalistic literature on predictive policing, privacy, and algorithmic discrimination. The course was structured as a humanities seminar with weekly readings, presentations, and debates. Students were directed to reflect critically on the “natural attitude” of data scientists with respect to the statistical treatment of bias, “raw data,” and the ethical implications of large-scale data analytics. Each weekly module included a series of articles, book chapters, or blog posts and involved close readings and debates, plus regular exercises in which students were asked to respond to a prompt, such as “what does it mean for data to be ‘human data’”? “Is there such a thing as ‘raw data’?” Or, “can a regulatory approach solve issues around privacy and surveillance? What might be the limits of a regulatory response?” The instructor's pedagogical goal was to “[collectively] develop an understanding of data ethics that will allow us to reflect critically on and respond to the challenges and opportunities of data science today” (Zigon and Lengen, 2019). The relational approach informed the selection of the materials for class debates and exercises as an exploration of “broader, cultural, political, economic, and social conditions in which data science takes place.”
It is worth noting that an extra effort was made in this version of the course to connect assignments with capstone projects, in which students formed groups to work on a data science project with a client. Most clients have been at that point from the private sector. Over time, however, the school made a concerted effort to bring community projects and non-profit organizations to sponsor capstone projects (such as Wikipedia, the Metropolitan Museum, and the Internet Archive) with help from a Wikipedian-in-Residence, Lane Rasberry. This effort has proven to be very fruitful in connecting in-class debates with technical practices in capstone projects at various levels: from data acquisition, licensing, sharing, and management to questions of design, representation, interpretation, and interpretability in ML-based classification systems. It also had the benefit of helping us socialize students into the practice of critical data science with open technologies in the public interest.
After two years of teaching this version of the course, the instructor reported having obtained positive reviews from the students, who found themselves more engaged with the material and the issues they identified in their projects. Students reported being mostly satisfied with the course (68.97%), whereas a smaller group declared not (10.34%) or neutral (20.69%). The overall score also increased in relation to the previous iteration: 3.71/5.0 versus 3.86/5.0 (average) for the other technical courses. There was a clear qualitative gap identified by the students with respect to the presentation of the debates and the experience of learning about other technical aspects of the field in statistics, text, and data analytics courses. One student suggested for the data ethics course to be integrated with “practice and applications of data science” (a practicum in data science that students take before venturing into more advanced topics of scientific computing and statistical learning). Another student described the experience as “very inspiring,” but “readings for the classes were too lengthy and the nature of the topics (morality and ethics) made them a little hard to read,” even though complex concepts were regularly covered in lectures and engaged in group discussions.
One of the main shortcomings of the course, according to the instructor, was that technical aspects were mostly glossed over, creating little opportunity for integrating the critical examination of the actual practices of data science in its various stages of data acquisition, processing, analysis, and presentation. This disconnect, we found, was fundamentally a curricular one: at that time our “data ethics” course was on the conventional path to become the typical island in the sea of technical concerns that are fundamentally sociohistorical and political but cryptically so through the usual technoscientific act of ontological division between the “technical” and the “social.” Most of the technical work was offset to capstone meetings with little intersection into the course's pedagogical plan, other than the preparation of reports on “ethical issues” in students’ projects.
From this second iteration of the course, we identified additional lessons that can be converted into concrete recommendations. One of the major realizations for us was that, despite our efforts to the contrary, the “grand division” that is epistemic, technopolitical, and institutional between humanities, social sciences, and engineering disciplines tends to, invariably, reinstate itself if we do not work proactively to establish transdisciplinary interfaces. As a pedagogical orientation, we must start by creating conditions for the division not to be re-enacted, but to be challenged in every pedagogical activity. This cannot be done by one isolated course, we learned, but needs to be integrated with other courses in the curriculum. From this effort, another concrete recommendation follows for building interfaces between our classroom debates and readings with the actual practice of data analytics: support our students’ technical cultivation with the ability to see the sociotechnical entanglements and implications of their work, which is to say, to see their tools, infrastructures, and data practices as part of a sociotechnical ensemble with reciprocal relations and dependencies (just like the dependencies we resolve in software and architecture across various levels of computation; just like the obligations we have in our collective lives to give, receive, and reciprocate on and offline).
For the third iteration of our course, we tried to integrate sociotechnical and historical aspects of data science with a hands-on, practical approach. While maintaining contributions from CE and RE, we opted for a closer engagement in technical debates and practices that were accessible to the students. In addition to the course materials that were included in the previous iterations, we also added sources from the media (e.g., examples of data journalism from ProPublica, COVID data visualization dashboards from around the world), ongoing debates on moral values in data science (Meng, 2020), and explainable AI (XAI) frameworks (Bellamy et al., 2018; Ribeiro et al., 2016) as well as pieces on the social history of computing, critical race and law studies, science and technology studies, and privacy and surveillance studies (Benjamin, 2020; Crenshaw, 1989; Eubanks, 2020; Leonelli and Tempini, 2020; Lyon, 2007; Schneier, 2016; Zuboff, 2019). Our primary goal was to balance technical and humanistic references, showing how interconnected they are not only topically but in the very practice of data science.
Given how topically diverse the literature on data ethics is, we maintained a modular structure for the course to function as a survey of the different domains of research and practice (Table 3). When teaching explainability and interpretability, for example, we combined the presentation of XAI frameworks with articles that engage the societal implications of opacity. We reviewed popular implementations, such as AIF360 and LIME, and gave the students the choice of preparing a notebook, mixing code and textual analysis, or a regular essay using concepts from class, such as “opacity” to the ML application they were developing in their capstone projects. It is worth noting here that our discussion of ML was prefaced by a critical detour in the history of AI to examine different phases in which the research field clashed with philosophical and anthropological critiques (Agre, 1997; Dreyfus, 1972; Forsythe, 2001; Winograd and Flores, 2008). The exercise was meant to help students develop the habit of placing their computational tools and infrastructures in perspective and think about them in historical terms. For doing so, we combined the debate about AI with discussions of “ontological design” as an important moment in the history of the field (when phenomenology served as a critical influence).
Course modules (2015–2021).
To provide another example of integrative and translational curricular activity, our module on gender, ethnic, and racial inequity examined the technical details of training computer vision applications. The project “Gender Shades” (Buolamwini and Gebru, 2018) was first introduced with a discussion of the sources of the concept of intersectionality in critical legal and race studies (Crenshaw, 1989). By demonstrating how concepts migrate from other fields to be operationalized in computer science, we encouraged students to think about their own experiences of translation in their academic and technical practices. Since many students came from fields that were far from data science (such as law, economics, management, and literature), we repeated the same exercise of translation for other modules, balancing as much as possible the sociological and technical materials, such as the module on algorithmic decision-making in which we focused on the recidivism risk assessment model COMPAS alongside the literature on the history and sociology of mass incarceration in the USA (Wacquant, 2009a, 2009b; Harcourt, 2007). While we did not work with this sociological literature in our assignments, it oriented majorly our class activities and debates. We asked students, for example, who should be held accountable for the racial and class-based discrimination caused by such systems? Similarly, when we study the effects of micro-targeting on liberal democracies and the spread of misinformation, we prompted students to reflect on the use of recommendation engines for political manipulation and suggest alternatives, while considering issues of privacy and consent, transparency, and equity as key elements of technopolitical design. This discussion was folded into the broader debate about the relationship between digital infrastructures and political power in subsequent modules.
In this iteration of our data ethics course, we continue to connect class activities with a speaker series as part of the research activities of the Center for Data Ethics and Justice, first designed and directed by Jarrett Zigon as a space for interdisciplinary work on relational ethics. Our speaker series created opportunities for students to meet face to face with researchers and educators in the field of feminist STS, critical data studies, sociology, and computer science such as Sareeta Amrute, Brandeis Marshall, Fran Berman, Ruha Benjamin, Lauren Klein, Virginia Eubanks, and Desmond U Patton. We introduced students to important venues for critical data ethics, such as the conferences and webinars hosted by AI Now, Women in Data Science, Data and Society, Data4BlackLives, and the Grace Hopper Celebration of Women in Computing. Students were also made aware of the FAccT conferences and encouraged to follow current debates through papers, online resources, and social media channels, which we brought to the classroom. We assigned “think pieces” that were connected with students’ experiences in these spaces of socialization, creating ample opportunity for them to reflect on the moral and political assumptions embedded in data analytics. One student compared this experience with other courses they took in the Science, Technology, and Society program at the university: “having taken several STS courses in the past, this felt like a perfect continuation of what was taught there, just from the point of view of a data scientist.”
This version of our course was better received by the students. Many of them were particularly enthusiastic about the possibilities that we created for branching out of traditional engineering topics. We have received messages from students who reported having turned down offers from companies they identified as contributors to mass surveillance. The overall scores of the course represented a small increase in relation to the previous iterations. The course ranked slightly higher in comparison with other technical courses (4.6/5.0 in comparison with 4.1/5.0, respectively). Student reports were more positive with respect to the integrative effort as well: 84.22% declared to be satisfied with this version of data ethics, whereas 5.26% did not, and 10.53% declared themselves neutral. One student reported that this iteration was a “rare example of a course that made efforts to adapt to its students, rather than the other way around, and I think every student appreciated this.” They also expressed surprise in encountering an approach that transformed their understanding of the practice of data analytics: “this is one of the courses that will end up being so meaningful in ways I don't yet realize as I begin work in data science.”
Despite the overall positive responses we received, our course continued to struggle with a major shortcoming: we were offering it to a diverse group of students with a literature that is overwhelmingly concentrated in Euro-American debates, ethical frameworks, and historical experiences. Developments in this direction, fortunately, have started to be published with empirical studies of computing outside the Euro-American axis (Amrute and Murillo, 2020), calls for “decolonial AI” research and transnational feminism (Mohamed et al., 2020; Tacheva, 2022), and examinations of data ethics that are not grounded in the liberal tradition (Mhlambi, 2020). As a recommendation for future data ethics courses, data science instructors should incorporate these new directions in scholarship to address the transnational nature of our communities, tools, and infrastructures (as well as their distributed and, at times, hardly traceable impact) beyond the Euro-American context.
Finally, we identified another key recommendation for data ethics courses: to concentrate on the analysis of techniques and technical objects to discuss their associated context, not the other way around, in class activities. In our experience, this does not mean introducing a particular programming language with simple exercises of descriptive statistics (using R or Python with their popular packages and modules), but to expand the integrative effort to the activities that students are engaging in other courses. We found that helping the students to identify the sociotechnical collective behind their technical practices across curricular activities to be not only the most urgent but also the most rewarding for them. We made this possible, for example, by opening of a laboratory for experimentation with open technologies that we introduced in class. This laboratory (librelab) offered an informal space for hands-on experience with Open Hardware technologies for data acquisition with a strong emphasis on community science and environmental justice. In the process of learning more about the history of their tools, students learned to see “data” as a diacritic in a process of ontological categorization, collection, pre-processing, transformation, simulation, analysis, and visualization with serious consequences for concrete people in concrete sociotechnical settings. Students were guided to learn to identify the very act of abstraction of human relations in computational orders and, conversely, the social order in their computational interfaces. We suggest that this can be achieved pedagogically through a non-adversarial approach to the technical, the social, and the philosophical with class activities that bridge data science instruction with social research on the perils of digital automation and mass surveillance. This is particularly fruitful to help students revert the trend of data science as a (profitable) “solution in search of a problem” but to redirect its potential rather for addressing critical socioeconomic and environmental problems.
In this article we described three non-rival examples of critical data ethics. We suggested that the cross-examination of our pedagogical practices is key for transforming the present and future of data analytics. Here, we have modestly taken the first step in outlining experiences and recommendations based on a case study to illustrate broader trends where data ethics has been conceived and taught for a considerable amount of time, given how recent data science programs are worldwide. We hope this will be useful to our colleagues in other institutions doing curricular work but also for researchers willing to contribute more actively to pedagogical activities within and beyond educational settings. There was no presumption from our part to offer an evolutionary ladder of data ethics education. We see the three approaches we described here as non-adversarial and supplementary.
Among the perspectives we described, there are take-away lessons to highlight. As we discussed, computer ethics has pioneered an interface between philosophy, computer science, and engineering through its early contribution to the debate surrounding the importance of codes of conduct. CE was the first attempt to respond to the call for addressing the “policy vacuum” in technology ethics which, as discussed above, is still very much part of the debate in critical data studies. Relational ethics constituted a reaction to the normative aspects of CE and converted the question of ethics into an empirical inquiry on intersubjectivity, reminding technologists of the importance of attending to the historical a priori of their design choices but also, and more importantly, to the inescapable relationality that shapes their experiences of (and within) sociotechnical systems. It is imperative in this approach for us to identify the human face behind the computational interfaces and infrastructures. Both approaches have clarified, albeit in distinct ways, what “ethics” could mean in normative and experiential ways. And, yet, both have encountered the limits of engagement with students in STEM fields. This is what prompted us to bring the translational interdisciplinarity of STS to the fore in our approach.
In our experience, sociotechnical approaches to data ethics are more suitable to devise integrative, translational, and horizontalist pedagogies for critical data engineering and science. The influence of STS can be found in the distinct programs of CE and RE, but it was not foregrounded as a major approach until more recently. Based on sociotechnical studies of computing, the emergent literature on FAccT constitutes an important source for transformative scholarship in critical data science as well. But it is far from sufficient given the societal relevance that computing has assumed in mediating social, technical, infrastructural, and epistemic relations. We need STS to be integrated within data science to better address and redress the production and reproduction of socioeconomic inequities that are part of the associated milieu in which digital systems operate. Valuable lessons in STS can be found on how disciplinary boundaries are made and remade and how objects of inquiry circulate to establish affiliations and disputes and, eventually, reorganize epistemic divides (Star, 2010; Edwards, 2010; Ribes, 2019). This is part of a fruitful meta-research tradition that could assist our educational programs as we build interfaces across domains of practice for critical data studies. In the process, we can create conditions for students to learn by doing through the translation of concepts with a pragmatic take on the ways we conceive of our tools and infrastructures for computational work.
We are just beginning to envision what critical data ethics could become as data science is institutionalized with the creation of research and educational centers. It has become increasingly, albeit counter-intuitively, evident that a vast domain for action research is now opening up to respond to the mechanisms that produce and reproduce inequities through “roboprocesses” that automate bureaucratic engines (Besteman and Gusterson, 2019). This work features scholars from critical legal and race studies, digital media studies, and social informatics alongside more traditional data scientists from STEM disciplines, such as physics and biology postgraduates transitioning to the computing industry, computer scientists seeking statistical training, and professional statisticians who have come to spend more time in software development than they would like to admit. We suggest that it is in this unlikely encounter across disparate disciplines and technopolitical orientations that data science finds an unparalleled opportunity to reinvent itself through critical pedagogical practices. It all starts in the linked classroom with the community workshop, where we get to teach and learn data science because a distributed, international collective exists to create and share its tools and lessons with us in the first place.
Footnotes
Acknowledgments
We would like to thank the editor and our anonymous reviewers for their help in improving this article. We are also grateful for Deborah Johnson for laying the foundations for the work we report here. We learned first about “data studies” through the pedagogical work of Joseph Dumit.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
