The Ethics of Using Digital Trace Data in Education: A Thematic Review of the Research Landscape

Abstract

This article presents the findings of a systematic qualitative analysis of research in the ethics of digital trace data use in learning and education. From the resulting analysis of 77 peer-reviewed studies, we (1) map the characteristics of research by study type, academic community, institutional setting, and national context; (2) identify the primary ethical concerns and related responses; and (3) highlight the research gaps. Four areas of focus are identified in this emerging area: (1) privacy, informed consent, and data ownership; (2) validity and integrity; (3) ethical decision making; and (4) governance and accountability. We highlight the lack of evidence particularly for preschool and school-aged children and the disparate communities working in this domain, and we suggest a more cohesive approach, where the wider learning and educational ecosystem is recognized, explicit engagement with ethical theory is central, and mid- to long-term ethical issues are considered alongside immediate concerns.

Keywords

digital trace data big data ethics education learning analytics

Digital trace data—the data generated when people use digital technologies—are playing a progressively central role across informal and formal education contexts, such as the home, school, museums, and higher education institutions. In an educational system now mediated by digital technologies, data are being generated, recorded, manipulated, and distributed on an unprecedented scale and scope. These kinds of data have been incorporated into a variety of learning scenarios and can be seen, for example, in the creation of real-time learning dashboards, the use of predictive analytics of learner performance, the development of educational apps and games, and data-driven decision making in education institutions. Digital trace data have been heralded as a means to improve the quality, efficiency, and, in some cases, equality of educational systems.

Despite the evidence for these claims remaining a significant focus of debate (e.g., Selwyn, 2015), digital trace data are now a feature of many education systems across the globe. At the same time, there is increasing attention to the ethical considerations surrounding the use of such data for learning and education. Such considerations include privacy and data protection as well as the assumptions that drive algorithms used in data analytics. These ethical challenges must be addressed across the complex networks of actors of the educational landscape—between policy, politics, administration, schools, commercial software providers and third party data “mediators,” who are often responsible for technical data services around software and databases. This context raises important questions about legal regulation, governance, agency, and who (and for what) each actor is accountable when it comes to ethical practice.

Although these issues have attracted debate for some time, answers to these questions are becoming increasingly pressing at the time of writing due to the significant switch to the use of digital technologies to provide access to learning opportunities during the COVID-19 (coronavirus disease 2019) pandemic. Although these moves were made to solve an immediate problem, there are likely to be long-lasting impacts on the use of technology, including the ethical implications of the use of digital trace data in education that require academic scrutiny (Williamson et al., 2020).

While the body of literature addressing and responding to these ethical concerns is certainly growing, research fields in these areas are still emerging. There are few integrated and coherent attempts to map ethical concerns and challenges related to the use of digital trace data in learning and education. Different researchers are exploring the issue within particular academic communities or focusing their analysis within a particular stage of education or stage of development, and through use of particular and varied terminology.

This article details a systematic search of the academic literature relating to digital trace data and ethics in education and learning. The review seeks to identify and analyze all relevant conceptual and empirical work in the field, with a view to identifying the key ethical issues and their social implications, any responses to such ethical issues (including guidance and frameworks), and areas for further research and policy development. Given the emerging nature of this topic, we address three key aims: (1) to map the landscape of research in the ethics of digital trace data use in learning and education (specifically by study type, academic community, institutional setting, and national context); (2) to provide a thematic analysis of the ethical issues and responses that are highlighted in existing research on the ethics of digital trace data use in education and learning; and (3) to identify research gaps and an agenda for future research in this area.

Digital Trace Data in Education

What Is Digital Trace Data?

For decades, education researchers have used technologies to capture data. However, there has been an exponential interest since the early part of this century in the availability and potential to use “digital trace data.” Digital trace data are the data that are generated as people interact with any kind of digital device. For example, when students complete an assessment, read a page, write a response to a forum post, or compose an essay, digital systems capture an array of data about these activities. Such data may include the time taken to carry out the various stages of the task, patterns of engagement, who responds to whom, what was written and edited out, and what images were uploaded and when. These and other digital activities, that in the past were often not directly measurable by researchers, can now all be routinely captured by digital systems—potentially offering insights into how students are thinking or understanding a particular concept (Regan & Jesse, 2019). These forms of data relevant to learning are not only captured by keystrokes but can include other kinds of engagements with digital devices such as movement (e.g., via the growing use of activity trackers), speech (e.g., providing commands to digital “personal assistants”), or eye movements and facial expressions (e.g., captured via webcams or smart glasses), and include the metadata from each interaction (see Boellstorff, 2013). Such data can be generated within a particular system (e.g., maths game, learning management system, search engine, social media site) but this data collection is not confined to one site or one location. The use of cookies, IP addresses, and the availability of GPS tracking enables activities to be further tracked across contexts and time (Mayer-Schönberger & Cukier, 2014; Regan & Jesse, 2019).

The examples summarized above demonstrate that digital data may originate from three principal sources. First, data can be deliberately collected, which Kitchin (2014b) terms directed data, such as the measurement of pupil progress against particular benchmarks or standards, as well as more covert forms of institutional monitoring, surveillance, and evaluation of individuals. In each case, the nature of the data collected has been predetermined by a person or institution. Second, data can be gathered and archived automatically through the ‘routine operations” of digital devices and systems, including, for example, patterns of engagement within learning platforms (Selwyn, 2015). Third, data may also be input by users of digital systems, for example, by providing information as part of personal “profiles” on social media, or participating in discussion forums (Kitchin, 2014b). As technology becomes increasingly embedded in everyday life and learning, there has been an exponential rise in the use of the latter two sources of digital data.

These data, that we and others have called digital trace data, can also be referred to as Big Data, and is viewed as uniquely different to previous forms of data collection. Kitchin (2014a) characterizes such data as huge in volume, consisting of terabytes or petabytes of data; high in velocity, being created in or near real-time; diverse in variety, being structured and unstructured in nature; exhaustive in scope, striving to capture entire populations or systems; fine-grained in resolution; and flexible, so that it may be extended and expanded easily (Kitchin, 2014a; see also boyd & Crawford, 2012).

While the majority of these data will be generated in its “raw” form of alphanumeric symbols, it can then be processed and manipulated through “data work” (Selwyn, 2015), when data are processed into “more socially meaningful ‘data entities’—i.e., representations, models, calculations and visualisations” (Kitchin, 2014a, p. 65). As such, the term Big Data are used to refer not only to the sheer size of datasets themselves but also to their internal complexity and the complexity of associated analytics (Clayton & Halliday, 2017).

Thus, there are a number of properties of digital trace data that are distinctively different from other kinds of data that we typically see in education (Eynon, 2013; Selwyn, 2015; Williamson, 2017b). Such data are particularly valuable in collecting real-time longitudinal data about learning processes that tend to be difficult to capture in other forms of education research. It enables the opportunities to link multiple forms of data together to understand learning and education, and lends itself to more computational approaches to analyses (Pardos, 2017). These characteristics present distinct ethical challenges which we discuss below.

For the purposes of this review, we deliberately use the term digital trace data to reflect these trends instead of the term Big Data that reflects a current “hype cycle” in education and is likely to date quickly.

It should also be made clear that academics from many social spheres are exploring the ethics of systems that use digital trace data. In particular, the Human Computer Interaction (HCI), Computer Supported Cooperative Work (CSCW), and Computer-Supported Collaborative Learning (CSCL) communities are areas where substantial work on the ethics of the use technology has already been explored, particularly in terms of the ways ethics can be incorporated into the design of technological systems (see, e.g., Friedman & Kahn, 2003; Gray & Boling, 2016; Shilton, 2018). In addition, Science and Technology Studies (STS) communities have conducted substantial work on data bias and algorithmic accountability, especially focusing on algorithmic bias, to understand, and ultimately challenge, the kinds of data-driven inequality and discrimination that can play a part in many spheres of social life (see, e.g., Ananny & Crawford, 2018; Martin, 2016). Furthermore, new methods across the social sciences (e.g., “trace ethnography,” which is premised on extracting information about users and their actions, and subsequent analysis and manipulation of these data to understand and describe behaviors more clearly; Geiger & Ribes, 2011) are also encouraging researchers to examine ethical practice in relation to the use of digital trace data.

The purpose of this review is to establish the evidence base and conceptual discussion within the particular fields of learning and education. There are two academic communities engaged in such debates: those from data analytics (learning analytics, educational data mining) and sociological scholars of datafication and digital governance.

Data analytics

The increasing prevalence of digital data work within education is apparent in the growing adoption of data analytics, often termed learning analytics: “The measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimising learning and the environments in which it occurs” (Long & Siemens, 2011, p. 34). Broadly speaking, this body of work has originated out of the learning science community, and those particularly focused in understanding online learning in the context of higher education.

Interactive digital educational tools, such as those mentioned above, generate immense quantities of granular information or “digital trails” about students, including view and download commands, start and end time, time on task, and correctness. Although not yet operational in most systems, applications that can monitor bodily movements and indicators such as heart rate, eye movement, and facial expressions already exist and can also provide data concerning students’ physical reactions while performing educational tasks (Effrem, 2016).

Analytics dashboards now appearing in most virtual learning environment/learning management system online learning platforms present a range of graphs, tables, and other visualizations used by learners, educators, administrators, and data analysts. Learners may receive basic analytics such as how they are doing relative to the cohort average (e.g., test scores, forum contributions) as well as summaries of their other online activity, such as library loans, purchases, and participation in social networks (Harel Ben Shahar, 2017).

Learning analytics can thus provide learners with insight into their own learning habits and give recommendations for improvement, or, for example, help identify “at risk” learners. Through learners’ static (e.g., demographics or past attainment) and dynamic (e.g., pattern of online logins; quantity of discussion posts) data, researchers have tried to “classify” learners’ trajectories (e.g., “at risk,” “high achiever,” “social learner”) or predict which students are likely to succeed or drop out of a particular course (Dodman et al., 2019) with a view to offering more timely interventions (e.g., extra social and academic support; more challenging tasks). As such learning analytics is focused both on the processes and outcomes of learning.

Educational data mining (EDM) is an emerging discipline that draw significantly from methods in computer science and engineering (Romero & Ventura, 2010) to optimize learning systems, where optimization typically relates to learning efficiency (Eynon, 2013). Such approaches may not always provide insights into the process of learning (which can often remain as a “black box”) but can nonetheless offer insights into how to improve outcomes. Many of the methods used are similar to those in learning analytics, although EDM historically has a greater focus on automated methods and learning analytics has a greater focus on human interpretation and visualization (Siemens & Baker, 2012).

Indeed, as both communities mature and collaborate, the distinctions between EDM and learning analytics are becoming less clear. As such, the use of data analytics within education can be seen to serve purposes for learners, teachers, administrators, institutions, and policymakers. These include providing feedback to teachers and learners about their understanding of key concepts, likely exam performance, and dropout risk (e.g., Clayton & Halliday, 2017); seeking to improve student success through predictive models and visualizations (Essa & Ayad, 2012); creating a shared understanding of the institutional successes and challenges through more transparent data and analysis (e.g., Dietz-Uhler & Hurn, 2013); finding unexpected patterns in the data, beyond those detectable through human observation (boyd & Crawford, 2012); and improving administrative decision-making and organizational resource allocation (Rios-Aguilar, 2015). The precise ways these goals can be achieved may include learning analytics and/or EDM despite each community having its own distinctive parameters.

Digital education governance

A quite separate academic community are those from the Sociology of Education, who have a relatively long history of discussion and critique of the growing “datafication” of education. Scholars from this community have argued that education has been subjected to a process of “digitally driven datafication,” whereby schools, colleges, universities, and commercial institutions now collate masses of digitized data on a daily basis and function increasingly along “data-driven’ lines. While there has been an intensification of more “traditional” digital data recording (e.g., surveys, exams, collection of performance data, etc.), there has also been a broadening of the use of digital trace data collected by digital devices. For example, Manolev et al. (2019) describe how a process of “datafication” of discipline and student behavior has taken place through ClassDojo, a classroom management and communication platform with a “gamified” behavior shaping function.

An important issue within this area of research is the significant role of commercial stakeholders within this space. Companies such as ClassDojo, along with Google, McGraw Hill, Knewton, and Pearson, are using data-driven methods in the educational products they market to educational institutions or to parents/guardians. In addition, data are at the heart of the business model of many other mainstream platforms that young people may use for learning such as YouTube, Facebook, or Twitter that can be used to facilitate informal learning, or indeed have become reconceptualized as learning spaces and “course management systems” for schools and higher education institutions (Tess, 2013). This is taking place in an environment where digital datasets have become increasingly commercially valuable and appropriated by the digital knowledge economy (Zuboff, 2019).

The growing commodification and commercial value of digital data sets, and their use in these domains, are blurring the boundaries between the roles of the private and the public sector in determining educational policy and practice. Internet empires—such as Google, Apple, Amazon, and Facebook—now have major control over the ways in which knowledge is generated and accessed (Andrejevic, 2013; Kitchin, 2014b; Mayer-Schönberger & Cukier, 2013; Williamson, 2016a). Such observations have important ethical consequences, as the ethical practices of companies tend to be less transparent than public institutions, which, in turn, can have significant implications for public trust. Digital data are not neutral and objective truths, but rather the products of human decision-making and technological design features that are embedded within wider social, cultural, and economic arrangements (e.g., Cheney-Lippold, 2011). There are thus further implications concerning whose judgments are informing and supporting learning processes and educational practices. The ethics of this direction of travel requires significant and careful scrutiny.

The Ethics of Digital Trace Data

There was early acknowledgment from scholars and practitioners of digital data that the use of digital trace data in education and learning posed significant ethical challenges (e.g., Eynon, 2013; Ferguson, 2012). Before examining this work it is useful to consider existing approaches to ethics in education.

Traditional approaches to ethics in education

There are well-established ethical guidelines across the field of education research. These include the guidelines on ethics from British and American Educational Research Associations (BERA and AERA) and the Australian Association for Research in Education (AARE).

However, while there has been some recognition by professional associations in the field of education about the need to examine the ethical implications for research using digital data (e.g., a useful bibliography provided by BERA), to date there have been limited professional resources for researchers in the field of education who are engaged in the collection and use of digital data, especially those focused on the use of digital trace data, to guide their ethical choices (Moore & Ellsworth, 2014).

A new ethics? Returning to ethical theory

West et al. (2016) suggest that engaging in an ethical decision-making process should prompt consideration and acknowledgement of our values (which each individual/institution has) and context (legal/cultural/political/economic) to identify the ethical issues relating to the decision-making process. Ethics as a discipline has deep and contested theoretical groundings. It is thus important to acknowledge the position and value base from which ethical issues arise in order to examine their relationship with data use in education.

There are a number of well-established schools of ethical thought, which serve as “alternate lenses with which to view specific contexts, individuals, and social relationships” (Gray & Boling, 2016, p. 969). These strands include (1) deontological, (2) virtue, (3) consequentialist, and (4) pragmatist ethics (see also Becker & Becker, 2001). Each strand brings with it a rich intellectual tradition, but can be briefly summarized as follows: Deontological ethics are based on the idea of duty, connected to rights, and are derived from the work Kant, Rawls, and Ross. A deontoglogical perspective addresses human action itself, assuming that an act can be seen to be inherently good, insofar as it references a formalized set of rules. This “duty-based” perspective focuses on the motives of the human actor rather than the results of that act. In the Kantian view there is the “idea that there are moral absolutes in the world . . . things which are quite simply wrong, morally, just as there are things which are wrong logically or arithmetically” (Reamer, 1993, p. 30). In the context of data in education, it could be argued that people have the right to data privacy, to know that their data are being collected, and/or that they have the right to access and amend their own data. Yet rights (e.g., the basic human right to privacy) are interpreted differently in different jurisdictions (Reidenberg, 2000), as well as individuals’ expectations and beliefs about these rights, making use of digital data across in education in different countries challenging.

Virtue ethics date back to Aristotle. As Reamer (1993, p. 41) explains, “An action is the right thing to do, not because of some sort of calculation as to its consequences, or because it concurs with a ‘duty’, but because it is consistent with virtue.” As an example, Clayton and Halliday (2017) discuss the way in which digitization threatens social integration (a virtue):

Schooling and university is largely about training for citizenship in ways that require interaction with peers from different social groups. . . . If digitisation threatens the sharing of space in the learning process, it may destroy one of the most valuable means through which societies pursue integration. (p. 299)

Consequentialist ethics can be understood as addressing the consequences of an act as the primary basis for ethical analysis. This perspective, as set out by Jeremy Bentham and John Stuart Mill in the form of utilitarianism, focuses on the effect of an act on society has been, often focusing on the potential of an act to affect the happiness of everyone. Consequentialist ethics thus prioritizes the social group over the individual. From this perspective, the benefits to the various players are considered and action is related to benefit for the largest number of people (Beauchamp & Childress, 1994). For example, if the majority of students benefits from presenting progress data to assist them in their progression, then all students should provide their data for this purpose. Willis (2014) suggests that much of the current state of ethics in learning analytics has a basis in utilitarianism, which says that we ought to do what causes the most good for the most people. Willis (2014) highlights the inescapable connection between utilitarianism and the law, as both attempt to provide a means to take individual information and serve the greater good. Willis (2014) suggests this is why legal frameworks and ethical discussions are generally dealt with together.

However, Willis (2014) suggests that utilitarianism has pragmatic problems, especially where implementation creates conflicts between individual users and larger groups and does not account for difficulties in predicting consequences. Discussions of learning analytics and ethics are problematic, because of their binary approach: actions are deemed either “right” or “wrong,” most often in legalistic terms (see also Kay et al., 2012). Willis (2014) suggests that determining the most good for the most people, a utilitarian perspective, is a responsible way forward, but often difficult to achieve when faced with unintended outcomes. Indeed, scholars have recognized that the ethical domain is one in which power is contested: “Who gets to decide what ethics is will determine much about what kinds of interventions technology can make in all of our lives, including who benefits, who is protected, and who is made vulnerable” (Moss & Metcalf, 2020, n.p.).

As a response to these problems, there has been a rise in what can be grouped as pragmatist approaches to ethical scholarship. The introduction of feminist theory has, for example, strongly influenced the development of care ethics, which prioritizes the interpersonal domain of interaction over notions of consequentialism (Gilligan, 1982). From this perspective, individuals are understood to have varying degrees of dependence and interdependence on one another, and through this ethical lens it is possible to explore the power relationships and the relative vulnerability of actors. Individuals affected by the consequences of one’s choices must be considered in proportion to their vulnerability, and contextual details determine how to safeguard and promote the interests of those involved. This perspective accounts for acts as facilitating the giving or receiving of care, emphasizing the interpersonal and affective nature of ethics in individual contexts (Gray & Boling, 2016). In educational contexts, care ethics help set the limits of what may or may not be harmful for students to know, especially when in the theoretical construct of predictive analytics (Willis & Strunk, 2017).

The particular ethical challenges of digital trace data

There are some characteristics of digital trace data that present particular ethical challenges. As technology evolves as a space for learning, the methods necessary to capture and document such activities are also emergent (Eynon et al., 2017). Indeed, legal frameworks often trail behind technical innovation; thus, the legal advice that serves to inform a utilitarian approach may not be sufficient or appropriate (Markham & Buchanan, 2002, 2012). For example, scholars must consider the relevance of the human subjects model versus a more humanities-based model to guide ethical decision making. In a number of contexts, digital trace data could be viewed as text as opposed to a human interaction or dialogue that requires consent from research subjects (Bassett & O’Riordan, 2002). However, this is not a straightforward decision in many cases. This is further complicated by the fact that often ethical advice on using digital data are not specific to young people (cf. Livingstone et al., 2015) or to the context of learning and education. The specific context for ethical decision making is thus of the utmost importance (West et al., 2016).

As noted above there are a number of properties of digital trace data that are distinctively different from other kinds of data that we typically see in education (Selwyn, 2015; Williamson, 2017b). The most well cited of these are the “three types or 3Vs of ‘volume, velocity, and variety’ (Laney, 2001; Zikopoulos et al., 2012)” (Kitchen, 2014b, p. 67) that are particularly valuable when researchers wish to collect real-time longitudinal data, link multiple forms of data together, and utilize advanced data techniques to capture parts of the learning or education process that have been difficult to measure with “traditional” social science methods.

These characteristics present distinct ethical challenges. The possibilities to aggregate and combine data sources lead to risks of de-anonymizing individuals (Oboler et al., 2012); and often these data are reused and decontextualized to answer new questions, but in ways that divorce the data from an understanding of its original social context, which is sometimes used as a heuristic for developing ethical practices that are socially and culturally appropriate (Eynon et al., 2017). Furthermore, such data enable data scientists to develop proxies for all kinds of categories (such as gender, race, and age) that may not be explicitly collected (or consented to) within the system itself (Regan & Jesse, 2019). Given the multiple ways these data can be used, there are also questions about the extent to which it is reasonable/possible for individuals to fully understand the implications of agreeing to the use of these data (Ananny & Crawford, 2018; Tsai et al., 2019).

To assist in developing ethical arguments that take account of this fluid and highly complex domain, Floridi (2018) describes the important interactions between digital ethics, also known as computer, information or data ethics, digital governance, and digital regulation. Digital ethics, Floridi (2018) suggests, is “the branch of ethics that studies and evaluates moral problems relating to data and information, algorithms and corresponding practices and infrastructures” (pp. 3–4). Floridi (2018) asserts that digital ethics shapes both digital regulation and digital governance through moral evaluation.

Digital governance, Floridi (2018) argues, is the practice of establishing and implementing procedures, policies, and standards for the use and management of data. This might include the processes and methods used to improve the data quality, reliability, access, security, and availability of data services, or setting out accountabilities with respect to data-related processes in a particular institution, often enacted by designated data “stewards” or “custodians.”

Digital governance may include guidelines and recommendations that overlap with, but are not identical to, digital regulation. “A system of rules elaborated and enforced through social or governmental institutions to regulate the behaviour of the relevant agents” (Floridi, 2018, p. 3). The EU’s General Data Protection Regulation (GDPR), introduced in 2018, represents a significant piece of digital regulation that introduces legally enforceable standards relating to privacy and data protection. It imposes legal obligations onto all organizations within its jurisdiction, providing they target or collect data related to people in the European Union. Harsh fines can be levied against those who are not compliant with the GDPR’s strict standards. Floridi (2018) argues that compliance is the crucial relation through which digital regulation shapes digital governance. That is to say, an institution’s digital governance policies and practices determine the extent to which it is compliant with a piece of digital regulation.

However, Floridi (2018) suggests that compliance is necessary but insufficient to steer society in the right direction. Digital regulation indicates the boundary between legal and illegal practice, but it does not cover what might be considered the best moral course of action. This, Floridi suggests, is the task of both ethics, on the side of moral values and preferences, and of digital governance, on the side of best management. This discussion is summarized in Figure 1.

Figure 1.

The interrelationships between digital ethics, governance, and legal regulation, based on Floridi (2018, p. 2).

There is thus significant need to bring together the existing advice on this topic, to understand the ethical issues that are arising as digital data are increasingly central to the educational landscape, to identify the emerging responses to these challenges in relation to both governance and legal regulation, and to question the ethical theories and values upon which they are based.

We therefore ask three questions about the emerging domain of research on the ethics of digital trace data use in education and learning: (1) What are the characteristics of research in this domain by study type, academic community, institutional setting, and national context? (2) What are the primary ethical concerns and related responses in this domain? (3) What research gaps are there in this domain and how might these be addressed?

Method

In this study, due to the new and emerging nature of the research field, and the diverse presentation of research evidence, a meta-analytic systematic review, such as Fong et al. (2017) or Dietrichson et al. (2017), has not been possible. Instead, a systematic qualitative/thematic analysis, similar to the approach employed by Tieken and Auldridge-Reveles (2019) has been conducted. The specific methods employed for this review are set out below.

Searching for Relevant Studies

Searches for relevant studies for this review were conducted using three principal databases: (1) EBSCOHost (a discovery service searching the following databases: British Education Index, Child Development & Adolescent Studies, Education Abstracts [H.W. Wilson], Educational Administration Abstracts, ERIC, Family & Society Studies Worldwide, Library, Information Science & Technology Abstracts, Teacher Reference Center); (2) ProQuest (a specialist index and full-text social sciences databases, incorporating 900 journal titles, covering subject areas including politics, sociology, education, linguistics, and criminal justice); and (3) the advanced search function in Google Scholar.

The searches were conducted using sets of search terms set out in Table 1a and limited to those studies published since 2000. The evolution of digital trace data has a long and complex history. For decades, in subfields such as in the AI and Education (AiED) community, researchers have been able to use the data generated by the systems they were developing to better understand learning processes. However, excitement around the potential for these kinds of data—typically termed “Big Data” accelerated in the early 2000s due to advances in new technologies and the current social, economic, and political landscape (boyd & Crawford, 2012). This was galvanized within the field of education with the introduction of Massive Open Online Courses (MOOCs) designed for tens of thousands of students in 2011, which provided opportunities for many researchers to access digital trace data about learning. Given this complex background, we therefore set our starting point to search for studies to 2000. The terms for Categories A (relating to digital data), B (relating to ethics), and C (relating to education and learning) were compiled iteratively through several trial searches, scanning relevant article abstracts for additional keywords and identifying key terms from reading, and revising the terms accordingly. A particular challenge was to identify the range of appropriate terms that were sufficiently far-reaching so as to identify all relevant studies, but also targeted enough to yield appropriate search results (e.g., a search for “education” AND “data” would yield far too many irrelevant search results). Where search engines would facilitate an advanced search, search terms were applied through groups using Boolean logic. The sensitivity of different combinations of search terms were trialed and reviewed by two researchers. The final list of search terms is set out in Table 1a.

Table 1a

Review search terms

Category	Search terms
A—Relating to digital data	digital trace data OR digital data OR digital footprint OR learning analytics OR education data mining OR big data OR artificial intelligence OR predictive analytics OR adaptive learning OR critical data OR datafication OR education analytics OR educational data OR data science OR data-driven
B—Relating to ethics	ethics OR ethic* OR privacy OR surveillance OR data protection OR data ownership OR dataveillance OR data sharing OR bias OR fairness OR accountability OR agency OR autonomy OR vulnerability OR anonymity OR inequality OR justice OR at risk OR governance OR ownership
C—Relating to education and learning	learn* or educat* or school or university or MOOC or distance learning or pre-school or primary school or prekindergarten or kindergarten or junior school or high school or secondary school or college or student or classroom or education technology or early years or instructional systems

Searching and Screening Studies

All searches were carried out between March and June 2019. Retrieved articles underwent three phases of screening. First, the titles and abstracts of retrieved studies in each database were scanned for relevance to the present review according to the inclusion criteria (Table 1b). The first 500 studies in each set of search results (sorted by relevance) from each of the three principal databases were scanned, after which point it was consistently found the studies retrieved by the databases did not have abstracts or key words that met any of the inclusion criteria for this review. This stage was carried out by a single reviewer, but an additional reviewer was available to discuss titles and abstracts that presented some uncertainty or ambiguity.

Table 1b

Selection criteria for included studies

	Systematic reviews of the ethical issues of the use of digital trace data in educational contexts
OR
	Empirical studies reporting the use of digital trace data in education with discussion of relevant ethical issues
OR
	Conceptual/theoretical/discussion papers discussing the ethics of use of digital trace data in educational contexts
AND
	Recent articles: the searches were limited to articles published between 2000 and 2019
AND
	Articles originating from peer-reviewed journals

In a second phase, the selected articles were reviewed in accordance with the inclusion criteria set out in Table 1b, and this process was aided by using the “search in document” function to locate keywords in passages of text. This stage was conducted by two reviewers, and articles subject to disagreement were discussed against the inclusion criteria until consensus was reached. Finally, the reference sections of included studies were scanned to locate additional studies. It was decided that nine articles, while initially appearing to fit the inclusion criteria, did not detail sufficient discussion or empirical detail that addressed the review questions to contribute to the review. These stages, and the retrieved results, are set out in Figure 2.

Figure 2.

A flow chart illustrating the retrieval and exclusion of articles from electronic databases.

Criteria for Inclusion

Studies were included if they focused on educational contexts or issues and devoted significant attention to ethical issues. Studies were also included if they focused predominantly on fields beyond that of education (e.g., sociology, Big Data, etc.) but made reference to educational contexts and devoted significant attention to ethical issues. Studies were excluded if they focused on ethical issues and digital trace data but did not relate specifically to educational contexts (see, e.g., Herschel & Miori, 2017; Mai, 2016). Studies were excluded if they made only passing reference to ethical issues. For example, a number of excluded studies suggested that ethical principles were important, or listed concerns related to “privacy and ethics” but did not discuss them in detail (see, e.g., Avella et al., 2016; Picciano, 2012; Reyes, 2015). Studies were excluded if there were discussions relating to the ethics of data use in education but the data used were not explicitly “digital trace data,” that is, the data were derived from more traditional means of data collection (see, e.g., Roberts-Holmes, 2015; Roberts-Holmes & Bradbury, 2016). Studies were excluded if the discussion related to the ethics of data collected specifically for research purposes (see, e.g., Metcalf & Crawford, 2016), and included only if the collection, use, and analysis of data was specifically embedded within the business of supporting learning and educational institutions. While there were a number of studies relating to data ethics within the context of libraries (see, e.g., Jones & Salo, 2018), these related less to the support of learning within educational institutions and more to archiving and data management, and so were excluded from this review.

Outcomes of Screening Process

As indicated by Figure 2, a total of 77 published studies were selected following the sorting and screening process. Despite the parameters, all included studies were published between 2012 and 2019. Given our understanding of the field we were unsurprised that the ethical implications of digital trace data were not a focus of papers until the second decade of this century, and began shortly after the introduction of MOOCs.

It is important to note that there was no attempt to evaluate the quality of the selected articles. Despite this being a common approach in systematic reviews, it was not appropriate in this case given the relative newness of the topic. When reviewing emerging research areas, there is a risk of omitting articles, and by extension important topics of research that are developing in this area, if the inclusion criteria are too stringent. We addressed this issue by restricting the review to peer reviewed papers.

We recognize the relatively fast-moving nature of the topic. In some senses, it means that reviews of this kind can become quickly dated. However, our intention is to highlight principles that endure rounds of innovation and provide a foundation for others to build upon. This is particularly important in research that explores aspects of education and new technologies, which typically neglects to learn from history (Selwyn, 2012).

Analysis and Coding of Selected Studies

As noted in the introduction, research exploring the ethics of using digital trace data are still in its infancy, and is relatively dispersed across academic communities, educational contexts, and definitions of ethics. We therefore coded each study according to its characteristics alongside the ethical issues raised to assist in mapping the field and providing insights for future work. Specifically, each selected study was coded according to educational context (early years, school, higher education, etc.); academic community (learning analytics; Big Data); study type; ethical issues raised; responses to ethical issues proposed; and engagement with (ethical) theories.

Papers associated with each code were then subject to a process of thematic analysis, used to draw out ethical issues and social implications that form the basis of the following discussion. This thematic analysis involved rounds of qualitative coding to create categories of papers that fit well together in terms of topic (Dey, 2003; Saldaña, 2015). Next, an open coding process was used to “chunk” the summaries into areas of convergence across the included studies (Corbin & Strauss, 2007). We coded for broad themes and related subthemes, including ethical issues and challenges, as well as proposed responses to these challenges. Just as adopted in the thematic review by Tieken and Auldridge-Reveles (2019), we focused on findings with support from multiple sources, and we also looked for any thematically relevant counterevidence or divergent findings, either within or between contexts. We drafted and revised summative memos in order to develop and refine the synthesis of our findings. The results of this process are presented in the next section.

Results

Study Characteristics

To provide an initial mapping of the landscape of research in the ethics of digital trace data use in learning and education, and address Research Question 1, we examined the articles by study type, academic community, institutional setting, and national context.

Study Type

The type of study gives an important overview of the nature of the research evidence in this particular field. Table 2a presents the included studies according to study type and shows that the majority (n = 53) were conceptual discussion papers, some of which also engaged with a variety of academic theories (see Table 9 for a summary of theoretical engagements). Only one of the included studies was a systematic review (Nunn et al., 2016), which sought to present the methods, benefits, and challenges of learning analytics in higher education. Nearly a fifth of the papers were classified as empirical. Studies were classed as empirical if they included some form of data collection or analysis, or if they included an in-depth case study, rich with empirical detail (n = 14). Table 3 sets out the nature of these empirical studies, and the educational contexts in which they were conducted.

Table 2a

Included study types

Study type	n
Conceptual/discussion paper	53
Empirical/case study	14
Code of practice	11
Systematic review	1

Table 2b

Academic field

Academic field	n
Big Data	19
Learning analytics	37
Education data mining	4
Datafication/governance	21
Law	8

Table 2c

Type of institution

Type of institution	n
Higher education (including MOOCs and professional education)	41
Higher education	36
MOOCs	4
Professional education	1
Schools	17
No context provided	14
Software companies and data mediators	14

Note. MOOC = massive open online courses.

Table 2d

Origin of research

Region	Country	n
Europe	United Kingdom	24
	Netherlands	2
	Germany	3
	Belgium	1
	Norway	3
	Austria	3
	Denmark	1
	Spain	1
	Sweden	1
	Switzerland	1
	Finland	1
North America	United States	25
North America	Canada	1
Southern Hemisphere	Australia	18
	New Zealand	2
	South Africa	4
Middle East	Israel	3

Table 3

Studies reporting empirical work and case studies

Study	Context	Nature of empirical work
Howell et al. (2018)	Higher education	Higher education academics’ knowledge, attitudes, and concerns about using learning analytics
Ifenthaler and Schumacher (2016)	Higher education	University student views about privacy, control over data, and sharing of information
Khalil et al. (2018)	Higher education (MOOCs)	Comparison of issues of consent in four massive open online course (MOOC) providers
Lawson et al. (2016)	Higher education	Ethical implications of a learning analytics implementation at an Australian university
Lindh and Nolin (2016)	Schools	An analysis of Google’s business model within Google Apps for Education (GAFE) using 30 Swedish schools
Manolev et al. (2019)	Software in schools	A case study of ClassDojo and its datafication of student behavior
Ratner et al. (2019)	Third party agencies	A comparison of student data visualizations from official sources and a private consultancy
Roberts et al. (2016)	Higher education	Higher education students’ knowledge, attitudes, and concerns about Big Data and learning analytics
Rodríguez-Triana et al. (2016)	Higher education and a primary school	Ethical issues from teacher-led learning analytics in higher education and primary school contexts
Slade and Prinsloo (2015)	Higher education	Views of students of Open University, UK, relating to increasing uses of their personal data
Watson et al. (2017)	Higher education	An in-depth case study of a single student who failed an online master’s module
West et al. (2016)	Higher education	Institutional leaders and academic staff in Australia views ethical issues in learning analytics
Williamson (2016a)	Third sector agencies	A critical survey of the digital methods used by Pearson to generate educational data
Williamson (2016b)	Third sector agencies	A case study of how software-mediated methods intervene in the way education institutions produce knowledge

The remaining papers were codes of practice, frameworks for understanding ethics, engaging with ethical principles, or carrying out ethical decision making in educational institutions (n = 11). These were almost exclusively situated within the context of higher education institutions and are set out in Table 4.

Table 4

Ethical frameworks and codes of practice

Study	Context	Nature of framework
Cormack (2016)	Higher education	A framework for obtaining consent that separated the processes of analysis and intervention to give protection from inadvertent harm during data analysis
Elouazizi (2014)	Higher education	A framework that sets out necessary requirements for data governance in the application of learning analytics
Greller and Drachsler (2012)	Higher education	A generic design framework to support setting up learning analytics services
Hoel and Chen (2016)	Higher education	A framework for privacy-driven design of learning analytics systems
Khalil and Ebner (2016)	Higher education	An approach that combines anonymization strategies and learning analytics techniques
Prinsloo and Slade (2016)	Higher education	Expanding on earlier work by Prinsloo and Slade (2013), this framework sets out ways to decrease student vulnerability, increase their agency, and empower them as participants in Learning Analytics
Sclater (2016)	Higher education	A model for developing an institutional code of practice according to central principles: responsibility; transparency and consent; privacy; validity; access, enabling positive interventions; minimizing adverse impacts; stewardship of data
Slade and Prinsloo (2013)	Higher education	Proposes principles for higher education institutions to address ethical issues in learning analytics in context-dependent and appropriate ways
Steiner et al. (2016)	Higher education	A comprehensive privacy and data protection framework for the LEA’s BOX project, which ensures ethical treatment of personal data in a learning analytics platform
Willis et al. (2016)	Higher education	A cross-institutional review of the policy frameworks and ethical review processes at three research institutions
West et al. (2016)	Higher education	An ethical decision-making framework to encourage consistent application and documentation of ethical decision-making processes

Academic Communities

It is important to note the broad academic communities from which the included studies arise, so as to identify the most active research groups in the field (and to understand the extent to which they are connected). The data were based on study keywords and bibliographic tags and are set out in Table 2b. It is clear that studies linked to learning analytics were dominant within the included sample (n = 37). As Siemens and Baker (2012) have observed, learning analytics has become a relatively well-defined academic community with its own dedicated journals, conferences, and so on, and given its emphasis on informing institutional practice it is perhaps not surprising that many of the ethical studies emerge from this tradition. A relatively large number of studies (n = 19) were positioned within a Big Data community, encompassing a larger number of studies from outside higher education, although it was possible for a study to engage with both learning analytics and Big Data. Only a few studies (n = 4) referred explicitly to EDM techniques, and this may reflect that a lot of researchers who engage with EDM tend to be from the computer and engineering sciences, where data are likely to be seen in a numerical or textual sense, not as part of a human subject—thus ethics have in the past been seen as less pertinent. However, as the field is developing, this is changing.¹ Indeed, two of the studies that have data mining as keywords also refer to learning analytics, suggesting a blurred distinction between the two approaches to analyzing digital trace data.

The remainder of the included studies were more sociological in focus (n = 21). A significant number of these engaged with the concepts of datafication and digital governance in education, the increasing role of corporate and third party agencies in the production, analysis and visualizations of data across the educational landscape, and the associated ethical challenges. A number of studies (n = 8) engaged extensively with legal regulations and frameworks in their analysis, and in most cases were at least partly written by legal scholars. This indicates the clear interconnections between ethics and legalities. As the discussion below will explore, these legal reflections were particularly linked to privacy and data protection concerns.

It is thus clear that discussions relating to ethics and digital trace data use in education originate from a number of relatively distinct academic communities. In the section of ethical issues below, we look across these communities for themes, commonalities, and gaps.

Institutional Context

Given the wide range of institutions across which education and learning may take place, it is important to identify the context in which the included studies situated their conceptual discussions and empirical work. Table 2c illustrates that discussions of ethical issues for digital data in education are most common within the sphere of higher education (n = 41). The field of learning analytics is most commonly associated with higher education institutions (as well as MOOCs), which explains the similar number of studies in Tables 2b and 2c. There were a substantial number of studies that discussed the ethics of digital trace data in a school context (n = 17). However, although one article (Lupton & Williamson, 2017) discussed ethical issues relating to the use of digital data through different life stages and in different educational contexts, from infancy to adulthood, there was a general absence of research literature addressing the ethical issues associated with the education of younger children, and with games and informal learning outside the classroom. There were a large number of conceptual papers that discussed ethical issues in a broad sense, without engaging with a particular type of educational institution or age group (n = 14). If, as the section on ethics above described, ethical considerations must take into account context, this lack of focus on educational context is problematic and will be discussed later in this article.

The peer-reviewed literature has started to engage with the ethical challenges relating to the increasing role played by software companies and other data mediators (n = 14). This growing field is important, as understanding how companies are making ethical decisions is central to future discussions of the use of digital data for learning and education.

National Context

Ethical practice is inevitably linked to the laws that govern data use, and thus the national and regional jurisdiction within which included studies were carried out has potential importance. Table 2d sets out the national data according to institutional affiliation of authors, showing a dominance of studies from the United States the United Kingdom, and Australia.

Ethical Issues and Responses: A Thematic Analysis

From the thematic analysis of all the papers, we identified four overarching categories of ethical issues. They were privacy, informed consent, and data ownership; validity and integrity of data and algorithms; ethical decision making and the obligation to act; and governance and accountability. In this section, we will address each of these four ethical topics, implications, and author proposals to respond to these ethical issues in turn.

Privacy, Informed Consent, and Data Ownership

Notions of privacy, informed consent, and the extent to which digital data can and/or should be anonymized represent the most commonly and extensively discussed ethical issues across the selected papers. Scholars widely recognized the ethical challenges of informing and obtaining appropriate consent for collecting a range of types of digital data, subjecting it to analytics, and interpreting it. Of the 50 papers that discussed these issues, more than half were from the learning analytics tradition, which tends to be focused on HE institutions. Only four papers talked explicitly about privacy in relation to a school-aged context. The remaining papers, which made reference to Big Data, did not position the discussion within a specific context of education. Rather, they were broad, conceptual papers that discussed the issues of privacy and consent in relation to education and education policy in general. As this review will set out below, it is problematic that these discussions have been taking place without a great deal of attention to specific educational contexts, given the recognition that ethical issues must be considered in a context-specific manner (see, e.g., West et al., 2016).

Conceptualizing privacy

Several studies acknowledged that definitions of privacy, and related theories are contested (e.g., Heath, 2014; Rubel & Jones, 2016) and have a legal basis that varies according to national or regional jurisdiction. Privacy, in this sense, is the right to control and limit the flow of personal information. From this perspective, Rubel and Jones (2016) suggest that privacy is a function of autonomy; that is, to be able to make decisions for oneself, based on one’s own reasons and one’s own values, when one wishes to. Hoel and Chen (2016) emphasize privacy as a sociocultural concept and observe that the boundaries around personal and private data are social, contextualized agreements. As a final point, Regan and Jesse (2018) and Ferguson et al. (2016) note a tendency among scholars and practitioners to characterize ethical concerns under the general rubric of “privacy,” which can serve to oversimplify the concerns and make it too easy for advocates to dismiss them. Indeed, MacCarthy (2014) suggests that privacy disputes are acting as a proxy for more basic clashes in educational values.

Yet researchers such as Cormack (2016) and Ifenthaler and Schumacher (2016) suggest uses of personal data may be increasingly accepted, even expected, both by individual learners and by society. Pardo and Siemens (2014) argue that “society seems to be evolving toward a situation in which the exchange of personal data are normal” (p. 440). Thus, privacy, and how best to conceptualize it, remains open areas of debate by authors within the included studies.

Threats to learner privacy

A significant number of authors in the included studies recognize that privacy of student information is important for education because of the adverse impact that inappropriate uses or disclosures may have on student learning and social development (e.g., Regan & Jesse, 2018; Reidenberg & Schaub, 2018). The mass collection and centralization of student information have been argued to pose significant threats to learner privacy and raise questions about data ownership and consent: who owns the data and who has the right to use and/or profit from it (Cope & Kalantzis, 2016; Cormack, 2016; Jones, 2019; Lynch, 2017; Steiner et al., 2016).

Several studies highlight that the process of data aggregation can mean data are combined from many different sources, is used and/or shared by many actors and for a wide range of purposes (e.g., Wang, 2016). Lawson et al. (2016) cite an example of the use of data in a regional university in Australia in which there was an institutional assumption that student data, consensually gathered at enrolment, could be analyzed beyond the scope of the original consent. Yet, as Cormack (2016) and several others make clear, the potential value of the gathered data becomes apparent only after they are subjected to analysis by computer algorithms, not beforehand. Many scholars recognize that it is hard to predict what correlations will emerge from the data, let alone what their impact on the individual will be (Cormack, 2016). Thus, a number of scholars (e.g., Aguilar, 2017) have emphasized the potential unexpected consequences of data use and the related implications for privacy.

In addition, several studies recognize the possibility that new personal information may be deduced or inferred by employing data analytics on already gathered data. This broadened approach needs to address both whether or not individuals’ consent was secured for data collection in the first place, and privacy issues arising from the development of new information on individuals’ likely behavior through analysis of already collected data. Johnson (2017) argues that this new information can violate privacy but does not call for consent.

Reidenberg and Schaub (2018) suggest, for example, that tools may provide recommendations on which students may need a tutor’s attention or specific interventions. Such early warning systems aim to identify at-risk students with the goal of providing them with needed support and, particularly with respect to higher education, increasing student retention rates. Yet students and parents may not be aware of the existence of learning analytics dashboards or what information is employed within them (Pardo & Siemens, 2014; Reidenberg & Schaub, 2018).

Also of concern within a number of the included studies is the flow of information to private vendors or mediators who combine institutions’ data for analysis, such as the controversial InBloom project in secondary education (Pardo & Siemens, 2014; Polonetsky & Tene, 2014). A number of scholars problematize the sharing of student data with commercial technology vendors, with concerns ranging from inadequate security controls to monetization of student’s information. Indeed, Russell et al. (2018) explore the U.S. marketplace for student data.

Data ownership and legalities

Several studies include specific reference to the challenges relating to the ownership of digital data. Greller and Drachsler (2012) highlight a lack of legal clarity with regard to data ownership. Elouazizi (2014) suggests data ownership is inherently distributed. For example, the instructional designer and/or faculty can own the learning process data, the Learning and Content Management System (LCMS) team (in some institutions) can own the processes and procedures for configuring and collecting the data, and the administrative staff owns part of the learner educational experience data. When it comes to ethical data sharing or analytics, the privacy issues discussed above must be negotiated across this range of actors.

A significant number of studies (n = 8) explore the relationship between digital trace data, ethical issues, and legal regulation. Although Khalil et al. (2018) present the European Union as an example of a fast-evolving legal framework, scholars acknowledge that there is generally a low level of legal “maturity” (Kay et al., 2012, in Pardo & Siemens, 2014), as legal systems are still at the early stages of commenting on privacy, ethics, and data ownership. Making cross-cultural comparisons between the United States and China, Hoel and Chen (2019) note that privacy is handled very differently in different parts of the world. This is a challenge when designing tools and approaches for the use data analytics in a potentially global education market.

Privacy and surveillance

A number of studies linked data privacy to the concept of surveillance, or “dataveillance” (e.g., Hope, 2018). Selwyn (2015) suggests that dataveillance entails the monitoring, mining, and processing of data, which in turn allows for the identification, classification, and representation of social entities in the form of automated data profiles—sometimes described as “data doubles” or “data shadows” (p. 73). Manolev et al. (2019) comment that ClassDojo’s datafying system of school discipline intensifies and normalizes the surveillance of students. Furthermore, it creates a culture of performativity and serves as a mechanism for behavior control. Watson et al. (2017) uses the case of an individual student to problematize dataveillance, suggesting that such practices threatened trust and student autonomy.

Responses

The included studies suggest a number of possible responses to the privacy-related ethical challenges, as set out in Table 5. Some scholars point to specific strategies to digital governance structures and legal regulation, such as privacy review boards, impact assessments, and greater legal safeguards that are better tailored to the particular privacy issues. Others point to better transparency, trust, and learner engagement, including a framework for an ongoing process of consent (Cormack, 2016), ways to encourage active engagement with Terms and Conditions, and ways that learners can have greater control over their own privacy settings. A final group of scholars identify a need to embed privacy issues into system design, including the de-identification of data or using data simulations to preempt and address ethical issues (Hoel & Chen, 2016; Pardo & Siemens, 2014; Reidenberg & Schaub, 2018; Steiner et al., 2016). Despite these numerous helpful responses, there is a very limited focus on what privacy and consent means for younger age groups, or how to involve parents as mediators of consent in the home and school settings, or what privacy means at a household level. Indeed, Rodríguez-Triana et al. (2016) emphasize how such considerations are markedly different and more complex (given interaction between parents, schools, and legal regulations regarding young people) when considering data use with children, compared to students attending a higher education institution.

Table 5

Responses to privacy, consent, and ownership issues

Response	Description
De-identification of data	Slade and Prinsloo (2013), Khalil and Ebner (2016), and Steiner et al. (2016) point to the importance of de-identification and/or encryption of data before the data are made available for institutional use, including the option to retain unique identifiers for individuals.
Privacy by design	Pardo and Siemens (2014), Steiner et al. (2016), and Hoel and Chen (2016) advocate “privacy by design.” Designers are encouraged to include privacy and security issues in the early stages of the design. Reidenberg and Schaub (2018) note that rules for public procurement should specify “privacy by design.”
Consent as an ongoing process	Cormack (2016) has proposed a data protection framework that treats learning analytics as two separate stages: • Discovering significant patterns (“analysis”) • The application of those patterns to meet the needs of particular individuals (“intervention”)
Engage students as collaborators	Slade and Prinsloo (2013), Prinsloo and Slade (2016), Heath (2014), and Roberts et al. (2016) advocate a student-centric approach, suggesting that learning analytics should engage students as collaborators. Ifenthaler and Schumacher (2016) call for more empirical research into the conditions under which students are willing to share relevant data.
Transparency	Slade and Prinsloo (2013), Sclater (2016), and Polonetsky and Tene (2014) call for greater transparency in discussions over privacy and the scope and purpose of data use, both within and between educational institutions and third parties.
Impact assessments	Reidenberg and Schaub (2018) argue for holistic educational impact assessments of learning analytics system to reconcile system design with pedagogical and privacy goals.
Legal safeguards	Reidenberg and Schaub (2018) argue for greater legal safeguards for education privacy that reflect the reality of data-driven education by expanding privacy protections to clearly cover learning analytics.
Privacy review boards	Polonetsky and Tene (2014) call for establishment of privacy ethics boards, modelled on human subject review boards (IRBs) that operate in academic research institutions.
Use of synthetic data	Berg et al. (2016) suggest that using synthetic data, also known as simulation data, during the development and testing of learning analytics systems can help avoid ethical and legal issues.
Greater control over personal privacy settings	Jones (2019) sets out the development of Platform for Privacy Preferences (P3P) technology. Hildebrandt (2017) and Rubel and Jones (2016) suggest students should still be able to object to being profiled as a specific type of learner.
Terms and conditions	Khalil et al. (2018) suggest users do not engage with Terms and Conditions in any context, educational or otherwise. Higher education institutions should find ways to make Terms and Conditions more understandable and accessible.
Fostering trust	Polonetsky and Tene (2014), Steiner et al. (2016), and Russell et al. (2018) suggest fostering learner and parent trust is key. Schools must define their data use philosophy and convey it to stakeholders, including relationships with data brokers.

Validity, Integrity and Interpretation of Digital Data

A significant proportion (n = 35) of included studies identified ethical concerns that relate to the nature of the digital trace data itself and the various assumptions that are intertwined with the processes of data analytics. The included studies address both the validity (i.e., the extent to which the products of data analytics truly represent the phenomenon they claim to measure) and the integrity (i.e., the completeness, accuracy, and consistency) of the digital data within education and learning systems. Similar numbers of these studies were derived from learning analytics, Big Data, and sociological literatures, suggesting that ethical problems with the nature and manipulation of data are receiving attention across the three broad academic communities identified in this review. There were similar numbers of studies addressing these issues in both higher education (15) and schools (10), while three studies made reference to corporate or third party organizations and seven studies did not make reference to a specific educational context.

The reductionist nature of digital trace data

Clayton and Halliday (2017) and Perrotta (2013) highlight that Big Data only tends to include information that is easily measurable and readily quantifiable: test scores, attendance rates, time on task, and completion rates during exercises (cf. Zeide & Nissenbaum, 2018). Slade and Prinsloo (2013) suggest that fixating on data that show what can be measured sometimes leads to a failure to remember that the information is, at best, a partial representation of what one wishes to know about. For example, Veletsianos et al. (2016) highlight three important domains of the experience of MOOC students that are absent from MOOC tracking logs: the practices at learners’ workstations, learners’ activities online but off-platform, and the wider social context of their lives beyond the MOOC. Watson et al. (2017) use the case of an individual student to highlight aspects of a student’s online learning practices that are not reflected in learning analytics data. While the student felt out of his depth and was left “dangling,” these feelings were certainly not reflected in his apparently purposeful learning behaviors. A significant number of the included studies thus discussed the reductionist nature of digital trace data and its analytics (see Selwyn, 2015; Slade & Prinsloo, 2015), providing a snapshot view of a learner at a particular time and context (see Harel Ben Shahar, 2017; Lundie, 2016).

Having clearly highlighted the ethical problems with what is captured by data, and when the data are captured, a significant number of studies note that much of the data require extensive filtering, classification, and standardization to transform the digital trace data into databases, and the ways in which complex data are represented (see, e.g., Edwards, 2015). Similarly, correlations between different variables may be assumed when dealing with missing and incomplete data around usage of, for example, an institution’s learning management system. Such assumptions may be influenced by the analyst’s own perspectives and result in subconsciously biased interpretations. The ethical implications of this are that such approaches risk inadvertently shifting learning and educational processes in ways we do not fully understand (Eynon, 2013). Indeed, Williamson and Piattoeva (2019) problematize the “precarious construction of objectivity” of digital trace data and its analytics and the way it informs “evidence-based” education governance.

Bias and discrimination

The included studies further suggested ways in which the use of digital data in education could serve to increase bias and discrimination toward certain learner groups. For example, predictive tools can produce discriminatory results because they include demographic data that can mirror past discrimination included in historical data. Harel Ben Shahar (2017) suggests algorithms used in educational enrolment can favor recruiting wealthier students over their less affluent peers simply because those are the students the college has always enrolled (Harel Ben Shahar, 2017). Students from a privileged background may also have greater digital skills, which results in more successful engagement in a digital environment. These disparities do not reflect an actual gap in academic ability but could cause the algorithm’s prediction to be biased against young people from low-income families or ethnic minorities (Har Carmel & Harel Ben Shahar, 2017).

Data analytics models that rely on demographic data (or that do not take into account disparate outcomes based on demographics) may also unintentionally entrench disparities in achievement among particular groups. For example, early-alert and program recommender systems can disproportionally flag certain ethnic or economically disadvantaged groups for poor achievement or steer them away from more challenging and/or economically rewarding paths, thus potentially exacerbating their disadvantage (see, e.g., Har Carmel & Harel Ben Shahar, 2017). Stahl and Karger (2016) suggest that a digital record of the student’s negative behaviors or poor academic performance, and being profiled as a particular type of learner, could ultimately interfere with future educational and/or employment opportunities. These concerns may be more pronounced with respect to students with disabilities in light of the potentially sensitive nature of the data being collected. Schouten (2017) emphasizes that data are often used to profile students who are regarded as vulnerable or educationally at risk, thus these particular ethical concerns may disproportionally affect disadvantaged students.

Several scholars also contend that algorithmic decision making may create new groups that are systematically unfairly disadvantaged. For example, if children who are color blind or who engage in after-school sports are less likely to succeed on computerized tasks, the algorithmic predictions are less favorable for them, this may be detrimental to their educational prospects, and they may be discriminated against in ability grouping processes (see, e.g., Har Carmel & Harel Ben Shahar, 2017; Harel Ben Shahar, 2017).

Scholes (2016) argues that at the heart of the analytics is the concept of categorizing students according to the statistical risk that can be attached to them, and subjecting students to different treatment on this basis. Scholes (2016) emphasizes the possible discriminatory effects of such practices. For example, a prediction that leads to a decision about a student’s educational future does more than indicate the student’s ability: it constitutes it (Harel Ben Shahar, 2017). It is extremely hard, therefore, to verify the algorithm’s predictions and to expose any inaccuracies and falsehoods (Clayton & Halliday, 2017).

Responses

The included studies highlighted a number of responses to the ethical challenges of data validity and integrity (see Table 6), which included deeper and more regular interrogation of the data; better training for a wider workforce in data interpretation, so that it may be more effectively interrogated; and adoption of such ethical principles and practices from the outset of any project as part of an “ethics by design” approach. Once again, scholars call for greater transparency in the underlying algorithms in educational software, and better communication surrounding their limitations, in order to improve trust and credibility.

Table 6

Responses to challenges to data validity and integrity

Response	Description
Data monitoring and interrogation	Sclater (2016) and Prinsloo (2017) argue institutions should use qualified staff to monitor the quality, robustness, and validity of their data and analytics processes to instill confidence in learning analytics, keep bias in check, and ensure it is used to the benefit of students. Perrotta and Williamson (2018) call for critical examination of the “algorithmic assemblage” that constitutes learning analytics. Jones and McCoy (2019) propose a documentation studies framework to investigate how individuals are “made into” data points.
Incorporating contextual information	Slade and Prinsloo (2015) suggest including context-rich information in data analytics processes. Scholes (2016) describes how instructional design can help support recognition of students as individual agents by incorporating factors specific to individual students.
Ethics by design	Har Carmel and Harel Ben Shahar (2017) and Harel Ben Shahar (2017) argue that there should be a combination of technological solutions and legal regulation, both of which should begin at the design of algorithms. This includes removing discriminatory attributes. Reidenberg and Schaub (2018) argue that accountability and security need to be integral aspects of learning analytics technology rather than afterthoughts. Rodríguez-Triana et al. (2016) advocate teacher, parent, and student participation to define the educational purposes of the analysis, reflect on the available data sources, improve the validity of the results (by adding new evidence), and identify the limitations of the results.
Transparency	Steiner et al. (2016) suggest that communication about the limitations of algorithms is important to optimize trust and credibility. Regan and Jesse (2018) recommend that algorithms should be made available for examination by educators and researchers, and that a disinterested third party should review software using algorithms prior to adoption.
Data skills and training	Greller and Drachsler (2012), Nunn et al. (2016), Fenwick and Edwards (2016), and Perrotta (2013) call for better training for education professionals on the interpretation of data. Data analytics should not be accepted as black boxes that only computer specialists can understand (Fenwick & Edwards, 2016).
Collaboration across disciplines	Fenwick and Edwards (2016) call for greater cross-disciplinary work, especially between public service professionals and computer scientists, to explore possibilities and limitations of data.
Expiration of data	Data collected through learning analytics should have an agreed-upon lifespan and expiry date, as well as mechanisms for students to request data deletion (Lundie, 2016; Slade & Prinsloo, 2013).

Ethical Decision Making and the Obligation to Act

The obligation to act was discussed most explicitly within the context of learning analytics in higher education (see Daniel, 2019; Greller & Drachsler, 2012; Howell et al., 2018; MacFadyen, 2017; Roberts et al., 2016; Sclater, 2016; Slade & Prinsloo, 2013) (n = 7).

The insights provided by Big Data and data analytics can inform students to adjust their behaviors, and institutions to provide additional support to individuals and groups. More than ever before institutions have rich and detailed, albeit incomplete, pictures of students’ learning journeys. As discussion in this article has set out, there is widespread acknowledgment that having access to data does not, per se, translate into complete or necessarily accurate information or knowledge and the wider educational implications of using this data are still subject to debate. Yet, where such data does flag a likely outcome which might result in student harm, if, for example, a student is at risk of underperforming or dropping out of a course, it does raise the obligation to act, whether on moral and/or legal grounds.

As the above included studies have recognized, this raises a number of important questions: What obligation does the institution have to intervene when there is evidence that a student could benefit from additional support? What obligation do students have when analytics suggest actions to improve their academic progress? How are the appropriate interventions decided upon? While the included studies make clear the challenges relating to obligations to act, there remains a need for greater attention to the ways in which we should respond to insights about our students’ dispositions, learning behaviors, and risk profiles (given the recognized limitations to the uses of digital trace data). These issues are not only relevant to those working in the formal educational sector. Institutions developing analytics for use in the home have responsibilities related to when and how to alert parents and young people to potential learning issues.

A small number of the included studies proposed some possible ways forward (see Table 7), including the careful consideration in advance (and in hindsight) of the circumstances, nature of possible interventions, and their possible impacts. Scholars also identify the importance of taking into account additional qualitative data when considering an individual learner (see Greller & Drachsler 2012). Yet responding to the analyses and insights provided by digital data analytics is often constrained by a range of factors, such as lack of political will, gaps in performance contracts, and/or a lack of financial or human resources.

Table 7

Responses to the obligation to act

Response	Description
Consider circumstances for intervention	Sclater (2016) suggests institutions should clearly specify circumstances under which they should intervene when analytics suggests that a student could benefit from additional support.
Consider nature of possible intervention	Sclater (2016) suggests clearly specifying the type and nature of possible interventions, and retrospectively reviewing appropriateness and effectiveness.
Consider impacts of possible interventions	The impact of possible interventions on staff roles, training requirements, and workload should be carefully considered (Sclater, 2016).
Use additional qualitative data	Greller and Drachsler (2012) emphasize the importance of using additional qualitative data alongside data analytics as part of the decision-making process.

Governance and Accountability Across the Educational Landscape

The studies addressing the phenomena of datafication, digital governance, and accountability (n = 19) were split between those predominantly discussing increasing forms of datafication in schools (n = 5), those looking at institutional governance in learning analytics (usually part of a code of practice or framework; n = 5), and those addressing accountability and governance through examining the role of corporate entities and data mediating agencies (n = 9).

A significant number of scholars (Fenwick & Edwards, 2016; Hartong, 2016; Lupton, 2015; Manolev et al., 2019; Ratner et al., 2019; Selwyn 2015; Williamson, 2015a, 2015b) raise important issues regarding the increasing power of software companies (Williamson 2016b), nonstate actors and centers of data analytics (see Lindh & Nolan, 2016; Ratner et al., 2019) in the sphere of data and education, where there is often “the predictive analytics capacities to anticipate and pre-empt educational futures” (Williamson, 2016a, p. 123).

In this way, Fenwick and Edwards (2016) assert that digital technologies are not simply technical solutions to enhancing the quality, efficiency, and effectiveness of practices but can also be powerful “value-embedded socio-technical interventions” in the attempted shaping of practices, accountabilities, and responsibilities. Scholars report the way in which educational institutions are conducting their work practices in this context of “algorithmic governance” (e.g., Williamson, 2015b), often without fully realizing how the data they generate are taken up and used by other actors and agencies and what the broader implications are of becoming entangled within these digital knowledge systems (Lupton, 2015). Despite the potential problems with validity and integrity of digital trace data for education, such data science practices are at risk of being presented as “objective ways of knowing,” making calculations about intervening in educational practices and learning processes (Perrotta & Williamson, 2018, p. 6). MacGilchrist (2019) notes that

no matter how good the motives, and how pedagogically well-founded the decisions, it is a post-democratic moment when the ability to make these decisions has shifted from publicly accountable government officials, policy-makers or educators, to developers, programmers, designers and other staff in private edtech organisations. (p. 83)

In addition, despite the increasingly complex partnerships with different global and national data service contractors (Lupton, 2015), these have rarely been the subject of in-depth analysis or ethical scrutiny (Hartong, 2016). While codes of practice and ethical frameworks such as Slade and Prinsloo (2013) and Sclater (2016) have highlighted the need for appropriate data management and interpretation skills, this guidance is generally to be implemented at the level of the educational institution. Indeed, all but one of the frameworks identified as part of this review originate from the field of learning analytics (which is generally associated with higher education institutions). Some initial responses to these observations have been suggested in the included studies (see Table 8), including a move away from thinking about from institutions as isolated data spheres but rather a network of publicly accessible data, a greater transparency in the way that these data have been produced and consumed by different actors across this network. Yet very few studies have discussed ways in which ethical practices of institutions should relate to third parties, or governance polices across these actors.

Table 8

Responses to ethical challenges of governance and accountability

Response	Description
Understanding data networks	Fynn (2016) advocates a shift in thinking away from institutions as isolated data spheres into a network of publicly accessible data that is directly related to the goal of student analytics (cf. Edwards, 2015; Hartong 2016).
Examine responsibilities of public and private agencies	Souto-Otero and Beneito-Montagu (2016) suggest examining the requirements that the public sector imposes on private sector organizations during the commissioning of data-work and how transparent the public sector is in the use of these companies.
Transparency of process	Pardo and Siemens (2014) and Edwards (2015) call for greater transparency in the work of software, what is assumed in its development, and what it can and does do. Through such efforts at transparency, those institutions engaged with analytics become more accountable, which is essential to ensure that the use of digital data in education is ethical.

Engagements With Theories of Ethics

As discussed earlier in this review, West et al. (2016) suggest that engaging in an ethical decision-making process should involve consideration and acknowledgement of our values (which each individual has) and context so as to identify the ethical principle(s) that are being applied to the decision-making process. As such, it is essential that such principles are understood and de-constructed to acknowledge the position and value base from which they arise (West et al., 2016). However, as is clear from Table 9, it is relatively rare for authors in this domain to acknowledge these assumptions or to attempt to articulate those assumptions that underpin their ethical decisions and prompt their questions about ethics.

Table 9

Theoretical engagements in included studies

Study	Theory	Summary
Harel Ben Shahar (2017)	Distributive justice	Paper discusses effects of incorporating information and communication technologies in schools in terms of distributive justice
Clayton and Halliday (2017)	Liberalism and individualism	Paper uses conceptions of liberalism and individualism to problematize using Big Data in education
Heath (2014)	Privacy theory	Paper considers the contribution contemporary privacy theories can make to learning analytics
Hildebrandt (2017)	Behaviorism	Paper investigates some of the assumptions of learning analytics related to behaviorism
Hope (2018)	Surveillance creep (Marx)	Using ideas of surveillance, data creep, policy creep and concept creep, this paper explores the digital monitoring of students
Willis and Strunk (2017)	Utilitarianism, relativism, and care ethics	Employing utilitarianism, relativism, and care ethics, the authors discuss the centrality of human agency in educational interaction
Johnson (2017)	Structural justice	Ethical challenges in learning analytics are understood within a framework of structural justice
Lundie (2016)	Hegel	This paper uses Hegel’s ideas in relational pedagogy which problematizes a datafied learning environment reduced to finite measures
Manolev et al. (2019)	Foucault: surveillance	Paper argues ClassDojo’s datafying system of discipline intensifies and normalizes the surveillance of students, serving as a mechanism for behavior control
Perrotta and Williamson (2018)	Material semiotics	Paper draws on material semiotics to examine cluster analysis as a “performative device” that can “create” educational entities it claims to objectively represent
Prinsloo and Slade (2016)	Vulnerability	Paper explores vulnerability in the nexus between realizing the potential of learning analytics; the fiduciary duty of higher education institutions and student agency
Scholes (2016)	Discrimination	Paper uses theories of discrimination to explore how learning analytics can better predict which students are at greater risk of dropping out, and use the statistics to treat “risky” students differently
Selwyn (2015)	Need better use of theory	Paper calls for insightful use of social theory to make better sense of education and digital data
Souto-Otero and Beneito-Montagu (2016)	Governmentality (Foucault)	Article sets out a change of logic toward governmentality, whereby social actors become both objects and subjects of government through the use of digital data
West et al. (2016)	Ethical schools of thought	Higher education representatives were asked about the ethical issues guiding learning analytics use and compared with established ethical theories and principles
Wintrup (2017)	Connectivist learning/the panopticon	Contrasting perspectives are offered by Siemen’s theory of connectivist learning and Foucault’s notion of the panopticon

Across the included studies, the engagement with theoretical literature to discuss ethical concerns was broad, and often drawing on concepts of surveillance, governmentality, structural justice, discrimination, and vulnerability to explore the power imbalances presented by ethical issues. While discussion of these individual theories is beyond the scope of this article, it is clear that scholars have used different theoretical frameworks. However, there was often limited or no attempt to define what scholars understood as ethics, perhaps because it is assumed that everyone has a shared understanding of the term. A minority of studies made explicit the values and/or theories through which they arose. In particular, Har Carmel and Harel Ben Shahar (2017) and Harel Ben Shahar (2017) situate their discussions of Big Data use in education with a framework of distributive justice and employ (and critique) consequential logic to their discussion and problematization of ethical issues in data-driven ability grouping in schools. Slade and Prinsloo (2013) pose a question with a strongly utilitarian emphasis:

At some point, all institutions supporting student learning must decide what their main purpose really is: to maximize the number of students reaching graduation, to improve the completion rates of students who may be regarded as disadvantaged in some way, or perhaps to simply maximize profits. (p. 1514)

Similarly, Cormack’s (2016) “balancing test” to determine the outcome of ethical decisions in learning analytics could be understood as driven by consequentialist ethics, considering all relevant interests for involved parties.

Willis and Skunk (2017) and Prinsloo and Slade (2016) both seek to move beyond utilitarianism toward an ethics of care, which holds that moral action centers on interpersonal relationships and views care or benevolence as a virtue. While consequentialist and deontological ethical theories emphasize generalizable standards and impartiality, an ethics of care emphasizes the importance of response to the individual. The distinction between the general and the individual is reflected in their different moral questions: “What is just?” versus “How to respond?” This links to the discussion earlier in this review relating to the obligation to act.

In practice, the values underpinning ethical practice are rarely set out clearly, justified, or interrogated (Ferguson et al., 2016). Yet values are not consistent from country to country, from institution to institution, or even from classroom to classroom (Ferguson et al., 2016). Rodríguez-Triana et al. (2016) suggest that ethical considerations need to be context specific, noting that different ethical issues arise in higher education and primary school settings.

From these selected papers, there were some specific proposals to address these issues (see Table 10). Rodríguez-Triana et al. (2016) call for greater stakeholder involvement to better incorporate differences in ethical priorities across cultures and contexts. Fynn (2016) suggests the need to negotiate the ideological value base of learning analytics systems, and Ferguson et al. (2016) suggest we should strive for specific ethical goals. Willis and Strunk (2017) argue that the ethics of technology and, more specifically, of learning analytics

ought to be deeply ingrained in innovation, implementation, and assessment. At each critical juncture of development, a set of principled frameworks should be employed to think through possible outcomes, unintended consequences, and how those ethical decisions could affect future inventions. (p. 61)

Table 10

Establishing ethical values and priorities

Response	Descriptions
Identify ethical challenges and set ethical goals	Ferguson et al. (2016) set out ethical challenges and 9 ethical goals:1. Student success2. Trustworthy educational institutions3. Respect for private and group assets4. Respect for property rights5. Educators and educational institutions that safeguard those in their care6. Equal access to education7. Laws that are fair, equally applied, and observed8. Freedom from threat9. Integrity of self
Negotiate ideological value base of analytics system	Fynn (2016) proposes creating a cultural shift by debating, negotiating, and communicating the ideological and value base on which the analytics will be built
Involvement of stakeholders in identifying ethical priorities	Rodríguez-Triana et al. (2016) and Ferguson et al. (2016) suggest the involvement of all stakeholders and the recognition of differences between countries and educational contexts

The papers suggest moves in a positive direction, but without a strong conceptual underpinning and more explicit guidance on how ethical practice might be achieved, it is difficult for a collective response from the education community to be developed.

Discussion

In this article, we have mapped the current landscape of research in the ethics of using digital trace data in learning and education, specifically by study type, academic community, institutional setting, and national context. As would perhaps be expected, many papers are conceptual in orientation and located in specific geographic areas (notably the United States, the United Kingdom, and Australia). This geographical focus partly reflects where a lot of research using digital trace data are carried out, but also reflects the parameters of our search. Activities in other parts of the globe, notably India, China, and parts of South America, will be important to examine in future research.

A very large proportion of the existing research is focused on higher education. There is a distinct lack of attention on ethical requirements in schools, particularly early years and informal contexts of learning. Clayton and Halliday (2017) suggest that some of the most important challenges related to using digital trace data in education relate to preschool and school-aged contexts. Indeed, outside of education there has been a significant discussion about the needs for children’s rights to be firmly integrated into the agendas of global debates about ethics and data science (Berman & Albright, 2017; Livingstone et al., 2015; Lupton & Williamson, 2017). Special legal and cultural considerations need to be in place for children and young people both in school and in informal contexts of learning including the home. From the mapping phase we can see that the two broad academic communities set out at the beginning of the article, that is, those engaged in learning analytics and EDM and those focused on digital education governance, are clearly identifiable in the data. There is limited overlap between them. This is a missed opportunity, as Floridi (2018) notes both would benefit from stronger links in order to understand ethics as part of a wider system of governance and regulation in this constantly shifting landscape (as shown in Figure 1).

The thematic analysis has shown that there are four main areas of focus in this domain. The first theme—privacy, informed consent, and data and ownership—formed a central focus of the majority of papers, followed by issues related to the validity and integrity of data and algorithms, then governance and accountability, with a small number focused on the theme of obligation to act. For each theme, authors proposed multiple ways forward to address the ethical challenges that were highlighted: yet were relatively disparate in approach.

We would like to draw attention here to two broader issues cut across all four thematic themes: transparency and power. Across the included studies, there are discussions relating to the need for greater transparency in data privacy, access, and ownership (i.e., students knowing what data are being collected and having access to their own data), in data interrogation (having access to and understanding the processes and algorithms to which the data are subjected), and in the activities of corporate and third party actors.

Independent reviews of educational software that utilizes algorithms prior to their use in schools may be one way to help address some of these issues (Boninger et al., 2017; Regan & Jesse, 2018). Others, such as Greller and Drachsler (2012), Nunn et al. (2016), Fenwick and Edwards (2016), and Perrotta (2013), call for better training for education professionals on the interpretation of data. However, transparency and associated interventions are not straightforward. In the terms of Barocas and Nissenbaum there is the problem of the “transparency paradox”: simplicity and fidelity cannot both be achieved because the details necessary to convey properly the impact of the information practices in question would be far too complex to be accessible and understood by the majority (Barocas & Nissenbaum, 2014). Furthermore, as Tsai et al. (2019) note, drawing on Ananny and Crawford (2018), total transparency is not only very difficult but also positions individuals as solely responsible for their decision to use a system. However, the inherent imbalance in the power relationships in the various contexts in which data are collected poses questions about the extent to which individuals can truly make informed decisions about the use of their data.

Recognition of issues of power was a consistent theme throughout the articles we have reviewed above. Moss and Metcalf (2020) suggest the ethical domain is one in which power is contested: who gets to decide what ethics is will determine much about what kinds of interventions technology can make in all of our lives, including who benefits, who is protected, and who is made vulnerable. Within the review, a number of theoretical perspectives well known to critical scholars of education were evoked to explore issues of power and vulnerability (e.g., Foucault, Hegel, Marx). Issues raised above included the power institutions yield over learner data, and the means to address power imbalances through increased learner engagement. Others highlighted the power of data systems and algorithms to introduce bias and discrimination in interpretation of digital trace data, the power of software companies and other third party agencies due to their role in data collection, storage and mediation, and the limits of the power educators had to act based upon these data.

In other areas of social life, individuals have some (although often admittedly limited) power, in terms of the decision whether or not to use a certain system or to determine settings that mean less of their data are tracked. Yet the context of learning and education can often be quite different as in many cases students and other stakeholders have far less agency. For example, sharing data can be synonymous with course enrolment, with students being unable to opt-out and still enroll in the class. Parents may be similarly coerced, as they can feel that they may place their child at a disadvantage if they opt out from them using particular education software (Zeide, 2017). Thus, the need to provide specific attention on digital trace data does not just derive from the characteristics of the data, but also the unique context of the field of education itself.

The importance of this issue is further highlighted by another significant finding from this review, that many of these papers did not engage explicitly with theoretical debates in the field of ethics. “Ethics” in many of the papers is presented as a given, as if all authors and readers have a shared understanding, but often scholars do not elaborate on what is meant or intended by this term. Yet, as we have discussed ethics is a rich field, there is a need to make explicit the theoretical/epistemological underpinnings of those studying this question in relation to digital trace data and education to help provide strong foundations to future work in this domain. As a community and as a field, we need to make our values and the basis for those values far more explicit.

The reviewed papers demonstrate an emerging recognition and exploration of the ethical challenges posed by the use of digital trace data and the accompanying renegotiation of legal, accountability, and governance structures in education. In the results sections above relating to our thematic analysis: privacy, informed consent and data and ownership, validity and integrity of data and algorithms, governance and accountability, and the obligation to act we have summarized potential responses to each ethical concern, and we hope that the responses set out in Tables 5 to 9 provide a useful guide for policymakers, practitioners, and researchers seeking to make ethical choices at this time.

However, we also wish to suggest three broad lines of future work that are much needed. At present, the educational literature on this topic is less advanced than in other related fields and intellectual traditions, where there has been a much deeper engagement with ethical concerns, frameworks, and implications for future technology development.

First is the need to increase the awareness of the wider ecosystem of learning with digital technologies that lead to the production and use of digital trace data. Lupton and Williamson (2017) present a means to think across the life course of a learner (from early years onwards), and this ensures a more developmental perspective to such debates; and also to recognize that learning and education of this kind may take place across an array of settings both in formal and informal contexts of learning, across geographical settings with different legal and social rules, norms, and regulations.

As part of this, it is important to attend to the ever-increasing role of corporate and third party actors in the collection and storage and processing of data for learning and education. Education is distinct from other domains in which Big Data are being applied and needs special consideration (Clayton & Halliday, 2017). There needs to be a more holistic view of how and where learning and education takes place, and by whom, to ensure that all institutions engaged in learning (not just formal educational institutions) are accountable, and that the rights, needs, and experiences of all learners are recognized. This includes the incorporation of additional contextual data when making decisions concerning individual learners. Given the parameters of our review, it is likely that the role of the commercial sector and private actors and their practices are even more prevalent than indicated here. Thus, this area is an essential focus of future enquiry.

Next is the need to develop a more explicit engagement with existing theoretical framings that could be used as a lens to explore this specific domain. Without it, there is a risk that the development of a meaningful and coherent body of research that outlines the ethics of the use of digital trace data for learning and education will not be realized due to the lack of theory explicitly used in these studies. At present, many of the studies seem to emerge from a relatively “short term” vision of such debates, where immediate institutional needs (often those of a higher education institution) predominate. As Boden et al. (2009) warned, this can risk an overemphasis on the creation of procedures, rules, and regulations, with philosophical debates on matters pertaining to morals, ethics, standards, and principles having been sidelined. Future papers need to pay attention both to local context and to broader theoretical principles. Relatedly, we would encourage a more long-term vision in many of these debates. There is a lack of attention and exploration of future orientated questions (e.g., issues of measurement and the obligation to act). In this way, ethical debates tend not to anticipate future challenges, and often lag behind what it is technically feasible and tend not to recognize or build on past understandings in other areas of education.

At present, the relationships between ethics, governance, and legal regulation as set out by Floridi (2018) in the introduction to this paper are not synchronized in the domain of digital trace data in learning and education. In order to create greater synchronicity we need stronger explicit discussions of theoretical orientations, to unpack “ethics as a keyword” (Moss & Metcalf, 2020), clearer discussions of contexts and audiences and why these matter, and a strong focus on considering these issues within the specific case of learning and education. These all need to be considered in a holistic way that builds on previous analyses and theoretical arguments where different communities work together.

The synthesis suggests that one way to achieve this in such a rapidly emerging field may be to take an “ethics by design” approach (see, e.g., Har Carmel & Harel Ben Shahar, 2017; Harel Ben Shahar, 2017; Rodríguez-Triana et al., 2016); that chimes with many of the proposed ways forward identified in the thematic analysis above. In other words, to incorporate ethical considerations from the outset of any initiative or intervention that involves the use of digital trace data as part of the research endeavor that recognizes that ethical decisions within educational contexts requires continuous renegotiation. The included studies suggest that this process needs to involve key stakeholders and take contextual factors into account when defining ethical priorities. Although there remains relatively little instruction as to how these laudable goals might be achieved, there is helpful guidance emerging. For example, franzke et al. (2020) set out an enhanced framework to engage in a “processes of deliberation” to empower ethically informed research practices. Researchers could also draw on similar activities and approaches in other fields such as HCI (Gray & Boling, 2016).

Such an approach should take into account, in a research informed way, the context in which ethical decision making occurs, and should incorporate a complex network of actors across the educational landscape. Scholars would be encouraged to make ethical decisions in a more iterative, collaborative, and transparent way that links stakeholders across settings, practices, and policies. In a similar way to participative design, a well-established approach in some areas of education, research can be actively linked to both changing practice and developing theory at the same time. This research domain could then develop in ways that takes into account the wider ecosystem of learning, to set a proactive, forward-looking ethical agenda.

Conclusion

In this article, we have carried out a systematic qualitative analysis of the emerging research in the ethics of digital trace data use in learning and education. From our systematic search of three databases, we identified 77 peer-reviewed studies. In order to address the first research question we examined the articles by study type, academic community, institutional setting, and national context and found a significant overrepresentation of higher education, and two relatively disconnected academic communities working in this space. We then carried out a thematic analysis of the ethical issues and responses that are highlighted in existing research on the ethics of digital trace data use in education and learning; and reported on the findings and arguments from four identified themes of work: privacy, informed consent, and data transparency and ownership; validity and integrity of data and algorithms; governance and accountability; and ethical decision making and the obligation to act. From this process we also found important gaps in the literature. Given the importance of this area, and the likely growth in the use of digital trace data for learning and education, we suggest there is a need for future research to begin to work toward a more cohesive community of thought, where the wider learning and educational ecosystem is recognied, explicit engagement with ethical theory is central, and mid- to long-term issues are also considered alongside immediate concerns. In so doing, the academic community can build a proactive ethical agenda in this developing domain. This review provides an important foundation to these discussions.

Footnotes

ORCID iD

Rebecca Eynon

This work was supported by Ferrero International, Grant Number R52124/CN001.

Notes

Authors

LAURA HAKIMI is a researcher affiliated with the Department of Education and the University of Oxford, Oxford, UK; email: laurajanehakimi@gmail.com . Her research interests include the use of technology across educational contexts, particularly within disadvantaged and marginalized communities.

REBECCA EYNON is a professor of education, the Internet and society at the University of Oxford, Oxford, UK; email: rebecca.eynon@oii.ox.ac.uk , where she holds a joint appointment between the Department of Education and the Oxford Internet Institute. Her research explores the intersections between digital life, education, and inequality.

VICTORIA A. MURPHY is a professor of applied linguistics at the Department of Education, University of Oxford, Oxford, UK; email: victoria.murphy@education.ox.ac.uk . Her research interests include language and literacy development in EAL (ELL/ESL/language minority) children and foreign language learning in primary school.

References

*Aguilar

S. J.

(2017). Learning analytics: At the nexus of big data, digital innovation, and social justice in education. TechTrends, 62(1), 37–45. https://doi.org/10.1007/s11528-017-0226-9

Ananny

Crawford

(2018). Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability. New Media & Society, 20(3), 973–989. https://doi.org/10.1177/1461444816676645

Andrejevic

(2013). Infoglut: How too much information is changing the way we think and know. Routledge.

Avella

J. T.

Kebritchi

Nunn

S. G.

Kanai

(2016). Learning analytics methods, benefits, and challenges in higher education: A systematic literature review. Online Learning, 20, 13–29. https://olj.onlinelearningconsortium.org/index.php/olj/article/view/790

Barocas

Nissenbaum

(2014). Big data’s end run around anonymity and consent. In Lane

Stodden

Bender

Nissenbaum

(Eds.), Privacy, big data, and the public good: Frameworks for engagement (Vol. 1, pp. 44–75). Cambridge University Press.

Bassett

O’Riordan

(2002). Ethics of internet research: Contesting the human subjects research model. Ethics and Information Technology, 4(3), 233–247 https://doi.org/10.1023/A:1021319125207

Beauchamp

T. L.

Childress

J. F.

(1994). Principles of biomedical ethics. Oxford University Press.

Becker

L. C.

Becker

C. B.

(Eds.). (2001). Encyclopedia of ethics (2nd ed.). Routledge.

*Ben Porath

Harel Ben Shahar

. (2017). Introduction: Big data and education: Ethical and moral challenges. Theory and Research in Education, 15(3), 243–248. https://doi.org/10.1177/1477878517737201

10.

*Berg

A. M.

Mol

S. T.

Kismihók

Sclater

(2016). The role of a reference synthetic data generator within the field of learning analytics. Journal of Learning Analytics, 3(1), 107–128. https://doi.org/10.18608/jla.2016.31.7

11.

Berman

Albright

(2017). Children and the data cycle: Rights and ethics in a big data world. UNICEF Office of Research-Innocenti. https://www.unicef-irc.org/publications/907-children-and-the-data-cyclerights-and-ethics-in-a-big-data-world.html

12.

Boden

Epstein

Latimer

(2009). Accounting for ethos or programmes for conduct? The brave new world of research ethics committees. Sociological Review, 57(4), 727–749. https://doi.org/10.1111/j.1467-954X.2009.01869.x

13.

Boellstorff

(2013). Making big data, in theory. First Monday, 18(10), Article 4869. https://doi.org/10.5210/fm.v18i10.4869

14.

Boninger

Molnar

Murray

(2017). Asleep at the switch: Schoolhouse commercialism, student privacy, and the failure of policymaking (Report on Schoolhouse Commercialization Trends). National Education Policy Center. https://nepc.colorado.edu/publication/schoolhouse-commercialism-2017

15.

boyd

Crawford

(2012). Critical questions for big data. Information, Communication & Society, 15(5), 662–679. https://doi.org/10.1080/1369118X.2012.678878

16.

Cheney-Lippold

(2011). A new algorithmic identity: Soft biopolitics and the modulation of control. Theory, Culture & Society, 28(6), 164–181. https://doi.org/10.1177/0263276411424420

17.

*Clayton

Halliday

(2017). Big data and the liberal conception of education. Theory and Research in Education, 15(3), 290–305. https://doi.org/10.1177/1477878517734450

18.

*Cope

Kalantzis

(2016). Big data comes to school: Implications for learning, assessment, and research. AERA Open, 2(2), 1–19. https://doi.org/10.1177/2332858416641907

19.

Corbin

Strauss

(2007). Basics of qualitative research: Techniques and procedures for developing grounded theory. Sage.

20.

*Cormack

(2016). A data protection framework for learning analytics. Journal of Learning Analytics, 3(1), 91–106. https://doi.org/10.18608/jla.2016.31.6

21.

*Daniel

B. K.

(2019). Big data and data science: A critical review of issues for educational research. British Journal of Educational Technology, 50(1), 101–113. https://doi.org/10.1111/bjet.12595

22.

Dey

(2003). Qualitative data analysis: A user friendly guide for social scientists. Routledge.

23.

Dietrichson

Bøg

Filges

Klint Jørgensen

A.-M.

(2017). Academic interventions for elementary and middle school students with low socioeconomic status: A systematic review and meta-analysis. Review of Educational Research, 87(2), 243–282. https://doi.org/10.3102/0034654316687036

24.

Dietz-Uhler

Hurn

J. E.

(2013). Using learning analytics to predict (and improve) student success: A faculty perspective. Journal of Interactive Online Learning, 12(1), 17–26. https://core.ac.uk/download/pdf/30676491.pdf

25.

Dodman

S. L.

DeMulder

E. K.

View

J. L.

Swalwell

Stribling

Dallman

(2019). Equity audits as a tool of critical data-driven decision making: Preparing teachers to see beyond achievement gaps and bubbles. Action in Teacher Education, 41(1), 4–22. https://doi.org/10.1080/01626620.2018.1536900

26.

*Edwards

(2015). Software and the hidden curriculum in digital education. Pedagogy, Culture & Society, 23(2), 265–279. https://doi.org/10.1080/14681366.2014.977809

27.

Effrem

K. R.

(2016, June 3). The dark side of student data mining. The Pulse. http://thepulse2016.com/karen-r-effrem/2016/06/03/response-to-us-news-educational-data-mining-harms-privacy-without-evidence-of-effectiveness/

28.

*Elouazizi

(2014). Critical factors in data governance for learning analytics. Journal of Learning Analytics, 1(3), 211–222. https://doi.org/10.18608/jla.2014.13.25

29.

Essa

Ayad

(2012). Improving student success using predictive models and data visualisations. Research in Learning Technology, 20(1), 58–70. https://doi.org/10.3402/rlt.v20i0.19191

30.

*Eynon

(2013). The rise of big data: What does it mean for education, technology, and media research? Learning, Media and Technology, 38(3), 237–240. https://doi.org/10.1080/17439884.2013.771783

31.

Eynon

Fry

Schroeder

(2017). The ethics of online research. In Fielding

N. G.

Lee

R. M.

Blank

(Eds.), The SAGE handbook of online research methods (pp. 19–37). SAGE.

32.

*Fenwick

Edwards

(2016). Exploring the impact of digital technologies on professional responsibilities and education. European Educational Research Journal, 15(1), 117–131. https://doi.org/10.1177/1474904115608387

33.

Ferguson

(2012). Learning analytics: Drivers, developments and challenges. International Journal of Technology Enhanced Learning, 4(5/6), 304–317. https://doi.org/10.1504/IJTEL.2012.051816

34.

*Ferguson

Hoel

Scheffel

Drachsler

(2016). Guest editorial: Ethics and privacy in learning analytics. Journal of Learning Analytics, 3(1), 5–15. https://doi.org/10.18608/jla.2016.31.2

35.

Floridi

(2018). Soft ethics and the governance of the digital. Philosophy & Technology, 31(1), 1–8. https://doi.org/10.1007/s13347-018-0303-9

36.

Fong

C. J.

Davis

C. W.

Kim

Y. W.

Marriott

Kim

(2017). Psychosocial factors and community college student success: A meta-analytic investigation. Review of Educational Research, 87(2), 388–424. https://doi.org/10.3102/0034654316653479

37.

franzke

a. s.

Bechmann

Zimmer

Ess

C. M.

(2020). Internet research: Ethical Guidelines 3.0. https://aoir.org/reports/ethics3.pdf

38.

Friedman

Kahn

P. H.

Jr. (2003). Human values, ethics, and design. In Jacko

J. A.

Sears

(Eds.), The human-computer interaction handbook (pp. 1177–1201). Lawrence Erlbaum.

39.

*Fynn

(2016). Ethical considerations in the practical application of the Unisa socio-critical model of student success. International Review of Research in Open and Distributed Learning, 17(6), Article 2812. https://doi.org/10.19173/irrodl.v17i6.2812

40.

Geiger

R. S.

Ribes

(2011). Trace ethnography: Following coordination through documentary practices. 2011 44th Hawaii International Conference on System Sciences. https://doi.org/10.1109/HICSS.2011.455

41.

Gilligan

(1982). In a different voice: Psychological theory and women’s development. Harvard University Press.

42.

Gray

C. M.

Boling

(2016). Inscribing ethics and values in designs for learning: A problematic. Educational Technology Research and Development, 64(5), 969–1001. https://doi.org/10.1007/s11423-016-9478-x

43.

*Greller

Drachsler

(2012). Translating learning into numbers: A generic framework for learning analytics. Journal of Educational Technology & Society, 15(3), 42–47. https://eric.ed.gov/?id=EJ992502

44.

*Har Carmel

Harel Ben Shahar

. (2017). Reshaping ability grouping through big data. Vanderbilt Journal of Entertainment & Technology Law, 20(1), 87. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2944743

45.

*Harel Ben Shahar

. (2017). Educational justice and big data. Theory and Research in Education, 15(3), 306–320. https://doi.org/10.1177/1477878517737155

46.

*Hartong

(2016). Between assessments, digital technologies and big data: The growing influence of “hidden” data mediators in education. European Educational Research Journal, 15(5), 523–536. https://doi.org/10.1177/1474904116648966

47.

*Heath

(2014). Contemporary privacy theory contributions to learning analytics. Journal of Learning Analytics, 1(1), 140–149. https://doi.org/10.18608/jla.2014.11.8

48.

Herschel

Miori

V. M.

(2017). Ethics and big data. Technology in Society, 49(1), 31–36. https://doi.org/10.1016/j.techsoc.2017.03.003

49.

*Hildebrandt

(2017). Learning as a machine: Crossovers between humans and machines. Journal of Learning Analytics, 4(1), 6–23. https://doi.org/10.18608/jla.2017.41.3

50.

*Hoel

Chen

(2016). Privacy-driven design of learning analytics applications: Exploring the design space of solutions for data sharing and interoperability. Journal of Learning Analytics, 3(1), 139–158. https://doi.org/10.18608/jla.2016.31.9

51.

*Hoel

Chen

(2019). Privacy engineering for learning analytics in a global market. International Journal of Information and Learning Technology, 36(4), 288–298. https://doi.org/10.1108/IJILT-02-2019-0025

52.

*Hope

(2018). Creep: The growing surveillance of students’ online activities. Education and Society, 36(1), 55–72. https://doi.org/10.7459/es/36.1.05

53.

*Howell

J. A.

Roberts

L. D.

Seaman

Gibson

D. C.

(2018). Are we on our way to becoming a “helicopter university?” Academics’ views on learning analytics. Technology, Knowledge & Learning, 23(1), 1–20. https://doi.org/10.1007/s10758-017-9329-9

54.

*Ifenthaler

Schumacher

(2016). Student perceptions of privacy principles for learning analytics. Educational Technology, Research and Development, 64(5), 923–938. https://doi.org/10.1007/s11423-016-9477-y

55.

*Johnson

J. A.

(2014). The ethics of big data in higher educatio n. International Review of Information Ethics, 18(1), 4–9. http://www.i-r-i-e.net/inhalt/021/IRIE-021-Johnson.pdf

56.

*Johnson

J. A.

(2017). Ethics and justice in learning analytics. New Directions for Higher Education, 2017(179), 77–87. https://doi.org/10.1002/he.20245

57.

*Jones

K. M. L.

(2019). Learning analytics and higher education: A proposed model for establishing informed consent mechanisms to promote student privacy and autonomy. International Journal of Educational Technology in Higher Education; 16(1), 1–22. https://doi.org/10.1186/s41239-019-0155-0

58.

*Jones

K. M. L.

McCoy

(2019). Reconsidering data in learning analytics: Opportunities for critical research using a documentation studies framework. Learning, Media and Technology, 44(1), 52–63. https://doi.org/10.1080/17439884.2018.1556216

59.

Jones

K. M. L.

Salo

(2018). Learning analytics and the academic library: Professional ethics commitments at a crossroads. College & Research Libraries, 79(3), Article 304. https://doi.org/10.5860/crl.79.3.304

60.

Kay

Korn

Oppenheim

(2012). Legal, risk and ethical aspects of analytics in higher education. CETIS, University of Bolton.

61.

*Khalil

Ebner

(2016). De-identification in learning analytics. Journal of Learning Analytics, 3(1), 129–138. https://doi.org/10.18608/jla.2016.31.8

62.

*Khalil

Prinsloo

Slade

(2018). User consent in MOOCs—Micro, meso, and macro perspectives. International Review of Research in Open and Distributed Learning, 19(5), Article 3908. https://doi.org/10.19173/irrodl.v19i5.3908

63.

Kitchin

(2014a). Big data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), Article 8481. https://doi.org/10.1177/2053951714528481

64.

Kitchin

(2014b). The data revolution: Big data, open data, data infrastructures and their consequences. Sage.

65.

Laney

(2001). 3D data management: Controlling data volume, velocity and variety. https://www.bibsonomy.org/bibtex/742811cb00b303261f79a98e9b80bf49

66.

*Lawson

Beer

Rossi

Moore

Fleming

(2016). Identification of “at risk” students using learning analytics: The ethical dilemmas of intervention strategies in a higher education institution. Educational Technology, Research and Development, 64(5), 957–968. https://doi.org/10.1007/s11423-016-9459-0

67.

Livingstone

Carr

Byrne

(2015). One in three: Internet governance and children’s rights. Global Commission on Internet Governance.

68.

*Lindh

Nolin

(2016). Information we collect: Surveillance and privacy in the implementation of Google Apps for education. European Educational Research Journal, 15(6), 644–663. https://doi.org/10.1177/1474904116654917

69.

Long

Siemens

(2011). Penetrating the fog: Analytics in learning and education. EDUCAUSE Review, 46(5), 30–40. https://er.educause.edu/articles/2011/9/penetrating-the-fog-analytics-in-learning-and-education

70.

*Lundie

(2016). Authority, autonomy and automation: The irreducibility of pedagogy to information transactions. Studies in Philosophy and Education, 35(3), 279–291. https://doi.org/10.1007/s11217-016-9517-4

71.

*Lupton

(2015). Data assemblages, sentient schools and digitised health and physical education (response to Gard). Sport, Education & Society, 20(1), 122–132. https://doi.org/10.1080/13573322.2014.962496

72.

*Lupton

Williamson

(2017). The datafied child: The dataveillance of children and implications for their rights. New Media & Society, 19(5), 780–794. https://doi.org/10.1177/1461444816686328

73.

*Lynch

C. F.

(2017). Who prophets from big data in education? New insights and new challenges. Theory and Research in Education, 15(3), 249–271. https://doi.org/10.1177/1477878517738448

74.

*MacCarthy

(2014). Student privacy: Harm and context. International Review of Information Ethics, 21(1), 11–24. https://ssrn.com/abstract=3093299

75.

MacFadyen

L. P.

(2017). Overcoming barriers to educational analytics: How systems thinking and pragmatism can help. Educational Technology, 57(1), 31–39.

76.

*MacGilchrist

(2019). Cruel optimism in edtech: When the digital data practices of educational technology providers inadvertently hinder educational equity. Learning, Media and Technology, 44(1), 77–86. https://doi.org/10.1080/17439884.2018.1556217

77.

Mai

J.-E.

(2016). Big data privacy: The datafication of personal information. Information Society, 32(3), 192–199. http://jenserikmai.info/Papers/2016_BigDataPrivacy.pdf

78.

*Manolev

Sullivan

Slee

(2019). The datafication of discipline: ClassDojo, surveillance and a performative classroom culture. Learning, Media and Technology, 44(1), 36–51. https://doi.org/10.1080/17439884.2018.1558237

79.

Markham

Buchanan

(2002). Ethical decision-making and Internet research. https://aoir.org/reports/ethics2.pdf

80.

Markham

Buchanan

(2012). Ethical decision-making and internet research recommendations from the AoIR (Version 2.0). AoIR Ethics Working Committee.

81.

Mayer-Schönberger

Cukier

(2013). Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt.

82.

Metcalf

Crawford

(2016). Where are human subjects in big data research? The emerging ethics divide. Big Data & Society, 3(1), Article 50211. https://doi.org/10.1177/2053951716650211

83.

Moore

Ellsworth

(2014). Ethics of educational technology. In Spector

Merrill

M. D.

Elen

Bishop

M. J.

(Eds.), Handbook of research on educational communications and technology (4th ed., pp. 113–127). Springer.

84.

Moss

Metcalf

(2020, April 29). Too big a word? What does it mean to do ethics in the technology industry? We found four overlapping meanings. Data & Society: Points. https://points.datasociety.net/too-big-a-word-13e66e62a5bf

85.

*Nunn

Avella

J. T.

Kanai

Kebritchi

(2016). Learning analytics methods, benefits, and challenges in higher education: A systematic literature review. Online Learning, 20(2). https://doi.org/10.24059/olj.v20i2.790

86.

Oboler

Welsh

Cruz

(2012). The danger of big data: Social media as computational social science. First Monday, 17(7), Article 3269. https://firstmonday.org/article/view/3993/3269/

87.

*Pardo

Siemens

(2014). Ethical and privacy principles for learning analytics. British Journal of Educational Technology, 45(3), 438–450. https://doi.org/10.1111/bjet.12152

88.

Pardos

Z. A.

(2017). Big data in education and the models that love them. Current Opinion in Behavioral Sciences, 18(December), 107–113. https://doi.org/10.1016/j.cobeha.2017.11.006

89.

*Perrotta

(2013). Assessment, technology and democratic education in the age of data. Learning, Media and Technology, 38(1), 116–122. https://doi.org/10.1080/17439884.2013.752384

90.

*Perrotta

Williamson

(2018). The social life of learning analytics: Cluster analysis and the “performance” of algorithmic education. Learning, Media and Technology, 43(1), 3–16. https://doi.org/10.1080/17439884.2016.1182927

91.

Picciano

A. G.

(2012). The evolution of big data and learning analytics in American higher education. Journal of Asynchronous Learning Networks, 16(1), 9–20. https://doi.org/10.24059/olj.v16i3.267

92.

*Polonetsky

Tene

(2014). The ethics of student privacy: Building trust for ed tech. International Review of Information Ethics, 25. https://fpf.org/wp-content/uploads/2016/06/Ethics-of-Student-Privacy_Polonetsky-Tene.pdf

93.

*Prinsloo

(2017). Fleeing from Frankenstein’s monster and meeting Kafka on the way: Algorithmic decision-making in higher education. E-Learning and Digital Media, 14(3), 138–163. https://doi.org/10.1177/2042753017731355

94.

Prinsloo

Slade

(2013, April). An evaluation of policy frameworks for addressing ethical considerations in learning analytics. In Proceedings of the Third International Conference on Learning Analytics and Knowledge (pp. 240–244). ACM. https://doi.org/10.1145/2460296.2460344

95.

*Prinsloo

Slade

(2016). Student vulnerability, agency, and learning analytics: An exploration. Journal of Learning Analytics, 3(1), 159–182. https://doi.org/10.18608/jla.2016.31.10

96.

*Ratner

Andersen

B. L.

Madsen

S. R.

(2019). Configuring the teacher as data user: Public-private sector mediations of national test data. Learning, Media and Technology, 44(1), 22–35. https://doi.org/10.1080/17439884.2018.1556218

97.

Reamer

F. G.

(1993). Ethical dilemmas in social services: A guide for social workers (3rd ed.). Columbia University Press.

98.

*Regan

P. M.

Jesse

(2018). Ethical challenges of edtech, big data and personalized learning: Twenty-first century student sorting and tracking. Ethics and Information Technology, 21(3), 167–179. https://doi.org/10.1007/s10676-018-9492-2

99.

Reidenberg

J. R.

(2000). Resolving conflicting international data privacy rules in cyberspace. Stanford Law Review, 52(5), 1315–1376. https://doi.org/10.2307/1229516

100.

*Reidenberg

J. R.

Schaub

(2018). Achieving big data privacy in education. Theory and Research in Education, 16(3), 263–279. https://doi.org/10.1177/1477878518805308

101.

Reyes

J. A.

(2015). The skinny on big data in education: Learning analytics simplified. Techtrends, 59(2), 75–80. https://doi.org/10.1007/s11528-015-0842-1

102.

*Rios-Aguilar

(2015). Using big (and critical) data to unmask inequities in community colleges. New Directions for Institutional Research, 2014(163), 43–57. https://doi.org/10.1002/ir.20085

103.

*Roberts

L. D.

Howell

J. A.

Seaman

Gibson

D. C.

(2016). Student attitudes toward learning analytics in higher education: “The fitbit version of the learning world.” Frontiers in Psychology, 7, Article 959. https://doi.org/10.3389/fpsyg.2016.01959

104.

Roberts-Holmes

(2015). The “datafication” of early years pedagogy: “If the teaching is good, the data should be good and if there’s bad teaching, there is bad data.” Journal of Education Policy, 30(3), 302–315. https://doi.org/10.1080/02680939.2014.924561

105.

Roberts-Holmes

Bradbury

(2016). Governance, accountability and the datafication of early years education in England. British Educational Research Journal, 42(4), 600–613. https://doi.org/10.1002/berj.3221

106.

*Rodríguez-Triana

M. J.

Martínez-Monés

Villagrá-Sobrino

(2016). Learning analytics in small-scale teacher-led innovations: Ethical and data privacy issues. Journal of Learning Analytics, 3(1), 43–65. https://doi.org/10.18608/jla.2016.31.4

107.

Romero

Ventura

(2010). Educational data mining: A review of the state of the art. IEE Transactions on Systems, Man, and Cybernetics, Part C: Application and Reviews, 40(6), 601–618. https://doi.org/10.1109/TSMCC.2010.2053532

108.

*Rubel

Jones

K. M. L.

(2016). Student privacy in learning analytics: An information ethics perspective. Information Society, 32(2), 143–159. https://doi.org/10.1080/01972243.2016.1130502

109.

*Russell

N. C.

Reidenberg

J. R.

Martin

Norton

(2018). Transparency and the marketplace for student data. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3191436

110.

Saldaña

(2015). The coding manual for qualitative researchers. Sage.

111.

*Scholes

(2016). The ethics of using learning analytics to categorize students on risk. Educational Technology Research and Development, 64(5), 939–955. https://doi.org/10.1007/s11423-016-9458-1

112.

*Schouten

(2017). On meeting students where they are: Teacher judgment and the use of data in higher education. Theory and Research in Education, 15(3), 321–338. https://doi.org/10.1177/1477878517734452

113.

*Sclater

(2016). Developing a code of practice for learning analytics. Journal of Learning Analytics, 3(1), 16–42. https://doi.org/10.18608/jla.2016.31.3

114.

Selwyn

(2012). Ten suggestions for improving academic research in education and technology. Learning, Media and Technology, 37(3), 213–219. https://doi.org/10.1080/17439884.2012.680213

115.

*Selwyn

(2015). Data entry: Towards the critical study of digital data and education. Learning, Media and Technology, 40(1), 64–82. https://doi.org/10.1080/17439884.2014.921628

116.

Siemens

Baker

(2012). Learning analytics and educational data mining: Towards communication and collaboration. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (pp 252–254). Association for Computing Machinery.

117.

Shilton

(2018). Values and ethics in human-computer interaction. Foundations and Trends® in Human–Computer Interaction, 12(2), 107–171. http://dx.doi.org/10.1561/1100000073

118.

*Slade

Prinsloo

(2013). Learning analytics: Ethical issues and dilemmas. American Behavioral Scientist, 57(10), 1510–1529. https://doi.org/10.1177/0002764213479366

119.

*Slade

Prinsloo

(2015). Student perspectives on the use of their data: Between intrusion, surveillance and care. European Journal of Open, Distance and E-Learning, 18(1), 291–300. http://oro.open.ac.uk/41229/

120.

*Souto-Otero

Beneito-Montagu

(2016). From governing through data to governmentality through data: Artefacts, strategies and the digital turn. European Educational Research Journal, 15(1), 14–33. https://doi.org/10.1177/1474904115617768

121.

*Stahl

W. M.

Karger

(2016). Student data privacy, digital learning, and special education: Challenges at the intersection of policy and practice. Journal of Special Education Leadership, 29(2), 79–88.

122.

*Steiner

C. M.

Kickmeier-Rust

M. D.

Albert

(2016). LEA in private: A privacy and data protection framework for a learning analytics toolbox. Journal of Learning Analytics, 3(1), 66–90. https://doi.org/10.18608/jla.2016.31.5

123.

Tess

P. A.

(2013). The role of social media in higher education classes (real and virtual): A literature review. Computers in Human Behaviour, 29(5), A60–A68. https://doi.org/10.1016/j.chb.2012.12.032

124.

Tieken

M. C.

Auldridge-Reveles

T. R.

(2019). Rethinking the school closure research: School closure as spatial injustice. Review of Educational Research, 89(6), 917–953. https://doi.org/10.3102/0034654319877151

125.

Tsai

Y. S.

Perrotta

Gašević

(2019). Empowering learners with personalised learning approaches? Agency, equity and transparency in the context of learning analytics. Assessment & Evaluation in Higher Education, 45(4), 554–567. https://doi.org/10.1080/02602938.2019.1676396

126.

*Veletsianos

Reich

Pasquini

L. A.

(2016). The life between big data log events: Learners’ strategies to overcome challenges in MOOCs. AERA Open, 2(3), Article 7002. https://doi.org/10.1177/2332858416657002

127.

*Wang

(2016). Big opportunities and big concerns of big data in education. TechTrends, 60(4), 381–384. https://doi.org/10.1007/s11528-016-0072-1

128.

*Watson

Wilson

Drew

Thompson

T. L.

(2017). Small data, online learning and assessment practices in higher education: a case study of failure? Assessment & Evaluation in Higher Education, 42(7), 1030–1045. https://doi.org/10.1080/02602938.2016.1223834

129.

*West

Huijser

Heath

(2016). Putting an ethical lens on learning analytics. Educational Technology Research and Development, 64(5), 903–922. https://doi.org/10.1007/s11423-016-9464-3

130.

*Williamson

(2015a). Governing methods: Policy innovation labs, design and data science in the digital governance of education. Journal of Educational Administration and History, 47(3), 251–271. https://doi.org/10.1080/00220620.2015.1038693

131.

*Williamson

(2015b). Governing software: Networks, databases and algorithmic power in the digital governance of public education. Learning, Media and Technology, 40(1), 83–105. https://doi.org/10.1080/17439884.2014.924527

132.

*Williamson

(2016a). Digital education governance: Data visualization, predictive analytics, and “real-time” policy instruments. Journal of Education Policy, 31(2), 123–141. https://doi.org/10.1080/02680939.2015.1035758

133.

*Williamson

(2016b). Digital methodologies of education governance: Pearson plc and the remediation of methods. European Educational Research Journal, 15(1), 34–53. https://doi.org/10.1177/1474904115612485

134.

*Williamson

(2017a). Who owns educational theory? Big data, algorithms and the expert power of education data science. E-Learning and Digital Media, 14(3), 105–122. https://doi.org/10.1177/2042753017731238

135.

Williamson

(2017b). Big data in education: The digital future of learning, policy and practice. Sage.

136.

Williamson

Eynon

Potter

(2020). Pandemic politics, pedagogies and practices: Digital technologies and distance education during the coronavirus emergency. Learning, Media and Technology, 45(2), 107–114. https://doi.org/10.1080/17439884.2020.1761641

137.

*Williamson

Piattoeva

(2019). Objectivity as standardization in data-scientific education policy, technology and governance. Learning, Media and Technology, 44(1), 64–76. https://doi.org/10.1080/17439884.2018.1556215

138.

Willis

J. E.

III . (2014, August 25). Learning analytics and ethics: A framework beyond utilitarianism. EDUCAUSE Review. https://er.educause.edu/articles/2014/8/learning-analytics-and-ethics-a-framework-beyond-utilitarianism

139.

*Willis

J. E.

III Strunk

V. A.

(2017). The ethics of machine-based learning: Advancing without losing humanity. International Journal of Sociotechnology and Knowledge Development, 9(1), 53–66. https://doi.org/10.4018/IJSKD.2017010104

140.

*Willis

J. E.

Slade

Prinsloo

(2016). Ethical oversight of student data in learning analytics: A typology derived from a cross-continental, cross-institutional perspective. Educational Technology, Research and Development, 64(5), 881–901. https://doi.org/10.1007/s11423-016-9463-4

141.

*Wintrup

(2017). Higher education’s panopticon? Learning analytics, ethics and student engagement. Higher Education Policy, 30(1), 87–103. https://doi.org/10.1057/s41307-016-0030-8

142.

Zeide

(2017). The structural consequences of big data-driven education. Big Data, 5(2), 164–172.

143.

*Zeide

Nissenbaum

(2018). Learner privacy in MOOCs and virtual education. Theory and Research in Education, 16(3), 280–307. https://doi.org/10.1177/1477878518815340

144.

Zuboff

(2019). The age of surveillance capitalism: The fight for a human future at the new frontier of power. Profile Books.