Abstract
Data-Driven and Artificial Intelligence technologies are rapidly changing the way that health research is conducted, including offering new opportunities. This will inevitably have adverse environmental impacts. These include carbon dioxide emissions linked to the energy required to generate and process large amounts of data; the impact on the material environment (in the form of data centres); the unsustainable extraction of minerals for technological components; and e-waste (discarded electronic appliances) disposal. The growth of Data-Driven and Artificial Intelligence technologies means there is now a compelling need to consider these environmental impacts and develop means to mitigate them. Here, we offer a scoping review of how the environmental impacts of data storage and processing during Data-Driven and Artificial Intelligence health-related research are being discussed in the academic literature. Using the UK as a case study, we also offer a review of policies and initiatives that consider the environmental impacts of data storage and processing during Data-Driven and Artificial Intelligence health-related research in the UK. Our findings suggest little engagement with these issues to date. We discuss the implications of this and suggest ways that the Data-Driven and Artificial Intelligence health research sector needs to move to become more environmentally sustainable.
Introduction
Data-Driven and Artificial Intelligence (DDAI) technologies are rapidly changing the way that health research is conducted. 1 New technological capabilities to store and process vast quantities of clinical data, and new collections of health data from non-traditional sources, 2 have led to new opportunities for large-scale data analytics. These in turn have led to the explosion of health data repositories, containing troves of clinical and genomic data, 1 as well as swaths of self-tracking data from wearables, biosensors and/or environmental data.3,4 Meanwhile, social media and other data are being mined for health research, 5 and machine learning techniques are being used to help predict health conditions. 6 The storage and processing of health-related data are set to become the fastest growing sector in the datasphere. 7
Expansion of DDAI technologies inevitably results in adverse environmental impacts. These include heavy carbon dioxide emissions linked to the energy required to generate and process large amounts of data. Approximately 100 megatonnes of carbon dioxide emissions are produced from the digital sector per year,
8
and the yearly electricity usage of data centres is over 205 TetraWatts per hour,
9
which already exceeds the consumption of countries such as Ireland and Denmark (see, e.g.
10
) Furthermore, this consumption fails to account for indirect electricity usage (and likely carbon emissions) associated with data centre supply chains.
11
DDAI technologies also have adverse impacts on the material environment (e.g.
While likely improvements in energy efficiency and the move to renewable energy will no doubt relieve at least some of these concerns,
15
the pace of data-driven innovation raises concerns that information and communication technologies could outpace the world's renewable energy sources, leading to increases in carbon emissions when other sectors are decreasing their energy use.11,16 Furthermore, data-driven solutions have rebound effects, meaning that while digital solutions in the near term may appear to offer environmental advantages, in the long run, this may not be the case. For example, the move to centralise health research data, biobanking data, and/or the move to open science will not
There is a compelling need to consider the environmental impacts of DDAI technologies. 17 This includes, for example, who is, and who should be responsible for these impacts and how these responsibilities should be enacted and distributed. Given that concern about environmental sustainability in the health research sector is relatively recent, 2 this paper provides a scoping review to explore how the academic literature is engaging with these issues. Our research question was: what discussions pertaining to the environmental impacts of data storage and processing during DDAI health-related research have been discussed in the academic literature? Using the UK as a case study, we also asked: what policies or initiatives consider the environmental impacts of data storage and processing during DDAI health-related research? Our findings suggest very little engagement with these issues to date.
Methods
Literature searches
In October 2021, Web of Science, PubMed and Google Scholar were searched for relevant articles using the keywords in Table 1. Inclusion criteria included documents (book chapters, preprints, conference proceedings, articles, etc.) that focussed on the negative environmental impacts of data storage and processing during DDAI health-related research and care. Health research and care are becoming increasingly interlinked through learning health care systems (see, e.g. the UK's ‘100k Genomics Project’, and ‘Our Future Health’), and widening the search to health care ensured all pertinent articles were included. Relevant exclusion criteria were applied (Table 3). Reference lists of included articles were checked for further relevant documents/authors via snowballing. Following removal of duplicates, 25 documents remained. Six documents – all identified during snowballing – could not be accessed (either because they could not be found in web searches or because they required a fee to access), and so were excluded. The 19 remaining documents were deductively coded into the following categories: title, date published, type of publication (book chapter, journal. etc.), place of publication (journal name, etc.), first author name and country of institution of the first author. The documents were read thoroughly, and additional inductive codes were added, including the type of environmental impact addressed/mentioned; whether the document was framed in terms of green information technology (IT); use of the term ‘sustainability’ as an overarching concept for discussion; and mention of unaddressed challenges and perceived solutions. Documents were also qualitatively reviewed (read in depth) for relevant concepts.
Search strategy for literature review, including database searched, keystrings used and number of relevant articles retrieved.
Exclusion criteria for literature searches.
DDAI, Data-Driven Artificial Intelligence.
Web searches
In October 2021, Google was searched using the keywords in Table 2 to identify relevant initiatives and/or policies pertaining to the research question: what policies or initiatives consider the environmental impacts of data storage and processing during DDAI health-related research specifically in the UK? Inclusion criteria included UK web pages associated either with sustainability and the health sector (as above, health care was also included to ensure all relevant initiatives were identified); with research laboratory sustainability more generally; or with sustainability, health sector and DDAI technologies specifically. For each keyword search, all returned pages were checked for at least the first five pages. If at least one relevant link was identified on page 5, the search was continued until two consecutive pages returned no relevant links (the maximum number of pages reached was 10 pages). Exclusion criteria included published journal articles or links to scholars whose work had been identified in the literature review, and multiple links from the same organisations. For each retrieved link (n = 104), information on the weblink's institutional origins, and a short description of the weblink (purpose of the web page; blog, pdf document, policy statement and other information) were collected. All retrieved links were checked for content pertaining to the unintended environmental impacts of data storage and processing for DDAI-associated health research or care.
List of key strings used to search Google search engine
Limitations
While literature and Google searches were kept broad to ensure all relevant articles and UK initiatives/policies were identified, key articles and policies may have been missed. This is because practices pertaining to reducing the environmental impacts of data storage and processing may not be written into policies or initiatives – for example, if they are being considered tacitly and from the ground up. Furthermore, DDAI technology-associated environmental impacts may be being considered as part of broader initiatives that were not detected in our searches. For example, a recent UK Medicines and Healthcare products Regulatory Agency (MHRA) consultation document comprised a section on the environmental sustainability of medical devices that was not detected in our searches. Arguably this was not specifically related to DDAI technologies, but the section highlighted how the health technology sector is beginning to engage with these issues. At the same time as noting these limitations, we emphasise that the aim of a scoping review is to offer insight and summary of an emerging body of scholarship and identify gaps, and to this end, the methodology used is appropriate.
Findings
Literature analysis
All analysed documents (n = 19) were published between 2010 and 2021 and included peer-reviewed articles and preprints (n = 7), commentary pieces (n = 3), conference presentations/proceedings (n = 6) and book chapters (n = 3). All conference presentations/proceedings and four articles/commentaries were written for an IT sector audience (n = 10 out of 13, as defined by the journal article or discipline of conference). First, the authors were affiliated with institutions in India (n = 5), the UK (n = 2), Spain (n = 2), South Africa (n = 2) and the United States (n = 2), as well as with institutions in Canada, France, Pakistan, Botswana, Greece and West Africa (n = 1 each). Half of the documents (n = 9) were written by three research groups. Nearly all 18 documents (n = 15) placed their focus on addressing or raising awareness about the environmental impacts of e-health, healthcare and/or hospitals rather than health-related research (n = 3) or health apps (n = 2). 3
Types of environmental impact mentioned
All documents highlighted the importance of reducing energy use and/or carbon emissions to address the environmental impact of digital technologies in the health sector. 4 Nearly all documents (n = 14) also stressed the need to attend to the environmental impacts associated with the technology's component materials in terms of used mineral resources and e-waste. Scott and colleagues (2012) categorised these environmental impacts collectively into ‘upstream impacts’ (extraction, processing, or synthesis of raw materials, the manufacture of components and the packaging and distribution of these components), ‘mid-stream impacts’ (design, implementation and use) and ‘down-stream impacts’ (‘end-of-life’ aspects of disposal or recycling). 18
Promoting technological solutions
Most documents were technical in nature, meaning that documents focussed on developing
Beyond technical solutions
Few articles considered the
In fact,
Sustainability as a normative framework for action
Half of the authors (n = 10) explicitly used the concept of
Policies and initiatives on the environmental sustainability of DDAI health research or care
Alongside the academic literature scoping review, web searches were conducted to identify relevant policies or initiatives associated with the unintended environmental impacts of DDAI technologies in the UK health research or care sector. Retrieved web pages focusing on laboratory research sustainability (n = 34) or health sector sustainability initiatives more generally (n = 70) were checked for such policies or initiatives. In the former, many institutions and groups have developed resources to assist researchers’ efforts to maintain environmentally sustainable laboratory practices. For example, decreasing the environmental impacts of fume hoods and freezers, reducing water consumption, as well as reducing emissions and waste practices (refuse, recycle, repurpose, reuse and reduce). In these resources, the energy consumption of computers is mentioned, though this is limited to statements concerning the need to turn off digital devices regularly. We were unable to find statements addressing the environmental concerns specifically associated with the unintended environmental consequences associated with DDAI technologies specifically.
In health sector sustainability initiatives, high-level sustainability principles were often presented, with little information about specific practices, policies or initiatives. Because of this, it was sometimes difficult to ascertain how much DDAI-associated environmental impacts were included in these strategies, or whether specific initiatives or policies existed. Several documents found on the searched web pages considered issues associated with the adverse environmental impacts of digital technologies more generally and focused on better assessments of energy use, as well as various practices that could decrease energy consumption. Similar to the sustainable research setting described above, this usually amounted to calls to switch off computers when not in use, or a better estimation of the energy expenditure from digital (and other) technologies. Very few considered the
Discussion
While the adverse environmental impacts of DDAI-associated technologies are well established, our analysis shows that to date there has been relatively little engagement with them in the academic literature specifically focusing on health, and even less in health research. This might be because most efforts to address the environmental sustainability of DDAI technologies are seen to be an issue for the computing sector rather than as a concern for those who use computational and IT technologies. 31 It might also be because of current difficulties with calculating the exact environmental impacts of DDAI technologies. For example, there is disagreement and uncertainty in the digital sector about specific carbon emissions associated with different digital technologies, and many of the assessments include varying criteria (e.g. some include embodied carbon or indirect effects, most do not 11 ).32,33 Furthermore, private corporations are often unwilling to release data necessary for these assessments, requiring the need to develop – at best – rough estimations of such environmental effects. Finally, specific upstream impacts associated with embodied carbon, biodiversity, mining and manufacturing are particularly problematic to calculate because the health research sector only accounts for a small fraction of global use of digital products. Therefore, while steps are being made to assess various environmental impacts of health-related computational research, 10 12 much uncertainty remains about how to do so effectively.
At the same time, although environmental impacts are difficult to measure or compare, inaction is not a solution. We need to ‘
Problematising ever-increasing data collection and analysis practices requires being attentive to not only the associated potential benefits of these practices, but also reflecting on the economic and political drivers of this ‘datafication’ 38 culture (what are the assumptions that drive us to collect and analyse this data, and where do these assumptions come from), as well as any potential harms that may come from such a culture (including addressing the commonly held misperception that digital technologies do not have an unintended environmental impact). Underlying the ‘more data is better’ culture is the underlying assumption that individuals can be completely knowable if enough data is collected and analysed. 39 In the health research field, this is seen to lie with more accurate and detailed information that can help improve the health of patients and/or individuals (c.f..4,35,40) At the same time, many of these assumptions are tied to economic and political interests: data collection is driven by a cycle of capital accumulation that comes from the expected economic value attributed to the data. 39 One notable example is genomics, in which the value of health-related data goes beyond health benefits in that genomic technologies attract investments as a way to stimulate the economy towards economic growth. 41 14 Coupling genomics and economic growth, while providing much-needed investment in the field, is problematic because it portrays health conditions in technological solutionist frames, amplifies genetic determinism discourses in inappropriate ways, promotes a datafication discourse (if only we had more data our health issues could be addressed; as such store now and if we can think of a use, analyse later), and fuels sociotechnical imaginaries that minimise the complexities of genomic data interpretation.38,42–44 While genomics research is vital for a range of specific genetic conditions, coupling genomics with datafication cultures and economic growth also promotes the continued and ever-increasing use of this technology in potentially environmentally unsustainable ways. Datafication in genomics (and health more broadly) also creates a path dependency for digital infrastructures, as well as the epistemic authority that comes from the elevation of certain logics, techniques and imaginaries associated with digital technologies. A sunk cost mentality, that is, the tendency to continue a process after investment and effort, then sees this infrastructure repeatedly built upon.
This does not mean that we should cease collecting and analysing data for health (including genomic) research – especially when analysis could lead to obvious health benefits. Rather, a nuanced approach is required. There is a need to acknowledge that while many health researchers are trying to help the diagnosis, prognosis and/or care of patients and/or individuals with health-related illnesses, their research is embedded within a sociopolitical climate that unquestioningly views datafication as intrinsically good, with little reflection on what this assumption means in practice. Reflecting on this requires those conducting health research to carefully consider the rationale for data collected (who are we helping, but also who (individuals) and what (the environment) are we
While as stated earlier, it is difficult to understand which choices will have the largest impact on the environment, there are steps that can be taken. At an institutional level, green IT can drive improvements in energy consumption at data centres, e-waste recycling and computer product design and labelling. 46 At an individual level, several hyperscalers, including Google and AWS (Amazon), now provide carbon emission data to track carbon footprints. 17 , 18 Other behaviours, such as turning off computers when not in use, can also have an impact on energy use, and various guidelines have been developed to help individuals and institutions align themselves with eco-friendly computing. For example, a University of Cambridge factsheet 19 notes that leaving a computer on overnight for a year creates enough carbon dioxide to fill a double-decker bus, and while such facts are likely based on contested assumptions and predictions (ref me), they can still be useful for building awareness and driving change. Other helpful guidelines include https://www.greenimpact.org.uk/GIforHealth and LEAF, which is a standard for sustainable laboratory operations.
Lannelongue and colleagues (2021) have proposed a series of 10 rules for health researchers to make computing more environmentally sustainable. 47 These include calculating the carbon footprint of their research and including this in a cost–benefit analysis; choosing a computing facility and any associated hardware carefully; increasing the efficiency of the code used to analyse the data; and being a frugal analyst during this process (pilot energy-hungry algorithms on smaller data sets first). They also remind researchers to be aware of rebound effects, that is, such as that increases in IT efficiency will lead to increases in demand for data storage and analyses, not a reduction.48,49 Taking these rules on board takes seriously the concept of ‘digital temperance’ 26 – it forces us to slow down and think carefully about the specific data we need to collect for our research (rather than collecting as much as possible), and give due consideration to how this data is stored and analysed (c.f. an ‘ethic of care’.50,51)
Finally, due consideration should be given to more environmentally sustainable ways of achieving health benefits. This is, of course, not always possible, as we have witnessed during the COVID-19 pandemic, during which decision-making about the most appropriate health interventions has benefited from the collection and analysis of large swaths of data. At the same time, we already know that the biggest improvements to health can be made by addressing its social, economic and political determinants – improving people's lives and well-being (education, economic livelihoods, work-associated stress, etc.), re-balancing economic and social inequality, and improving the quality of our living environments (air and water pollution).
Concluding, health-related DDAI research has tremendous potential to bring health benefits to many, but DDAI technologies also have adverse environmental impacts that must now be addressed. 52 The health sector has engaged little with the issues that emerge from these impacts. As more and more data are collected for health research – and as more and more societal issues fall under the umbrella of being a health concern and therefore as having an intrinsic value that requires attention (‘healthitisation’ 16 c.f. medicalisation), it is crucial that we explore how to negotiate these challenges. Making explicit decisions about how we do so is the first step to developing shared understandings across the entire research ecosystem about the need to consider these issues within our own research practices. Such work can become a part of the wider sustainability agendas of the UK health sector and/or research institutions.
Footnotes
Acknowledgements
There are no acknowledgements for this article.
Contributorship
GS conducted the review of the literature, web searches and wrote the first draft of the manuscript. AL reviewed and edited several follow-up drafts and approved the final version of the manuscript.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
Not applicable, because this article does not contain any studies with human or animal subjects.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Wellcome (grant number 222180/Z/20/Z).
Guarantor
Dr Gabrielle Samuel.
Informed consent
Not applicable, because this article does not contain any studies with human or animal subjects.
Trial registration
Not applicable, because this article does not contain any clinical trials.
