Digital phenotyping and data inheritance

Abstract

Proponents of precision medicine envision that digital phenotyping can enable more individualized strategies to manage current and future health conditions. We problematize the interpretation of digital phenotypes as straightforward representations of individuals through examples of what we call data inheritance. Rather than being a digital copy of a presumed original, digital phenotypes are shaped by larger data collectives that precede and continuously change how the individual is represented. We contend that looking beyond the individual is crucial for understanding the factors that can ‘bend’ digital mirrors in specific directions. Since algorithms used for digital profiling are based on historical data, their predictions often inherit and increase the values and perspectives of past data practices. Moreover, the data legacies we leave behind today may return as so-called ‘data phantoms’ that conflict with the interests of the individual and contest who and what the ‘original’ is.

Keywords

Digital phenotyping precision medicine data inheritance data doubles personalized health care data phantoms

This article is a part of special theme on Digital Phenotyping. To see a full list of all articles in this special theme, please click here: https://journals.sagepub.com/page/bds/collections/digitalphenotyping

Introduction

Digitalization of health care and self-tracking technologies has provided new opportunities for the realization of precision medicine. The notion of the ‘digital phenotype’ was introduced by Jain et al. (2015) as a parallel to the notion of the extended phenotype in biology. Dawkins (1982) used this concept to emphasize that adaptive capacities of genomes are not only expressed through physiological traits, but also through the organism's modifications of its environment to improve its chances of survival (by building caves, dams, mounds, etc.). Similarly, health data from various sources are envisioned to provide a digital expression of the individual's health status – information that can be actively used to modify and control disease risk. Other terms used in the literature are data doubles, data twins and data doppelgängers (Boschert and Rosen, 2016; Haggerty and Ericson, 2000). While these terms usefully capture how data clouds can provide information about – and sometimes substitute – a physical person, we address some shortcomings of the idea of digital phenotypes as straightforward representations of the individual.

Lupton (2014) describes digital self-tracking as a cyclical move of co-construction and reconfiguration of the individual. Digital mirrors, such as optical ones, offer a medium through which we not only discover but also change ourselves. The performativity of self-observation through digital representation is further complicated by the multiplicity of data doubles, or what she calls mirror mazes. Vegter et al. (2021) take the mirror metaphor one step further in describing how personalized health technologies create digital funhouse mirrors. Rather than providing a singular and well-integrated reflection of the individual's health status, digital mirrors present fragmented pieces of information that require interpretative skills to become meaningful. The funhouse mirror nicely captures how digital phenotypes can be skewed, confusing, and often give rise to conflicting pictures of the individual.

We here introduce the notion of data inheritance as a conceptual tool to explore factors that shape digital mirrors, introduce blind angles and make the mirror image unrecognizable for the bodily subject. We have chosen this concept to underscore the relational characteristics of digital phenotypes, as well as the temporal dimension of these relations. Data inheritance highlights how traces of previous generations of data donors hide in the traits of digital mirror reflections. Thus, more is reflected in a digital mirror than the characteristics of the presumed original. In the following, we illustrate how algorithmic profiling is shaped by data sources that precede the individual, as well as how the data traces we leave today take part in the shaping of our futures.

Data inheritance

Proponents of precision medicine often highlight the potentials of so-called N-of-1 studies, where each person is considered ‘his or her own control’ (Price et al., 2017; Schork, 2015). However, interpretation of test results and data traces is fundamentally dependent on sampling and comparison of data on larger populations according to predefined statistical norms (Hoeyer, 2019). Health information is therefore inherently comparative and relational, and digital mirrors can be bent or stretched in specific directions, depending on the choice of reference classes. Reliance on comparative data is a general epistemic condition for all types of digital phenotyping, as illustrated in the following examples of genetic risk profiling and other forms of data-intensive classification.

Like magic mirrors in fairy tales, genomic risk profiling allows us to discover ourselves – and our future selves – in new ways. Yet, the fortune-telling capacities of many genetic markers remain controversial and uncertain (Senn, 2016). A speaker at an international conference on personalized medicine in Copenhagen recently explained how he had his genome sequenced five times, and that each time the ‘genetic mirror’ showed him a different picture of his future health status. Different companies use different sequencing platforms that look for different markers, depending on which variants stand out in the preceding comparative studies (Ng et al., 2009). Risk variants are identified on the basis of a statistical comparison of data populations, for example, with and without a specific disease. The ‘individual’ risk scores are thus based on the data inheritance from thousands of other individuals, as well as on the procedures for sampling and comparing data populations in relation to predefined clinical criteria (see also Vegter et al., 2021: 5). Moreover, the status of genetic risk variants often changes over time, as more genomes are sequenced, and new knowledge is gained on the clinical relevance or irrelevance of specific markers (Timmermans et al., 2016). The funhouse mirror maze of genetic variants is dynamic and can be difficult to navigate, even for health professionals.

The insight that knowledge is partial, situated and specific also applies to data (Haraway, 1997), and the opportunity for objective or ‘true’ data phenotypes can be questioned. Still, attention to data inheritance can also help understand why some digital mirrors are more skewed than others. Algorithmic classification is not only dependent on the availability of high-quality data on the individual, but also on the variation in the data set that precedes the current classification. An illustrative example is how voice recognition software works better for men than women, because most recordings in training databases are male voices (Perez, 2019). Similarly, there is a substantial risk of reifying structural inequalities in health care, if medical diagnostics and profiling are based on datasets from medical records that are structurally unbalanced (Birk and Samuel, 2020; Pot et al., 2021). Since most genetic studies and clinical tests have been done exclusively on Whites, this can affect the test accuracy – and thus the effectiveness of disease treatment and prevention – for underrepresented groups (Gianfrancesco et al., 2018; Huey et al., 2019).

Structural biases can occur even when algorithms are designed to avoid partiality in outcomes. An algorithm used for disease-risk screening of millions of American patients was recently criticized for being racially biased (Obermeyer et al., 2019). The purpose of the algorithm was to identify patients who would benefit from inclusion in a disease preventive program, focusing on cardiac disease and type 2 diabetes. The algorithm was trained on historical health data including test results, diagnoses, hospitalizations, emergency room visits and health care expenses, and the developers deliberately left out information on the ethnicity of patients. Nevertheless, Obermeyer et al. (2019) found that Black patients need to have more health problems to get the same risk score as White patients (a difference estimated to be 26%). Thus, Black patients with the same health care needs will comparatively be offered fewer seats in the preventive program. The problem occurred because the algorithm relied on health care expenses (insurance claims) in existing data records as a proxy for health care needs. However, since health insurance coverage is unevenly distributed across different ethnic populations, the digital mirror showed a distorted picture of the disease risk of Black and White patients with comparable health status. The prediction, in this case, reflects the inheritance of structural biases in the historical data used to train the algorithm.

Attention to the importance of data inheritance can suggest better uses of health data. Obermeyer et al. (2019) report that training the neural network to weigh actual health conditions higher than past insurance claims can reduce the structural bias by 80%. Similarly, scholars have called for strategies to improve the inclusion of underrepresented groups in precision medicine programs (Sabatello and Appelbaum, 2017). This exemplifies how weighing methods and selective sampling, which may be interpreted as the introduction of theoretical biases, can be productively used to counteract existing inequities (Pot et al., 2021). It is increasingly acknowledged that ‘data-driven’ solutions do not necessarily present a more objective way of estimating health care needs, as they are themselves based on significant epistemic assumptions about what data represent.

The view of digital phenotyping as straightforward representations of individuals is not only naïve, but also problematic, as it ignores how the features of existing data sources impact future predictions. Factors that bend the digital mirror in specific directions only become visible once we look beyond the individual and digital mirrors always introduce some blind spots.

The legacy of data phantoms

Another problem with the simple representational view of digital phenotyping is the implicit assumption that digital phenotyping practices are well aligned with the interests of the ‘original’ person. Yet, data legacies can travel to new places and proliferate beyond the control of the individual. Ebeling (2016) uses the notion of data phantom in her auto-ethnographic analysis of how health data can produce new digital beings. In her case, data traces in a fertility clinic gave birth to a digital ‘marketing baby’. While grieving after her second miscarriage, she was surprised to receive maternity-focused magazines and calls from companies that sell baby products. This made her suspicious of where her personal data had travelled and led her to track the data journeys, from medical health records and credit transfers to data brokers that repackage and sell information for commercial reuse. In her case, a reporting error had mistakenly registered the birth of a baby. Although the baby was never born in flesh and blood, it lived on as a digital commodity for targeted advertising. Ebeling's personal and traumatizing story highlights how digital phenotyping serves purposes far beyond those of the individual.

Data phantoms can also take the shape of frozen mirror images that no longer provide an adequate picture of the individual. For illustration, we include an example we have come across in our social lives. A man with diabetes mellitus had his application for an insulin pump rejected. His electronic health record documented a medical need for subcutaneous treatment, but it also revealed that he had previously been diagnosed with paranoid schizophrenia and had attempted to commit suicide. This psychiatric diagnosis was no longer ‘active’ in his health record, and he no longer received medical treatment for mental health problems. However, the digital mirror still showed a picture of a suicidal man at risk of misusing the insulin pump. He worried that complaints might not be taken seriously, or that they might even be taken to confirm the continued relevance of his previous diagnosis. His worries were confirmed by experiences of being treated differently, depending on whether health professionals had read certain parts of his medical record. The uncontrollable life of this ‘data monster’ – as he and the first author jokingly named it in a conversation – retained him in the disempowering role of a psychiatric patient.¹

The examples elucidate how the legacy of past data traces can enter our lives as ‘haunting’ data phantoms. Moreover, they underscore how digital phenotyping is not just about discovering ourselves. It also involves the experience of being seen and monitored by others (Vegter et al., 2021). Since we cannot control the data legacies we leave behind, digital phenotypes can at the same time extend and reduce the person. Sharing of digital information can open doors to health care services. However, it can also give rise to new and unwelcome forms of objectification that contest who and what the ‘original’ is.

The looping effects of digital mirrors

The notion of data inheritance underscores that data and digital mirrors can have different temporalities than the person in flesh and blood. As seen in the previous sections, digital mirrors can inherit features of other data donors or reflect an outdated picture of the physical person. However, digital mirrors can also be temporally ahead of the subject, as the constructed digital phenotypes created today can anticipate – and possibly shape – future trajectories of the individual.

Ian Hacking (1995) described how medical classifications ‘makeup people’ in certain ways and can introduce ‘looping effects’. However, where Hacking focused on the relation between the classification by the medical profession and the self-understanding of individuals, digital phenotyping presents a new situation where individuals are increasingly affected by invisible profiling practices. Digital phenotyping introduces several looping effects, is influenced by multiple drivers, and is a growing part of our everyday life. In the context of health care, one looping effect may appear when digital phenotyping reframes health conditions, including mental disorders, as being primarily a matter of individualized biology and social behaviours (Birk and Samuel, 2020). Another may occur when individuals are redefined as being at risk of future disease as a result of digital profiling (e.g. Price et al., 2017). Although framed as a means to empower individuals, by taking control over future health conditions, this ‘loop’ also involves vested commercial interests in the expansion and proliferation of risk categories, which may lead to increased medicalization and overdiagnosis of asymptomatic people (Vogt et al., 2019). Moreover, where Hacking took for granted an external position from where to identify and possibly contest the loops, the current data collection practices leave no space for the individual – whether social scientist or citizen – to stay outside data profiling.

Digital phenotyping is inescapably tied to a digital economy, characterized by a complex reciprocity of access to services and the commodification of personal data (Fourcade and Kluttz, 2020). Ebeling writes that in her search for the mysterious marketing baby, she realized that she herself is a commodity in the databased society (Ebeling, 2016: 128). Similarly, Zuboff's (2019) seminal book on surveillance capitalism stresses how personal data can be commodified for economic profit and serve purposes of control and steering of behaviours. As illustrated in the documentary The Social Dilemma, digital profiling can affect how we perceive ourselves and the world around us, including what news we read and how we vote. Digital phenotyping can be empowering, but it can also make us prone to manipulation via invisible strings of data traces.

Concluding remarks

We have used the notion of data inheritance to illustrate how ‘personal’ data profiles are based on the legacy of many data sources that precede and go beyond the individual. Attention to how individual profiles are constructed through the inheritance of data on the many can help us understand why genetic test results can differ in their accuracy for specific groups or how structural injustice can become a problem for algorithmic data profiling. Similarly, by tracking the multiple uses of health data, which go far beyond the purposes of self-tracking and health optimization, it becomes possible to explain why digital mirrors cannot easily be controlled or modified by the physical subject. Once constructed, digital phenotypes may enter a multiplicity of alliances that compromise the interests of the individual. These considerations further underscore how digital phenotypes, through complex and often invisible social relations, are performative and co-constitutive of the person in flesh and blood.

Footnotes

Acknowledgments

We thank the editors, an anonymous reviewer and members of the MeInWe research group for useful comments on a previous version of this paper.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Danish Research Council for Independent Research (grant number 0132-00026B) and the Carlsberg Foundation (Semper Ardens grant ‘MeInWe’ CF17-0016).

ORCID iD

Sara Green

Mette N. Svendsen

Notes

References

Birk

Samuel

(2020) Can digital data diagnose mental health problems? A sociological exploration of ‘digital phenotyping’. Sociology of Health & Illness 42(8): 1873–1887.

Boschert

Rosen

(2016) Digital twin – the simulation aspect. In: Hehenberg

Bradley

(eds) Mechatronic Futures. Cham: Springer International Publishing, 59–74.

Dawkins

(1982) The Extended Phenotype: The Gene as the Unit of Selection. Oxford: Oxford University Press.

Ebeling

(2016) Healthcare and Big Data. Digital Specters and Phantom Objects. New York: Palgrave MacMillan.

Fourcade

Kluttz

(2020) A Maussian bargain: Accumulation by gift in the digital economy. Big Data & Society 7(1): 1–16.

Gianfrancesco

Tamang

Yazdany

, et al. (2018) Potential biases in machine learning algorithms using electronic health record data. JAMA Internal Medicine 178(11): 1544–1547.

Hacking

(1995) The looping effects of human kinds. In: Sperber

Premack

(eds) Symposia of the Fyssen Foundation. Causal Cognition: A Multidisciplinary Debate. Oxford: Clarendon Press, 351–394.

Haggerty

Ericson

(2000) The surveillance assemblage. The British Journal of Sociology 51(4): 605–622.

Haraway

(1997) Modest_Witness@Second_Millenium: FemaleMan_Meets_ OncoMouse: Feminism and Technoscience. New York: Routledge.

10.

Hoeyer

(2019) Data as promise: Reconfiguring Danish public health through personalized medicine. Social Studies of Science 49(4): 531–555.

11.

Huey

Hawk

Offodile

(2019) Mind the gap: Precision oncology and its potential to widen disparities. Journal of Oncology Practice 15(6): 301–304.

12.

Jain

Powers

Hawkins

, et al. (2015) The digital phenotype. Nature Biotechnology 33(5): 462–463.

13.

Lupton

(2014) Self-tracking cultures: Towards a sociology of personal informatics. In: Proceedings of the 26th Australian computer–human interaction conference on designing futures: The future of design, pp. 77–86. Available at: https://doi.org/10.1145/2686612.2686623

14.

Murray

Levy

, et al. (2009) An agenda for personalized medicine. Nature 461(7265): 724–726.

15.

Obermeyer

Powers

Vogeli

, et al. (2019) Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464): 447–453.

16.

Perez

(2019) Invisible Women: Exposing Data Bias in a World Designed for men. London, UK: Vintage Publishing.

17.

Pot

Kieusseyan

Prainsack

(2021) Not all biases are bad: Equitable and inequitable biases in machine learning and radiology. Insights Into Imaging 12(1): 1–10.

18.

Price

Magis

Earls

, et al. (2017) A wellness study of 108 individuals using personal, dense, dynamic data clouds. Nature Biotechnology 35: 747–756.

19.

Sabatello

Appelbaum

(2017) The precision medicine nation. Hastings Center Report 47(4): 19–29.

20.

Schork

(2015) Time for one-person trials. Nature 520: 609–611.

21.

Senn

(2016) Mastering variation: Variance components and personalised medicine. Statistics in Medicine 35(7): 966–977.

22.

Timmermans

Tietbohl

Skaperdas

(2016) Narrating uncertainty: Variants of uncertain significance (VUS) in clinical exome sequencing. BioSocieties 12: 439–458.

23.

Vegter

Zwart

HAE

van Gool

(2021) The funhouse mirror: The I in personalised healthcare. Life Sciences, Society and Policy 17(1): 1–15. Available at: https://doi.org/10.1186/s40504-020-00108-0

24.

Vogt

Green

Ekstrøm

, et al. (2019) How precision medicine and screening with big data could increase overdiagnosis. BMJ 366: l5270.

25.

Zuboff

(2019) The age of Surveillance Capitalism: The Fight for A Human Future at the new Frontier of Power. London: Profile Books.