Abstract

Why data work in healthcare?
Healthcare organizations across the globe are currently grappling to implement tools and practices to transform data from “refuse to riches,” a movement propelled by mass adoption of electronic health records (EHRs), sensors, and servers that can hold an ever-expanding volume of digital data. 1 Allegedly, “By digitizing, combining and effectively using big data, healthcare organizations ranging from single-physician offices and multi-provider groups to large hospital networks and accountable care organizations stand to realize significant benefits.” 2 The potential is “. . . to improve care, save lives and lower costs.” 2 As a consequence, organizations are struggling under massive institutional pressures to make healthcare “data-driven” against the messy reality of creating, managing, analyzing, and using data for management, decision-making, accountability, and medical research. 3
However, data do not sit in ready repository, fully formed, and easily harvestable. 4 Data must be created through various forms of situated work. Even when data is a byproduct—“exhaust data”—from other processes, data has to be filtered, analyzed, and interpreted. 5 While scholars have acknowledged the situated and effortful nature of data production along with the inherent subjectivities of data, these practices have been little investigated.
Drawing attention to the various kinds of data work that take place to generate and make use of data can be seen as a socio-technical intervention to make data work. Just as humanistic scholars are keen to make interpretive interventions arguing against the presumptions that data and pattern analysis can be conducted without attention to meaning and interpretations, 6 we argue that useful data require encounters between people, technologies, and data. These encounters are rooted in particular places and particular times and require (often considerable) effort on the part of the people involved.
Healthcare organizations must re-organize around data production, including developing new technical and human resources for this work. Simultaneously, healthcare practices and research are being re-worked in response to big data and data-driven tools and logics. The mere volume of data is changing the discourse about the phenomenon, for example, from one of quality to one of quantity, emphasizing correlations (gleaned through machine learning) rather than causal relationships, and machine-generated rather than human-generated hypotheses. 7 This development will create a push for new kinds of expertise, not only within machine learning, but make data collection and management crucial and should entail acknowledgment of those engaged in this work with regard to promotion and career paths 7 (see also the studies by Hogle 8 and Ruckenstein and Schüll 9 ). Datafication of healthcare will also change existing occupations and professions and generate new ones. 10 Scholarly attention to data work more broadly is imperative in order to elucidate the emerging requirements for management and work system design as well as to develop the workforce and skill-mix necessary to carry out data work (e.g. healthcare data analysts), organizational change (e.g. developing healthcare data business intelligence units), and training of existing staff (working, interpreting, and critically assessing data). While there is an emerging literature on data work outside of healthcare,11–13 so far little research has been conducted upon data work in healthcare.14,15 The rich set of research papers assembled in this special issue make large strides toward filling that gap, while also providing insights about data work that may be applicable in sectors beyond healthcare.
What is data work?
To aid our exploration of data work, we first offer a definition. “Data” are defined as “. . . facts, numbers, letters, and symbols that describe an object, idea, condition, situation, or other factors.” 16 Christine Borgman, a leading scholar of data studies related to scientific research, refers to four categories of data: observational, computational, experimental, and records. 17 These categories hold true in healthcare as well. The various papers in this special issue demonstrate data work occurring on all of these types of data. However, our goal is to turn the focus to the range of socio-technical practices of producing and using data, rather than describing data itself. Thus, all of the research presented in this issue attends to “data work” in some form. We define data work as any human activity related to creating, collecting, managing, curating, analyzing, interpreting, and communicating data.
Data work has long been part and parcel of healthcare. For example, physicians and nurses collecting vital signs, patient histories, and other types of data and entering them on the medical record are examples of data work. 18 However, data work has grown exponentially in the past few decades as healthcare work systems have been digitized, and we have entered the era of “the quantified patient” and “big health data” characterized by drastic increases in the volume, velocity, variety, exhaustiveness, resolution, flexibility, and relationality of data. 19
Workers arrayed in all parts of the healthcare ecosystem are doing more data work. The jobs of many healthcare workers are being reconfigured around working with data: for example, doctors and nurses are charged with collecting data in systematic ways that are far beyond the expectations of the past. Further, the rise in data work has led to the emergence of entirely new occupations, such as medical scribes, who collect data in charts as doctors attend to patients; Clinical Documentation Specialists, who monitor the charting of physicians in real time to make sure their data are accurate; and data scientists, who use machine learning statistical tools and techniques to analyze data caches. 10 Some existing occupational groups, particular those who work with information systems groups in healthcare organizations, have expanded to keep step with the data needs of healthcare organizations. Accordingly, in this special issue, we attend to data work practices by existing occupational groups as well as emerging occupations centered on data. In addition, we attend to the many shifts, from subtle to drastic, that are occurring in the data work carried out by healthcare workers predicated on new IT capabilities and expectations about the use of data for a multitude of purposes that were heretofore quite limited or impossible. These include data-driven management, accountability, and governance based on performance measurements, and secondary clinical research. While data infrastructures are built to enable monitoring of populations and early prevention measures, 20 and to monitor the quality of services,21,22 healthcare workers increasingly learn to maneuver playfully within these environments. 23
Thus, the papers collected here discuss the data work of different groups: doctors struggling to clarify data ambiguity and use predictive algorithms for personalized medicine; 24 coming to terms with data variability—varying data on the same phenomena; 25 clinicians’ response to patients’ increasing data literacy; 26 nurses’ interpretation of patient-generated data as a means of inclusion in their own care;27,28 and patients generating data about their health.29,30 Finally, there is work to produce data itself, including skillfully assessing messy charts to create structured datasets, 22 to sanitizing and validating data, 31 and building data integrations between various information systems. 32
Relating data work to healthcare research more generally
As elaborated in the previous section, data work describes the actual work of persons engaging with IT systems for the purpose of producing, managing, interpreting, and analyzing data. As reflected in the papers in this special issue, this work typically occurs in a local context where the IT system is an important tool for the healthcare worker. At first glance, the reliance on IT systems for data work signals the importance of the healthcare worker using her tool in a local setting to accomplish her work. However, this interrelationship goes two ways: the tool may also take part in structuring the work through imposing what to do, when to do it, and in which order.
33
A further examination, informed by past research CSCW (computer-supported cooperative work) on the role of artifacts in healthcare work, may also reveal that data work depends on a range of tools and artifacts such as lists, tables, plans, paper, computer screens, and whiteboards, which together organize work by enabling the handling of the increasingly complex cooperation of distributed actors.34–36 Bossen and Markussen
34
describe this heterogenous topography as coordinating artifacts: Coordinating artefacts reduce the complexity of articulating and coordinating actions, since interdependencies are already precomputed and tasks stipulated, and therefore “reduce the space of possibilities.”
34
However, the advance of new technologies has opened up possibilities for clinical data—and therefore also data work—to span organizational and professional boundaries. This poses challenges for linking, integrating, and reusing information on a much grander scale. 37
We suggest that one starting point for mapping the interconnectedness of data and data work alike is to consider the “orders” of data present in data reuse. Health data are typically reused multiple times, for multiple purposes, and at each stage of reuse people translate and transform data from one form into another. These data are then transformed into yet another form (and on and on) through data work. A straightforward example is that of quality measurements, which involve reusing billing data, which are themselves a reuse of clinical data recorded in the medical record (see Figure 1). 14 Mapping the orders of data reuse and the data work entailed therein points to the interdependencies between data and attendant data artifacts and data work. Adding data uses creates new interdependencies in the ecosystem of data.

Orders of data production and reuse for quality measurements.
Artifacts in use “. . . are not just individual tools (. . .) they are linked to others so that they together constitute the infrastructure that all (. . .) work depends upon.” 38 Through examining interdependencies between artifacts and practices as data are produced and reused throughout the healthcare information ecosystem, it becomes clear that data work is not limited to smaller scale clinical sites. Throughout the information ecosystem, data work is interdependent with—and has implications for—data work at other sites. 39 Components in an information infrastructure are almost never standalone entities, but are integrated with other information systems and communication technologies, and with non-technical elements. 40 Therefore, analyses of data work in an information infrastructural context need to consider a broad range of socio-technical issues shaping the processes of implementation and use.
Consider, for example, the multiple purposes of data: data are entered in records to enable coordination and continuity of care and treatment of patients across time and space; data serve as documentation in legal disputes; data are used for billing in privatized healthcare systems or reimbursement in welfare state healthcare systems; data can be used to trace quality issues or unequal levels of performance for learning purposes; data can be used to drive efficiency; and data can be used to discover new correlations between diseases, causes, and treatment. These multiple orders of data work involve different occupational groups, which have different demands and ways of ordering data, which mutually shape each order in turn. Importantly, data work is ever changing and transforms along with an evolving institutional demands, resources, and information infrastructures on an ongoing basis.
Themes and issues around data work in this special issue
The 13 papers included in this special issue can be summarized into three major, partly overlapping, themes: (a) working with conflicting qualities of data, (b) data work as collaborative endeavor, and (c) relationships between patients and providers.
In the first theme, “working with conflicting qualities of data,” four papers attend to questions of decision-making, veracity, shareability, and reliability. Chorev describes how doctors work to make sense of data for personalized medical decision-making in situations where the context of data is missing and interaction with patients is limited. Veracity (truthfulness and correctness) of data is an important and often overlooked characteristic of data, including so-called big data, and at the center of attention in the case of Mønsted. The veracity of statistical characteristics of patients depends on both the citizens’ ability to report high-quality data and on the ability of the health professionals to interpret the outcome in the context of existing care practices. Cabitza et al. focus on reliability, and the fact that while it is considered natural that multiple observers can interpret a case as it is represented in a medical record in different ways, fewer studies exist on how data producers observing the same phenomenon could report in different and irreconcilable ways. The authors discuss some design ideas to let such a multiplicity be a resource for a more collaborative, learning-oriented, and uncertainty-aware approach to data work. Finally, Vassilakopoulou and Aanestad analyze the work needed to make data shareable and reusable across laboratories distributed all over the world. Data work is, in this case, the transformative process needed to make data produced locally into a communal resource by adding metadata, disambiguating and sanitizing data, and assessing its relevance, validity, and combinability. Overall, these studies make clear that reconciliation, veracity, and reliability of data are emerging products of data work processes.
The second theme, “data work as collaborative endeavor,” contains four papers. Bjørnstad and Ellingsen focus on the activities needed to maintain coherence across a distributed data infrastructure for electronic medication management. Integration and interoperability are not mere technicalities, but require constant local effort and collaboration from doctors, nurses, and computer scientists to ensure the correctness of data. Pine reports an ethnographic study of the data work performed by hospital clerks and medical record coders and the “qualculative practices”—the judgment and decision-making involved in creating data elements—that are part of creating standardized administrative data suitable for maternal care quality measures. Wallenburg and Bal focus on data for quality and accountability purposes and describe how frames of play and reward are applied to the healthcare setting when doctors adapt, ignore, and change measures when improving their work and creating a professional identity. Bonde, Danholt, and Bossen focus on the collaborative data work of clinicians and IT and business intelligence staff that is necessary for the development of quality indicators. This entails multiple “frictions” and is too complex to be handled by one single profession. One overall takeaway from these papers is that collaborative work around data too often remains invisible and, if supported, offers huge potential for learning. 41
The third theme, “relationships between patients and providers,” contains five papers. Langstrup analyzes the activities aimed at promoting and standardizing the collection and use of Patient-Reported Outcome (PRO) data. This article addresses the overwhelming demand for more data from patients and asks what kind of data work, done by whom, makes sense? And, how should doctors and patients be involved in the multi-perspective interpretation of the same data? Islind, Lindroth, Lundin, and Steineck tackle the issue of how the inclusion of Patient-Generated Health Data (PGHD) first requires disentangling data from the context of its production (patients) and then a re-contextualization of data when nurses discuss treatment progress with the very same patients. On a similar note, Hult, Hansson, Svensson, and Gellerstedt reflect on the transformation of the role of patients and providers as patient-centric technologies enter the care scene. They illustrate how patients’ data work requires the development of new professional competences by health care professionals, and the possibility for more cooperative, “flipped” health care. Grisot, Kempton, Hagen, and Aanestad analyze nursing practices for the remote monitoring of patients, and how remote monitoring reshapes nurses’ work from body work to data work. Building a relationship with a remote patient through data requires new skills in both data handling and communication in order to achieve personalized care. Finally, Piras proposes an analysis of some of the labels used in the scholarly debate and policy-making discourse around health data produced and managed by patients: “Patient-Generated Health Data,” “Observations of Daily Living,” “Quantified Self,” and “Personal Health Information Management” are labels for related, but discrete lenses that enable researchers to see how different forms of patient data work sustain different patient-provider relationships.
Implications of healthcare data work
The papers presented in this special issue, coupled with prior research,1,3,4,42 show unequivocally that healthcare data are not simply passively available, even when sensors are installed or digital workflow systems capture data in the course of work. Data are produced through effortful, situated work carried out by people. Further, the number and diversity of individuals and occupational groups involved in healthcare data work is vast and includes clinicians, non-clinical healthcare workers, managers, administrators, patients, caregivers, and external organizations and workers (quality improvement organizations, researchers, IT companies, consultants, and others). For example, Langstrup 30 describes the collection of PROM (Patient-Reported Outcome Measures) data by patients; Hult et al. 26 describe the work of nurses adapting to different patient literacies; Chorev 24 chronicles the work of clinicians to make sense of ambiguous data; and Bjørnstad and Ellingsen 32 call attention to the invisible work of creating meaningful integration of information systems. Widespread adoption of digital Information Infrastructures for healthcare increases the capacity to produce, store, and analyze data, 43 and widespread availability of data tools mean that increasing expectations are developing for the types and depth of biomedical and organizational research that can be done using second-order data. Such managerial and administrative data work presents novel data practices both for professionals and for management and administrative staff. Healthcare organizations are increasingly required to gather and report data about their performance,44,45 and they are struggling to keep up with emerging demands for performance measurement and reporting, re-organize their organizations to collect and manage data, and respond to the results of consequential quality measurements. Further, the promise of healthcare data (be it generated by individual patients, organizations, or another source) has spawned numerous efforts to generate novel data and to link and integrate existing data streams from the public sphere. Critical accounts raise alarm bells about the increasing focus on data in healthcare, both in terms of the ways in which it refocuses clinicians in moments of care away from patients and toward computer terminals or tablets, and in the big picture away from patient-centered medicine and toward algorithmic accountability measurements.46–49
Collectively, the papers in this special issue address these concerns by providing detailed empirical accounts on how the quest for data has local consequences. The papers also provide policy and design implications focused on how to better support individual and collaborative data work, 25 how to characterize the data activism of patients so they might be more influential, 29 and how to support workers to adapt to data work and find ways to thrive amid changing, data-centric work environments. 23 A critical line of ongoing research should pertain to the consequences of widespread “valorization of data-oriented ways of knowing” 50 in healthcare. Hence, while the studies presented here answer questions related to what is observable in terms of data work, they do not describe what is absent. For instance, we know less about the organizational, managerial, and clinical skills we might be losing by focusing on data-centric ways of knowing. It is our hope that future research will continue the work started in this special issue and pay special attention to this question.
Outlook
What, then, are the prospects for the data work phenomenon? On one hand, ongoing digitization of healthcare and advances in machine learning and artificial intelligence suggest that some data work may be replaced by voice-recognition, digital diagnostics, and data-mining of EHRs and social media. On the other hand, since data work often involves interpretation and assessments based on sense-making, existing and new healthcare occupational groups—and patients—may increasingly be doing such work.
Looking at digitization of jobs and the labor market broadly outside of healthcare, the overall picture points toward a continuation of a decades-long process, where computers and algorithms substitute jobs in routine-intensive occupations. Both cognitive and manual routine jobs are candidates for substitution by computers, whereas non-routine jobs such as medicine and sales as well as low-income manual occupations that demand a high degree of flexibility and physical adaptability will prevail. The overall effect is predicted to be a polarization where middle-income jobs are diminished, and cognitive high-income and physical adaptability–demanding low-income jobs show the strongest growth. 51 Recently, however, this picture has become more complex by indications that even non-routine tasks may be substitutable by complex algorithms, as exemplified by advances in autonomous cars, handwriting and face recognition, and sorting produce. Even jobs requiring subtle judgment are susceptible to computerization. 52
The implications of digitization within healthcare seem similar complex: robots can perform surgeries, medical records clerks become obsolete when EHRs are implemented and medical transcriptionists may possibly be replaced by speech recognition, and computers substitute physicians doing diagnostics based on MR scans, text mining, and test results. Jobs can be ranked according to the susceptibility to computerization or not. Some jobs are deemed not to be threatened such as nurses, physicians (overall), database managers, and recreational and occupational therapists, whereas clerks and other administrative jobs usually categorized as “routine” may wither away like telephone operators, bookkeepers, and coopers. 52 However, at the same time new occupations emerge: medical scribes write records on physicians’ behalf to lessen the latter’s increased workload due to the implementation of EHRs, and Clinical Documentation Improvement Specialists grow in numbers to meet increasing demands for precise and fine-grained documentation. 53 Furthermore, as the papers in this special issue show, data work adds an additional layer of tasks to those of existing healthcare occupations. Hence, from the perspective of data work, more jobs and re-skilling rather than computational substitution could be the outcome of healthcare digitization.
However, the implications of datafication of healthcare for data work remain to be further explored and investigated. This is important since governmental education policies, healthcare managements’ skill-mix strategies, and people’s career choices are based on perceived and often-simplified anticipations of future developments. Such anticipations include a long tradition of neglecting the skilled competences involved in clerical and low-paid work as well as work historically conducted by women—for example, medical secretaries are fired in connection with the implementation of EHRs, since their work is wrongly expected to become void. 54 This makes detailed, empirical analyses of data work pertinent, since the priorities and healthcare strategies otherwise may be skewed. Moreover, even the jobs that will prevail or grow in numbers will most likely incorporate more data work and may therefore transform as a part of the process. Hence, we believe that more research into this area is needed, and we hope to have provided inspiration along these lines.
Footnotes
Acknowledgements
The special issue originates from a series of workshops on data work in healthcare in connection with conferences from 2016 to 2018: Computer-Supported Cooperative Work 2016 (CSCW ’16), Design of Cooperative Systems 2016 (COOP ’16), and European Conference on Computer-Supported Cooperative Work 2017 (ECSCW ’17). We would like to thank all participants and the authors of the papers in this issue for their contributions.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
