Abstract
How are data science systems made to work? It may seem that whether a system works is a function of its technical design, but it is also accomplished through ongoing forms of discretionary work by many actors. Based on six months of ethnographic fieldwork with a corporate data science team, we describe how actors involved in a corporate project negotiated what work the system should do, how it should work, and how to assess whether it works. These negotiations laid the foundation for how, why, and to what extent the system ultimately worked. We describe three main findings. First, how already-existing technologies are essential reference points to determine how and whether systems work. Second, how the situated resolution of development challenges continually reshapes the understanding of how and whether systems work. Third, how business goals, and especially their negotiated balance with data science imperatives, affect a system’s working. We conclude with takeaways for critical data studies, orienting researchers to focus on the organizational and cultural aspects of data science, the third-party platforms underlying data science systems, and ways to engage with practitioners’ imagination of how systems can and should work.
Keywords
Introduction
Data science is the practice of analyzing large-scale data using techniques drawn from domains such as machine learning, artificial intelligence (AI), and natural language processing. Building data science systems is a laborious process, requiring extensive amounts of technical work.
1
Unsurprisingly, dominant narratives about the working of such systems—
A good example is the work of problem formulation (Hand, 1994)—translating high-level goals into data-driven problems. During problem formulation, an essential first step, practitioners outline the system’s intended working in the service of given goals. Research has shown how the expected working of a data science system is not given but negotiated in the problem formulation stage through “discretionary judgments of various actors and further affected by choice of methods, instruments, and data” (Passi and Barocas, 2019: 46). “Even the simplest piece of software has embedded within it a series of architectural decisions about what ‘works’ regarding the purposes for which it was created” (Shaw, 2015: 2).
Even after problem formulation, practitioners’ conceptions of how, whether, or in what ways their systems work remain in flux. In this paper, we unpack the ongoing forms of human work involved in a corporate project to show how a data science system’s working is not stable but remains
In fact, we show that even determining what aspects of a system work or do not work is not always obvious or numerically determinable. One reason for this is that a system’s working is multifaceted. The system works or does not work in distinct ways for different actors. Data scientists, for instance, often describe a system’s working via performance metrics (Rieder and Simon, 2016). As artifacts of “algorithmic witnessing” (Passi and Jackson, 2018), numbers remain tightly coupled with those aspects of working “that are most readily computationally quantifiable” (Baumer, 2017). Project managers, however, define working through the lens of business use cases, and product managers prioritize compatibility and feasibility as essential aspects of working—articulations embedded in broader organizational imperatives. “Mathiness” (Lipton and Steinhardt, 2018) is but one feature of a data science system whose eventual working is a “practical accomplishment” (Garfinkel, 1967), marked by diverse forms of sociotechnical work.
This paper addresses the question:
Engaging with the design of data science systems, however, is challenging. Opening the data science black box is neither easy nor straightforward. Analyzing such systems requires multiple forms of knowledge. Access to data science practitioners, especially those working in corporations, remains restricted. Critical data studies scholarship on the working of data science systems thus centers on their use, rather than their design—their impact in the way they work as opposed to
In this paper we address this gap, contributing to a growing body of research on the design practices of data science systems (e.g., Amershi et al., 2019; Muller et al., 2019; Passi and Barocas, 2019; Passi and Jackson, 2018; Saltz and Grady, 2017; Saltz and Shamshurin, 2015). Based on ethnographic fieldwork with a corporate data science team, we describe how actors negotiate central aspects of a system’s working: what work the system should do, how the system should work, and how to assess whether the system works. These negotiations, we show, lay the foundation for how, why, and to what extent a system ultimately works the way it does. We focus on a specific project—a self-help legal chatbot—to highlight these negotiations and the invisible human work that goes into determining whether and how systems work. Through a detailed recounting of corporate data science in action, we develop a more general account of how a system’s working is not only affected by algorithmic and technical design but also continually reworked in light of competing technologies, implementation challenges, and business goals.
In the following sections, we describe our research site and methodology before moving to the empirical case study. We conclude with our findings, describing ways for the field of critical data studies to move forward based on them.
Research site and methods
This paper builds on six months of ethnographic fieldwork with Aurelion, 2 a multi-billion-dollar technology corporation based on the US West Coast. To gain immersive access to ordinary work practices, the first author worked as a data scientist at the corporation during this time, serving as a lead on two business projects (not reported in this paper) and taking part in several others. Aurelion owns several companies across domains such as health and law. There are multiple teams of project managers, business analysts, and software developers at Aurelion and its subsidiaries. Aurelion’s data science team works with product teams to build diverse business applications using data science. During fieldwork, Aurelion’s data science team had 8–11 members (including the first author). 3 Remy—Director of Data Science with 30+ years of experience managing projects in several major technology firms—heads the data science team. Remy and his team report to Chief Technology Officer (CTO) Harper, who has 20+ years of industry experience.
We conducted 52 interviews with project managers, business analysts, data scientists, software engineers, and corporate executives. The collected data also included 426 pages of fieldwork notes and 104 photographs. We used principles of grounded theory analysis to code interview and fieldwork data (Charmaz, 2014; Strauss and Corbin, 1990). Through two rounds of in-vivo and thematic coding, 4 we identified several forms of human work through which actors establish what systems should do, in what ways, and to what extent, the three most salient of which we report below. The first author headed the empirical analysis, including multiple rounds of discussions with four other researchers. We organized our interview and fieldwork data in two ways: (a) categorized by projects (e.g., self-help legal chatbot) and (b) categorized by professional groups (e.g., analysts, scientists, managers, and executives). The former helped to analyze themes within and across projects (e.g., problems specific to certain kinds of projects), and the latter helped to examine similarities and differences between groups (e.g., how different professionals expect a system to work). In this paper, we focus on one project for its salience to the paper’s theme, but we observed similar dynamics across other projects. The chatbot project was in its initial stage when we began fieldwork. The first author was only an observer in this project.
Empirical case study: Self-help legal chatbot
Law&You, an Aurelion subsidiary, offers online marketing services to attorneys, lawyers, and law firms. Clients pay to integrate Law&You’s digital tools into their websites to convert website visitors into paying customers. One such tool is an online chat module that connects users with human chat operators, who guide users toward self-help legal resources or collect information on users who need professional legal services. If a user needs professional help, the operator collects data such as the user’s name, address, and contact, and forwards it to the client.
In late 2016, Law&You started replacing human chat operators with automated guided chats. A guided chat is a scripted list of options presented to the user one at a time. Depending on the user’s selection, guided chat moves to the next set of options. Guided chat generates its initial options based on, for instance, keyword analysis. If the user’s request is, for example, “I want to file for bankruptcy,” guided chat will identify the keyword “bankruptcy” and provide three options:
Law&You, however, faced two challenges with guided chats. First,
In May 2017, Law&You’s director of technology, Paul, approached Aurelion’s director of data science, Remy, with the idea of a smart chatbot. Law&You’s move to chatbots was partly a response to the fact that several of their competitors now provided AI-driven marketing tools. The chatbot, from Paul’s perspective, was the future of digital marketing. The data science team had never built a chatbot before but had experience with natural language processing (NLP) tools. The team had previously developed NLP-based systems using in-house models and third-party services provided by companies such as Amazon, IBM, and Microsoft. At the beginning of the project, as part of requirement gathering, Remy asked Paul for a definition of the chatbot’s use case—
Guided chat’s failure to perfectly determine
The data science team broke down the chatbot’s functionality into two tasks: (1) “knowing” what users say (identifying what users want to accomplish with the chatbot) and (2) “guiding” conversations (helping users by making the chatbot talk to them in a human-like manner). 5 Initially, data scientists focused on the first task—determining what users say. When they researched existing chatbots, they found that third-party AI platforms powered most chatbots. In earlier projects, the data science team had “tried” most third-party AI platforms; given their experiences, they favored VocabX 6 and considered it “smarter than most systems.”
The data science team believed that building a VocabX-enabled chatbot required the use of several VocabX services working in tandem: (1) 1. business and industrial/company/bankruptcy 2. society/social institution/divorce. (Fieldwork Notes, 31 May 2017)
7
They considered the line “confusing” because it had keywords for both divorce and bankruptcy. 8 Such attempts to confuse the chatbot 9 were commonplace, considered an informal, yet reasonable, form of testing. The chatbot “knew” what users said if it identified the “correct” topics.
The more significant challenge was the second task—guiding conversations. When users went off track, the data science team wanted the chatbot to guide back the conversation in a “natural” manner. Data scientists tried but were unsuccessful in making the chatbot hold conversations. The chatbot could identify legal topics, but data scientists could not successfully configure
Remy asked how the chatbot would work when users went “outside the scope of the conversation.” The data science team often discussed the two tasks (
The meeting surfaced a pertinent challenge. Guiding conversations required knowing what a user says
The situation grew complex when users said things that were not already specified as entities and intents. For example, when users describe love for their kids in divorce conversations. Pre-trained on general topics, the chatbot can identify that the user is perhaps on the “family” or “parenting” topic. If the chatbot does not identify that these topics are also related to divorce, then it might incorrectly determine that the user is off track. Law&You wanted to avoid such situations that could make users feel that they were not understood or taken seriously, making them drop conversations midway. For the legal use case, the chatbot needed to identify
Director of technology Paul and CTO Harper turned down manual labeling, arguing that it incurred high financial and personnel costs. They could allocate resources to manually label a small set of entities, but not all of them.
The data science team organized the meeting to know how to “guide” conversations. However, much of the meeting centered on the work of identifying legal issues, i.e. “knowing” what users say, which the data science team believed it had already accomplished. The team realized that the full scope of the task of knowing what users say includes knowing the difference between “really off-track” users and on-track users “describing things differently.”
Our description of the meeting may seem incomplete, consisting only of discussions among business personnel—Remy, Paul, and Harper—and VocabX representatives. Although data scientists were at the meeting, they are absent in our description. During fieldwork, we observed similar dynamics between data scientists and business personnel. Data scientists were vocal in data science team meetings (even in the presence of their director or project manager). However, they were far less vocal in meetings with business personnel, especially senior personnel. For example, in requirement gathering or status update meetings with business teams, Remy and Daniel often spoke on behalf of the team.
The same dynamic was at play in the VocabX meeting. One reason for this, as we learned through discussions with data scientists, lies in their understanding of the goal of such meetings. It may seem that the meeting’s goal was technical—to learn how to configure VocabX to hold open-ended conversations. Data scientists, however, felt differently. For them, the meeting’s goal was primarily business in nature. They recognized that if VocabX could power the chatbot, the company would need to invest a substantial amount of money into VocabX to get access to their cloud infrastructure to handle the thousands of customers that would use the chatbot daily. The meeting’s technical and business goals were intertwined—a common occurrence in the world of corporate data science.
Going back to our story, the meeting surfaced additional problems but resolved some previous ones. Data scientists now better understood what they could accomplish with VocabX (e.g., identify preconfigured intents and entities) and what they could not (e.g., discern whether new entities were part of legal intents)—as discussed in the debrief session after the meeting. They decided that they needed to figure out a way to make the chatbot automatically learn legal intent–entity relations. The team discussed two ways to achieve this. First, using VocabX’s
The data science team presumed that VocabX
A few weeks later, the data science team provided an update on the chatbot’s development. Remy, data science project manager Samuel,
11
data scientists Max and Alex, Law&You’s director of technology Paul, and Law&You’s software engineer Richard attended the meeting. Remy told Paul that the chatbot was a work in progress. It performed better than before but was far from ready for deployment.
12
There were “too many edge cases” in which conversations did not go as planned. Paul, however, was not convinced that the chatbot’s performance was as bad as Remy described. For him, the chatbot just had to be good enough.
For the data science team, the chatbot was better than before but still far from perfect. Paul did not require perfection. The chatbot had business value, even if it worked 80% of the time. Paul differentiated between academic and business perspectives. Edge cases posed exciting data science challenges. Solving them, however, was outside the project’s scope. The business gained value from a good-enough chatbot even if it was not ideal from a computational perspective.
It is not surprising that imperfect, good-enough systems can provide business value. What was surprising was that Paul argued that the chatbot’s failures were, in fact, not failures at all!
For Paul, the primary goal was to collect user data and sell it to clients. If the chatbot did this most of the time, it worked. It was acceptable to fail, and failures did not mean that the chatbot did not work. In such cases, the chatbot can inform users that their legal problem (the problem it failed to identify) is not a self-help problem and requires professional help. The users should thus provide their information so that the company could put them in touch with lawyers. There were no failures, only data.
Findings
We began the paper with the question:
Existing technologies: The old and the new
The working of data science systems is not just an artifact of their technical features but entangled with existing technologies. In this subsection, we analyze how two existing technologies—one
First, assessments regarding whether a technology other than the chatbot—guided chat—worked shaped actors’ articulations of the chatbot’s intended working. Director of technology Paul initially described the chatbot as different from guided chat. The chatbot was a novel technology, signifying the company’s move toward AI. The chatbot’s initial problem formulation, however, was motivated by the perceived performance of guided chat. The guided chat
The company could revert to hiring chat operators, but this was undesirable. Hiring back human operators would incur a high financial cost. The company would also lose market share to competitors who already provided AI-driven digital marketing tools. Reverting to human operators would mean that to remedy guided chat’s failure (unreliable data collection), Law&You would also have to give up on guided chat’s success (higher profit margin). The company instead went for the chatbot that promised reliable data collection
Second, an important technical feature of a technology making up the chatbot shaped actors’ understanding of users’ legal issues, affecting how actors expected the chatbot to work. For our actors, users had legal intents, which were combinations of distinct legal entities. The chatbot needed to determine intent by identifying entities.
Emergent challenges: Situated resolutions and system working
The working of data science systems is shaped not only by problem formulation and algorithmic design but also by situated forms of work to resolve emergent implementation challenges. In this subsection, we examine how the resolution to the problem of identifying
The data science team initially equated the work of knowing what users say with identifying legal intents. In doing so, the team made two assumptions. First, users described legal issues, i.e. users were on track. This assumption was apparent in that the text used to test the chatbot contained valid accounts of legal issues (e.g., bankruptcy and divorce). Second, users used recognizable legal words in their descriptions. This assumption was evident in the use of specific keywords (e.g., chapter 13, divorce, and child custody) in test cases. The chatbot worked through the correct identification of legal intents of on-track users who described issues using recognizable legal words.
The VocabX meeting, however, surfaced additional challenges. The meeting’s goal was to configure the chatbot to
Actors resolved this challenge by proposing that the chatbot must identify
Negotiated balance: Business and data science considerations
Whether a data science system works is neither obvious nor given (for a more general account, see Collins, 1985; Rooksby et al., 2009; Suchman et al., 2002); the perceived success and failure of its working depend as much on business goals as on data science imperatives. In this subsection, we unpack how actors evaluated the chatbot in distinct ways, agreeing to assess its working in a practical way founded in a negotiated yet skewed balance between business and data science goals.
Business and data science actors evaluated the chatbot in different, somewhat divergent, ways. The data science team focused on assessing the algorithmic efficacy of the chatbot’s working. From this perspective, the chatbot was far from perfect because of its inability to account for several edge cases. The data science team’s fixation on scoping and resolving edge cases was not an arbitrary choice. An essential part of director of data science Remy’s project goal was that the chatbot should know when users went off track and guide them back. It should not come as a surprise that most, if not all, edge cases were off-track user queries—queries at the heart of the data science team’s assessment criteria.
For the business team, the chatbot’s assessment depended on the practical efficacy of its working. Director of technology Paul argued that solving edge cases was an interesting academic challenge but not the project’s business goal—the chatbot needed to work for most,
Our finding that practitioners often need systems to work in good-enough, and not perfect, ways echo similar findings concerning other systems (Gabrys et al., 2016; Keller, 2000; Lewis et al., 2012). However, what is surprising is how the business team reframed the situations which the data science team considered as failures as potential sites of success. The chatbot’s computational failures were, for the business team, a result of the complexity of users’ legal issues and not of the chatbot’s technical inadequacies. In such cases, the chatbot could inform users that their legal issue required professional help—
Discussion
In this paper, we described how actors make data science systems work relative to existing technologies, emergent challenges, and business goals. Enormous amounts of discretionary, often invisible work lay the foundation for how, why, and to what extent systems ultimately work.
One way to explain this is to see the chatbot’s final working as a mundane consequence of data science practice. From this perspective, a plausible narrative of the chatbot project would be that a problem (anticipating conversational pathways) hinders a business goal (data collection). Data scientists break down the problem (identify topics, learn relations among topics) and build a chatbot to solve it (by knowing what users say and guiding them). The chatbot’s working is thus a foregone conclusion—always stable, merely realized through development.
This framing, however, does not account for the choices and decisions that alter the working of systems throughout development—sometimes in ways that are invisible even to practitioners. Building systems—indeed, making systems work—requires “ordering the natural, social, and software worlds at the same time” (Vertesi, 2019: 388). Business goals and existing technologies shape problem formulations. The design of existing technologies configures the work systems must do and assumptions about how users will interact with the systems. Considerations of financial cost, personnel time, and resource allocation lead actors to require systems to do specific work in automated ways. The working of data science systems is not just an account of their technical design but made
Researchers continue to analyze data science implications, recommending how systems should or should not work. Researcher’s ability to effectively address responsible design requires understanding and addressing how and why practitioners choose to make systems work in specific ways. Through a detailed description of the negotiated nature of the working of data science systems, our empirical findings call attention to the consequential relationship between the working of data science systems and the everyday practices of building them, highlighting three takeaways for critical data studies.
First, practitioners have different, sometimes divergent, assumptions and expectations of system’s intended working. We saw how data science and business team members differed in their approach to evaluating the chatbot’s working, highlighting underlying differences in their understanding of how the chatbot was intended to work. Data science is as much the process of managing different expectations, goals, and imperatives as of working with data, algorithms, models, and numbers. Building data science systems requires work by many practitioners, but not all forms of work are considered equal. We saw how business team members had more power than data scientists in the chatbot project (Saltz and Shamshurin (2015) point to similar dynamics at another company). The organizational culture at Aurelion reinforced the notion that the data science team’s job was to “add value” to business products. Data science teams remain one of the most recent additions to corporate organizations. But their everyday work intersects with already-existing project, product, and business practices—with teams that often have more weight in organizational decisions.
In making visible the impact of organizational aspects, we suggest researchers orient toward how differences among practitioners and teams are managed, resolved, or sidelined within projects. Who gets to take part in negotiating a system’s working? Who decides the nature of this participation? Who gets to arbitrate the outcome of negotiations? In studying the implications of data science systems, we also must engage with the culture and structure of organizations that design such systems to understand how specific viewpoints are (de)legitimized by practitioners (Haraway, 1988; Harding, 2001). This engagement can help us better understand the entangled relationship between the organizational context and product development practices of corporate organizations (Boltanski and Thévenot, 2006; Passi and Jackson, 2018; Reynaud, 2005; Stark, 2009).
The second takeaway concerns itself with new empirical sites for analyzing the work of building data science systems. The challenging nature of gaining access to corporate practice continues to limit critical data studies scholarship. Data science systems, however, do not exist in isolation but are embedded in wider sociotechnical ecosystems. Justifications concerning the working of systems may lie, as we have shown, not within but outside them—in the technologies that systems replace or the technologies that make up systems. We must keep in mind that existing technologies that data science systems replace—even those that seem to have nothing to do with data science—also shape how practitioners envision what data science systems can or cannot, and should or should not, do. What existing practices and technologies do data science systems augment or replace? In what ways do existing technologies seem deficient or superior to data science systems? Engaging with these questions is useful, helping us see how data science systems intersect with existing ways of doing and knowing.
As we have shown, third-party AI platforms make up and shape the working of data science systems in important ways. As we continue to work to gain access to corporate data science practices, analyzing these systems promises to provide meaningful insights into how, why, and to what extent systems work the way they do. How do third-party AI services define and solve common problems such as identifying user intent? What are the affordances and limits of different AI platforms? Describing how AI platforms work and what futures they promise to their clients falls within the purview of our efforts to unpack the working and implications of data science systems.
Third, a data science system’s working is impacted as much by how practitioners anticipate the kinds of work the system can perform as by the actual work of practitioners to build the system. Initially, our actors believed that the chatbot could successfully solve guided chat problems. In two other instances, data science team members expected the chatbot to easily recognize legal intent, guide conversations, and learn new information. Beyond the project’s immediate goals, business actors believed that the chatbot was the future of digital marketing. Such forms of “anticipation work” (Steinhardt and Jackson, 2015) shaped how actors imagined viable approaches and solutions to problems at hand, further affecting how and why actors built the chatbot to work the way it did. In our case, however, anticipations often faltered, requiring work by actors to adjust their programs of action. Situated in the present, articulations of plausible futures consequentially shape everyday data science work.
Proclaiming the efficacy of actions before performing them is not the mistake of putting the cart before the horse but, in fact, an essential attribute of professional work. Pinch et al. (1996) describe how professionals use forms of anticipation to decide what is and is not doable, imagining and selecting among viable actions when faced with uncertainty. Examples of such anticipatory skills range from the ability of mechanics to provide accurate time estimates for repairs to the ability of aerospace engineers to assess the safety of a rocket before flight. Similarly, data science practitioners must know not only We cannot find out which of the alternatives will lead to the desired end without imagining […an] act as already accomplished. […We] have to place ourselves mentally in a future state of affairs which we consider as already realized, though to realize it would be the end of our contemplated action.
For critical data studies researchers, analyzing and participating in the anticipatory work of practitioners is crucial because it is through such forms of work that practitioners imagine possible futures—worlds in which systems can, do, and must work in some ways and not in others. As researchers, our efforts to assist practitioners in designing better systems thus must include the work of fostering alternative, even somewhat radical, imaginations of futures to help practitioners envision new possibilities. We get the systems we imagine, but not necessarily the ones we need. The work of building better systems begins with working with practitioners to imagine better futures.
We realize that such engagements will sometimes frustrate both critical data studies researchers and data science practitioners; the two may, and often do, have different normative goals. Data science practitioners may cater to a different set of ethics, caring more about the clients they are in relationship with than about the concerns espoused by critical data studies. Sometimes critical data studies researchers may argue that it is better not to build a system. Data science practitioners may still go ahead with it because of the perceived business value or because they believe that if they do not build it, someone else will. In such situations, it may seem better to not engage with practitioners at all.
Our aim is not to simplify the lives of practitioners and researchers and create a binary divide between them.
We strongly believe that we need more research on the capitalistic underpinnings and negative implications of data science—certainly, you
Conclusion
In this paper we have provided a process-centered account of the everyday work of building data science systems, showing how and why the working of a system is neither stable nor given, but a resourceful and improvised artifact that remains
Footnotes
Acknowledgments
We wish to thank Matthew Zook, Age Poom, and three anonymous reviewers for their constructive feedback and help with the review process. We would also like to thank Artificial Intelligence, Policy, and Practice (AIPP) research group members and Technology, Law, and Society (TLS) 2018 Summer Institute participants for their helpful comments on earlier versions of this work, and Ranjit Singh, Priya Gupta, and Utkarsh Srivastava for their help in refining the case study.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: National Science Foundation (NSF) Cyber-Human Systems Grant #1526155.
