Abstract
Elite athletes are constantly tracked, measured, scored, and sorted to improve their performance. Privacy is sacrificed in the name of improvement. Athletes frequently do not know why particular personal data are collected or to what end. Our interview study of 23 elite US college athletes and 26 staff members reveals that their sports play is governed through information asymmetries. These asymmetries look different for different sports with different levels of investment, different racial and gender makeups, and different performance metrics. As large, data-intensive organizations with highly differentiated subgroups, university athletics are an excellent site for theory building in critical data studies, especially given the most consequential data collected from us, with the greatest effect on our lives, is frequently a product of collective engagement with specific organizational contexts like workplaces and schools. Empirical analysis reveals two key tensions in this data regime: Athletes in high-status sports, more likely to be Black men, have relatively less freedom to see or dispute their personal data, while athletes in general are more comfortable sharing personal data with people further away from them. We build from these findings to develop a theory of collective informational harm in bounded institutional settings such as the workplace. The quantified organization, as we term it, is concerned not with monitoring individuals but building data collectives through processes of category creation and managerial data relations of coercion and consent.
Introduction
Young people in elite US university athletics are constantly measured, weighed, pinched, tracked, recorded, monitored, and reviewed. Their bodies, diets, workouts, and sports play are rich with data. But while these data tie teammates together, they do not always mean the same thing to everyone. The organization of campus athletics and the subdivisions within that organization shape what information is available to whom, how they act upon it, and how it acts upon them.
Softball player Jade explained to us that her coach “wants us to be 92% routine on defense.” This goal—the percentage of normal plays completed by the defensive team with ordinary effort—drove their fielding practice at home, their dugout interactions in a game (where an assistant would mark up a chart with X's and O's), and the coach's review of their performance next week. Jade felt both the categorization of routine versus difficult plays and the 92% target were arbitrary, but there was little she could do about it. “She just loves that number for some reason. She's always like, ‘92%!’ I don't really get it, but that's what she likes.” Perhaps the target came from their Sports Information Director, who analyzed everything from the three-dimensional distribution of their hits to their records in different uniforms. Jade gave little thought to the in-game statistics pored over by the public, but these arbitrary data points discussed within the team affected her deeply. Especially when they came from a trusted authority like her coach, or during sensitive periods like when she was recovering her swing after an injury. At moments like that, Jade said, the ball coming towards her could shrink down to the point where “it's like you swing and it goes right through your bat.”
Elite US university athletics then, like many organizations, are built on information asymmetries: Unequal distributions of data not just in terms of who can see what, but who can understand or act on it, and who must face the consequences of data collection and circulation. Such consequences include not just Jade's frustration but high stakes questions about injuries, academics, disordered eating, and more. In this article, we use interviews with athletes and staff across two US universities to answer the following questions:
- What personal data is collected from elite college athletes, and why are those data of interest to the organization? - How do elite college athletes perceive this process of data collection and analysis, and how does that perception differ between athletes and sports? - Finally, what features of university athletics as an institution creates these patterns of data collection and reception?
Answering these questions provides insight into the individual and organizational patterns of quantification in this specific, multibillion dollar industry. But this specific domain is also an ideal site for theory building in critical data studies. We draw on our empirical findings to offer a provisional account of the institutional restructuring of social relations through data relations. We call this dynamic the quantified organization, in conversation with but distinct from scholarship on the quantified self. The quantified organization is an institutional structure for data collection and circulation that produces new collectives within its boundaries to reach its goals. This collective subject is formed through vertical relationships with the data collector—often managerial relations of coercion or consent—and horizontal relationships with other data subjects through processes of category construction.
Ours is an ideal site in which to advance such a provisional theory because university athletics brings multiple data-collection regimes, with similar goals but different methods and populations, under a common institutional rubric. The two universities we call State U and U of State are at the top level of US collegiate athletics, the NCAA Football Bowl Subdivision. Their basketball and American football teams are regularly on national TV. Their total annual athletics budgets are around $100 million, and they each employ around one staff member for every two athletes (about 600 athletes at State U and nearly 1000 at U of State); with higher-revenue sports having a lower staff-to-athlete ratio than lower-revenue sports. From the outside, then, these look like mid-sized firms. But the collection of personal data here is more intensive and extensive than that of other organizations. Athletes consent to it because they are competitive people who see themselves as elite performers. Like any large organization, what data is collected from whom and why differs depending on an individual's role or the level of investment into or output expected from a certain subdivision. Intra- and inter-organizational comparisons of data collection and circulation are easy to make here; not because football and baseball do the same things with the same data, but because they are distinct but connected subdivisions of a larger ‘total institution,’ that assumes oversight of all aspects of a subject's life (Goffmann 1961). We will use these comparisons to advance our account of the collective, if unequal, data relations organizations build within their boundaries to advance their goals.
We argue that the collection and circulation of personal data in college athletics, like most organizations, is characterized by information asymmetries. These asymmetries are institutionalized through a management structure we label the quantified organization. This collective structure of data relations ties data subjects to each other through processes of categorization and ties data subjects to data collectors through processes of coercion and consent. In what follows, we first review the empirical literature on quantified selves and theoretical analyses of informational harms in order to better theorize collective data relations. After reviewing our interview methodology, we describe two primary tensions in college athletics data: The relative unfreedom of athletes in capital-intensive sports and the relative discomfort athletes feel when sharing data with peers, versus coaches or spectators. Finally, we build on these empirical findings to develop the concept of the quantified organization as a framework to describe, explain, and transform collective, institutionalized data relations.
Literature review
Two bodies of literature inform our approach to personal data in college athletics specifically, as well as the more general conceptual issue of how best to conceptualize collective informational harms. First, scholarship on self-tracking of the body (e.g., the “quantified self”) provides a rubric with which to describe data relations specific to athletics. Second, privacy studies provide the dominant language of informational harms, not just in critical data studies but in academic, legal, and political discussions more broadly. We build on the dissident margins of privacy scholarship to explain intra-organizational information asymmetries and the collective informational harms that follow from them.
Technology has always been a part of sports. In turn, sports studies have regularly positioned new developments in this domain as synecdoche for larger shifts in capitalist development. Brohm (1978) argues that modern sport is a “technology of the body” that “treats the human organism as a machine” (50). While sports studies are not our focus here, it should come as no surprise that the field has built up a significant literature on quantification, sometimes in parallel to, sometimes intersecting with critical data studies scholarship on the quantified self. This research stresses, as we do, that personal data is always interpersonal: Measurement proceeds through hierarchy and is interpreted through comparison (Lury and Day 2019). For example, through 113 interviews, Luczak et al. (2020) found that strength and conditioning coaches and athletics trainers only consider athletes to be one of the parties involved in tracking athletes’ data—the organization and the analyst were just as important. Fans too are heavily involved in monitoring, analyzing, and circulating player data (Sanderson 2009). Williamson's (2015) description of the ‘algorithmic skin’ makes clear that while self-tracking creates new forms of self-knowledge, it does not do so under conditions of the self-tracker's choosing. Rather, the self-knowledge that emerges from self-tracking is generally that most conducive to circulation as an individual measure of productivity, and it only emerges through an algorithmic connection between the body and the broader data economy.
Within critical data studies proper, Neff and Nafus (2016) review the many different types of self-tracking and the motivations for it within a general cultural moment of biomedicalization. For our purposes, it is important to note their argument that “Wearables acquired through work change what nonwork time is for” (129). Through self-tracking, organizations can approach ‘total institution’ status, creating a closed world that brings more and more of the self-being tracked into the space of work. As we show below, the more structured institutional settings we study embrace the use of wearable trackers that industry offers as “digital compasses” to guide users in their life choices (Schüll 2016), but under the assumption that users are largely following directions laid out for them, rather than charting their own (Ball 2010). Lupton (2017), the leading sociologist of self-tracking, shows that users are rarely totally enthusiastic or opposed to self-tracking. Ambivalence reigns, particularly as they learn the ins and outs of a device, or as devices cross contexts or break down. It is unclear, however, whether these attitudes would change if workers could see and act on the massive information asymmetries built into modern organizations. This is one of the motivations behind our research. Because, as Ajunwa, Crawford, and Schultz (2017) make clear, employers’ increased technological capacity to monitor workers has coincided with the growing weakness of both labor unions and the law to contest that surveillance. Institutional channels of control have grown as workers’ leverage within those institutions has decreased.
Certain strands of quantified-self scholarship have shifted their focus to the scale of the organization. In an earlier era, Zuboff's (1988) touchstone ethnography of the computerizing workplace described the process as one of ‘informating’—the creation of data traces for worker actions—rather than the labor substitution of automation. These traces make more of the worker and their behavior visible to management and susceptible to rationalization and reorganization. Today, Phoebe Moore is perhaps the most prominent scholar of what she calls the “quantified self at work” (Moore and Robinson 2016). While not in conversation with Zuboff's earlier research, Moore's writing shows that new requirements to self-track at work, combined with greater managerial technical capacity, have both extended surveillance across more types of work and made surveillance more intensive in the types of data gathered from workers. Like us, Moore (2018) focuses on the tracking not just of work tasks (e.g., filing reports, throwing footballs) but worker attitudes and, through physiognomic monitoring, worker affect (e.g., stamina, anxiety, motivation). More and more of the worker is thus brought under management's purview. Drawing on autonomist Marxism and, especially, the new materialisms, Moore (2017), catalogs management's increased insight into this ‘autonomic self.’
The way we define informational harms—threats to individual or collective livelihood through data collection and circulation—dictates how we discover, describe, and act on them. While quantified self scholarship has described how individuals are enmeshed in data flows, another field provides the most popular rubric for informational harm: privacy studies. This is a broader issue than academic theory, extending into law, advocacy, and public debate. Mass surveillance, for example, appears to Anglo-American political culture as a problem of a great many privacy violations, of unlicensed snooping, rather than one wing of a war on racialized communities at home and abroad. Igo (2018) shows how this legalistic approach to informational harm rose to prominence over the course of the twentieth century, as the (gendered) boundaries between private homes and public work and politics grew firmer but the ability of states and corporations to violate those boundaries increased. The call for privacy became a call for the state, often through plaintiff's appeals to the judiciary, to reinforce these boundaries. Nissenbaum's (2009) influential contextual integrity framework recognizes the increasing porosity of these boundaries in the twenty-first century and redefines privacy harms as the violation of contextual informational norms; e.g., a request for private medical data from one's manager is a violation but the same request from one's doctor is not. But contexts are not stable and, especially in an era of bulk data collection online and through embedded sensors offline, individuals may be unaware of privacy violations as they occur (Marwick and boyd 2014). Information asymmetry is a fact of modern life. Most of us are separated from the means of data production. In this environment, the most common Western, legal conceptions of privacy as a negative right—defined by what it prevents—are inadequate to the task of providing ‘breathing room’ not just for the liberal self (Cohen 2013) but for collectives of, e.g., racialized subjects, seeking to claim full citizenship (Bridges 2017).
A thorough accounting of this landscape of social struggle demands a rubric for informational harm distinct from legalistic conceptions of privacy as a negative right, if for no other reason than the fact that surveilling institutions consider their mission to be much broader than taking secrets from their subjects. In order to govern their populations, states have long relied on information asymmetries that create monopolies of knowledge (Innis 2007). The conditions Igo (2018) reviewed accelerated in the late twentieth century into the twenty-first as data-hungry corporations and states increasingly tracked peoples’ movements between work, home, and public life. This is not just the active observation commonly referred to as surveillance. Agre (1994) argued that contemporary informational harms occur largely through a process of ‘capture’ that collects trace data and forces more and more facets of human activity into market competition with other humans. This is a positive project, the creation of new collectives—like Moore's precarious media workers—whose experience of individualized competition is the collective product of institutions measuring them against one another. For Gandy (2000), it is essential to understand that this process of capture grants tracking institutions a fundamentally different view of social life than that held by tracked individuals. Where individuals form an identity in interaction with other individuals, he argues that data-collecting institutions are concerned instead with identification—categorization for institutional utility. This categorization may be wrong or incomplete; but because these flaws occur behind the backs of the categorized, they are unlikely to realize the mistake. You know who you are, but the bank or police may identify you as a different subject, for the purpose of collateralizing your debt or for calculating your neighborhood's risk for crime.
For our purposes, a tracked team—or firm—is not series of discrete, quantified selves in a box, but a collective subject, greater than the sum of its parts. For critical data studies in general, describing and explaining collective informational harms demands theoretical alternatives to ‘privacy’ as the property—financial or psychological—of individuals. This search for an alternative rubric drives Viljoen's (2021) account of data as a democratic medium. Viljoen argues that dominant critiques of data governance boil down to either propertarian critiques of data as the private property of data subjects, whose theft by powerful data-collecting institutions demands recompense, or dignitarian critiques of “data as an expression (or extension) of individual selfhood” (41), whose collection violates individual autonomy and violently abstracts our psychic and social lives. Such violations, dignitarians argue, are best responded to with new rights, such as the right to be forgotten. Both propertarian and dignitarian critiques miss the fundamental fact that data are always already relational. Facebook is only interested in an individual's clicks insofar as they can use those clicks to situate them in specific population categories. Neither line of critique adequately grasps the relational harm of data collection: The construction of group categories reorganizes social relations such that certain groups are targeted by the police, charged higher rates for insurance, or presented with tailored misinformation. Your data is never just about you, but about people like you and people with whom you are connected. This relational critique is essential to understanding how institutions create collectives through data, and why personal data is always interpersonal. Viljoen schematizes these relations along vertical—between data collector and subject—and horizontal lines—between data subjects—but focuses largely on population-level data collected at great scale, rather than the more bounded organizational settings of interest to us.
Viljoen's reframing is thus crucial to our own investigation of organizational data flows in college athletics as a problem beyond privacy. In combination with an empirical account of two quantified organizations—athletics at State U and U of State—this relational perspective helps us describe and explain the informational harms that emerge from organizational tracking of steps, lifts, and heartbeats. We adapt her analysis to advance our own concept of the quantified organization, which describes the institutionalization of informational harms and can thus help clarify collective efforts to change those data and those institutions.
Methodology
This study is based on in-depth, semi-structured interviews with 23 athletes across a range of sports, as well as 26 athletics staff members (e.g., coaches, nutritionists) conducted between May 2019 and July 2020. Interviews were conducted as part of a larger project on student-athlete data literacy in which this paper is situated. Participants are split between two, large Division 1 Football Bowl Subdivision universities on the east coast of the United States: State U and U of State. As with our research sites, students are given pseudonyms. Interviews deliberately sample a range of sports because this allows us to compare student and staff perspectives on data collection and circulation between subdivisions of a larger organization; sampling a range of sports also ensures racial and gender diversity. Interviews sample two universities to increase validity and allow for comparisons between institutional contexts. While staff interviews also inform our analyses as part of the larger data set, this paper centers the voices of student-athletes to better understand their perceptions of collective informational harm and to build theory from the perspective of a particular class of datafield workers.
Initial participants were recruited through interpersonal connections and targeted advertising in spaces and channels (e.g., athletics councils, WhatsApp groups) specific to athletes and staff. Later participants were recruited through snowball sampling. Interviews took about an hour and participants were compensated with a $25 Amazon gift card. This procedure was approved both by our institutions’ Institutional Review Boards and State U and U of State's Athletics Departments. Athlete interviews contained about a dozen questions split into three themes: personal data, relationships with staff, and links between athletics and academics. Staff interviews replaced the latter theme with questions about career arcs. The first ten interviews at State U were grouped together as a pilot study to calibrate the interview protocol and our data analysis (Clegg et al. 2020).
Interview transcripts went through two rounds of coding: thematic and directional. Thematic coding took up the bulk of data analysis. Codes were built through an inductive approach that generated parent themes (Data Analysis Practices, Personal and Teach Technologies, Interactions Through Data, and Feelings About Data) through engagement with the transcript and dialogue with collaborators (Charmaz 2006). We understood our relationship to the transcript text through the lens of Critical Discourse Analysis, which grounds the meaning-making of individual speech acts within larger institutional settings that support and circulate, or deny and block, that meaning (Fairclough 2013). Athletic data, like all data, does not speak for itself. Rather, data must be cleaned, processed, analyzed, and circulated to become meaningful. Our Findings and Discussion thus focus on the meaning given to, or denied, different kinds of data, produced through speech acts and actions that function as speech acts (Thornham and Gómez Cruz 2016).
Directional coding was simpler. Here, we were interested in what data was collected from athletes, through what medium, and with whom it was shared. We reviewed each interview for these questions and then merged the findings into a larger spreadsheet with six broad data categories (Academic, Performance, Strength & Conditioning, Strategy, Nutrition & Health, and Team Operations), the data in each category, the means of collection, and the intended audience.
Findings
Distribution of data collection
“Basically everything we do is recorded and watched,” golfer Brian told us. Stories like his were repeated throughout our interviews. But athletes’ experience with surveillance differed depending on their place in the organization and on the intended audience for their data. The information asymmetry at the core of athletics data presented differently for different sports with different levels of investment and different divisions of race and gender: Capital-intensive sports such as basketball and football, where Black men were overrepresented relative to the campus population, collected more and different data and gave players fewer options to avoid, dispute, or understand their data, relative to Olympic sports with lower levels of investment and majority-white teams. We label this trend a high-capital vs low-capital dynamic. When athletes did have a say in the collection and circulation of their data, they were more comfortable sharing these metrics with the public or with staff than they were with the teammates with whom they lived, dined, and practiced. We label this trend close discomfort.
Both close discomfort and the high-capital vs low-capital dynamic show that the relative visibility of athletes qua data is contingent upon their role and status in the organization. This organizational data architecture—who shares what with whom and why—remains largely out of athletes’ view and reach, even if it is at times keenly felt. It is this meaning-making process—how status is conferred on or through different data, what data are prioritized or occluded—that is the focus of our analysis. It is worth pausing to map the context for these sociotechnical relationships: the landscape of data collection and circulation within college athletics.
Collected data fell into the following categories:
- Academic data (e.g., class attendance, study time) - Competition and performance data (e.g., in-game individual and team metrics) - Strength and conditioning (e.g., body weight, speed, weights lifted) - Strategy (e.g., film of individuals, teams, and opponents in games or practice) - Nutrition and health (e.g., hydration, calories, protein) - Sport medicine (e.g., body-fat percentage, heart rate) - Team operations (e.g., messages between players and staff, facilities usage)
We initially suspected that performance data would dominate, but the most frequent types of data collected by far were in strength and conditioning and sport medicine. This makes sense within the organizational context. Staff frequently say they are dedicated to the “health, protection, and welfare” of athletes. But this in loco parentis role wherein university staff, in general, are charged with students’ care is tempered by the specific charge of athletics staff to help student-athletes, representing the university, to win games. That can mean using data not just to change practice or game routines but as a form of motivation and intra-team competition; both volleyball player Janna and wrestler Seth said the centerpiece of their training grounds was a big board displaying every player's statistics.
The team's health, however, is not reducible to that of individual players. Strength and conditioning and sport medicine data are the most frequent types collected in large because they demonstrate institutional compliance: Regulators within the university, the state, and the NCAA must be able to see that athletes receive the mandated amount of rest, class time, etc. Professional sports, in contrast, have a clearer mission—those athletes are paid to play—and so data collection is more intensive but narrower in scope. For elite college athletics, data collection is as much about signaling to supra-organizational regulators the health of an organization that runs on the labor of a special class of unpaid amateurs, as it is the health of those workers.
For their part, elite players commit to the collection of their data, even when its purpose and destination are opaque, not just because it's required of them to play and keep their scholarships but because they are motivated to maintain peak performance. Soccer player Tanya explained that even during downtime at practice she would juggle the ball and keep track of how many touches she could make in a minute because “I’m always trying to make things a competition by tracking these numbers.” Many players find that that motivation, combined with natural talent, carried them through high school, but more was required now. “Every girl on our team was the number one girl in their state or was the number one girl in their county, or district. And then you come here and all of a sudden you're the 12th runner at best,” middle-distance runner Laura said. Each calorie, mile, or lift recorded appeared to them as a record of their commitment.
But all data, sports, and players, are not treated equally. The capacity of athletes to access, interpret, or resist measurement is determined by their location and status in the organization. This dynamic is most visible in the contrasting experiences of players in high-capital sports with extensive investment, from the university, sponsors, and donors, in technology and personnel, versus those without these resources.
High-capital vs low-capital sports
Football and men's basketball hold more power and prestige on university campuses than other sports, because of their crowds, sponsors, and TV deals. Investment into these teams crystallizes as a rigid set of technologies and staff hierarchies that gives high-status athletes, paradoxically, little freedom to avoid, dispute, or even understand their data. In contrast, athletes in low-status sports with relatively lower levels of investment find more freedom to push back on data collection or offer their own interpretations. In this way, personal data is used to justify and reinforce existing status hierarchies between and within teams.
Viewed as a labor issue, it is unsurprising that as the ratio of fixed to variable capital increases, “the brain moves up the chain” and athletes find that debates about what to measure, how, and why are already settled before they even enter the room, encoded in technical routines (Braverman 1998). It is no coincidence that this relative unfreedom falls more heavily on Black students, who are better represented in capital-intensive, revenue-producing sports than in those lower-status sports that often act as affirmative action programs for wealthier, white students (Hextrum 2019). As activist-scholar-athlete Harry Edwards put it, “Like a piece of equipment, the Black athlete is used” (Edwards 2017, 21).
At the beginning of our research at State U, the football team accounted for one-fifth of the athletics department's overall expenses and one third of its spending on coaches across 19 teams. Compared to other sports, football and basketball players in our sample were provided more resources by their university for them to use as individuals—personal iPads to review film, dedicated tutors—but the real difference between these capital-intensive sports and their peers emerged in the technologies controlled by staff.
“I'm trying to get my body composition to the goal that we set based on my last Dexa scan,” football player Ben said. Nutritionists assigned to the football team—other sports share a pool of staff—set Ben and each of his teammates a goal for an ideal body-fat percentage based on their diet, size, and position. That goal and progress to it was based on body composition and bone density scans undertaken in DEXA low-energy x-ray machines, housed in vans ordered to campus quarterly.
Biometrics were far from the only data collected from football players. Linebacker Ekon said “a good 85% of [football] is just scheming.” He learned to position himself and recognize the opposition's tells through hours of film study on his iPad, assigned to each player through the XOS Thundercloud platform. Annual Thundercloud team subscriptions, without any specialized hardware or customized analytics tools, run to $75,000. Ekon took pride in his film study, but at the end of the day he didn’t have much choice in the assignment or the lessons he was he was supposed to draw from it: “I don’t really scheme, it's the coaches’ scheme.” As a student, Ekon was at university to learn. But as an athlete, what he could learn from film data, how, and when, was constrained by the team's data infrastructure.
In contrast, players in low-capital teams had greater flexibility in how they analyzed film and other data. Softball player Jade said she liked using the team's subscription to Hudl, another video platform, costing around $3000 per year, to improve her batting form. But Jade admitted she often picked and chose what to watch, “If I’m doing well I’m like, ‘I’m doing well. I don’t need to watch that.’ I’ll overthink it. And then if I’m doing bad I’m like, ‘I know I’m doing bad. I don’t need to watch this and overthink it.” This dynamic ran through the whole data pipeline, all the way to the minutiae of collection. During weight-lifting sessions, football players had their routine assigned to them based on fitness goals set by strength and conditioning coaches. Each rep is counted and recorded by those same coaches. The same went for recording their own bodyweight. In contrast, athletes who ran track or wrestled may have had a routine set for them, but they recorded their own weights and their own weightlifting on small cards later returned to staff.
In certain instances, this flexibility created an opportunity for athletes in low-capital sports to resist data collection—or even set new terms for it. All the middle-distance runners on Laura's team wore GPS tracker watches. While she was recovering from injury, her team began experimenting with yet another data-sharing platform, TrainingPeaks, that pooled data from individual watches so that “Basically your coaches can see everything.” Laura refused, as did other teammates in recovery. “I wanted to come back on my own pace because I knew that if my coaches could see everything I was doing, I would probably take my runs way to fast.” There was no resistance. Laura suspected her coaches knew she knew her body best. After all, Laura and her teammates were in the habit of adjusting their coach's prescribed training pace up or down, faster or slower, to better fit their stamina levels. Football players said they were not even permitted to report their own bodyweight.
Shot putter Byron and his senior teammate went even further than Laura. After an impressive showing at nationals, the team brought in a new trainer, but performance slumped, and they missed nationals the next year. Byron said the trainer refused to take their advice; “You’re giving me body workouts for my quads? It was crazy. I disagreed with his program 90% of the year.” So, the two Olympic hopefuls petitioned their trainer to design their own training regimen, whose success would be judged based on their performance in the weight room, and their throws in practice and competition. Their numbers went up, success at nationals followed, and the trainer was fired.
Importantly, Laura and Byron were both highly recruited prior to landing at State U and grew to became leaders on their teams. This status may have enhanced their ability to speak back to their team's data-collection regimes. Other athletes in low-capital teams used a different kind of status—family wealth—to collect their own data outside of team routines. This was most visible in golf, a sport that generally requires expensive equipment and country club membership. Golfers Amanda and David both had personal coaches, often for years, with boutique measurement tools like FlightScope that were more advanced than anything available in their universities. It came up in soccer too, a working-class sport in much of the world, but one which in the US requires playing in expensive travel teams to advance. Oliver had suffered from muscle tears while he played soccer in Spain as a teenager. So, his parents “accessed some pretty high-end sports nutritionists who completely broke down my diet to account for calorific needs and, obviously, the type of food.”
Wealthy golfers or soccer players were the exception that proved the rule. Their family resources allowed them to access high-end tools and intensive personal data, but on their own terms. Capital-intensive sports like football had the best tools and personnel available, but these data-collection regimes calcified team hierarchies; calories, lifts, sprints, bone density, and more were relentlessly collected by staff, with no room for discussion or debate with players. Athletes in low-capital sports, who were more likely to be white, were still objects of constant measurement, but with relatively less investment in technology and staffing compared to football or basketball they had more freedom to disagree with their coaches’ data analyses or conduct their own.
Regardless of the source of the restrictions, with sufficient investment, the information asymmetry between athletes and coaches or staff calcified into a power asymmetry—one built into institutional technologies worth many years of tuition. In Viljoen's (2021) terms, this dynamic demonstrates both horizontal—player to player—and vertical—staff to player—data relations. But the organization bends these axes to fit its goals, distinct from the more dispersed, population-level examples Viljoen draws from in social media or utility regulation, where data subjects may be tied together without ever directly interacting. Within the quantified organization, vertical relations of control are often justified or enacted through horizontal relations of comparison and competition. The greater vertical authority wielded by football and basketball coaches is in part a product of the durable data relations built between players by expensive technology and staff, used to rank, sort, reward, and punish.
Close discomfort
One's relation to the labor process differs considerably based on one's position within it; and this is as much a question of intra-organizational social life as it is of organizational hierarchy. Athletes were uncomfortable sharing their data with those closest to them: The teammates with whom they not only trained and competed but lived, dined, studied, and partied. As their relationship with the potential audience grew thinner, limited increasingly to sports, their comfort level rose. Coaches and athletics staff were also often mentors and educators, if informally, but were primarily managers, and so athletes felt comfortable sharing performance data as part of their work. Athletes paid little mind to the broader public who pored over their statistics, sometimes regretting their politicization in inter-collegiate competition, sometimes relishing the research it allowed them to conduct on competitors, mostly accepting it as a fact of life for people at the top of the game. For these young people, discussions about data served as boundary-drawing exercises: Different data justified different boundaries, but the same data could also mean different things depending on which set of boundaries it was shared within.
To be sure, different data felt different; women runners were especially sensitive to sharing their weights, concerned that doing so would either encourage disordered eating or lead others to suspect them of it. As runner Laura put it, with some resignation, “Body weight is a performance indicator.” But organizational infrastructure for sharing this data only gained meaning through the personal relationships they had with data collectors and data audiences. Even equal exchanges of personal data, where both parties saw everything, could threaten thick bonds of solidarity.
While rowers often come from wealthier families—few US high schools have rowing teams—the rowing teams themselves weren’t well-funded, so Stacy had to record her own workout data in a notebook. That notebook would be reviewed by team captains—leaders, but peers—alongside coaches. “Sometimes I wish that it was only the coaches. I know the team captains, they only see them to make sure that we did the actual workouts, so I don't really think they pay attention. But still— privacy!” This discomfort was heightened when personal data was shared not just with coaches and captains but the entire team. For a group of competitive women, this naturally led to comparisons with teammates, who were also roommates and friends. This was especially true for “erg tests”: an indoor test of race speed. Then, staff “will make an entire spreadsheet and send it out to the entire team, so everybody else can see your score for that one workout.” Asked how she felt about this, Stacy replied, “If I did well, it's great because then I can see where I stand, but then if I don't, it's like, ‘Oh, that sucks.’”
Athletes were more comfortable sharing personal data with coaches and staff. This was in part because relationships with peers crossed multiple contexts, while coaches and staff usually saw distinct data streams, rarely aggregated together. In Nissenbaum's (2009) terms, managerial data relations rarely threatened contextual integrity. Or, at least, players did not usually feel they could dispute the boundaries coaches and staff placed around a particular context. Like many of her peers, pitcher Nicole accepted that, for example, nutritionists were tracking her food purchases through a campus-issued charge card; “They see what you buy, so if you go and say ‘Hi’ she'll be like ‘Lay off the french fries’ or something.” It was routine, just like the markers placed on their water bottles that showed how much to drink each day.
When major decisions were being made, or in moments of crisis, powerful actors—especially head coaches—could force these different streams of data together to create more holistic accounts of their athletes. But this did not mean that each type of data was given equal weight. Indeed, intra-organizational power asymmetries meant that management often decided which type of data to prioritize—generally in the name of higher performance. This was most stark in track, where the performance metric was simple—your time—and health data—bone density, body fat percentage, weight—was complicated. As runner Laura said, “For our coaches, if you're performing well but you clearly have issues with eating, or you’re severely underweight, it's fine. You're performing well. I guess in that sense, there's a positive reinforcement for being unhealthy—as long as you're performing well.”
Athletes felt that this prioritization of performance data over health data was due to the fact that coaches are ranked based on someone else's metrics: their athletes’. Whether a coach is promoted or given a raise depends on their athletes’ numbers. Shot-putter Byron was told by his surgeon to take six months off after tearing a pectoral tendon, but his coach had him back throwing in just three. “It's just a business, they want their return on investment,” Byron said. “Because if that athlete does well at nationals, then the coach gets a raise of pay, and the school looks good.” Coercion, then, does not just impress management's will on discrete individual athletes; it ties them together through specific data points. Vertical data relations flow up and down.
Finally, athletes were most comfortable sharing their performance data with the broader public because they never met. That TV audiences were poring over their stats was taken as a fact of life. These public data could also grant athletes useful perspective, unmoored as they were from social ties, with a far more open, bilateral view of personal data than anything contained within the university itself. Hammer thrower Raquel regularly complained that her head coach did not understand the progress she was making, because he only looked at distance thrown in practice, where he expected continuous improvement. The more meaningful metric for Raquel was always relative: The distance needed to beat a particular woman at a particular meet. She found this data through the Track & Field Results Reporting System (TFRRS) that scored every single college track & field event. The experience was empowering, helping her make manageable goals instead of reaching for an abstract ‘best.’
Similarly, golfer Amanda pored over the GolfStat website that scored every college team live, researching the competition. She gave little thought to her own public data unless it was wielded by a confidant to pass judgement, chiefly her father, a passionate hobbyist whose daughter was now breaking into the pros. In her hands, GolfStat explained, “what other teams do better, where they’re better, why they’re better.” But when her dad harangued her over the phone, Amanda's perception of those public data shifted: “It's like, you're not there, and numbers don't… it doesn't explain everything that happened today.”
For college athletes, personal data felt personal only when it was embedded in existing social relationships. That meaning changed with the relationship. It was uncomfortable to compare yourself to your teammates—who were also your friends, roommates, and classmates—on the big board in the training room, but easy to do so with the competition on TFRRS or GolfStat. Management was another kind of social relationship, and athletes accepted that they would have different managers of their diet, weight, speed, etc. as a condition of elite competition. Discomfort arose when management collapsed those contexts and forced athletes to prioritize one metric, and thus one part of themselves, over others.
Close discomfort within the quantified organization reveals an intimacy to data relations often absent from critical data studies, including Viljoen's population-level, supra-institutional account. As Cohen (2013), notes, surveillance “facilitates modulation: a set of processes in which the quality and content of surveillant attention are continually modified according to the subject's own behavior; sometimes in response to inputs from the subject but according to logics that ultimately are outside the subject's control” (1915). Such a schema treats personal data impersonally. Within the same organization, both watcher and watched will modulate their behavior based on the data collected from the latter. But this rarely occurs automatically, like a thermostat. Rather, exactly what behavior is modulated and how depends on the relationship between watcher and watched. Any hope for a more democratic data governance, in Viljoen's terms, within quantified organizations like sports teams, or hospitals, or factories, may depend on marshalling these intimate connections with interpersonal data to encourage data subjects to action and redraw the boundaries of data collection and circulation. Athletes, like other workers, do this in small ways all the time. Recall that Byron's discomfort with his trainer's regimen prompted him and his teammate to assert control and reorganize their entire routine.
Discussion
Data collection abounds in university athletics because those institutions are charged with students’ care and tasked with transforming them into elite competitors. The type of data collected from athletes, their control over it, and the consequences of that collection for them varies across the different sections of the organization. Data relations in high-capital sports are concretized into expensive equipment and staff. Athletes have little flexibility to adjust these systems, as compared to colleagues in low-capital sports. But these organizational dynamics are always also social dynamics. Similar kinds of data and analyses of it are experienced differently depending on whether it is being shared among teammates who are also friends and roommates, or managers, or the broader public.
In describing these dynamics as data relations, we highlight the fact that these data, and, we argue, all organizational data, are never only about the individual from whom they are collected. This is true across two dimensions. First, individual records of weightlifting or jump height are only valuable by way of their comparison with teammates, competitors, professionals, a larger population of ‘typical’ athletes at a given stage of development, or past or future selves. No datum is an island; personal data are always interpersonal. Second, the collection, storage, analysis, and dissemination of, say, a record of meals eaten, traces the relationship of the athlete to their nutritionist—or coach, or trainer. Viljoen (2021) labels the first kind of data relation, between similar data subjects, horizontal data relations, and the second data relation, between data collector and data subject, vertical data relations. Here, we adapt Viljoen's framework of population-level vertical and horizontal data relations for the more tightly bounded collectives of quantified organizations such as university athletics.
Accounts such as Moore's (2017) mark the datafication of the workplace as a longstanding capitalist tendency that has nonetheless accelerated following the defeat of the Fordist workers’ movement. This should be understood less as a technical achievement, and more as a moment of strategic retaliation in the longue durée of class struggle. In his 1964 classic Struggle at Fiat, workerist theoretician Romano Alquati (1964) labels the ‘invisible organization’ those informal social ties between workers that undergird seemingly spontaneous wildcat strikes or acts of sabotage: daily gossip, shared eye-rolls, commiseration at day's end. In theorizing capital's increasing capacity to absorb workers’ affects into production, the subsequent generation of Italian Marxists known as autonomists, and especially thinkers like Negri who bridge the two cohorts, were not describing an abstract capacity to dissolve boundaries between commodities and people but a concrete response to unruly workers who only seem disorganized from above (Wright 2017). Those inchoate social ties that threatened production were increasingly captured within the labor process.
The quantified organization, then, is less a technical innovation than a capitalist response to the power workers have historically wielded through inchoate social ties. In our context, it is no coincidence that the weight of data-driven surveillance falls hardest on Black men in revenue-producing sports. A lot of status and money depends on making their bodies and intuitions maximally visible to management. As these men are sorted and ranked, the quantified organization absorbs their invisible collectives. This abstraction of invisible, embodied practices and their transformation into visible, organizational knowledge through the datafication of the labor process is what Zuboff (1988) labeled ‘informating.’ In this broader, historical narrative, we might consider the informating processes reviewed above in comparison with the labor processes of other collectives who experience their work as ‘free agents.’ For example, Levy's (2022) long-haul truckers had their unions broken by the Carter administration. The state re-intervened 40 years later to require onboard electronic monitoring devices, usually bundled within fleet management systems. Now truckers no longer record their own hours, and even lose control of over route planning—like players in capital-intensive sports.
Given this historic advance by capital, it is important that Viljoen's relational theory of data governance facilitates not only a more accurate empirical account of collective data relations but a normative account of legitimate collective interests in data collection and circulation. Such interests may not only seek to block data collection but redirect it and reverse information asymmetries, and thus power asymmetries. Violating the property rights or autonomy of banks, armies, or polluters may be a perfectly democratic outcome. For Black male athletes to control their bodies and their work, they will need not just to see the other side of the clipboard but hold it, use it, and change it. An organizational perspective on workplace data is essential here. As we saw, the quantified organization warps the flat plane of Viljoen's population-scale data relations. Rather than orthogonal axes of collectivity (horizontal) or authority (vertical), these relations become, as managers compare and sort subordinates, the dimensions of actually existing organizations. The football coach's vertical authority is enacted through lateral comparisons of players’ body fat and bone density. But workers can reshape their personal data to construct different interpersonal relations, and thus different organizations with different kinds of authority. In this way, researchers and athletes may find inspiration in the various initiatives Gregory (2021) describes as ‘worker data science.’ Such platforms help reduce the information asymmetry between employers and workers to democratize the quantified organization; whether that means helping microworkers to rate their assigners, or delivery riders to track their hours.
While elite players grant consent to data collection, the institutional form of university athletics is unequal, coercive, and far from democratic (Hatton 2020). Significant attempts at redress have recently been made through Name, Image, and Likeness (NIL) policy in the NCAA and complementary state-level legislation. NIL allows players to receive payment for use of their face or jersey in advertisements, videogames, and so on; whether this applies to personal data is a matter of debate. Regardless, there is not equal demand for each players’ likeness and so NIL will likely increase the bargaining power of those superstars headed to professional basketball or football but leave out most of their teammates, much less players in low-capital sports. It is a classic propertarian solution, one that does not fit the relational harms of college athletics data. Ben and Ekon will continue to take orders they cannot refuse or debate because fixed capital investments in football solidify its vertical data relations—even if they receive a windfall from NIL. Runners like Laura are unlikely to receive any financial returns, but the horizontal data relations between them means they will continue to stress over what their weight says about their speed and how both compare to the girl next to them. What would real individual autonomy and collective deliberation look like here? A hint may be found in the NBA and NFL. Collective bargaining there resulted in some of the strongest worker protections against data-based discrimination in any sector (Kresge 2020).
College athletics may seem an idiosyncratic context through which to raise these questions about data and democratic governance. But State U and U of State are large, well-financed organizations with a wide range of waged and unwaged work pursued by specialists with the latest technology. Debates over personal data in this context remain useful as an edge-case for every workplace which measures, monitors, tracks, and counts the people within it—which is increasingly every workplace. In this way, we see Viljoen's relational theory of data governance as the first step in building a theory of the quantified organization.
The quantified organization is a more tightly bounded phenomenon than the population-scale data that is Viljoen's focus. Compared to, say, Facebook, the institutional grouping of university athletics allows for greater application of force in pursuit of organizational goals. Here, vertical data relations overlay managerial relations. A coach can make a slow player run; a nutritionist can make a player change what's on their plate. Similarly, category construction through horizontal data relations directly links data subjects’ livelihoods. Wrestlers and volleyball players are reminded of this every day as they enter the training room and look up at the big board comparing their statistics and bodies. Facebook is rewriting our social contract, but it does not have this coercive power over users. The quantified organization is much more than a set of quantified selves in box: It arranges data subjects into categories and hierarchies that drive them towards the organization's goals. Although it should also be noted that the data structuring these intra-organizational relations may themselves exceed the bounds of the organization. Student-athletes often set fitness goals through reference to professionals’ benchmarks, or they may have health and safety data collected by external regulators. In the corporate world, management can use the external customer data collected on software-as-a-service platforms Salesforce to onboard, train, and discipline internal sales staff.
The quantified organization, then, is a management structure built through information asymmetries. Its objects are collectives, not individuals. It ties data subjects to each other through processes of categorization and ties data subjects to data collectors through processes of coercion and consent. In practice, these vertical and horizontal data relations overlap as organizational authority creates new social groupings in pursuit of collective goals. Their shape and strength will vary across different quantified organizations. This is one reason why elite college athletics makes an excellent case study for theory-building: While grouped under a common governance structure, each team operates in distinct sociotechnical conditions. The low-freedom, high-investment conditions under which football and basketball players labor appear more similar to a highly mechanized warehouse than a low-tech but labor-intensive chain restaurant (Delfanti 2021). Because data are relational, turnover within the organization and members’ social ties outside work may dictate their comfort with and consent to data collection, analysis, and dissemination—as it did for athletes. This need not be limited to coworkers. Consumers, socially and morally elevated above workers, are increasingly enlisted as managers through surveillance systems built into their shopping or ride-hailing experiences (Stark and Levy 2018).
The quantified organization provides a framework to empirically describe, theoretically explain, and normatively critique the role of data in the workplace and in similar coercive institutions. Shifting the focus from the abstraction or theft of personal data to how those data tie workers together shifts our research focus to the collective-making powers of organizations. Here, that collective is a team of unpaid workers. Future work should pursue other collectives in other data-rich domains, from waged workers to policed citizens. Future theory-building should also help clarify why particular data relations appear in particular conditions, why, for example, the platform form has exploded across the world in an era of economic, political, and technological stagnation (Benanav 2020). We hope to clear ground for a positive political project that does not simply decry the impersonal domination of capitalist data regimes but that builds institutions and movements to subordinate these unequal data relations to democratic will. Such subordination could take many forms—abolition, nationalization, counter-surveillance—but any project to attenuate power's effects, much less take power, must begin with an accurate map of power's functions and a plan to redraw that map. Equality must always be organized, if for no other reason than existing inequalities are thoroughly organized. They must be taken apart before something else can take their place.
Conclusion
We have focused on the process whereby elite university athletics makes student bodies visible—to various devices, managers, and institutions, if not the student themselves—so that we in turn can make visible the organizational data relations that tie student-athletes to staff and one another. There are several limitations to this work. As a case study of athletics data, our findings are limited by our domain: Elite college athletes are unlike professional athletes with more resources, or athletes in lower-status universities with fewer resources. As case study of organizational data, our findings are limited by our participants: Consent to data collection is important in every workplace, but the intrinsic motivation we heard from our interviewees may distinguish them from data subjects just seeking a paycheck.
As a study of data relations, our chief limitation is methodological. Data are not only discourse, but the material infrastructure of institutions like workplaces and schools (Thornham and Gómez Cruz 2016). One-time interviews cannot observe change over time, or those data relations that involve routine interaction with this infrastructure but are too mundane to mention to an outsider. We plan to resolve these limitations in future work, through sustained ethnographic fieldwork with two capital-intensive teams. In part, this is motivated by the positive project of Viljoen's (2021) critique. Athletes are too frequently mute instruments of organizational data. But they have a tremendous amount of folk literacy that would in other domains be recognized as data science expertise. A better understanding of the conditions in which they learn (or not) about their personal data will inform the sort of institutional and process reforms that can better engage athletes in data analysis and the organization of their data relations.
In this way, we seek to model a collective account of data relations. We hope this study and the theory built from it encourages our peers to ask after the ties through which institutional data binds us together as workers, migrants, neighbors, students, patients, and more.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article
