Abstract
There has been a surge of recent interest in sociocultural diversity in machine learning research. Currently, however, there is a gap between discussions of measures and benefits of diversity in machine learning, on the one hand, and the broader research on the underlying concepts of diversity and the precise mechanisms of its functional benefits, on the other. This gap is problematic because diversity is not a monolithic concept. Rather, different concepts of diversity are based on distinct rationales that should inform how we measure diversity in a given context. Similarly, the lack of specificity about the precise mechanisms underpinning diversity’s potential benefits can result in uninformative generalities, invalid experimental designs, and illicit interpretations of findings. In this work, we draw on research in philosophy, psychology, and social and organizational sciences to make three contributions: First, we introduce a taxonomy of different diversity concepts from philosophy of science, and explicate the distinct epistemic and political rationales underlying these concepts. Second, we provide an overview of mechanisms by which diversity can benefit group performance. Third, we situate these taxonomies of concepts and mechanisms in the lifecycle of sociotechnical machine learning systems and make a case for their usefulness in fair and accountable machine learning. We do so by illustrating how they clarify the discourse around diversity in the context of machine learning systems, promote the formulation of more precise research questions about diversity’s impact, and provide conceptual tools to further advance research and practice.
Keywords
Introduction
Sociocultural diversity is a key value in democratic societies both for reasons of justice, fairness, and legitimacy and because of its ramifications for group performance. As a result, researchers in humanities and social sciences have worked to understand diversity’s varied meanings, develop appropriate measures for quantifying diversity (in some sense), and specify pathways by which diversity can be functionally beneficial to groups such as juries, design teams, and scientific communities. Recently, there has also been a surge of interest in sociocultural diversity in machine learning (ML) research. This burgeoning literature on diversity 1 in ML systems can be roughly divided into two general lines. One set of diversity-related considerations pertains to the composition and dynamics of teams and groups whose judgments shape the construction of ML systems—for example, those engaged in problem formulation, data generation, and development. From this perspective, there have been claims about the potential benefits of diversifying these teams as an organizational solution to alleviate biases in ML systems (Jobin et al., 2019; West et al., 2019), with more recent efforts aiming to empirically test these purported benefits (Duan et al., 2020; Cowgill et al., 2020). A second set of issues concerns the composition of items at different stages of the data-processing pipeline, especially in relation to who or what gets represented therein and how it affects individuals and communities that are impacted by the deployment of ML systems. In the context of curating input data, for instance, the lack of geographic diversity in benchmark image datasets has been linked to amerocentric and eurocentric algorithmic biases, such as higher misclassification rates for bridegroom images from Pakistan than from the US (Shankar et al., 2017). Similarly, recent works have highlighted the importance of diversity in relation to algorithmic output across tasks ranging from image search and content recommendation to matchmaking and automated recruiting (Drosou et al., 2017). To incorporate these considerations into the design of ML products, researchers have proposed various measures for quantifying diversity, developed methods for satisfying these measures, and examined the interaction between diversity and other design desiderata such as predictive performance and fairness (Drosou et al., 2017; Celis et al., 2016; Mitchell et al., 2020).
Currently, however, the discourse around diversity in ML systems is hindered by a lack of clarity regarding the underlying concepts of diversity and the precise pathways by which diversity can benefit group performance. This is problematic because diversity is an ambiguous term that admits various, potentially conflicting, conceptions. These diversity concepts differ substantially in their motivations, meanings, and appropriate operationalizations. Without explicitly grounding proposed diversity measures in appropriate underlying concepts, we thus risk a mismatch between professed diversity-related aims and the operationalization of those aims. What is more, whether and when diversity (in some sense) can improve the performance of a collective requires attending to the precise pathways of diversity’s influence, the specific factors that moderate this impact, and the broader enabling conditions that support and sustain these potential positive consequences. The lack of specificity about the precise mechanisms underpinning diversity’s potential benefits can thus result in espousing uninformative or easily falsifiable generalities, invalid experimental designs, and illicit inferences from findings.
In this paper, we bridge the gap that currently exists between discussions of measures and beneficial consequences of diversity in ML, on the one hand, and the humanities and social sciences research on concepts and consequences of diversity, on the other. We begin by articulating current thinking about the concepts and consequences of sociocultural diversity in feminist philosophy, social psychology, organizational and network sciences. Drawing on this literature, we distinguish between different diversity concepts, and highlight their distinct and potentially conflicting epistemic, ethical, and political rationales. We then discuss mechanisms through which diversity can potentially improve the performance of teams and collectives, with a focus on cognitive and information elaboration pathways. Finally, we situate this understanding in and draw its implications for the discussions of diversity in ML. In mapping the various types of diversity-related considerations that arise throughout the ML lifecycle, we draw attention to the significance of achieving clarity about the conceptual underpinnings of diversity measures and the precise mechanisms mediating diversity’s potential benefits. We show how this can enrich the design and evaluation of ML systems by improving hypotheses formulation, experimental design, and interpretations of findings.
Before we proceed a qualification is in order. Diversity is valued on many different moral, political, and epistemic grounds. Discussions of the potential epistemic benefits of team diversity should
The varied concepts of diversity
Diversity implies some sort of difference. Divergent understandings of what this difference consists in lead to varied meanings and perceptions of diversity. Perhaps the most obvious way our assessments of a collective’s diversity can vary depends on what we take to be the set of attributes along which its members exhibit relevant differences. So, the same group can be seen as more or less diverse depending on the attribute (e.g. gender, ethnicity, socioeconomic status) used for characterizing its members. As noted by a number of researchers, however, even abstracting from questions of relevant attributes, the notion of diversity admits multiple understandings that can differ significantly in their underlying rationales and appropriate operationalizations (Harrison and Klein, 2007; Page, 2010; Steel et al., 2018). In particular, Steel et al. (2018) distinguish between
While there are different ways of drawing distinctions between diversity concepts, in what follows we primarily build on the classification developed by Steel et al. (2018), which is purpose-built for understanding
Within-group family of concepts
Suppose there is a focal group of individuals,
As their name suggests, a typical political and ethical rationale for egalitarian conceptions is the ideal of equal participation. In feminist philosophy of science, moreover, this ethical rationale is often tied to proposed epistemic benefits (Longino, 1990; Solomon, 2007; Grasswick, 2018). A core insight of this literature is the recognition of the
Comparative family of concepts
The assessments of diversity via within-group concepts almost solely focus on the properties of a focal group. In many circumstances, however, the narrow focus of egalitarian conceptions fail to capture broader social features that motivate our analyses of sociocultural diversity in the first place (Rushton, 2008). As a simple example, suppose
The relevance of the representative conception can be motivated on different grounds. The obvious ethical and political rationale emerges from the association of this conception with political ideals of
Capturing these sociopolitical and historical dimensions of diversity is at the heart of
From ethical and political perspectives, the motivation behind normic concepts is grounded in ideals of social justice, inclusion, and anti-domination. In addition, a long line of research in standpoint theory has highlighted an epistemic rationale for these conceptions (Harding, 2004; Collins, 2002). Besides recognizing the socially situated nature of knowledge discussed above, standpoint theorists often endorse an
Combine, but don’t conflate
Depending on context, it might be perfectly sensible to create hybrid conceptions. For instance, an egalitarian and normic hybrid concept might incorporate weights, so as to reflect contextually-informed (dis)similarity between various attribute categories. While such combinations can be permissible, it is critical not to
Potential epistemic benefits of diversity
As discussed above, in addition to referring to distinct ethical and political ideals, researchers often motivate different concepts of diversity by highlighting how increasing diversity
We start this section by distinguishing between two types of pathways—

Two types of pathways by which sociocultural diversity
Cognitive pathways
The core idea behind cognitive pathways is that diverse groups can leverage the variety of the cognitive repertoires of their members—their varied knowledge, perspectives, skills, etc.—to more effectively deal with demanding tasks. This idea underpins the epistemic rationale for many of the diversity concepts discussed above.
This general idea can be analyzed in terms of a series of interrelated steps. First, individuals in cognitively homogeneous groups tend to exhibit significant overlap in their relevant strengths and limitations (Bang and Frith, 2017).
Empirical support for this pathway has been found in relation to different tasks and groups (Roberson, 2019). In organizational contexts, for example, Milliken and Martins review studies demonstrating the beneficial effects of sociocultural diversity in relation to cognitively relevant outcomes such as the number and quality of alternative ideas considered by groups (Milliken and Martins, 1996). Using citation counts as a proxy for research quality, Campbell et al. (2013) found evidence for the positive impact of gender diversity on the quality of work produced by scientific teams. 2 In addition to these empirical studies, researchers in cognitive science (Bang and Frith, 2017), sociology (Page, 2017), and philosophy (Muldoon, 2013) have turned to formal and simulation tools to investigate the impacts of diversity along cognitive pathways. These studies typically center around frameworks (e.g. NK landscapes, multi-armed bandits) that represent tasks that demand exploitation-exploration trade-offs. While these studies often abstract away from the relation between sociocultural and cognitive diversity, they nonetheless offer valuable insights about the conditions—for example, task difficulty, team composition, network structure—under which cognitive diversity can benefit group performance in tasks such as learning and problem-solving.
These lines of work have identified key contextual factors that moderate diversity’s benefits along cognitive pathways, as well as the potential trade-offs involved. As emphasized by both empirical (Van Dijk et al., 2012) and simulation (Hong and Page, 2004) studies, for example, chief among these effect modifiers are
Information elaboration pathways
The collective performance of any task requires the elicitation, examination, and ultimately integration of information that is distributed among individual members. In diversity research, these processes are collectively referred to as
Consider how sociocultural homogeneity can negatively impact information elicitation and sharing. Individuals tend to take perceived similarity in social markers of identity as an indicator of similarity in underlying cognitive repertoire (e.g. assumptions and information) (Phillips, 2017). Those in homogeneous groups may thus
Sociocultural homogeneity can also adversely impact information elaboration because of group-based trust and conformity (Fazelpour and Steel, forthcoming). Individuals tend to perceive in-groups as more reliable sources of information (Turner et al., 1989). Such group-based trust can reduce vigilance, and increase uncritical acceptance of (potentially misleading) information from in-groups, resulting in fast but mistaken consensus in homogeneous groups. Empirical studies have shown that sociocultural diversity can counteract these detrimental influences by enhancing the critical assessment of incoming information (Levine et al., 2014). False consensus can also arise because of group-based conformity pressures that make individuals in homogeneous groups less likely to voice dissenting opinions. By reducing the impact of group-based conformity on the sharing of dissenting views, sociocultural diversity has been shown to improve group performance (Phillips et al., 2009).
In addition to demonstrating the potential benefits of diversity via information elaboration pathways, these studies also highlight potential trade-offs—for example, speed versus reliability of convergence (Fazelpour and Steel, forthcoming)—that affect diversity’s benefits via these pathways as well as unique communicative challenges faced by diverse groups (Phillips, 2017). Incorporating the insights from these works can be fruitful when thinking about the influence of diversity in the context of designing ML systems. This is particularly useful when there is interaction between different individuals (e.g. stakeholders, experts, crowdsource workers, or developers), and where individuals are aware of (or have expectations about) others’ identities.
Diversity’s overall impact and the significance of context
We have highlighted some mechanisms by which diversity
More generally, these mixed findings from meta-analytical studies provide an important occasion for examining enabling institutional and societal conditions that support the functioning of diverse groups. For example, the positive impact of gender diversity is even more pronounced in inclusive settings with egalitarian power distributions (Post and Byron, 2015). Moreover, increased sociocultural diversity is more likely to improve group performance when group members are open to social differences; when groups are less homogeneous to begin with; and when individuals from marginalized demographics find support in an organizational leadership that values and promotes a culture of inclusion (Phillips, 2017; Bear and Woolley, 2011). Absent such enabling conditions, “diversifying” teams by simply introducing individuals from underrepresented backgrounds, and then expecting improvements to team performance, may amount to little more than setting up these individuals for failure (Ray, 2019; Dobbin and Kalev, 2016).
Situating diversity in sociotechnical ML systems
In this section, we situate the discussion of concepts of diversity and mechanisms of its benefits in ML by mapping the various contexts in which sociocultural diversity can be relevant throughout the ML lifecycle. We emphasize that diversity-related considerations are ubiquitous in ML systems. While often neglected, this fact should not come as a surprise; these systems are sociotechnical systems, embedded in decision pipelines that are shaped by, or else implicate, groups and communities. Nonetheless, identifying, understanding, and implementing these diversity-related considerations hinges on being specific about concepts, rationales, and pathways.
The structure of this section is as follows: in each subsection, we focus on one stage of the ML lifecycle. For each stage, we outline the nature and ramifications of the task, and delineate the set of diversity-related considerations that are pertinent to decisions and value judgments therein. Figure 2 provides an overview of these stages, and identifies diversity-related questions relevant to them. 3 We explore potential ethical and political rationales that can support the use of different diversity concepts, and attendant measures, in particular settings. Moreover, when these diversity considerations pertain to (teams of) decision-making agents, we examine relevant cognitive and information elaboration-related factors that can influence their performance as socially situated agents, and identify potential ways in which increasing team diversity can benefit epistemic and ethical performance. Throughout we also highlight the strengths and limitations of existing work as well as promising avenues for future research.

Stages of the lifecycle of a machine learning system at which different varieties of sociocultural diversity may be relevant.
Problem formulation
Prior to the design and development of an ML system, its overarching goal and main function must be defined. This is frequently done by a team that is distinct from the design and development team. For example, state governments tend to outsource the development of risk assessment models for recidivism prediction to external firms (Chouldechova, 2017), while making in-house choices concerning the goal of the predictive model and the nature of its use. Problem formulation includes defining a system’s overarching goal, translating this goal into a prediction problem, defining the space of available prediction-informed decisions, and anticipating the potential impact of these decisions on populations that will be affected by them (Mitchell et al., 2021). Each step in the process may demand distinct types of domain knowledge as well as substantive value judgments. Neglecting or underappreciating considerations of diversity can result in problem formulation groups that advance the interests of only a small subset of stakeholders, exhibit significant overlap in their blind spots—ethical or otherwise—or preclude possibilities for effective participation among team members.
Consider, for example, that the particular formulation of the prediction problem requires careful anticipation and evaluation of its likely impacts on different sections of society (Fazelpour and Danks, 2021). This issue has become painfully salient in a recent example of bias in healthcare algorithms, where using health costs as a proxy for health needs resulted in systematic underestimation of the needs of Black patients, due to racial disparities in access to and spending on care (Obermeyer et al., 2019). As previous research has highlighted, individuals’ ability to anticipate and evaluate relevant hypothetical scenarios is likely to be correlated with their sociocultural identities and assumed social roles (Catellani et al., 2021). In participatory and value-sensitive design approaches, awareness of this type of correlations have traditionally led to an emphasis on diversity in prototype and foresight studies (Friedman and Hendry, 2019). More recently, increased recognition of diversity’s importance in these contexts has also led to proposals for the adoption of participatory and collaborative techniques in ML problem formulation (Martin Jr et al., 2020). Ultimately, however, the appropriate use of these techniques will depend upon attention to the relevant concepts, measures, and pathways.
Details of the task and the social context of operation can inform the choice of relevant diversity concepts. For instance, in many high-stake domains that concern the allocation of public resources, problem formulation often involves deliberative mini-publics and town halls (Pennsylvania Commission on Sentencing, 2020). In such cases, the ideal of democratic participation might support a representative notion of diversity. On the other hand, normic notions of diversity can be relevant if the proposed technology imposes a disproportionate risk on a community that has been historically disadvantaged by, or excluded from, key societal decisions. For example, in deliberations about the potential deployment of recidivism prediction algorithms as part of the US criminal justice system, considerations of social justice and anti-oppression can require that Black, Latinx, and Indigenous communities are represented in a greater proportion than what corresponds to their presence in the country’s population, precisely because these communities have been disproportionately harmed by the criminal justice system.
Insofar as the deliberative interaction among problem formulation teams takes place in institutional and societal settings subject to various group dynamics, information elaboration considerations also matter. A particular challenge here is ensuring the
Design and development
Once a problem has been formulated, a series of design choices ought to be made. These choices concern the data used for training the model, the performance metrics optimized, and the models considered for prediction. Similar to the problem formulation phase, these decisions may be distributed across multiple teams or may all be the responsibility of one team. Unlike many of the decisions involved in problem formulation, however, this stage requires significant machine learning expertise. It is thus useful to be specific about the potential pathways and relevant effect modifiers that can govern diversity’s beneficial impact on performance (broadly understood) in the design and development stage.
Both cognitive and information elaboration pathways may impact algorithmic design. For instance, designers’ identity and lived experiences may inform their ability to anticipate modes of failure. If this is so, then we may expect marginalized standpoints to better identify the inimical impacts of certain choices, such as the training data and the metric used to assess algorithms’ performance. But we should not expect diversity to be a panacea, and the pathways and effect modifiers that mediate diversity’s impact should be attended to.
Consider, for instance, a recent study of whether fairness properties of algorithms are impacted by the demographic attributes of programmers who train them (Cowgill et al., 2020). The authors “found no evidence that female, minority and low-IAT [Implicit Association Test] engineers exhibit lower bias or discrimination in their code” (Cowgill et al., 2020). Instead, biased predictions were found to be “mostly caused by biased training data” (Cowgill et al., 2020). Taken at face value, the findings seem to suggest that we should not expect sociocultural diversity of design teams to result in any epistemic benefits. It is crucial, however, to interpret the findings in relation to the setup of the experiments. Importantly, engineers worked alone within an experimental setting that pre-defined crucial problem formulation and design choices pertaining to the predictive task, the evaluation metric, and the available input. While representative of many real world settings, this setup significantly restricts the diversity-related inferences that can be drawn from the findings. Without any group interaction, for example, information elaboration pathways are absent. With respect to cognitive pathways, moreover, the highly regimented nature of the task is likely to attenuate any potential benefits of diversity. Given this setup, the findings are thus fully consistent with literature showing that the effects of diversity are dependent on the nature and complexity of the task (Van Dijk et al., 2012). Appropriately interpreted, then, the findings highlight the importance of being specific about the relevant pathways when designing diversity initiatives for improving ML products, but should not be misinterpreted to mean that such potentially beneficial impacts do not exist.
Currently, there is a growing number of voices and initiatives calling for a more diverse workforce in AI, in part motivated by the thesis that this would lead to less biased, more ethical algorithms (Jobin et al., 2019). The lack of clarity in many of these calls regarding pathways through which these benefits may be realized is harmful: it lends itself to performative diversity initiatives that have no tangible impact on the technology built, and can also lead to misleading inferences from findings about diversity’s epistemic impacts. Clarity about pathways of diversity’s benefits is thus a first step—both in research and practice—towards designing impactful diversity interventions.
Training data
Data is a fundamental element of machine learning pipelines. In this section, we discuss diversity considerations related to the population who is represented in the data, as well as the collective in charge of providing labels for it.
Consider, then, diversity’s potential benefits via the cognitive pathways. The requisite association between labelers’ identities and their task-relevant knowledge and perspectives might emerge for a variety of reasons. In the mid-90s, for example, those arguing that dismantling affirmative action in medicine could imperil access to care for Black, Latinx, and low-income communities emphasized that women, Black, and Latinx physicians are more likely to serve communities that have been historically underserved by the healthcare system (Komaromy et al., 1996; Cantor et al., 1996). In the context of ML, this type of specialization implies that incorporating diversity considerations when learning from experts’ assessments could enable health algorithms to better serve the needs of diverse patient populations by drawing on this variety of expertise.
Diversity’s benefits via cognitive pathways can also be relevant in crowdsourcing (Duan et al., 2020). For example, in hate speech detection, disparities in performance across African American and non-African American English vary across annotators (Keswani et al., 2021). Labelers’ identity could in part explain the gap in their performance across different dialects, as crowdsourced assessments can vary across cultural communities (Sen et al., 2015), and demographic identities (Davani et al., 2021). Thus, being attentive to the diversity of a group of labelers could help address algorithmic bias in tasks such as automated hate speech detection. 4
As before, the appropriate conception of diversity depends on the context and the task. A representative concept of diversity could be epistemically motivated, for example, if it is believed that each individual is best positioned to annotate instances written in their own dialect. Consider the task of hate speech annotation in a Spanish corpus, where terms vary widely across countries and regions (Rodriguez-Diaz et al., 2018). Ideally, one may want to match annotators who are familiar with the local dialect to be tasked with annotating instances written in said dialect. In other contexts, however, a different diversity concept might be more appropriate. Consider that hate speech annotation introduces a second dimension: the legacy and pervasiveness of racist and other discriminatory terms, whose derogatory and demeaning nature tends to go unnoticed by the majority in everyday and even professional discourse (Aspinall, 2005). Here, those who are more likely to be victims of hate speech may be better positioned to identify it, which could motivate the use of a normic concept. 5 Such cases can be seen as an instance of standpoint theorists’ epistemic advantage thesis; cases in which those occupying historically marginalized standpoints are better positioned to evaluate not only their own dialects, but also identify and address the blind spots of the dominant culture.
Finally, when considering expert panels involved in data labeling, in addition to task-relevant cognitive pathways, one must consider information elaboration pathways. Here, socioculturally heterogeneous groups of labelers may benefit from better information sharing, reduced normative conformity pressures, and enhanced critical assessment. An important first step towards understanding the impact of the group diversity on the resulting labels is to standardize practices of reporting demographic information for panels of labelers.
Being explicit about the underlying concept of diversity (and its rationales) is important when exploring potential tensions between diversity and other design desiderata. Consider curating a dataset with the aim of maximizing (i) sociocultural diversity of individuals represented therein, and (ii) coverage over the entire feature space to improve generalizability. While some works have suggested a trade-off between these two (Celis et al., 2016), other works claim that they are in alignment (Asudeh et al., 2019). This apparent contradiction can be resolved by acknowledging the distinct concepts of diversity implicit in these works. In particular, representativeness of the entire feature space is not in conflict with diversity in a
A monolithic understanding of the notion of diversity can also result in misinterpreting potential tensions between diversity and fairness-related desiderata. Suppose, for example, that an ethnic group makes up a small percentage of a relevant population. Here, assuming a representative diversity concept–as is often implied in work on diversity of training data (Shankar et al., 2017; Buolamwini and Gebru, 2018)—would mean that this group should be represented in the training data in the same proportion as in the population. Yet, this representatively diverse sampling could lead to disparities in performance as the algorithm may favor accuracy for the majority group, and thus may be undesirable from an algorithmic fairness perspective. This does not mean, however, that there is an inherent trade-off between fairness and diversity per se. Rather, such a trade-off between representative diversity and fairness is relevant only when there is indeed an epistemic or ethical rationale for representative diversity as the contextually appropriate conception of diversity. Acknowledging the variety of diversity concepts and clearly articulating their underlying justifications can thus help avoid confusions.
Human-AI teams
In many use cases, the function of predictive algorithms is to offer informational support to human decision-makers who—either as individuals or teams—make the ultimate decisions. In such settings, diversity-related considerations arise in relation to the characteristics of the human users, the AI tool, and the interaction between the two.
Sociocultural identity may relate to how individuals integrate AI recommendations into their decisions. Research has shown two mechanisms through which this may occur: social context may result in differential rates of adherence to algorithmic recommendations (Albright, 2019), and demographic attributes may be associated with how decision makers prioritize and integrate different sources of information (Mallari et al., 2020; Peng et al., 2019). In the context of judges’ adherence to risk assessment instruments, Albright (2019) shows that judges are more likely to override algorithmic recommendations in favor of harsher bond conditions in counties with larger Black populations, independent of the defendant’s race. Studying recidivism prediction in the context of an Mturk study, Mallari et al. (2020) show that decision-makers’ self-identified gender was a significant factor in recidivism predictions, and that this interacted with demographic attributes of individuals subject to those decisions. The gender of decision-makers was also shown to impact algorithm-informed decisions (and associated biases) in an MTurk study assessing the use of algorithmic hiring tools (Peng et al., 2019). Thus, attending to diversity considerations in relation to those relying on ML recommendations is central to the study of algorithmic adoption, both to ensure the validity and generalizability of empirical studies and the effective and responsible deployment of predictive algorithms.
Additional diversity considerations become salient when we examine how the constituents of human-AI teams may
Algorithm-informed decisions and impact
Whether an algorithm is making autonomous decisions, or assisting human decision-makers, the diversity of its output set is also important. This topic has been considered in works on ranking and information retrieval (Singh and Joachims, 2018; Karako and Manggala, 2018), and more broadly on subset selection (Mitchell et al., 2020). In the context of image retrieval, one may care about the diversity of a set of images shown; in targeted advertisement, the diversity of those who will be shown a job ad matters; and if ML is used to pre-select individuals who will be hired or admitted into a program, the diversity of the selected set is of key relevance. As before, the relevant concept of diversity will depend on the task that the algorithm is solving.
Representative notions are frequently evoked when discussing diversity of algorithmic output, and are especially attractive in the context of information retrieval systems (Singh and Joachims, 2018), where the task is often construed as that of “mirroring” the world. That is, if a search engine is believed to be a descriptive tool, an argument for a representative conception of diversity can follow naturally. This argument may falter, however, if demographic information is deemed conceptually irrelevant to a query, in which case an egalitarian concept may be justified. Similar to a dictionary, which does not define “surgeon” to be a man, irrespective of the frequency of different genders in this occupation, one may expect a search engine to provide query-relevant results that are unaffected by such contingent frequencies.
Importantly, the societal impact of algorithms goes beyond those directly subjected to algorithmic predictions. For instance, information retrieval and search engines are not purely descriptive tools; instead, they hold an active role in shaping beliefs and behaviors. Considering this may significantly impact the concepts of diversity that are appropriate. Thus, normic (and hybrid) notions gain relevance once we consider the active role of such systems in shaping societal beliefs. Indeed, in one of the most in-depth discussions of diversity in algorithmic outputs and unlike most other approaches, Mitchell et al. (2020) defend a normic notion of diversity. In so doing, they introduce a family of metrics that facilitate the quantification of diversity in algorithmic subset selection from a normic perspective, grounded in considerations of social power differentials.
Systems of incentives introduced by the deployment of an algorithm may also have significant impacts on diversity. For instance, algorithmic unfairness may disincentivize investment from individuals who invest rationally (Liu et al., 2020). For example, if the use of standardized tests in college admissions disadvantages a group, this group has less incentives to invest in standardized testing, which would impact the diversity of both the pool of college students and the pool of applicants. Furthermore, in the presence of multiple players—such as multiple companies making hiring decisions—research has shown that a partial compliance to (supposedly) fairness-enhancing interventions can result in segregation (Dai et al., 2021). Thus, considering dynamics of deployment is crucial to understand the impact of algorithms on the diversity of different groups in society.
Discussion
The discussion so far has eluded a key difficulty due to value tensions. Consider, for instance, that whether, and to what extent, team diversity can result in epistemic benefits critically depends on how we measure performance. But what if individuals—perhaps in ways that correlate with sociocultural identity—cannot agree on a performance measure? For example, if one is concerned with the impact of teams’ diversity on building fairer algorithms, how to measure “fair” is a crucial matter that may be a subject of disagreement. Similarly, while diverse groups are more likely to come up with better solutions (in variety and quality), they may also take more time to reach consensus or never reach one at all (e.g. due to polarization) (Muldoon, 2018). Yet, these additional considerations are clearly crucial to social and institutional planners working under time and resource constraints. Even more generally, researchers have offered different ethical and epistemic rationales in support of distinct diversity concepts. While in many contexts these rationales work in tandem, it is also possible that they come apart. We offer no context-free solution to these value tensions, but two general points are worth emphasizing.
First, such value tensions are widespread in the context of deliberative and collaborative groups. A useful way to proceed, therefore, is to draw on techniques developed in previous research. For example, researchers in value-sensitive design have developed a range of techniques for addressing value tensions (Friedman and Hendry, 2019). In many contexts, for instance, it might be useful to shift the focus of discussion from underlying values to proposed courses of action, since actions are often over-determined by values—that is, divergent value systems can nonetheless agree on the same action. Similarly, recent works on diversity in political philosophy have sought to devise bargaining techniques for addressing disagreements among individuals with fundamentally different perspectives in ways that are nonetheless acceptable to those involved (Muldoon, 2016).
Second, in thinking about these value tensions, we caution against the trap of myopia. Viewed from a static or short-term perspective, such value tensions might appear inescapable. Things may look different, however, when we broaden our purview. This is familiar from works that examine the situated dynamics of algorithmic decision-making. Fairness measures that appear irreconcilable when considered statically can be jointly satisfied, for instance, when we move beyond the static setting of one-shot classifications to consider strategic plans consisting of multiple interventions over time (Fazelpour et al., 2021). Importantly, as noted above, some of the key enabling conditions for realizing diversity’s
Conclusion
Diversity’s importance at different stages of the ML lifecycle has increasingly been recognized in discussions about the ethics and governance of ML. Meaningful conversations, studies and interventions hinge on our ability to clearly define and articulate diversity and its implications. This paper provides this clarity by drawing from the broader humanities and social scientific literature on diversity. Building on these works, we identified and explicated diversity-related questions that arise at different stages of ML pipeline, and demonstrated how clarity over concepts and mechanisms can resolve seemingly contradictory findings in existing work, open up new directions of research, and enable better system design.
By providing a detailed characterization of the various contexts in which diversity can be relevant throughout the ML lifecycle, this paper makes a case for diversity as a design desideratum of teams, data and models. By translating multi-disciplinary literature and bringing it to bear on the study of sociotechnical ML systems, it provides conceptual tools to further advance research and practice grounded in a coherent understanding of sociocultural diversity.
Footnotes
Acknowledgements
We would like to thank the two anonymous reviewers for helpful comments and suggestions. An early version of this article was presented as a tutorial at the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT). We would like to thank the participants for their valuable feedback.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial supportfor the research, authorship, and/or publication of this article: For parts of the writing of this paper, Sina Fazelpour was a postdoctoral fellow at Carnegie Mellon University, supported in part by funding from the Social Sciences and Humanities Research Council of Canada (No. 756-2019-0289). Maria De-Arteaga was supported in part by a Google AI Award for Inclusion Research and by Good Systems, a research grand challenge at the University of Texas at Austin.
