Sage Journals: Discover world-class research

Abstract

There has been a surge of recent interest in sociocultural diversity in machine learning research. Currently, however, there is a gap between discussions of measures and benefits of diversity in machine learning, on the one hand, and the broader research on the underlying concepts of diversity and the precise mechanisms of its functional benefits, on the other. This gap is problematic because diversity is not a monolithic concept. Rather, different concepts of diversity are based on distinct rationales that should inform how we measure diversity in a given context. Similarly, the lack of specificity about the precise mechanisms underpinning diversity’s potential benefits can result in uninformative generalities, invalid experimental designs, and illicit interpretations of findings. In this work, we draw on research in philosophy, psychology, and social and organizational sciences to make three contributions: First, we introduce a taxonomy of different diversity concepts from philosophy of science, and explicate the distinct epistemic and political rationales underlying these concepts. Second, we provide an overview of mechanisms by which diversity can benefit group performance. Third, we situate these taxonomies of concepts and mechanisms in the lifecycle of sociotechnical machine learning systems and make a case for their usefulness in fair and accountable machine learning. We do so by illustrating how they clarify the discourse around diversity in the context of machine learning systems, promote the formulation of more precise research questions about diversity’s impact, and provide conceptual tools to further advance research and practice.

Keywords

Diversity fair machine learning responsible artificial intelligence algorithmic bias AI ethics sociotechnical systems

Introduction

Sociocultural diversity is a key value in democratic societies both for reasons of justice, fairness, and legitimacy and because of its ramifications for group performance. As a result, researchers in humanities and social sciences have worked to understand diversity’s varied meanings, develop appropriate measures for quantifying diversity (in some sense), and specify pathways by which diversity can be functionally beneficial to groups such as juries, design teams, and scientific communities. Recently, there has also been a surge of interest in sociocultural diversity in machine learning (ML) research. This burgeoning literature on diversity¹ in ML systems can be roughly divided into two general lines. One set of diversity-related considerations pertains to the composition and dynamics of teams and groups whose judgments shape the construction of ML systems—for example, those engaged in problem formulation, data generation, and development. From this perspective, there have been claims about the potential benefits of diversifying these teams as an organizational solution to alleviate biases in ML systems (Jobin et al., 2019; West et al., 2019), with more recent efforts aiming to empirically test these purported benefits (Duan et al., 2020; Cowgill et al., 2020). A second set of issues concerns the composition of items at different stages of the data-processing pipeline, especially in relation to who or what gets represented therein and how it affects individuals and communities that are impacted by the deployment of ML systems. In the context of curating input data, for instance, the lack of geographic diversity in benchmark image datasets has been linked to amerocentric and eurocentric algorithmic biases, such as higher misclassification rates for bridegroom images from Pakistan than from the US (Shankar et al., 2017). Similarly, recent works have highlighted the importance of diversity in relation to algorithmic output across tasks ranging from image search and content recommendation to matchmaking and automated recruiting (Drosou et al., 2017). To incorporate these considerations into the design of ML products, researchers have proposed various measures for quantifying diversity, developed methods for satisfying these measures, and examined the interaction between diversity and other design desiderata such as predictive performance and fairness (Drosou et al., 2017; Celis et al., 2016; Mitchell et al., 2020).

Currently, however, the discourse around diversity in ML systems is hindered by a lack of clarity regarding the underlying concepts of diversity and the precise pathways by which diversity can benefit group performance. This is problematic because diversity is an ambiguous term that admits various, potentially conflicting, conceptions. These diversity concepts differ substantially in their motivations, meanings, and appropriate operationalizations. Without explicitly grounding proposed diversity measures in appropriate underlying concepts, we thus risk a mismatch between professed diversity-related aims and the operationalization of those aims. What is more, whether and when diversity (in some sense) can improve the performance of a collective requires attending to the precise pathways of diversity’s influence, the specific factors that moderate this impact, and the broader enabling conditions that support and sustain these potential positive consequences. The lack of specificity about the precise mechanisms underpinning diversity’s potential benefits can thus result in espousing uninformative or easily falsifiable generalities, invalid experimental designs, and illicit inferences from findings.

In this paper, we bridge the gap that currently exists between discussions of measures and beneficial consequences of diversity in ML, on the one hand, and the humanities and social sciences research on concepts and consequences of diversity, on the other. We begin by articulating current thinking about the concepts and consequences of sociocultural diversity in feminist philosophy, social psychology, organizational and network sciences. Drawing on this literature, we distinguish between different diversity concepts, and highlight their distinct and potentially conflicting epistemic, ethical, and political rationales. We then discuss mechanisms through which diversity can potentially improve the performance of teams and collectives, with a focus on cognitive and information elaboration pathways. Finally, we situate this understanding in and draw its implications for the discussions of diversity in ML. In mapping the various types of diversity-related considerations that arise throughout the ML lifecycle, we draw attention to the significance of achieving clarity about the conceptual underpinnings of diversity measures and the precise mechanisms mediating diversity’s potential benefits. We show how this can enrich the design and evaluation of ML systems by improving hypotheses formulation, experimental design, and interpretations of findings.

Before we proceed a qualification is in order. Diversity is valued on many different moral, political, and epistemic grounds. Discussions of the potential epistemic benefits of team diversity should not be confused with the “business case” for diversity—the idea that institutions should value diversity because, and insofar as, it can improve their performance. The “business case” takes place against a troubling moral background. Among other problems, it takes as default the lack of diversity, and puts the burden on individuals from marginalized communities to justify their presence (Phillips, 2017). The moral background against which discussions of the epistemic benefits of diversity take place thus matters. While the business case ignores other ethical and political grounds for valuing diversity, a long line of research in feminist and standpoint epistemologies has highlighted the ways in which diversity’s epistemic benefits work in concert with considerations of social justice, anti-oppression, and participation. Additionally, it should be noted that beneficial epistemic impacts of diversity are broader than—and may come apart from—those relevant to the business case. For instance, diversity can enhance group performance by promoting mechanisms that support critical and contestable collective inquiry. These tend to increase reliability by delaying convergence and may very well be in conflict with business aims involving speed and efficiency.

The varied concepts of diversity

Diversity implies some sort of difference. Divergent understandings of what this difference consists in lead to varied meanings and perceptions of diversity. Perhaps the most obvious way our assessments of a collective’s diversity can vary depends on what we take to be the set of attributes along which its members exhibit relevant differences. So, the same group can be seen as more or less diverse depending on the attribute (e.g. gender, ethnicity, socioeconomic status) used for characterizing its members. As noted by a number of researchers, however, even abstracting from questions of relevant attributes, the notion of diversity admits multiple understandings that can differ significantly in their underlying rationales and appropriate operationalizations (Harrison and Klein, 2007; Page, 2010; Steel et al., 2018). In particular, Steel et al. (2018) distinguish between contexts, concepts, and measures of diversity. The context of diversity refers to a background situation that suggests, among other things, a set of relevant attributes. A concept of diversity, in contrast, consists of “an understanding of what constitutes diversity, abstracted from questions of which attributes are relevant to diversity in a specific context” (Steel et al., 2018, p.726). Measures of diversity are mathematical operationalizations for quantifying the extent of diversity according to some specific concept. Accordingly, the validity of a given measure depends on the extent to which it tracks a specific underlying concept in the appropriate way.

While there are different ways of drawing distinctions between diversity concepts, in what follows we primarily build on the classification developed by Steel et al. (2018), which is purpose-built for understanding sociocultural diversity and its potential epistemic benefits for groups. Throughout, we also incorporate discussions from other works. We start with general families of diversity concepts, discuss prevalent types of concepts within each family, and highlight the motivations behind these concepts as well as some popular ways of operationalizing them.

Within-group family of concepts

Suppose there is a focal group of individuals, $G$ , whose diversity is a matter of question. $G$ might be a group of stakeholders deliberating about how to formulate a social problem, or a group of crowdsource workers engaged in labeling some type of content. For simplicity, let us assume that, given the context, there is an attribute, $A$ , which is seen as relevant for assessing $G$ ’s diversity, with ${a_{1}, a_{2}, \dots, a_{n}}$ comprising different values of $A$ . $A$ might refer to gender, race, socioeconomic status, and more. According to within-group family of diversity concepts, assessments of $G$ ’s diversity can proceed by mainly focusing on the properties of $G$ itself. To make this concrete, we examine egalitarian diversity concepts, which are, arguably, the most widely used type in the within-group family.

Egalitarian concepts. Broadly speaking, egalitarian concepts are concerned with balanced representation of sub-groups within a focal group. Typically this pertains to the presence as well as the proportion of each attribute category $a_{i} \in A$ present in $G$ . In general, diversity is maximal, in the egalitarian sense, when the distribution in $G$ is uniform over $A$ . Most egalitarian concepts thus tend to be symmetric, in the sense that changing the referent of two category attributes, $a_{i}$ and $a_{j}$ , does not affect the diversity of $G$ . In practice, egalitarian concepts are often operationalized in terms of different specifications of generalized entropy functions (Page, 2010), such as Blau’s index (Blau, 1977) in sociology and Simpson’s index (Simpson, 1949) in ecology.

As their name suggests, a typical political and ethical rationale for egalitarian conceptions is the ideal of equal participation. In feminist philosophy of science, moreover, this ethical rationale is often tied to proposed epistemic benefits (Longino, 1990; Solomon, 2007; Grasswick, 2018). A core insight of this literature is the recognition of the situated nature of knowledge—the idea that the social situation of agents can shape the set of hypotheses that they might pursue, their knowledge base, their task-relevant expertise, and more. Similar points are also expressed in strands of diversity research from sociology and organizational research (Page, 2017). Viewed in this way, therefore, egalitarian conceptions of diversity are valuable because the variety of perspectives and approaches promotes a more comprehensive pursuit of alternative hypotheses (Solomon, 2007), reduces the risks of inadvertently accepting unjustified assumptions (Longino, 1990), and enables a fruitful way for groups to leverage the benefit of exploration (Hong and Page, 2004).

Comparative family of concepts

The assessments of diversity via within-group concepts almost solely focus on the properties of a focal group. In many circumstances, however, the narrow focus of egalitarian conceptions fail to capture broader social features that motivate our analyses of sociocultural diversity in the first place (Rushton, 2008). As a simple example, suppose $A \in {a_{1}, a_{2}, a_{3}, a_{4}}$ refers to ethnicity in a social context where individuals from $a_{1}$ constitute $40 %$ of the general population, with the rest equally consisting of $a_{2}$ , $a_{3}$ , and $a_{4}$ . Viewed through the lens of egalitarian diversity concepts, a balanced group, $G_{1}$ , consisting of $25 %$ from each category, is maximally diverse. Furthermore, the extent of (egalitarian) diversity remains unperturbed when we shift our attention from $G_{2} = {0.6, 0.2, 0.2, 0}$ to $G_{3} = {0, 0.2, 0.2, 0.6}$ . The core idea behind comparative diversity concepts is that, in many circumstances, our understanding of diversity needs to be sensitive to broader societal information that could render $G_{2}$ and $G_{3}$ significantly different. Below, we examine two types of diversity concepts that aim to incorporate this idea, albeit in different ways.

Representative concepts. Given a prior designation of a reference population $F$ , a focal group $G$ is diverse, in the representative sense, to the extent that its distribution over $A$ matches that of $F$ (Steel et al., 2018). The choice of $F$ is often value-driven. In the context of a group subject to admission decisions, for example, $F$ might be taken to be the general population, or the population of university graduates in relevant fields. Accordingly, appraisals of $G$ ’s diversity can vary depending on what is taken to be the relevant reference population. Given $G$ and $F$ , different distance or proportionality measures (e.g. demographic parity) can be used for constructing quantitative measures of representative diversity.

The relevance of the representative conception can be motivated on different grounds. The obvious ethical and political rationale emerges from the association of this conception with political ideals of representative democracy (Jacklin, 1978). In certain contexts, the representative concept might also be desirable for epistemic reasons. For instance, when considering the representation of different political affiliations on a debate concerning an issue about which there is robust evidence (e.g. climate change), it might be argued that the relevant sense of diversity is representative, as opposed to egalitarian, insofar as the latter can give rise to an overrepresentation of unfounded outlier views (Steel et al., 2018).

Normic concepts. While representative concepts take a broader perspective on diversity, contextually critical considerations of social dynamics still remain absent from this understanding. Consider again groups $G_{2} = {0.6, 0.2, 0.2, 0}$ and $G_{3} = {0, 0.2, 0.2, 0.6}$ in a population with a ${0.4, 0.2, 0.2, 0.2}$ proportions. Here, $G_{2}$ and $G_{3}$ remain indistinguishable from an egalitarian perspective (absent any weightings). Suppose, however, that the sub-group characterized by category $a_{1}$ is not only a relative numerical majority, but also a dominant group, which is historically over-represented in positions of power. Suppose that $a_{4}$ in contrast refers to a marginalized group, whose members have been excluded from consequential decision-making processes due to unjust and oppressive means. Given this setting, we might view $G_{3}$ as more diverse than $G_{2}$ , in that it includes a more significant representation of a historically marginalized group. Importantly, this is the case, even though $G_{2}$ is more representative of the reference population than $G_{3}$ .

Capturing these sociopolitical and historical dimensions of diversity is at the heart of normic diversity concepts. These concepts may characterize diversity in comparison with a non-diverse category in the reference population (Steel et al., 2018). In the example above, a focal group can be seen as diverse to the extent that its members diverge from the dominant, non-diverse category, $a_{1}$ . Alternately, one can assess the diversity of a focal group in terms of the extent to which it matches an ideal distribution over other attribute categories (e.g. a distribution that is skewed towards the historically marginalized group $a_{4}$ ).

From ethical and political perspectives, the motivation behind normic concepts is grounded in ideals of social justice, inclusion, and anti-domination. In addition, a long line of research in standpoint theory has highlighted an epistemic rationale for these conceptions (Harding, 2004; Collins, 2002). Besides recognizing the socially situated nature of knowledge discussed above, standpoint theorists often endorse an epistemic advantage thesis, according to which “[s]ome standpoints, specifically the standpoints of marginalized or oppressed groups, are epistemically advantaged (at least in some contexts)” (Intemann, 2010, 783). The core idea is that those occupying the standpoints of marginalized groups can develop a more accurate and thorough understanding of institutional knowledge production and decision-making processes (Harding, 2004). In contrast, as beneficiaries of the status quo, dominant standpoints can result in significant blind spots or else distorted understandings of these processes (Grasswick, 2018). To be sure, whether, how, and to what extent some marginalized standpoints afford epistemic advantages is a context-dependent question, open to empirical investigation (Grasswick, 2018). The thesis has found empirical support in various aspects of biomedical and social scientific research (Wylie, 2017).

Combine, but don’t conflate

Depending on context, it might be perfectly sensible to create hybrid conceptions. For instance, an egalitarian and normic hybrid concept might incorporate weights, so as to reflect contextually-informed (dis)similarity between various attribute categories. While such combinations can be permissible, it is critical not to conflate different concepts. This is because different diversity concepts (and the measures generated to formalize them) are based on distinct rationales, and make fundamentally different assumptions about which aspects of the social environment are relevant for, or can be abstracted away from, evaluations of diversity. Aside from the conceptual confusion it can give rise to, conflating different concepts can also lead to illicit inferences from empirical research (Steel et al., 2018). Attending to these considerations is particularly important, since, as noted by some diversity researchers (Page, 2010), the current widespread use of certain egalitarian measures is mainly driven by convention, instead of intention. These considerations are also critical when building on recent calls for increasing diversity in ML as a way of improving design and combating bias.

Potential epistemic benefits of diversity

As discussed above, in addition to referring to distinct ethical and political ideals, researchers often motivate different concepts of diversity by highlighting how increasing diversity can improve the epistemic performance of groups. This emphasis on the word “can,” and the implied distinction between potential and actual benefits, is key (Phillips, 2017): diversity’s influence on group performance is mediated through various cognitive, communicative, and affective pathways. The mode of diversity’s influence (positive, negligible, or negative) along each pathway critically depends on the type of outcomes used for measuring performance, pathway-specific effect modifiers, and broader contextual features (Roberson, 2019). When exploring diversity’s potential epistemic benefits, it is thus necessary to be explicit about the specific mechanism under investigation.

We start this section by distinguishing between two types of pathways—cognitive and information elaboration—through which sociocultural diversity can improve the epistemic performance of groups (Figure 1). In each case, we describe the core idea, outline examples of supporting evidence, and highlight some key effect modifiers pertinent to these pathways. We finish this section by discussing the seemingly conflicting findings of various meta-analytical studies of sociocultural diversity and the significance of context.

Figure 1.

Two types of pathways by which sociocultural diversity can improve group performance, along with contrasts, path-specific effect modifiers, and broader enabling conditions.

Cognitive pathways

The core idea behind cognitive pathways is that diverse groups can leverage the variety of the cognitive repertoires of their members—their varied knowledge, perspectives, skills, etc.—to more effectively deal with demanding tasks. This idea underpins the epistemic rationale for many of the diversity concepts discussed above.

This general idea can be analyzed in terms of a series of interrelated steps. First, individuals in cognitively homogeneous groups tend to exhibit significant overlap in their relevant strengths and limitations (Bang and Frith, 2017). Cognitive diversity can be beneficial because of the complementarity between individual strengths (Page, 2017). Notably, insofar as many complex tasks require striking a balance between exploiting current best methods and exploring alternatives, cognitive diversity can be beneficial, even when, on average, members of the heterogeneous group exhibit lower individual performance, according to some measure, than those of the homogeneous one (Hong and Page, 2004). Sociocultural diversity can enhance the epistemic performance of groups, via these cognitive pathways, insofar as it can induce cognitive diversity—that is, to the extent that differences in social identity influence or correlate with the variety in task-relevant knowledge and skills (Page, 2017).

Empirical support for this pathway has been found in relation to different tasks and groups (Roberson, 2019). In organizational contexts, for example, Milliken and Martins review studies demonstrating the beneficial effects of sociocultural diversity in relation to cognitively relevant outcomes such as the number and quality of alternative ideas considered by groups (Milliken and Martins, 1996). Using citation counts as a proxy for research quality, Campbell et al. (2013) found evidence for the positive impact of gender diversity on the quality of work produced by scientific teams.² In addition to these empirical studies, researchers in cognitive science (Bang and Frith, 2017), sociology (Page, 2017), and philosophy (Muldoon, 2013) have turned to formal and simulation tools to investigate the impacts of diversity along cognitive pathways. These studies typically center around frameworks (e.g. NK landscapes, multi-armed bandits) that represent tasks that demand exploitation-exploration trade-offs. While these studies often abstract away from the relation between sociocultural and cognitive diversity, they nonetheless offer valuable insights about the conditions—for example, task difficulty, team composition, network structure—under which cognitive diversity can benefit group performance in tasks such as learning and problem-solving.

These lines of work have identified key contextual factors that moderate diversity’s benefits along cognitive pathways, as well as the potential trade-offs involved. As emphasized by both empirical (Van Dijk et al., 2012) and simulation (Hong and Page, 2004) studies, for example, chief among these effect modifiers are task demands and structure. When tasks are trivial to an agent (due to task simplicity or prior expertise), there is often no need to rely on others. Similarly, highly regimented tasks consisting of a set of prescribed steps do not provide agents with the requisite opportunity to draw on their potentially distinct approaches (Van Dijk et al., 2012). All else equal, therefore, diversity is more likely to result in performance benefits in complex tasks and those that demand innovation and creativity.

Information elaboration pathways

The collective performance of any task requires the elicitation, examination, and ultimately integration of information that is distributed among individual members. In diversity research, these processes are collectively referred to as information elaboration (Homan et al., 2007; Steel et al., 2021). Importantly, socioculturally homogeneous groups are particularly susceptible to group influences that undermine efficient information elaboration. By reducing these detrimental effects of homogeneity, therefore, sociocultural diversity can epistemically benefit groups, even when not correlated with cognitive diversity in context (Phillips, 2017; Fazelpour and Steel, forthcoming).

Consider how sociocultural homogeneity can negatively impact information elicitation and sharing. Individuals tend to take perceived similarity in social markers of identity as an indicator of similarity in underlying cognitive repertoire (e.g. assumptions and information) (Phillips, 2017). Those in homogeneous groups may thus overestimate the extent to which their peers share the same information or views, resulting in omissions, whereby individuals fail to share key pieces of information with peers. In contrast, perceived differences along identity markers can lead to expected cognitive differences in diverse groups, even if this is not in fact the case (Phillips, 2017). Such expected cognitive difference have been shown to reduce the likelihood of critical omissions in information sharing (Phillips and Loyd, 2006).

Sociocultural homogeneity can also adversely impact information elaboration because of group-based trust and conformity (Fazelpour and Steel, forthcoming). Individuals tend to perceive in-groups as more reliable sources of information (Turner et al., 1989). Such group-based trust can reduce vigilance, and increase uncritical acceptance of (potentially misleading) information from in-groups, resulting in fast but mistaken consensus in homogeneous groups. Empirical studies have shown that sociocultural diversity can counteract these detrimental influences by enhancing the critical assessment of incoming information (Levine et al., 2014). False consensus can also arise because of group-based conformity pressures that make individuals in homogeneous groups less likely to voice dissenting opinions. By reducing the impact of group-based conformity on the sharing of dissenting views, sociocultural diversity has been shown to improve group performance (Phillips et al., 2009).

In addition to demonstrating the potential benefits of diversity via information elaboration pathways, these studies also highlight potential trade-offs—for example, speed versus reliability of convergence (Fazelpour and Steel, forthcoming)—that affect diversity’s benefits via these pathways as well as unique communicative challenges faced by diverse groups (Phillips, 2017). Incorporating the insights from these works can be fruitful when thinking about the influence of diversity in the context of designing ML systems. This is particularly useful when there is interaction between different individuals (e.g. stakeholders, experts, crowdsource workers, or developers), and where individuals are aware of (or have expectations about) others’ identities.

Diversity’s overall impact and the significance of context

We have highlighted some mechanisms by which diversity can benefit group performance. Yet, according to meta-analytical studies, the overall impact of sociocultural diversity on task performance tends to be mixed—ranging from positive to slightly negative (Van Dijk et al., 2012; Bear and Woolley, 2011; Post and Byron, 2015). Some of the reasons for these mixed findings have already been mentioned, such as the importance of employing appropriate concepts and measures of diversity, as well as the various effect modifiers.

More generally, these mixed findings from meta-analytical studies provide an important occasion for examining enabling institutional and societal conditions that support the functioning of diverse groups. For example, the positive impact of gender diversity is even more pronounced in inclusive settings with egalitarian power distributions (Post and Byron, 2015). Moreover, increased sociocultural diversity is more likely to improve group performance when group members are open to social differences; when groups are less homogeneous to begin with; and when individuals from marginalized demographics find support in an organizational leadership that values and promotes a culture of inclusion (Phillips, 2017; Bear and Woolley, 2011). Absent such enabling conditions, “diversifying” teams by simply introducing individuals from underrepresented backgrounds, and then expecting improvements to team performance, may amount to little more than setting up these individuals for failure (Ray, 2019; Dobbin and Kalev, 2016).

Situating diversity in sociotechnical ML systems

In this section, we situate the discussion of concepts of diversity and mechanisms of its benefits in ML by mapping the various contexts in which sociocultural diversity can be relevant throughout the ML lifecycle. We emphasize that diversity-related considerations are ubiquitous in ML systems. While often neglected, this fact should not come as a surprise; these systems are sociotechnical systems, embedded in decision pipelines that are shaped by, or else implicate, groups and communities. Nonetheless, identifying, understanding, and implementing these diversity-related considerations hinges on being specific about concepts, rationales, and pathways.

The structure of this section is as follows: in each subsection, we focus on one stage of the ML lifecycle. For each stage, we outline the nature and ramifications of the task, and delineate the set of diversity-related considerations that are pertinent to decisions and value judgments therein. Figure 2 provides an overview of these stages, and identifies diversity-related questions relevant to them.³ We explore potential ethical and political rationales that can support the use of different diversity concepts, and attendant measures, in particular settings. Moreover, when these diversity considerations pertain to (teams of) decision-making agents, we examine relevant cognitive and information elaboration-related factors that can influence their performance as socially situated agents, and identify potential ways in which increasing team diversity can benefit epistemic and ethical performance. Throughout we also highlight the strengths and limitations of existing work as well as promising avenues for future research.

Figure 2.

Stages of the lifecycle of a machine learning system at which different varieties of sociocultural diversity may be relevant.

Problem formulation

Prior to the design and development of an ML system, its overarching goal and main function must be defined. This is frequently done by a team that is distinct from the design and development team. For example, state governments tend to outsource the development of risk assessment models for recidivism prediction to external firms (Chouldechova, 2017), while making in-house choices concerning the goal of the predictive model and the nature of its use. Problem formulation includes defining a system’s overarching goal, translating this goal into a prediction problem, defining the space of available prediction-informed decisions, and anticipating the potential impact of these decisions on populations that will be affected by them (Mitchell et al., 2021). Each step in the process may demand distinct types of domain knowledge as well as substantive value judgments. Neglecting or underappreciating considerations of diversity can result in problem formulation groups that advance the interests of only a small subset of stakeholders, exhibit significant overlap in their blind spots—ethical or otherwise—or preclude possibilities for effective participation among team members.

Consider, for example, that the particular formulation of the prediction problem requires careful anticipation and evaluation of its likely impacts on different sections of society (Fazelpour and Danks, 2021). This issue has become painfully salient in a recent example of bias in healthcare algorithms, where using health costs as a proxy for health needs resulted in systematic underestimation of the needs of Black patients, due to racial disparities in access to and spending on care (Obermeyer et al., 2019). As previous research has highlighted, individuals’ ability to anticipate and evaluate relevant hypothetical scenarios is likely to be correlated with their sociocultural identities and assumed social roles (Catellani et al., 2021). In participatory and value-sensitive design approaches, awareness of this type of correlations have traditionally led to an emphasis on diversity in prototype and foresight studies (Friedman and Hendry, 2019). More recently, increased recognition of diversity’s importance in these contexts has also led to proposals for the adoption of participatory and collaborative techniques in ML problem formulation (Martin Jr et al., 2020). Ultimately, however, the appropriate use of these techniques will depend upon attention to the relevant concepts, measures, and pathways.

Details of the task and the social context of operation can inform the choice of relevant diversity concepts. For instance, in many high-stake domains that concern the allocation of public resources, problem formulation often involves deliberative mini-publics and town halls (Pennsylvania Commission on Sentencing, 2020). In such cases, the ideal of democratic participation might support a representative notion of diversity. On the other hand, normic notions of diversity can be relevant if the proposed technology imposes a disproportionate risk on a community that has been historically disadvantaged by, or excluded from, key societal decisions. For example, in deliberations about the potential deployment of recidivism prediction algorithms as part of the US criminal justice system, considerations of social justice and anti-oppression can require that Black, Latinx, and Indigenous communities are represented in a greater proportion than what corresponds to their presence in the country’s population, precisely because these communities have been disproportionately harmed by the criminal justice system.

Insofar as the deliberative interaction among problem formulation teams takes place in institutional and societal settings subject to various group dynamics, information elaboration considerations also matter. A particular challenge here is ensuring the effective participation of different team members throughout the deliberative process (Aizenberg and van, 2020). Addressing this challenge requires adopting strategies that enable individuals, who might otherwise remain silent or feel excluded, to voice their opinions and concerns. Diversity’s influence on performance via information elaboration pathways thus gains immediate relevance in this respect. More studies are needed to explore these potential benefits in the context of ML problem formulation teams.

Design and development

Once a problem has been formulated, a series of design choices ought to be made. These choices concern the data used for training the model, the performance metrics optimized, and the models considered for prediction. Similar to the problem formulation phase, these decisions may be distributed across multiple teams or may all be the responsibility of one team. Unlike many of the decisions involved in problem formulation, however, this stage requires significant machine learning expertise. It is thus useful to be specific about the potential pathways and relevant effect modifiers that can govern diversity’s beneficial impact on performance (broadly understood) in the design and development stage.

Both cognitive and information elaboration pathways may impact algorithmic design. For instance, designers’ identity and lived experiences may inform their ability to anticipate modes of failure. If this is so, then we may expect marginalized standpoints to better identify the inimical impacts of certain choices, such as the training data and the metric used to assess algorithms’ performance. But we should not expect diversity to be a panacea, and the pathways and effect modifiers that mediate diversity’s impact should be attended to.

Consider, for instance, a recent study of whether fairness properties of algorithms are impacted by the demographic attributes of programmers who train them (Cowgill et al., 2020). The authors “found no evidence that female, minority and low-IAT [Implicit Association Test] engineers exhibit lower bias or discrimination in their code” (Cowgill et al., 2020). Instead, biased predictions were found to be “mostly caused by biased training data” (Cowgill et al., 2020). Taken at face value, the findings seem to suggest that we should not expect sociocultural diversity of design teams to result in any epistemic benefits. It is crucial, however, to interpret the findings in relation to the setup of the experiments. Importantly, engineers worked alone within an experimental setting that pre-defined crucial problem formulation and design choices pertaining to the predictive task, the evaluation metric, and the available input. While representative of many real world settings, this setup significantly restricts the diversity-related inferences that can be drawn from the findings. Without any group interaction, for example, information elaboration pathways are absent. With respect to cognitive pathways, moreover, the highly regimented nature of the task is likely to attenuate any potential benefits of diversity. Given this setup, the findings are thus fully consistent with literature showing that the effects of diversity are dependent on the nature and complexity of the task (Van Dijk et al., 2012). Appropriately interpreted, then, the findings highlight the importance of being specific about the relevant pathways when designing diversity initiatives for improving ML products, but should not be misinterpreted to mean that such potentially beneficial impacts do not exist.

Currently, there is a growing number of voices and initiatives calling for a more diverse workforce in AI, in part motivated by the thesis that this would lead to less biased, more ethical algorithms (Jobin et al., 2019). The lack of clarity in many of these calls regarding pathways through which these benefits may be realized is harmful: it lends itself to performative diversity initiatives that have no tangible impact on the technology built, and can also lead to misleading inferences from findings about diversity’s epistemic impacts. Clarity about pathways of diversity’s benefits is thus a first step—both in research and practice—towards designing impactful diversity interventions.

Training data

Data is a fundamental element of machine learning pipelines. In this section, we discuss diversity considerations related to the population who is represented in the data, as well as the collective in charge of providing labels for it.

Labelers. The labels used to train machine learning models are often generated by humans. This includes crowdsourcing, eliciting information from expert panels, and learning from historical observational data of human decisions. While target labels are often termed “ground truth,” labels derived from human assessments are subject to noise, expert disagreement, and bias. Accordingly, especially when labels are based on human appraisals, it is crucial to ask whose views, knowledge, and biases are being encoded. Attending to sociocultural diversity is particularly important—yet understudied—in tasks where the sociocultural identity of experts can be expected to impact their appraisals, or in settings where identity can instigate group influences that impact the quality of information elaboration of panels or groups in charge of labeling.

Consider, then, diversity’s potential benefits via the cognitive pathways. The requisite association between labelers’ identities and their task-relevant knowledge and perspectives might emerge for a variety of reasons. In the mid-90s, for example, those arguing that dismantling affirmative action in medicine could imperil access to care for Black, Latinx, and low-income communities emphasized that women, Black, and Latinx physicians are more likely to serve communities that have been historically underserved by the healthcare system (Komaromy et al., 1996; Cantor et al., 1996). In the context of ML, this type of specialization implies that incorporating diversity considerations when learning from experts’ assessments could enable health algorithms to better serve the needs of diverse patient populations by drawing on this variety of expertise.

Diversity’s benefits via cognitive pathways can also be relevant in crowdsourcing (Duan et al., 2020). For example, in hate speech detection, disparities in performance across African American and non-African American English vary across annotators (Keswani et al., 2021). Labelers’ identity could in part explain the gap in their performance across different dialects, as crowdsourced assessments can vary across cultural communities (Sen et al., 2015), and demographic identities (Davani et al., 2021). Thus, being attentive to the diversity of a group of labelers could help address algorithmic bias in tasks such as automated hate speech detection.⁴

As before, the appropriate conception of diversity depends on the context and the task. A representative concept of diversity could be epistemically motivated, for example, if it is believed that each individual is best positioned to annotate instances written in their own dialect. Consider the task of hate speech annotation in a Spanish corpus, where terms vary widely across countries and regions (Rodriguez-Diaz et al., 2018). Ideally, one may want to match annotators who are familiar with the local dialect to be tasked with annotating instances written in said dialect. In other contexts, however, a different diversity concept might be more appropriate. Consider that hate speech annotation introduces a second dimension: the legacy and pervasiveness of racist and other discriminatory terms, whose derogatory and demeaning nature tends to go unnoticed by the majority in everyday and even professional discourse (Aspinall, 2005). Here, those who are more likely to be victims of hate speech may be better positioned to identify it, which could motivate the use of a normic concept.⁵ Such cases can be seen as an instance of standpoint theorists’ epistemic advantage thesis; cases in which those occupying historically marginalized standpoints are better positioned to evaluate not only their own dialects, but also identify and address the blind spots of the dominant culture.

Finally, when considering expert panels involved in data labeling, in addition to task-relevant cognitive pathways, one must consider information elaboration pathways. Here, socioculturally heterogeneous groups of labelers may benefit from better information sharing, reduced normative conformity pressures, and enhanced critical assessment. An important first step towards understanding the impact of the group diversity on the resulting labels is to standardize practices of reporting demographic information for panels of labelers.

Data instances. Up to this point, we have discussed diversity in relation to teams and collectives whose views and knowledge shape the ML technology. An additional set of diversity considerations relates to items (e.g. individuals, regions) represented in the data used to train and validate predictive algorithms. Given the growing recognition of the implications of a dataset’s diversity for the resulting algorithm’s performance and fairness-related properties, diversity as a design desideratum in this stage of the pipeline has received considerable attention (Celis et al., 2016; Shankar et al., 2017; Buolamwini and Gebru, 2018). While existing work in this space has approached diversity with different goals and using different measures, there is frequently an implicit assumption that diversity is a monolithic concept. However, the question “is this dataset diverse?” does not have a single answer, as the answer will depend on the concept under consideration—a choice that is distinct from and ought to precede the choice of a diversity measure.

Being explicit about the underlying concept of diversity (and its rationales) is important when exploring potential tensions between diversity and other design desiderata. Consider curating a dataset with the aim of maximizing (i) sociocultural diversity of individuals represented therein, and (ii) coverage over the entire feature space to improve generalizability. While some works have suggested a trade-off between these two (Celis et al., 2016), other works claim that they are in alignment (Asudeh et al., 2019). This apparent contradiction can be resolved by acknowledging the distinct concepts of diversity implicit in these works. In particular, representativeness of the entire feature space is not in conflict with diversity in a representative sense. That is, while the representative concept of (sociocultural) diversity considers a subset of the features, it nonetheless has the same goal and can be estimated using the same families of measures. The tension arises, however, when considering egalitarian concepts of diversity, which will be in conflict with the aim of representativeness whenever the distribution across groups in the reference population is not uniform.

A monolithic understanding of the notion of diversity can also result in misinterpreting potential tensions between diversity and fairness-related desiderata. Suppose, for example, that an ethnic group makes up a small percentage of a relevant population. Here, assuming a representative diversity concept–as is often implied in work on diversity of training data (Shankar et al., 2017; Buolamwini and Gebru, 2018)—would mean that this group should be represented in the training data in the same proportion as in the population. Yet, this representatively diverse sampling could lead to disparities in performance as the algorithm may favor accuracy for the majority group, and thus may be undesirable from an algorithmic fairness perspective. This does not mean, however, that there is an inherent trade-off between fairness and diversity per se. Rather, such a trade-off between representative diversity and fairness is relevant only when there is indeed an epistemic or ethical rationale for representative diversity as the contextually appropriate conception of diversity. Acknowledging the variety of diversity concepts and clearly articulating their underlying justifications can thus help avoid confusions.

Human-AI teams

In many use cases, the function of predictive algorithms is to offer informational support to human decision-makers who—either as individuals or teams—make the ultimate decisions. In such settings, diversity-related considerations arise in relation to the characteristics of the human users, the AI tool, and the interaction between the two.

Sociocultural identity may relate to how individuals integrate AI recommendations into their decisions. Research has shown two mechanisms through which this may occur: social context may result in differential rates of adherence to algorithmic recommendations (Albright, 2019), and demographic attributes may be associated with how decision makers prioritize and integrate different sources of information (Mallari et al., 2020; Peng et al., 2019). In the context of judges’ adherence to risk assessment instruments, Albright (2019) shows that judges are more likely to override algorithmic recommendations in favor of harsher bond conditions in counties with larger Black populations, independent of the defendant’s race. Studying recidivism prediction in the context of an Mturk study, Mallari et al. (2020) show that decision-makers’ self-identified gender was a significant factor in recidivism predictions, and that this interacted with demographic attributes of individuals subject to those decisions. The gender of decision-makers was also shown to impact algorithm-informed decisions (and associated biases) in an MTurk study assessing the use of algorithmic hiring tools (Peng et al., 2019). Thus, attending to diversity considerations in relation to those relying on ML recommendations is central to the study of algorithmic adoption, both to ensure the validity and generalizability of empirical studies and the effective and responsible deployment of predictive algorithms.

Additional diversity considerations become salient when we examine how the constituents of human-AI teams may complement one another. Recall, such complementarity between the task-relevant attributes of team members is central to how cognitive diversity can enable a productive division of labor and so improve team performance on complex tasks. The same logic applies to the case of human-AI teams. For example, Madras et al. (2018) show that substantial performance gains can be made when predictive algorithms are optimized to prioritize correct predictions on instances that are difficult for human experts, while deferring to human experts elsewhere, and these gains can be amplified by attending to the heterogeneous skills of experts (Gao et al., 2021). In line with the discussion above, whether and when such performance gains could be achieved generally depends on the nature and difficulty of the task as well as the type of errors made by humans and algorithms (Tan et al., 2018). An interesting and understudied issue here pertains to the development of methods that explicitly consider and benefit from the demographic diversity of human experts.

Algorithm-informed decisions and impact

Whether an algorithm is making autonomous decisions, or assisting human decision-makers, the diversity of its output set is also important. This topic has been considered in works on ranking and information retrieval (Singh and Joachims, 2018; Karako and Manggala, 2018), and more broadly on subset selection (Mitchell et al., 2020). In the context of image retrieval, one may care about the diversity of a set of images shown; in targeted advertisement, the diversity of those who will be shown a job ad matters; and if ML is used to pre-select individuals who will be hired or admitted into a program, the diversity of the selected set is of key relevance. As before, the relevant concept of diversity will depend on the task that the algorithm is solving.

Representative notions are frequently evoked when discussing diversity of algorithmic output, and are especially attractive in the context of information retrieval systems (Singh and Joachims, 2018), where the task is often construed as that of “mirroring” the world. That is, if a search engine is believed to be a descriptive tool, an argument for a representative conception of diversity can follow naturally. This argument may falter, however, if demographic information is deemed conceptually irrelevant to a query, in which case an egalitarian concept may be justified. Similar to a dictionary, which does not define “surgeon” to be a man, irrespective of the frequency of different genders in this occupation, one may expect a search engine to provide query-relevant results that are unaffected by such contingent frequencies.

Importantly, the societal impact of algorithms goes beyond those directly subjected to algorithmic predictions. For instance, information retrieval and search engines are not purely descriptive tools; instead, they hold an active role in shaping beliefs and behaviors. Considering this may significantly impact the concepts of diversity that are appropriate. Thus, normic (and hybrid) notions gain relevance once we consider the active role of such systems in shaping societal beliefs. Indeed, in one of the most in-depth discussions of diversity in algorithmic outputs and unlike most other approaches, Mitchell et al. (2020) defend a normic notion of diversity. In so doing, they introduce a family of metrics that facilitate the quantification of diversity in algorithmic subset selection from a normic perspective, grounded in considerations of social power differentials.

Systems of incentives introduced by the deployment of an algorithm may also have significant impacts on diversity. For instance, algorithmic unfairness may disincentivize investment from individuals who invest rationally (Liu et al., 2020). For example, if the use of standardized tests in college admissions disadvantages a group, this group has less incentives to invest in standardized testing, which would impact the diversity of both the pool of college students and the pool of applicants. Furthermore, in the presence of multiple players—such as multiple companies making hiring decisions—research has shown that a partial compliance to (supposedly) fairness-enhancing interventions can result in segregation (Dai et al., 2021). Thus, considering dynamics of deployment is crucial to understand the impact of algorithms on the diversity of different groups in society.

Discussion

The discussion so far has eluded a key difficulty due to value tensions. Consider, for instance, that whether, and to what extent, team diversity can result in epistemic benefits critically depends on how we measure performance. But what if individuals—perhaps in ways that correlate with sociocultural identity—cannot agree on a performance measure? For example, if one is concerned with the impact of teams’ diversity on building fairer algorithms, how to measure “fair” is a crucial matter that may be a subject of disagreement. Similarly, while diverse groups are more likely to come up with better solutions (in variety and quality), they may also take more time to reach consensus or never reach one at all (e.g. due to polarization) (Muldoon, 2018). Yet, these additional considerations are clearly crucial to social and institutional planners working under time and resource constraints. Even more generally, researchers have offered different ethical and epistemic rationales in support of distinct diversity concepts. While in many contexts these rationales work in tandem, it is also possible that they come apart. We offer no context-free solution to these value tensions, but two general points are worth emphasizing.

First, such value tensions are widespread in the context of deliberative and collaborative groups. A useful way to proceed, therefore, is to draw on techniques developed in previous research. For example, researchers in value-sensitive design have developed a range of techniques for addressing value tensions (Friedman and Hendry, 2019). In many contexts, for instance, it might be useful to shift the focus of discussion from underlying values to proposed courses of action, since actions are often over-determined by values—that is, divergent value systems can nonetheless agree on the same action. Similarly, recent works on diversity in political philosophy have sought to devise bargaining techniques for addressing disagreements among individuals with fundamentally different perspectives in ways that are nonetheless acceptable to those involved (Muldoon, 2016).

Second, in thinking about these value tensions, we caution against the trap of myopia. Viewed from a static or short-term perspective, such value tensions might appear inescapable. Things may look different, however, when we broaden our purview. This is familiar from works that examine the situated dynamics of algorithmic decision-making. Fairness measures that appear irreconcilable when considered statically can be jointly satisfied, for instance, when we move beyond the static setting of one-shot classifications to consider strategic plans consisting of multiple interventions over time (Fazelpour et al., 2021). Importantly, as noted above, some of the key enabling conditions for realizing diversity’s epistemic benefits involve values of respect, non-domination, and inclusion. Accordingly, apparent tensions between ethical and epistemic aims are often also tensions between short- and long-term epistemic considerations. Ultimately, how tensions are resolved and what trade-offs are struck depend on our value judgments. We hope that the foregoing offers some tools for making these judgments in an informed way.

Conclusion

Diversity’s importance at different stages of the ML lifecycle has increasingly been recognized in discussions about the ethics and governance of ML. Meaningful conversations, studies and interventions hinge on our ability to clearly define and articulate diversity and its implications. This paper provides this clarity by drawing from the broader humanities and social scientific literature on diversity. Building on these works, we identified and explicated diversity-related questions that arise at different stages of ML pipeline, and demonstrated how clarity over concepts and mechanisms can resolve seemingly contradictory findings in existing work, open up new directions of research, and enable better system design.

By providing a detailed characterization of the various contexts in which diversity can be relevant throughout the ML lifecycle, this paper makes a case for diversity as a design desideratum of teams, data and models. By translating multi-disciplinary literature and bringing it to bear on the study of sociotechnical ML systems, it provides conceptual tools to further advance research and practice grounded in a coherent understanding of sociocultural diversity.

Footnotes

Acknowledgements

We would like to thank the two anonymous reviewers for helpful comments and suggestions. An early version of this article was presented as a tutorial at the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT). We would like to thank the participants for their valuable feedback.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial supportfor the research, authorship, and/or publication of this article: For parts of the writing of this paper, Sina Fazelpour was a postdoctoral fellow at Carnegie Mellon University, supported in part by funding from the Social Sciences and Humanities Research Council of Canada (No. 756-2019-0289). Maria De-Arteaga was supported in part by a Google AI Award for Inclusion Research and by Good Systems, a research grand challenge at the University of Texas at Austin.

ORCID iDs

Sina Fazelpour

Maria De-Arteaga

Notes

References

Aizenberg

van den Hoven

(2020) Designing for human rights in AI. Big Data & Society 7(2): 1–14.

Albright

(2019) If you give a judge a risk score: evidence from kentucky bail decisions. Harvard John M. Olin Fellow’s Discussion Paper 85.

Aspinall

(2005) Language matters: The vocabulary of racism in health care. Journal of Health Services Research & Policy 10(1): 57–59.

Asudeh

Jin

Jagadish

(2019) Assessing and remedying coverage for a given dataset. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, pp. 554–565.

Bang

Frith

(2017) Making better decisions in groups. Royal Society Open Science 4(8): 170193.

Barbosa

Chen

(2019) Rehumanized crowdsourcing: a labeling framework addressing bias and ethics in machine learning. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. pp. 1–12.

Bear

Woolley

(2011) The role of gender in team collaboration and performance. Interdisciplinary Science Reviews 36(2): 146–153.

Blau

(1977) Inequality and Heterogeneity: A Primitive Theory of Social Structure. New York: Free Press.

Buolamwini

Gebru

(2018) Gender shades: Intersectional accuracy disparities in commercial gender classification. In: Conference on fairness, accountability and transparency. PMLR, pp. 77–91.

10.

Campbell

Mehtani

Dozier

, et al. (2013) Gender-heterogeneous working groups produce higher quality science. PLoS One 8(10): e79147.

11.

Cantor

Miles

Baker

, et al. (1996) Physician service to the underserved: Implications for affirmative action in medical education. Inquiry 33(2): 167–180.

12.

Catellani

Bertolotti

Vagni

, et al. (2021) How expert witnesses’ counterfactuals influence causal and responsibility attributions of mock jurors and expert judges. Applied Cognitive Psychology 35(1): 3–17.

13.

Celis

Deshpande

Kathuria

, et al. (2016) How to be fair and diverse? arXiv preprint arXiv:1610.07183.

14.

Chouldechova

(2017) Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data 5(2): 153–163.

15.

Collins

(2002) Black Feminist Thought: Knowledge, Consciousness, and the Politics of Empowerment. New York: Routledge.

16.

Cowgill

Dell’Acqua

Deng

, et al. (2020) Biased programmers? or biased data? a field experiment in operationalizing ai ethics. In: Proceedings of the 21st ACM Conference on Economics and Computation. pp. 679–681.

17.

Dai

Fazelpour

Lipton

(2021) Fair machine learning under partial compliance. In: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. pp. 55–65.

18.

Davani

Díaz

Prabhakaran

(2021) Dealing with disagreements: Looking beyond the majority vote in subjective annotations. arXiv preprint arXiv:2110.05719.

19.

Dobbin

Kalev

(2016) Why diversity programs fail. Harvard Business Review 94(7): 14.

20.

Drosou

Jagadish

Pitoura

, et al. (2017) Diversity in big data: A review. Big data 5(2): 73–84.

21.

Duan

Yin

(2020) Does exposure to diverse perspectives mitigate biases in crowdwork? an explorative study. In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, volume 8. pp. 155–158.

22.

Fazelpour

Danks

(2021) Algorithmic bias: Senses, sources, solutions. Philosophy Compass 16(8). doi:10.1111/phc3.12760

23.

Fazelpour

Lipton

Danks

(2021) Algorithmic fairness and the situated dynamics of justice. Canadian Journal of Philosophy: 1–17.

24.

Fazelpour

Steel

(forthcoming) Diversity, trust, conformity: A simulation study. Philosophy of Science. doi:10.1017/psa.2021.25

25.

Friedman

Hendry

(2019) Value Sensitive Design: Shaping Technology with Moral Imagination. Cambridge: MIT Press.

26.

Gao

Saar-Tsechansky

De-Arteaga

, et al. (2021) Human-ai collaboration with bandit feedback. arXiv preprint arXiv:2105.10614.

27.

Grasswick

(2018) Feminist social epistemology. In: Zalta EN (ed.) The Stanford Encyclopedia of Philosophy, fall 2018 edition. Metaphysics Research Lab, Stanford University.

28.

Harding

(2004) The Feminist Standpoint Theory Reader: Intellectual and Political Controversies. New York: Routledge.

29.

Harrison

Klein

(2007) What’s the difference? diversity constructs as separation, variety, or disparity in organizations. Academy of Management Review 32(4): 1199–1228.

30.

Homan

Van Knippenberg

Van Kleef

, et al. (2007) Bridging faultlines by valuing diversity: Diversity beliefs, information elaboration, and performance in diverse work groups. Journal of Applied Psychology 92(5): 1189.

31.

Hong

Page

(2004) Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proceedings of the National Academy of Sciences 101(46): 16385–16389.

32.

Intemann

(2010) 25 years of feminist empiricism and standpoint theory: Where are we now?. Hypatia 25(4): 778–796.

33.

Jacklin

(1978) Representative diversity. Journal of Communication 28(2): 85–88.

34.

Jobin

Ienca

Vayena

(2019) The global landscape of AI ethics guidelines. Nature Machine Intelligence 1(9): 389–399.

35.

Karako

Manggala

(2018) Using image fairness representations in diversity-based re-ranking for recommendations. In: Adjunct Publication of the 26th Conference on User Modeling, Adaptation and Personalization. pp. 23–28.

36.

Keswani

Lease

Kenthapadi

(2021) Towards unbiased and accurate deferral to multiple experts. In: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’21. pp. 154–165. doi:10.1145/3461702.3462516.

37.

Komaromy

Grumbach

Drake

, et al. (1996) The role of black and hispanic physicians in providing health care for underserved populations. New England Journal of Medicine 334(20): 1305–1310.

38.

Levine

Apfelbaum

Bernard

, et al. (2014) Ethnic diversity deflates price bubbles. Proceedings of the National Academy of Sciences 111(52): 18524–18529.

39.

Liu

Wilson

Haghtalab

, et al. (2020) The disparate equilibria of algorithmic decision making when individuals invest rationally. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. pp. 381–391.

40.

Longino

(1990) Science As Social Knowledge: Values and Objectivity in Scientific Inquiry. Princeton: Princeton University Press.

41.

Madras

Pitassi

Zemel

(2018) Predict responsibly: improving fairness and accuracy by learning to defer. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. pp. 6150–6160.

42.

Mallari

Inkpen

Johns

, et al. (2020) Do i look like a criminal? examining how race presentation impacts human judgement of recidivism. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. pp. 1–13.

43.

Martin Jr

Prabhakaran

Kuhlberg

, et al. (2020) Extending the machine learning abstraction boundary: A complex systems approach to incorporate societal context. arXiv preprint arXiv:2006.09663.

44.

Milliken

Martins

(1996) Searching for common threads: Understanding the multiple effects of diversity in organizational groups. Academy of Management Review 21(2): 402–433.

45.

Mitchell

Baker

Moorosi

, et al. (2020) Diversity and inclusion metrics in subset selection. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. pp. 117–123.

46.

Mitchell

Potash

Barocas

, et al. (2021) Algorithmic fairness: Choices, assumptions, and definitions. Annual Review of Statistics and its Application 8: 141–163.

47.

Muldoon

(2013) Diversity and the division of cognitive labor. Philosophy Compass 8(2): 117–125.

48.

Muldoon

(2016) Social Contract Theory for a Diverse World: Beyond Tolerance. New York: Taylor & Francis.

49.

Muldoon

(2018) The paradox of diversity. Geo. JL & Pub. Pol’y 16: 807–00.

50.

Obermeyer

Powers

Vogeli

, et al. (2019) Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464): 447–453.

51.

Page

(2010) Diversity and Complexity. Princeton: Princeton University Press.

52.

Page

(2017) The Diversity Bonus. Princeton: Princeton University Press.

53.

Peng

Nushi

Kıcıman

, et al. (2019) What you see is what you get? the impact of representation criteria on human bias in hiring. In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, volume 7. pp. 125–134.

54.

Pennsylvania Commission on Sentencing (2020) Proposed model pretrial risk assessment tool. Pennsylvania Bulletin 50(3): Pa.B. 294.

55.

Phillips

(2017) Commentary. what is the real value of diversity in organizations? questioning our assumptions. In: The diversity bonus. Princeton University Press, pp. 223–246.

56.

Phillips

Liljenquist

Neale

(2009) Is the pain worth the gain? the advantages and liabilities of agreeing with socially distinct newcomers. Personality and Social Psychology Bulletin 35(3): 336–350.

57.

Phillips

Loyd

(2006) When surface and deep-level diversity collide: The effects on dissenting group members. Organizational Behavior and Human Decision Processes 99(2): 143–160.

58.

Post

Byron

(2015) Women on boards and firm financial performance: A meta-analysis. Academy of Management Journal 58(5): 1546–1571.

59.

Ray

(2019) Why so many organizations stay white. Harvard Business Review Available at: https://hbr.org/2019/11/why-so-many-organizations-stay-white

60.

Roberson

(2019) Diversity in the workplace: A review, synthesis, and future research agenda. Annual Review of Organizational Psychology and Organizational Behavior 6: 69–88.

61.

Rodriguez-Diaz

Jimenez

Dueñas

, et al. (2018) Dialectones: Finding statistically significant dialectal boundaries using twitter data. Computación y Sistemas 22(4): 1213–1222.

62.

Rushton

(2008) A note on the use and misuse of the racial diversity index. Policy Studies Journal 36(3): 445–459.

63.

Sen

Giesel

Gold

, et al. (2015) Turkers, scholars, “arafat” and “peace”: Cultural communities and algorithmic gold standards. CSCW ’15. doi:10.1145/2675133.2675285.

64.

Shankar

Halpern

Breck

, et al. (2017) No classification without representation: Assessing geodiversity issues in open data sets for the developing world. arXiv preprint arXiv:1711.08536.

65.

Simpson

(1949) Measurement of diversity. Nature 163(4148): 688–688.

66.

Singh

Joachims

(2018) Fairness of exposure in rankings. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 2219–2228.

67.

Solomon

(2007) Social Empiricism. Cambridge: MIT press.

68.

Steel

Fazelpour

Crewe

, et al. (2021) Information elaboration and epistemic effects of diversity. Synthese 198(2): 1287–1307.

69.

Steel

Fazelpour

Gillette

, et al. (2018) Multiple diversity concepts and their ethical-epistemic implications. European Journal for Philosophy of Science 8(3): 761–780.

70.

Tan

Adebayo

Inkpen

, et al. (2018) Investigating human+ machine complementarity for recidivism predictions. arXiv preprint arXiv:1808.09123.

71.

Turner

Wetherell

Hogg

(1989) Referent informational influence and group polarization. British Journal of Social Psychology 28(2): 135–147.

72.

Van Dijk

Van Engen

Van Knippenberg

(2012) Defying conventional wisdom: A meta-analytical examination of the differences between demographic and job-related diversity relationships with performance. Organizational Behavior and Human Decision Processes 119(1): 38–53.

73.

West

Whittaker

Crawford

(2019) Discriminating systems: gender, race and power in AI. AI Now Institute Retrieved from: https://ainowinstitute.org/ discriminatingsystems.html

74.

Wylie

(2017) What knowers know well: Standpoint theory and gender archeology. Scientiae Studia 15(1): 13–38.

Diversity in sociotechnical machine learning systems

Abstract

Keywords

Introduction

The varied concepts of diversity

Within-group family of concepts

Comparative family of concepts

Combine, but don’t conflate

Potential epistemic benefits of diversity

Cognitive pathways

Information elaboration pathways

Diversity’s overall impact and the significance of context

Situating diversity in sociotechnical ML systems

Problem formulation

Design and development

Training data

Human-AI teams

Algorithm-informed decisions and impact

Discussion

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iDs

Notes

References