Sage Journals: Discover world-class research

Abstract

As natural language processing tools powered by big data become increasingly ubiquitous, questions of how to design, develop, and manage these tools and their impacts on diverse populations are pressing. We propose utilizing the concept of linguistic justice—the realization of equitable access to social and political life regardless of language—to provide a framework for examining natural language processing tools that learn from and use human language data. To support linguistic justice, we argue that natural language processing tools (along with the datasets that are used to train and evaluate them) must be examined not only from the perspective of a privileged, majority language user, but also from the perspectives of minoritized language users. Considering such perspectives can help to surface areas in which the data used within natural language processing tools may be (often inadvertently) working against linguistic justice by failing to provide access to information, services, or opportunities in users’ language of choice, underperforming for certain linguistic groups, or advancing harmful stereotypes that can lead to negative life outcomes for members of marginalized groups. At the same time, this framework can help to illuminate ways that these shortcomings can be addressed and allow us to use inclusive language data and approaches to leverage natural language processing technologies that advance linguistic justice.

Keywords

Natural language processing,linguistic justice,language,power

Introduction

As natural language processing (NLP) tools powered by big data become increasingly ubiquitous, questions of how to design and evaluate these tools and their impacts on social justice are pressing. This paper asks: how might we leverage NLP tools to advance social justice through linguistic justice? We define linguistic justice as the realization of equitable access to social, economic, and political life regardless of linguistic repertoire (Gazzola et al., 2021). Language underlies much of this access, and an increasing number of critical services are being rendered with NLP technologies, from automated courtroom transcription to video interviews in organizational hiring. While these technologies have the potential to benefit users, they will only contribute to linguistic justice and social equity if they are designed to avoid linguistic prejudice and to serve all speakers, not just speakers of “privileged” language varieties¹ (e.g. “standard” American English²).

A framework of linguistic justice illustrates that linguistically just NLP tools must: (1) work well for users regardless of language variety they use and (2) work to counteract inequities based on language use in decision-making and resource allocation. To build linguistically just NLP tools, we must recognize and address power inequities such as over/underrepresentation of linguistic patterns and discourses within datasets. This paper begins with key concepts about language, power, and social identity. We apply these understandings to explore concerns in NLP systems related to differential performance and opportunity allocation and linguistic profiling. We present a path forward through nine concrete actions across the design, development, and management of NLP tools before concluding.

Language, power, and social identity

To understand linguistic justice, we must first understand standard language ideology. This common belief holds that some language varieties are “better” than others. However, it has no basis in fact; all language varieties are equally capable of expression (Hill, 2008). Nevertheless, some language varieties have been privileged as “standard” or viewed as more “appropriate” because of their association with people in power. This is why, for example, “standard” American English (“S”AE) reflects the linguistic norms of middle-class, White men who have overwhelmingly held power within the United States (Baker-Bell, 2020; Hill, 2008). Meanwhile, other equally valid ways of speaking, such as African-American English (AAE),³ have been devalued (Baker-Bell, 2020; Hill, 2008; King, 2020). Because “standardized” languages are not linguistically better than any others, our definition of linguistic justice requires that users of any language variety be equally able to access services. One should not be denied housing, for example, based on “sounding White” or “sounding Black” (Baugh, 2003). Even when Black people use “S”AE, listeners may show linguistic bias against them (Alim, 2007). For example, when Black students use “S”AE, they can be perceived as “underperforming” (Alim, 2007). This reality reflects how race—along with other aspects of identity including gender, sexual orientation, nationality, etc.—are intertwined with power and continue to affect people’s life outcomes (Crenshaw et al., 1995). In the next section, we further delve into the links between language, power, and identity and how they manifest in NLP systems by exploring two unjust linguistic outcomes of NLP tools: differential performance and opportunity allocation and linguistic profiling.

Concerns for NLP tools trained on human language data

NLP tools perform better and provide more opportunities for speakers of privileged language varieties

To achieve linguistic justice, we must examine whether NLP tools are providing equitable access to information, services, and opportunities in the language variety of all potential users. Because NLP tools recognize and replicate language patterns based on their training data, we should examine the language contained in NLP datasets to see whose language and viewpoints are (not) represented, as data collection and data itself are not neutral (D’Ignazio and Klein, 2020). Large language models are increasingly powering NLP tools. These language models rely on data from the internet, but internet use varies by social factors, resulting in skews in representation. Some estimate that over 60% of all language content on the internet is English (W3Techs, n.d.), despite only ∼17% of people speaking English globally.⁴ Although ∼7000 languages are in use worldwide, only 7 have the large digital data that are typically called for in machine learning (Joshi et al., 2021). Meanwhile, over 88% of languages have “exceptionally limited resources” in the digital space (Joshi et al., 2021).

Even among well-represented languages, some perspectives are overrepresented. Reddit users, for example, are 67% male and 70% White; using language from Reddit, then, results in the reification of White, male perspectives (Bender et al., 2021, citing Barthel et al., 2016). Moreover, some perspectives are actively marginalized online. On Twitter, pervasive harassment of women, especially Black women, may lead to self-censorship ( Toxic, n.d. ), reducing the representation of (Black) women’s viewpoints in NLP datasets (Bender et al., 2021). Moreover, while one in 10 Black people in 2010 accessed Twitter daily (a rate over four times higher than White people) (Brock, 2012: 535), Black Tweets are still often considered “inappropriate” uses of Twitter (542) and are more often inaccurately flagged as hateful by automatic hate speech detection tools (Davidson et al., 2019). This deficit model of Black internet use is both inaccurate and harmful as it minimizes the importance of Black internet content, while disproportionately censoring Black speakers.

Underrepresentation of diverse language data contributes to disparities in NLP tool performance, resulting in inequitable access to NLP tools for speakers for which there do not yet exist NLP tools that work (at all, or as well) for their language varieties. This could result in linguistic injustice through differential access to goods, services, and opportunities by language variety. This has been shown, for example, with automatic speech recognition (ASR) tools from Apple, IBM, Google, Amazon, and Microsoft, which show higher error rates for Black speakers than White speakers (Koenecke et al., 2020). Higher error rates in automated hate speech detection tools, as discussed above in the Twitter example, are partially linked to under- and overrepresentation of particular languages in training data (Davidson et al., 2019; Tatman, 2017), impacting use of and access to social media platforms.

Further, data labelers may incorporate their own biases (Davidson et al., 2019)—such as labeling AAE as more negative than “S”AE when labeling training data for hate speech detection, or more often inaccurately labeling AAE as “unintelligible”. In the context of decision-making and resource allocation, NLP errors can have significant negative impacts. For example, certified court reporters in an experimental setting mistranscribed key facts from AAE speech, such as transcribing the utterance, “why your door always locked?” as “why you always lie?” (Jones et al., 2019: e236).

While many speakers of marginalized languages are also speakers of majority languages, asking minoritized language users to modify their language use by adopting dominant language practices results in an inequitable burden, as minoritized individuals may spend significant time, money, and psychological energy to modify their speech (Hughes and Mamiseishvili, 2013). Moreover, addressing linguistic inequities earlier on will ameliorate larger effects that may accrue over time. For example, if an algorithm ranks video search results based on NLP-generated transcripts, but the tool can only transcribe content from some language varieties, others will not surface in search results as easily. This inequity can compound over time if search results are also ranked by popularity.

To work toward linguistic justice in NLP, developers must think carefully about datasets they use and create datasets that are more balanced for language variety (Bender et al., 2021). They must also involve speakers of diverse language varieties so they can accurately analyze and label language data. If only majority language speakers are involved, they may improperly label or even discard language data from other varieties. It is also important to conduct audits and user testing with a diverse set of users to spot and address potential disparities in performance by language variety. Accurately including language from a greater diversity of varieties is an important step toward achieving equitable access, and with it, linguistic justice.

Not all communities, however, wish to contribute data or share the same definition of intellectual property (Tatsch, 2004). Some language data, for example, may contain sensitive, culturally specific knowledge, and use of language data by individuals outside the community may require specific practices (Tatsch, 2004). Although some language communities may be excluded at present, others may choose not to participate. While some may view representation as key to linguistic justice, others may see it as contributing to unjust surveillance and control. Instead, the right to privacy or opacity may be considered more crucial (Blas, 2016; Glissant, 1997). Ultimately, while we recognize it may not be possible to curate datasets representative of all language varieties, we can continue to collaborate with diverse speaker communities and work toward equity, while being transparent in decisions and limitations. This includes giving communities choices in whether or not to share their linguistic data and honoring their decisions.

NLP tools can advance linguistic injustice through linguistic profiling

NLP tools can exacerbate social disparities through their advancement of language-based stereotypes. When we communicate, we not only communicate the literal content, but we simultaneously convey massive amounts of associated information about our social identities—such as race, gender, and nationality (Baugh, 2003; Hughes and Mamiseishvili, 2013). These associations are inherent to human communication, but can result in harm through linguistic profiling—making assumptions about an individual’s identity based on their language use (Baugh, 2003; Hughes and Mamiseishvili, 2013). These associations can be activated with extremely small amounts of linguistic data, making it difficult or impossible to curate datasets without identity markers. Purnell, Idsardi, and Baugh (1999), for example, showed that participants were able to correctly identify the race of 70% of speakers (who were not visible) after hearing them say only the word “hello.” Participants were also largely able to identify speakers’ gender by voice. Thus, natural language data that serve as inputs to NLP systems may lead the systems to learn and adopt identity-based stereotypes.⁵

Research shows that AI systems do form these connections, and they can use that information to discriminate. For example, an AI-powered resume scanning tool from Amazon was shown to discriminate by penalizing resumes containing words like “women” (Dastin, 2018). Even when direct references to gender were removed, discrimination continued. The tool picked up on linguistic patterns, such as men’s higher use of terms like “executed,” to infer the applicant’s gender (Dastin, 2018). While the link between terms like “executed” and masculinity may not be obvious, the repeated use of this term largely by men results in a connection that led to linguistic injustice: women were denied employment because of their language use.

As tools like NLP-powered virtual assistants or assessment tools become mediators for people seeking access to resources and opportunities, we must address the potential for NLP systems to use language to discriminate, and thus perpetuate linguistic injustice. Although Amazon no longer uses its tool, similar NLP-powered services remain (Raghavan et al., 2020). Tools like HireVue, which uses NLP to evaluate video job interviews, make judgments about applicants’ potential based largely on “their word choices and the language of their responses” (Zielinski, 2020).

Language has been shown to be a factor in hireability outside of NLP (Hosoda and Stone-Romero, 2010; Hughes and Mamiseishvili, 2013) despite the fact that human judgments of speakers’ language are notoriously unobjective (Hill, 2008). Hosoda & Stone-Romero (2010), for example, showed that job applicants with French-accented English were preferred over those with Japanese-accented English—even when applicants with Japanese accents were more understandable. If NLP tools are trained on data from human decisions, they will replicate those biased outcomes. To build NLP systems that treat speakers equitably, we must identify how past human decisions contained in the data reflect human biases, work to correct biases while being transparent about limitations, and audit NLP tools. By considering linguistic justice, we can see that it is unjust to build NLP tools that prioritize certain ways of speaking—and with them, certain social identities—over others.

A path forward

In considering linguistic justice, we identified two main areas where injustice can occur in NLP: (1) NLP tools may perform worse for users of minoritized language varieties resulting in inequitable access to information and opportunities and (2) NLP may reproduce injustice through linguistic profiling. To move toward linguistic justice—and thereby, social justice—we provide nine actions to prioritize in NLP tool development and management. These nine actions speak to data and NLP systems themselves, as well as broader power dynamics, values, and priorities in the design, development, and management of NLP tools.

Work with diverse language communities in participatory and empowering ways. Instead of building a tool that serves dominant language groups and asking minoritized language speakers to adapt, start by collaborating with marginalized communities. Understand communities’ wants and prioritize NLP development that serves them. This includes respecting that some may not want their language used for NLP. It also requires learning about and honoring language ownership within different communities (Tatsch, 2004).

Create conditions for members of diverse language communities to thrive on teams building NLP systems. Support linguistic diversity by encouraging multilingualism and avoiding a monolingual/“English-only” mentality in the workplace.

Ensure data is labeled by people familiar with the particular language variety. For example, speakers of “S”AE may be familiar with hearing AAE, but unfamiliar with its grammar. They may erroneously consider AAE features to be “mistakes” and mislabel them. Data labelers—and people providing data annotation instructions—should be trained in how linguistic biases manifest and how to mitigate them.

Transparently curate datasets with diverse language varieties so NLP tools serve diverse users. While it may be impossible to curate a fully equitable dataset, transparently documenting which language varieties are represented is important (Bender et al., 2021). Some NLP systems will not have the necessary data to serve certain groups; presenting such caveats transparently can assist in identifying the next steps for NLP development.

If considering off-the-shelf datasets or language models, ask whether developers of those resources have taken the considerations outlined here. Even if a dataset or model is widely used, it does not necessarily contribute to linguistic justice. In some cases, you might want to build a new dataset that allows linguistic justice to be prioritized.

Set performance goals that advance linguistic justice. While the goal of “human-like” performance may result in successful outcomes for members of some groups, it may reinforce harmful human biases. Goals that center equity may not only lead to better societal outcomes, but also products that avoid regulatory and reputational risk.

Audit tools throughout development and use. NLP tools can learn to discriminate against particular identities. Auditing tools is critical to ensure they perform equitably for different users.

Examine and alter power structures. Changing language alone is insufficient. We must examine existing power structures: who was provided the opportunity to make critical decisions about new technologies? People in power who benefit from unjust NLP tools have the greatest incentives to reproduce the status quo, so prioritizing the needs and perspectives of those at the margins may facilitate the development of systems that work for all (D’Ignazio and Klein, 2020).

Be wary of techno-solutionism. It’s important to recognize the limitations of technology in solving complex societal challenges. Sometimes, the question is not what tool should be developed, but rather should a tool be developed to solve this problem and what other nontechnical interventions might we employ?

Conclusion

This paper presents linguistic justice as a framework for NLP design, development, and management. By effectively centering linguistic justice in NLP, we can advance social justice. Doing so requires that NLP tools equitably serve diverse language users while acknowledging and responding to the harms of linguistic profiling within the current status quo. As Baker-Bell (2020) notes, “Within a Linguistic Justice framework, excuses such as “that’s just the way it is” cannot be used as justification for Anti-Black Linguistic Racism, white linguistic supremacy, and linguistic injustice” (p. 7). Instead, we must imagine and create a world where users of all language varieties are able to equitably access social, economic, and political life. Our nine actions provide a path toward that world whereby we design, develop, and manage NLP systems that advance linguistic justice, and as a result, social justice. This critical work will require practitioners to rethink how we collect data and what data we prioritize and value in NLP development. These shifts will take time and concerted effort. Finally, we must remember that technology alone cannot solve complex societal problems, but should be part of broader efforts toward linguistic and social justice.

Supplemental Material

sj-pdf-1-bds-10.1177_20539517221090930 - Supplemental material for Linguistic justice as a framework for designing, developing, and managing natural language processing tools

Supplemental material, sj-pdf-1-bds-10.1177_20539517221090930 for Linguistic justice as a framework for designing, developing, and managing natural language processing tools by Julia Nee, Genevieve Macfarlane Smith, Alicia Sheares and Ishita Rustagi in Big Data & Society

Footnotes

Declaration of conflicting interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The Center for Equity, Gender, and Leadership has received funding from a large Silicon Valley Tech Firm, whose work includes natural language processing and other use of big data. While this work was carried out by independent researchers at the Center, conversations between the researchers and individuals at the tech firm occurred and the research builds on a previous research collaboration between the tech company and the Center.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by The Center for Equity, Gender, and Leadership at Berkeley Haas School of Business.

ORCID iDs

Julia Nee

Genevieve Macfarlane Smith

Alicia Sheares

Ishita Rustagi

Supplemental material

Supplemental material for this article is available online.

Notes

References

Alim

(2007) Critical hip-hop language pedagogies: combat, consciousness, and the cultural politics of communication. Journal of Language, Identity & Education 6(2): 161–176.

Baker-Bell

(2020) Linguistic Justice: Black Language, Literacy, Identity, and Pedagogy. New York: Routledge.

Barthel

Stocking

Holcomb

, et al. (2016) Reddit news users more likely to be male, young and digital in their news preferences. Pew Research Center. Available at: https://www.pewresearch.org/journalism/2016/02/25/reddit-news-users-more-likely-to-be-male-young-and-digital-in-their-news-preferences/.

Baugh

(2003) Linguistic profiling. In: Black Linguistics: Language, Society, and Politics in Africa and the Americas. Oxon, OX: Routledge, pp.155–168.

Bender

Gebru

McMillan-Major

, et al. (2021) On the dangers of Stochastic Parrots: can language models be too big? In: FAccT ’21, Virtual Event, 2021, pp. 610–623.

Blas

(2016) Opacities: an introduction. Camera Obscura 31(2): 149–153.

Brock A (2012) From the Blackhand side: Twitter as a cultural conversation. Journal of Broadcasting & Electronic Media 56(4): 529-549. DOI: 10.1080/08838151.2012.732147.

Crenshaw

Gotanda

Peller

, et al. (1995) Critical Race Theory: The Key Writings That Formed the Movement. New York: New Press.

Dastin

(2018) Amazon scraps secret AI recruiting tool that showed bias against women. Reuters, 10 October. Available at: https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G.

10.

Davidson

Bhattacharya

Weber

(2019) Racial bias in hate speech and abusive language detection datasets. In: Proceedings of the Third Workshop on Abusive Language Online, Florence, Italy, 2019, pp. 25–35.

11.

D’Ignazio

Klein

(2020) Data Feminism. Cambridge, Massachusetts: MIT Press.

12.

Gazzola

Wickström

B-A

Fettes

(2021) Towards an index of linguistic justice. Working Paper. Ulster University. Available at: https://www.ulster.ac.uk/__data/assets/pdf_file/0011/677306/REAL20-1.pdf (accessed 23 July 2021).

13.

Glissant

(1997) Poetics of Relation. Ann Arbor: The University of Michigan Press.

14.

Hill

(2008) The Everyday Language of White Racism. West Sussex: John Wiley & Sons Ltd

15.

Hosoda

Stone-Romero

(2010) The effects of foreign access on employment-related decisions. Journal of Managerial Psychology 25(2): 113–132.

16.

Hughes

Mamiseishvili

(2013) Linguistic profiling in the workforce. In: Byrd

Scott

(eds) Diversity in the Workforce: Current Issues and Emerging Trends. New York: Routledge, pp.249–265.

17.

Jones

Kalbfeld

Hancock

, et al. (2019) Testifying while black: an experimental study of court reporter accuracy in transcription of African American English. Language 95(2): e216–e252.

18.

Joshi

Santy

Budhiraja

, et al. (2021) The state and fate of linguistic diversity and inclusion in the NLP world. In: Proceedings of the 58th Annual Meeting of the ACL., Online, 2021, pp. 6282–6293.

19.

King

(2020) From African American Vernacular English to African American language: rethinking the study of race and language in African Americans’ speech. Annual Review of Linguistics 6: 285–300.

20.

Koenecke

Nam

Lake

, et al. (2020) Racial disparities in automated speech recognition. PNAS 117(14): 7684–7689.

21.

Purnell

Idsardi

Baugh

(1999) Perceptual and phonetic experiments on American English dialect identification. Journal of Language and Social Psychology 18(1): 10–30.

22.

Raghavan

Barocas

Kleinberg

, et al. (2020) Mitigating bias in algorithmic hiring: evaluating claims and practices. ACM Conference on Fairness, Accountability, and Transparency.

23.

Tatman

(2017) Gender and dialect bias in YouTube’s automatic captions. In: Proceedings of the First Workshop on Ethics in Natural Language Processing, Valencia, Spain, 4 April 2017, pp. 53–59. Available at: https://www.aclweb.org/anthology/W17-1606.pdf.

24.

Tatsch

(2004) Language revitalization in native North America - issues of intellectual property rights and intellectual sovereignty. Collegium Antropologicum 28(1): 257–262.

25.

Toxic Twitter - The Psychological Harms of Violence and Abuse Against Women Online (n.d.) Amnesty International. Available at: https://www.amnesty.org/en/latest/research/2018/03/online-violence-against-women-chapter-6/#topanchor.

26.

W3Techs Web Technology Service (n.d.) Usage statistics of content languages for websites. Available at: https://w3techs.com/technologies/overview/content_language.

27.

Zielinski

(2020) Addressing artificial intelligence-based hiring concerns. SHRM, 22 May. Available at: https://www.shrm.org/hr-today/news/hr-magazine/summer2020/pages/artificial-intelligence-based-hiring-concerns.aspx.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.09 MB