Solving women’s voices? A theoretical framework for talking through the algorithm

Abstract

This article offers a four-dimensional framework for analyzing technologies of gendered voice, especially those which have sought to correct or “solve” the supposedly unruly, unpleasant, or otherwise problematic voices of women. While modern technologies such as artificially intelligent assistants Siri and Alexa implement digital algorithms for cultivating feminine voices, this framework insists upon a broad conceptualization of algorithms to consider how predigital technologies and institutions of vocal cultivation also anticipate and echo such contemporary means of vocal norm-production and control. By imagining gendered voice through this broad lens of the algorithm, we can begin to deconstruct the ways in which technologies and institutions of voice have historically operated as algorithms that attempt to “solve” women’s voices by making them amenable to hegemonic, patriarchal values, uses, and ideals. The research builds upon existing communication literature surrounding the nature and functionality of algorithms as well as feminist posthumanist theory, which provides a richer conceptualization of how algorithms of voice enact both a political and material discipline upon women’s voices.

Keywords

Algorithms feminist studies gender voice

Introduction

For whose voice have audio technologies been constructed? Whose voice must be constructed by and for those audio technologies in turn? In her 1991 book, Echo and Narcissus: Women’s Voices in Classical Hollywood Cinema, feminist film scholar Amy Lawrence observes that

if we return to the very beginnings of sound technology with women’s voices in mind, we find something at once obvious and rather startling: the basic ability to record the human voice was predicated on the ability to record the male voice.¹

Lawrence’s observation thus establishes the realization that early instruments of sound recording were biased toward men’s voices,² reifying existing beliefs that women’s voices were problematic—that they were naturally less powerful and technically deficient. Such gendered technological bias provided seemingly objective evidence of yet another reason why women should train and transform their voices in a way that conformed to male-defined standards of speech. In this article, I ask how we might imagine the various acoustic technologies that have tamed and transformed women’s voices across history through the lens of the algorithm. Integral to this work is my conceptualization of algorithmic technologies as extending beyond their modern digital connotations, understanding them in the fullest sense as any fixed set of rules, calculations, or manipulations designed to transform, or, more tellingly, to “solve” the original input data—in this case, women’s voices.

While my work takes many of its cues from Philip Napoli’s (2014) conceptualization of algorithms as analogous to norm-defining and norm-producing institutions, works of recent scholarship from Eubanks (2018), Hoffman (2019), and Noble (2018) provide further definitional insight on algorithms as non-neutral heuristic technologies which so often function to “reinforce oppressive social relationships” (Noble, p. 1). As these authors explore, algorithms are utilized and justified for their ability to streamline the processing of data, whether it be for purposes of decision-making, prediction, classifying, or sorting. In any case, we can understand algorithms as a calculus that removes, remediates, or otherwise reforms input data to create some potentially more useful in/formation. In her 2019 article, “Where fairness fails: data, algorithms, and the limits of antidiscrimination discourse,” Hoffman explains that “algorithmic or automated systems do not only issue decisions, they are also intertwined in the production of social and cultural meaning” (p. 909). More importantly, perhaps, Hoffman explains that such production is not random, but inclined toward a particular kind of meaning that “[reinforces] certain discursive frames over others” (p. 909). To bring sound and voice back into the equation, we might imagine algorithms operating as a sort of noise reduction, composing raw sound data into a meaningful melody. Humans can perform calculus or composition of this sort by themselves, though when we talk about “the algorithm” today, we often refer to the digital, automated technologies that humans have produced to reduce their workload and maximize “efficiency.” But as these authors demonstrate in their works, the seemingly distant, immediate, immaterial machinations of algorithms give way to deeply felt material realities, like the distribution of wealth or access to resources such as health care. Humans extend themselves and their preferences into algorithms, and even after the algorithms seem to have “taken over,” Eubanks (2018) posits that the “humans that remain become extensions of algorithms” (p. 315). If this broad conceptualization of algorithms seems to threaten the very meaning or usefulness of the word itself, it is precisely because my goal is to push back against any suggestion that these technologies are somehow entirely new and devoid of the legacies of prejudice that have always informed decision-making and social organization.

Language, itself a technology of representation, is also a technology of gender (de Lauretis, 1987). In other words, it is through the social technology of language that we come to express and give shape to constructs such as gender and sexual difference. And I argue that this technology, especially as it is ever-further abstracted by additional mediating technologies of representation, is inherently algorithmic, as it attempts to solve “unruly voices,” bringing them into the normative fold of idealized gender, race, and class. In this way, we may begin to see algorithmic gender bias as a phenomenon that compounds and amplifies itself through a representational feedback loop beginning with the broadest syntactical algorithms of language and building toward the most specific vocal algorithms like finishing school elocution, lossy compression in audio recording, or artificially intelligent fembots such as Siri and Alexa. While no person is wholly immune from the disciplinary pressures exerted through such algorithms of voice, I argue that they carry a different sociological weight for those identifying or identified by others as women, as women’s bodies are already the assumed objects of visual as well as aural consumption, and the assumed feminine proclivity for idle “talk” doubly implicates them as objects—amalgams of chaotic data—in need of being solved through speech. Furthermore, although gendered bias in vocal algorithms is the primary focus of the present framework, it must be emphasized that there is also a unique burden upon non-white voices, as many technologies of speech are coextensive with projects of imperialism and white supremacy (Katz, 2020; T. C. Moran, 2021; Rajendran, 2019). Future work should expressly attend to the intersection of race and gender in the context of algorithmic vocal discipline.

Largely building on the theories of feminist and posthumanist scholars N. Katherine Hayles and Elizabeth Grosz, this article proposes a framework for analysis whose goal is to better account for the ways in which algorithms have brought discursive and material bias to bear upon women’s voices. This framework begins with a brief two-part review of literature that first makes a case for the conceptualization of vocal algorithms as vehicles of “techno-institutional bias,” followed by an interrogation of the nature of discipline as it pertains to the “system of constraints and privations, obligations and prohibitions” (Foucault, 1977, p. 45) inherent within algorithms of voice. Ultimately, I urge a posthumanist perspective on discipline which pushes past the limitations of a Foucauldian conception to emphasize vocal algorithms as material, embodied phenomena. After making these conceptual investigations and justifications, I outline four overlapping dimensions, or ways of looking, from which to begin a thorough appraisal of the roots, causes, mechanisms, and symptoms of techno-institutional bias against women’s voices. By “talking through the algorithm” in these ways, I hope to begin unraveling the complex web of discourses and materialities that give shape to the vocal algorithms that take “unruly” feminine talk and transform it into speech.

The algorithm as techno-institutional bias

In his 2014 article, “Automated Media,” Philip Napoli applies institutional theory to the recent “algorithmic turn” to highlight the inherently parallel functions of institutions and media algorithms as social formations which exert power via “regulatory, normative, and cultural-cognitive” dimensions (p. 342). Napoli borrows the phrase “algorithmic turn” from William Uricchio (2011), who utilizes Heidegger’s image-driven concept of Weltbild [“world-picture”] as a metaphor for modern society’s recent turn to algorithms as a way of understanding “relations between the viewing subject and the world viewed” (p. 25). Ultimately, Uricchio asserts that the algorithmic turn is predicated upon our recent “increased access to new ways of representing and seeing the world, ways dependent on algorithmic interventions between the viewing subject and the object viewed,” citing and exploring visual technologies such as Global Positioning System (GPS) navigation and augmented reality as examples (p. 25). However, this article might challenge us to also consider the ways in which humans relate to their environment through a sonic Weltbild, to make the “algorithmic turn” useful as it relates to the plentiful new algorithmic interventions in voice technology, such as artificial intelligence (AI) voice assistants, talk-to-text applications, or vocal recognition software. Afterall, the sonic is another crucial plane through which we experience and represent the world around us, and the interventions between the hearing/speaking subject and the sonic constitute another set of powerful meaning-making institutions worth investigation.

Once we have opened our minds to the aural/vocal algorithmic turn, it is easier to expand our imaginations beyond the more recent digital voice applications to see a long history of algorithms that have sought to intervene in relations of the hearing/speaking subject and her heard/spoken world as institutions that “constrain and regularize” (Napoli, 2014) behaviors of the voice. Indeed, my use of the term “techno-institutional bias” is premised upon Napoli’s extension of Katzenbach’s (2011) argument that “media technologies should be thought of as an institution” (Napoli, 343). In other words, it is meant to reflect the fact that algorithms are institutional technologies—the programmatic means by which institutions exert the “regulatory, normative, and cultural-cognitive” power that Napoli observes. This is not to say that algorithms are necessarily secondary to or separate from institutions. Even if a particular algorithmic technology may have developed originally as the product of a formal institution, the algorithm in and of itself, as a set of “formal or informal routines, norms, rules, or behavioral guidelines” (Napoli, 2014, p. 341), becomes isomorphic with the original institution. Cognitive scientist Yarden Katz (2020) explores this exact phenomenon in his recent book Artificial Whiteness. As Katz (2020) argues, the technology of A.I., having been developed within Western military and academic institutions, and ultimately being composed of algorithms and other programmatic codes, is effectively “isomorphic to whiteness in being nebulous and hollow, with its shifting character guided by imperial and capitalistic aims” (p. 9).

In connecting the dots between the various modes of vocal, algorithmic discipline, I make a case for such algorithms as isomorphic with larger, abstract, yet undeniably influential institutions including whiteness, imperialism, and patriarchy. In some (maybe most) instances, these inherently biased institutions impact and beget further institutions of bias simply because they provide the hegemonic context within which certain technologies are created. For instance, thinking back to Lawrence’s revelation about the gendered determinism of early sound recording, we can see how the very patriarchal logics that made for a world in which the primary scientists and inventors responsible for creating recording technology were male also made for a world in which women’s voices seemingly required optimization, training, or solving, to make themselves suitable for mediated reproduction. In other instances, however, we see how such technologies take shape as the express projects of biased social institutions, as is the case with those projects of linguistic/phonetic standardization that flourished under the reign of English philologists such as Henry Sweet or Babington Macaulay, who participated in Anglo-Imperialism by asserting that “English was superior to all non-English vernaculars” and that “linguistic difference was . . . equivalent to intellectual, cultural, and religious inferiority” (Rajendran, 2019, p. 1).

The voice and embodied discipline

Given Napoli’s (2014) emphasis upon algorithms’ subtle exertion of power through “regulatory, normative, and cultural-cognitive” dimensions, it is practically impossible to conceptualize them fully without acknowledging Foucault’s notion of discipline. In Discipline and Punish: The Birth of the Prison, Foucault argues that the advent of the modern carceral system—as well as other institutions like the school and hospital—was the result of a new paradigm of disciplinary power that made individuals into “docile bodies.” The docile body, per Foucault’s (1977) definition, is clearly the product of algorithmic design. In many ways based on La Mettrie’s 1748 text L’homme-Machine (the man as machine), Foucault describes the formation of the docile body with the example of the soldier, using phrases such as “calculated control” and “automatism of habit” to describe the ways in which discipline achieves “controlling or correcting the operations of the body” (pp. 128–130, my emphasis). The implementation of such bodily automatization was moreover the result of an internalized surveillance, or universalizing gaze best exemplified by Jeremy Bentham’s Panopticon. With this all-seeing prison in mind, Foucault (1977) argues that, as opposed to traditional means of power, “disciplinary power . . . is exercised through its invisibility; at the same time, it imposes on those whom it subjects a principle of compulsory visibility. In discipline, it is the subjects who have to be seen” (p. 177).

Perhaps it is not surprising that in theorizing and philosophizing the organizational structures that give shape to the world, Foucault, like Uricchio, takes a visual approach. Just as Uricchio premises his “algorithmic turn” upon the Weltbild—the world as picture—Foucault imagines discipline primarily through the domain of the gaze, using the panopticon as the figurative typification of modern institutional power. Reading through the myth of Echo and Narcissus, Lawrence (1991) identifies a distinct cultural hierarchy that has lifted the visual/seeing above the sonic/hearing, sometimes disconnecting the sensations altogether. This may be attributed in no small part to the fact that the phenomenon of sight is linked to the masculine through Narcissus, while sound rests with the feminine Echo. However, the voice is as much a part of any “docile body” as an “erect head” or “taut stomach” (Foucault, 128), and has been subjected to just as much manipulation and normative regulation—especially, as I argue, if those docile bodies are identified as women. Consider, for instance, the finishing school, where girls were subjected not only to uniform physical composure (imagine the clichéd image of a young lady balancing a book atop her well-coiffed head), but to vocal training that would rid them of any unattractive dialectic deviations, giving women like Jackie Kennedy distinctive, if somewhat unnatural accents that would make them suitable for marriage to a good society man (Abad-Santos, 2017). If this example is too limited to the elite, imagine then the working-class women of the Bell Telephone company, whose torturous training to become the ideal feminine, polite voice behind telephone operating service is documented by Elinor Carmi in her 2015 article “Taming Noisy Women.”

As insightful as Foucault’s conceptualization of discipline may be, its operationalization in terms of vocal algorithms must account for embodiment in a more complete, less phallogocentric way. While bodies do not necessarily “disappear” in Foucault’s account of discipline, posthumanist scholar N. Katherine Hayles (1999) considers how the political technology of the Panopticon “abstracts power” in such a way that “the specificities of their corporealities fade into the technology as well” (p. 194). Ultimately, Hayles (1999) argues that Foucault’s analysis, precisely because it is so focused on discursive power, “participates in, as well as deconstructs, the Panoptic move of disembodiment,” not to mention participates in a humanist habit of “unreflectively” taking the male body “as the norm” (pp. 194–195). To put it in a different context, we might say that just as Foucault’s vested interest in the gaze forces a hierarchical disjuncture between sight and sound, so too does his focus on the abstract discourse of the Panopticon force a humanist disjuncture between mind and body. Posthumanism, then, as well as the general principles of intersectional feminism, can offer an antidote by insisting upon the connection between discourse and material environments, or lived, bodily realities. Such a perspective can lift the veil of discursive obscurity to help “clarify the mechanisms of change” that link the various manifestations of vocal algorithms across history, and “create feedback loops between materiality and discourse” (Hayles, 1999, p. 195). To this end, Hayles looks to the work of Elizabeth Grosz (1994), whose book Volatile Bodies has offered up the Möbius strip as a model for better understanding how bodies are structured/disciplined from the inside-out, and then from the outside-in again. I posit the four-part framework below as a means of better approximating this recursive, materially focused model.

A multidimensional approach

The following section attempts to briefly trace the underlying logic of vocal algorithms as it manifests itself across mythological, psycho-linguistic, socio-historical, and techno-material dimensions of gender bias in speech, language, and audio technology. While these dimensions are in no way mutually exclusive, it is my belief that the exercise of unraveling them in the order I have outlined may be helpful in conceptualizing the ways in which vocal algorithms are as much a function of invisible ideological stigmas cloaked as “common sense” (Hall, 1981) as they are the producers of physical transformation, working their way from the most deeply entrenched discourses to the tangible, embodied inscriptions of discipline. These dimensions capture various ways of looking at (or perhaps, in this case, ways of listening to) the phenomenon of algorithmically solved “unruly voices,” which, together, help us to understand its true texture and depth; indeed, they can help make material what may otherwise be overlooked as merely abstract. In short, I believe that through these dimensions, we can begin to see and hear “the feedback loops between materiality and discourse” (Hayles, 1999).

Mythological dimension

The discourse of the inferior, unruly feminine voice is deeply rooted in Western mythology. Recorded by Ovid in 8 CE, the Greco-Roman myth of Echo and Narcissus contains not only the implicit sight/sound, masculine/feminine hierarchy previously observed by Lawrence (1991), but also the telling suggestion that women’s inclination for idle, even deceptive chatter was best remedied by their conformity to male speech. As Ovid’s (8 CE/1993) poem explains, before first encountering Narcissus, the nymph Echo used to distract the goddess Juno from finding her husband Jove philandering with the other nymphs by “cunningly” stopping her to “talk and talk” so that the guilty parties had time to run away (Ovid, 1993, p. 91). As punishment, Juno curses Echo so that she can only repeat the “concluding sounds of any words she’s heard” (Ovid, 1993, p. 92). While this is not technically limited to male speech, it becomes as such when Echo finds herself enraptured by the young Narcissus, who, repulsed by her, leaves her to waste away, until she is only a voice without a body. While Narcissus’s body will also eventually waste away, staring at his own image in perpetuity, Lawrence (1991) rightly observes that “Echo suffers more,” because, at the very least, Narcissus sees his “ardour returned” by himself (p. 2).

Also in Ovid’s Metamorphoses is the story Pygmalion, an artist who, “disgusted by the many sins to which the female mind had been inclined by nature” (p. 335), decides to sculpt for himself an ideal woman who is then brought to life by the goddess Venus. While this myth of the perfect man-made woman is not inherently concerned with the issue of voice, it is fascinating to see how it takes on this dimension as it evolves with shifting paradigms of cultural influence across history. By the late 19th/early 20th century, in the heyday of the British Empire and era of increased interest in “voice culture,” playwright George Bernard Shaw began reimagining the story of Pygmalion as an homage to English philologist Henry Sweet, who had a vision for a world in which more and more people spoke a more and more perfect, phonetically optimized form of the English language. So, in Shaw’s adaptation of Pygmalion, which officially premiered in 1913, the male protagonist was not a sculptor of figures, but of speech: the fictional professor Henry Higgins (based on Sweet), who makes an ideal, society lady out of Eliza Doolittle by training her in proper English dialect. Over 100 years later, Shaw’s Pygmalion has been adapted several times on stage and screen, and Ovid’s original myth has been reconceived for a new paradigm of cultural influence yet again by filmmakers such as Alex Garland, whose 2015 movie Ex Machina tells the story of Pygmalion in the context of cutting edge A.I. robotics.

Psycho-linguistic dimension

The psycho-linguistic dimension of this framework considers how psychoanalytic theories of subject development reveal the phallogocentrism of language and reinscribe the mythology of the unruly feminine voice. Indeed, as Roland Barthes (1972) writes in his Mythologies, “myth in fact belongs to the province of general science, coextensive with linguistics, which is semiology” (p. 111, emphasis in original). While psychoanalysis may be historically complicit in misogynist pathologization of “unruly women,” it nonetheless provides a framework for understanding the supposed “unruliness” of women’s voices, and of woman’s subordination and objectification within and through the masculine signifying economy of language. Put differently, although Freudian and Lacanian psychoanalysis may be entirely mythological themseselves, they are “scientific” in their methodical application as semiology—as a means of making sense of the way that mythologies have come to structure the significance of the world around us. It is for this reason that so many feminist film scholars, especially those with a vested interest in language and voice, such as Amy Lawrence (1991) and Kaja Silverman (1988), have turned to Lacanian psychoanalysis to make sense of women’s place in cinema.

Per Lacan’s (1968) theory of the developing subject through speech, woman as mother is understood to represent a state of pre-language, the chaotic, pure sound of what has been called the chora (Kristeva, 1974; Silverman, 1988); while the mother exists indivisible from child in this womb of pure-sound, the name of the father represents law, the rules of syntax, and entrance into the realm of the Symbolic. Reading this into the logic of gender dynamics in classical Hollywood cinema, Silverman (1988) observes the ways in which women’s voices are “buried” within the diegesis of the film, which are almost always directed by male authorial voices. In other words, according to Silverman (1988), male voices can often be said to represent/dictate the syntactical organization of the film, ordering the arrangement of objects much like the male gaze, according to Mulvey (1975), while women’s voices are subsumed within the text, only finding their escape under certain limited conditions. In Silverman’s (1988) estimation, one mode of escape is found in the low-pitched voices of women such as Lauren Bacall, Mae West, and Marlene Dietrich, whose conformity with a more masculine connotation of voice allows them to move beyond the feminine, endowing their voices with an “excess” that “confers upon” their bodies a “privileged status vis-a-vis both language and sexuality” (p. 61).

Socio-historical dimension

By acknowledging the phenomenon in classical cinema whereby lower-voiced women were able to achieve a somewhat improved status within narrative film, Silverman points us back to questions about the socio-historical dimensions of voice algorithms. The goal of this dimension is to emphasize the imbrication of vocal discipline and voice cultures with historical projects of imperialism, as well as logics of patriarchy and white supremacy—many of which can be said to underlie this corrective habit of voice. Indeed, “solving” women’s voices by habitually lowering their pitch is certainly wrapped up in the masculine bias of early voice recording established by Lawrence (1991). Such biases aided in maintaining many myths of the female voice’s deficiency—including that it was too high-pitched. And at the same time voice recording technology began to flourish, many saw the opportunity to utilize new media of voice reproduction (radio, cinema) as a means of advancing classist/imperialist projects of linguistic prescriptivism. Hamlin Garland, novelist and former chairman of the diction-award committee for the American Academy of Arts and Letters, is quoted in Dr. Harrison Karr’s (1938) book Your Speaking Voice as responding to the question of a standard American dialect as follows:

Manifestly, it cannot be British. The Oxford accent is not acceptable to the radio public, and it is equally evident that we should not adopt the lingo of the New York subway, or the accent of First Avenue . . . It should be a blend of the best usage of the old world and the new. (p. 226)

An artifact of this algorithmic blend of English’s “best usage” is found in Edith Skinner’s Speak With Distinction (1942/1999), originally published in 1942. This text, which meticulously codifies the phonetic rules of what Skinner refers to simply as “Good American Speech,” is more or less consistent with the dialect we today refer to as the Mid-Atlantic or Transatlantic dialect that was so popular in the prestige cinema of classical Hollywood. However, it is important to remember that although Skinner’s work may only be remembered today in its dramatic context, as the former assistant to phonetician William Tilly, Skinner’s genealogy as an academic connects her directly with Henry Sweet and the institution of English philology, which Shyama Rajendran has explicitly defined as an institution of colonialism and “raciolinguistic bias.” As scholars such as T. C. Moran (2021) have noted, these same racial biases exist in A.I. voice technologies today, which make assumptions about who can/should have access to intelligence through their projection of whiteness.

While Skinner’s work provides a historical artifact linking vocal algorithms to classist, colonialist, and racist institutions and political paradigms, we can turn to the book How to Talk Well by James Bender (1949) to glimpse an example of more clearly gendered algorithmic artifact. Unlike Speak with Distinction (Skinner, 1942/1999), Bender’s self-help manual is not just a collection of phonetic rules, but a combination of rules around pronunciation and the etiquette of public speaking and polite conversation. As such, while this text may be equally intended for men and women alike, a closer reading of the text ultimately reveals a bias against speech qualities long held as feminine. For instance, the book emphasizes that talking about feelings and emotions is undesirable in a speaker, upholding virtues of stoicism and “rationality” that coincide with familiar values of Western normative masculinity.

Indeed, patriarchal logics run deep in the socio-historical dimension of “good speech” algorithms, and (re)produce through them not only messages about the cultural supremacy of masculinity, but also the sonic ideal of male “phantasy” (Mulvey’s portmanteau of phallogocentric + fantasy). In other words, the techno-institutions of voice that abounded at the turn of the 20th century and persist to this day have played a huge role in constructing and regulating what an ideal woman’s voice should sound like within a heteronormative, patriarchal society. Perhaps this is most profoundly highlighted by Joe Moran’s (2014) story of the 1935 “Girl with the Golden Voice.” As J. Moran (2014) explains, the 1935 “Girl with the Golden Voice” competition, which saw over 5000 applications from women across London’s telephone exchange, made 26-year-old Ethel Cain the voice of the Post Office’s new “talking clock” based on her superior execution of “dentals, sibilants, phrasing, and pleasing intonation” (p. 462). The talking clock service, colloquially known as “Tim,” allowed people to dial 846 (numbers correlating to letters TIM) to hear a verbal announcement of the current time. The public was so taken with Cain’s vocal performance that “lonely people would dial TIM just to listen to it, and besotted men would attempt to get in touch with its owner, and, eventually, an engineer had to program the recording to stop after 3 minutes to prevent callers from staying on the line all night (J. Moran, 2014, p. 462). Similarly, Stefan Schöberlein’s (2018) historical account of telephonic romances and woman’s voice demonstrates how early technologies of audio communication constructed a certain “romantic imagination.” By 1900, Schöberlein (2018) asserts, “a woman’s voice became one of her defining features in the technophile culture of the U.S,” as the voice stood to represent the “totality” of her character as well as her looks. Indeed, Schöberlein helps demonstrate how the obsession with voice culture in the early 20th century, inspired by technological innovation, is ultimately an obsession with algorithms that might solve women’s voices. In his estimation, the institution of Voice Culture imbued the new “telephonic girl-machine” with ideology, correcting and shaping telephone etiquette (though Voice Culture was not limited to this context) in accordance with values that would make her “of good use for industrialists, traders, and newspapermen” (pp. 9-10). He writes, “Voice Culture attacked the ‘problem’ at its core and attempted to reform the female voice itself” (p. 10). In other words, women’s potential mastery of voice qua telephone was a problem that patriarchal ideology attempted to solve with the technology of Voice Culture.

Techno-material dimension

This final dimension is meant to provide a space where the materials that both give shape to and are shaped by sound are taken into consideration. It is also the dimension in which we can start trying to piece together relevant aspects of the previous dimensions. In her philosophical text, Volatile Bodies (1994), feminist cultural theorist Elizabeth Grosz pushes toward a posthumanist feminism by positing a Möbius-strip-like model of corporeality where bodies are first created from the inside-out (from psyche to body) and then from the outside-in (body to psyche).³ I argue that as we consider the materiality of vocal algorithms, we start to see how voices are first formed from inside-out (from internalized mythologies to externalized voices) and then from the outside-in again (from the everyday voices around us to the intel of techno-institutions like Amazon or Apple). Consider, for example, how the internalized mythologies of women’s inferior speech, part and parcel of the institution of patriarchy, ultimately take material form in the algorithmically conditioned voices of telephone operators (Carmi, 2015) or “talking clocks” (J. Moran, 2014), not to mention radio and movie stars (Lawrence, 1991; Silverman, 1988). These voices shaped the sound, the sonic Weltbild, of the turn of the 20th century. While it may be easy to think of sound as immaterial, let us remember, as Lawrence (1991) does, that sound is a vibration—a physical sensation. Then also imagine the bodily transformations that occur with vocal discipline: the tired, fainting, anxious bodies of the Bell Telephone women (Carmi, 2015) or the loss-of voice, hoarseness, and vocal nodes that came with movie stars’ attempts to pitch-down their voices.⁴

These materialities of voice, now constituting an external lived reality, shape the way that society thinks about gendered/raced/classed voice. In this way, we come to literally “‘speak through’ the ideologies which are active in our society” so that this reality becomes “common sense” (Hall, 1981, p. 272). It is no wonder that companies such as Amazon find in their market research that women’s voices are preferred for such digital, affective labor devices.⁵ In this way, we see how the external materialities of voice work their way inwards again, to the internalized preferences of consumers and into the algorithms that will define yet another techno-institution of feminine voice.

Conclusion

After working through these overlapping dimensions, with attention to both discourse and embodiment, there is a glimmer of clarity as to the “mechanisms of change” that unify a grander history of algorithmic bias against women’s voices. Following Grosz’s (1994) Möbius strip model, we see how the algorithm, first manifest as internalized mythologies and discourses about the inferiority of the female voice or non-white voice, is able to construct docile, embodied voices from the inside-out. And then, we see how the vocal products of internalized construction can work from the outside to the inside again, normatively dictating the “common sense” feelings and preferences that justify continued manipulation and exploitation of certain voices as affective labor. Moving forward, I hope to reevaluate how these dimensions may be best implemented in case studies of more specific historical and contemporary sites of algorithmic bias toward women’s voices, so that I may begin more substantial analytical work. Ideally, these case studies will expand in a way that allows for more meaningful examination of the ways in which vocal algorithms enact bias across various demographic axes, especially race, sexuality, and class, as well as the ways in which they implicate and affect non-binary and trans voices.

Footnotes

Acknowledgements

The author thanks Dr. Soomin Seo, Dr. Carolyn Kitch, Dr. Laura Levitt, and her classmates in GSWS 9991 for their encouragement and feedback in developing this work.

ORCID iD

Kate Dawson

Notes

Author biography

Kate Dawson is currently a student in the PhD program at Temple University’s Klein College of Media and Communication. Her research interests span fields of film history, social memory, and sound studies, with particular attention to questions of gendered voice.

References

Abad-Santos

(2017, February 26). Jackie Kennedy’s strange, elegant accent, explained by linguists. Vox.com. https://www.vox.com/culture/2017/2/7/14442410/jackie-kennedy-accent-natalie-portman

Barthes

(1972). Mythologies ( Lavers

, Trans.). Hill and Wang.

Bender

J. F.

(1949). How to talk well. Whittlesey House.

Carmi

(2015). Taming noisy women. Media History, 21(3), 313–327.

de Lauretis

. (1987). Technologies of gender: Essays on theory, film, and fiction. Indiana University Press.

Eubanks

(2018). Automating inequality: How high tech tools profile, punish and police the poor. St. Martin’s Press.

Foucault

(1977). Discipline and punish: The birth of the prison. Vintage Books.

Grosz

(1994). Volatile bodies: Toward a corporeal feminism. Indiana University Press.

Hall

. (1981). Racist ideologies and the media. In Marris

Thornham

(Eds.), Media studies (2nd ed., pp. 271–282). New York: New York University Press.

10.

Hayles

N. K.

(1999). How we became posthuman: Virtual bodies in cybernetics, literature, and informatics. University of Chicago Press.

11.

Hepburn

(1991). Me: Stories of my life. Random House Publishing Group.

12.

Hoffman

A. L.

(2019). Where fairness fails: Data, algorithms, and the limits of antidiscrimination discourse. Information, Communication & Society, 22(7), 900–915.

13.

Karr

(1938). Your speaking voice. Griffin-Patterson Publishing Co.

14.

Katz

(2020). Artificial whiteness: Politics and ideology in artificial intelligence. Columbia University Press.

15.

Katzenbach

. (2011). Technologies as institutions: Rethinking the role of technology in media governance constellations. In Puppis

Just

(Eds.), Trends in communication policy research (pp. 117–138). Intellect Ltd.

16.

Kristeva

. (1974). Revolution in poetic language ( Waller

., Trans.). Columbia University Press.

17.

Lacan

(1968). Speech and language in psychoanalysis ( Wilden

, Trans.). Johns Hopkins University Press.

18.

Lawrence

(1991). Echo and narcissus: Women’s voices in classical Hollywood Cinema. University of California Press.

19.

Moran

(2014). Vox populi?: The recorded voice and twentieth-century British history. Twentieth Century British History, 25(3), 461–483.

20.

Moran

T. C.

(2021). Racial technological bias and the white, feminine voice of AI VAs. Communication and Critical/Cultural Studies, 18, 19–36. https://doi.org/10.1080/14791420.2020.1820059

21.

Mulvey

. (1975). Visual pleasure and narrative cinema. Screen, 16(3), 6–18.

22.

Napoli

P. M.

(2014). Automated media: An institutional theory perspective on algorithmic media production and consumption. Communication Theory, 24, 340–360.

23.

Noble

S. U.

(2018). Algorithms of oppression: How search engines reinforce racism. NYU Press.

24.

Ovid. (1993). The metamorphoses of Ovid ( Mandelbaum

, Trans Trans.). Harcourt Brace & Company. (Original work published 8 CE)

25.

Rajendran

(2019). Undoing “the vernacular”: Dismantling structures of raciolinguistic supremacy. Literature Compass, 16(9–10), 1–13.

26.

Schöberlein

(2018). Call me maybe: Telephonic romances and the female voice 1880–1920. American Literary Realism, 51(1), 1–20.

27.

Schwär

Moynihan

(2020, April 5). Companies like Amazon may give devices like Alexa female voices to make them seem “caring.” Business Insider. https://www.businessinsider.com/theres-psychological-reason-why-amazon-gave-alexa-a-female-voice-2018-9

28.

Silverman

(1988). The acoustic mirror: The female voice in psychoanalysis and cinema. Indiana University Press.

29.

Skinner

(1999). Speak with distinction. Applause Theatre & Cinema Books. (Original work published 1942)

30.

Uricchio

(2011). The algorithmic turn: Photosynth, augmented reality and the changing implications of the image. Visual Studies, 26(1), 25–35.