Does the Voice Have a Body? Esposito’s Vox Munus in an Age of Synthetic Voice

Abstract

This article examines the relationship between voice, embodiment, and community by extending Roberto Esposito’s concepts of munus, communitas, and immunitas to the domain of voice. While voice is often treated as a bodily capacity and marker of individual authenticity, contemporary voice technologies, including brain/computer interfaces, AI-generated voices, and deepfake audio, destabilise this assumption. The article proposes Vox Munus as a conceptual framework for understanding voice as a shared obligation rather than private property. Through analyses of assistive speech technologies and synthetic media, it argues that voice technologies simultaneously enable inclusion and generate new ethical risks, reframing voice as a site where communal responsibility, trust, and governance are negotiated.

Keywords

community Esposito social relations voice world

Introduction: From Embodied Singularity to Communal Relation

Does a voice have a body? Such a question already exposes an ontological ambiguity: is voice an attribute of the body, a capacity that belongs to it, or does voice itself operate as a mode of embodiment, one that cannot be fully reduced to flesh? This ambiguity becomes especially difficult to ignore where voice exceeds the immediate co-presence of speaker and listener, a condition increasingly structured by technological mediation. On the surface, the answer still seems obvious: a voice issues from a body, carried on breath and shaped by vocal cords, unmistakably tied to flesh. Yet this apparent obviousness conceals a deeper difficulty. Adriana Cavarero complicates this assumption by insisting that voice is one (but not the only) modality within acoustic life through which singular existence becomes perceptible by giving itself in sound as an individuated presence. As she writes, ‘In the uniqueness that makes itself heard as voice, there is an embodied existent, or rather, a “being-there” . . . in its radical finitude, here and now’ (Cavarero, 2005: 174). In this framing, voice discloses who is speaking rather than simply what is said, anchoring vocal expression in the irreducible particularity of embodied existence. The sound of a voice thus conveys personal identity, affect, and presence, mediating between one’s inner life and others’ ears (Bijsterveld et al., 2014; Bull, 2000; Weber, 2010). We instinctively hear someone behind the voice. Yet this intimate bond between voice and embodied singularity is increasingly unsettled as voices circulate, persist, and act beyond the limits of bodily co-presence (Gutierrez, 2024).

Consider a few scenarios in which this ambiguity is no longer just theoretical: A brain-computer interface allows a person who is completely paralysed to ‘speak’ through a digital voice synthesiser, translating neural signals directly into words (Stavisky, 2025; Willett et al., 2023). An artificial intelligence system clones a person’s speech patterns so precisely that the synthetic speech is indistinguishable from their natural voice, raising questions of ownership and authenticity (Kang, 2022; Millière, 2022). A sophisticated deepfake audio circulates a perfect imitation of a public figure’s voice saying things they never actually said, undermining trust in the evidentiary value of voice (de Rancourt-Raymond and Smaili, 2022; Williams et al., 2023). Each of these cases challenges the natural coupling of voice and body (Barrett, 2021). If a machine can generate my voice, does my voice still belong to my body? If my bodily voice is silent but technology speaks on my behalf, is the voice still me (Goering et al., 2021; Luo et al., 2022)? These examples are emerging realities that compel voice scholars to rethink what it means to have a voice, and indeed, to reconsider the very idea of voice as a purely individual possession.

This suggests that we need to consider voice not as an isolated property of an individual body, not as a discrete faculty that belongs without remainder to a sovereign speaker, and not as a private attribute whose significance can be mastered, contained, or fully controlled at the level of individual intention, but as something situated within broader social exchanges and communal relations. In everyday thought, the voice often appears precisely as this kind of private property: an intimate expression bound clearly to one’s own physical body, self-contained and singularly owned. Such a conception aligns closely with the individualised model of voice predominant in contemporary culture, where one’s voice serves as proof of personal authenticity, identity, and presence. This stable, privately owned voice resonates deeply with modern cultural expectations around speaking and identity, emphasising that we are what our voices uniquely reveal about us, and that authenticity becomes everything. This individualised notion of voice can be read as aligning with Esposito’s (2013b) discussion of immunitas, which emphasises an insulated, protected individual whose capacities are safeguarded as personal property. Esposito contrasts this with communitas, a second perspective grounded in obligation rather than ownership. If this account of communitas is extended to the domain of voice, the clear separation between private and communal begins to break down, opening voice as a shared space that exists only through collective circulation, dialogue, and reciprocal obligations. From this perspective, voice can be understood as a common gift (munus) that reshapes relations between self and other through dynamic forms of mutual responsibility and interaction.

The article begins by introducing Esposito’s philosophy of the common body, highlighting his key concepts of munus (gift or obligation), communitas (community), and immunitas (immunity). It then explores the central question, ‘Does the voice have a body?’, proposing, by extending Esposito’s conceptual framework, that voice can be better understood as a communal gift (here described as vox communis), shared between self and other. The analysis continues by examining how contemporary voice technologies, from brain–computer interfaces to deepfake audio, reshape and complicate these communal relationships, traversing tensions between collective openness and protective immunity. The article then offers critical reflections on the ethical and political implications of these technological developments, particularly for historically marginalised or non-normative voices. Finally, it concludes by suggesting that the question of whether the voice has a body ultimately points towards a broader understanding: voice gains its fullest meaning as an event that takes form through its resonance within a collective, communal body.

Munus, Communitas, Immunitas: Esposito and the Common Body

Roberto Esposito’s political philosophy interrogates the relationship between community, obligation, and the limits of individual self-possession. Rather than proposing a substantial or identity-based account of community, Esposito situates his intervention at the level of genealogy, arguing that

the idea of community bears within itself the key for escaping its impolitical turn and for regaining a political significance; but only by traveling back through history all the way to its Latin root of communitas, and even before that to the term from which this derives, namely, that of munus. (Esposito, 2013a: 84)

It is through this etymological and conceptual displacement, from shared identity to shared obligation, that Esposito develops his core analytic terms of munus, communitas, and immunitas. Clarifying these concepts is therefore essential before turning to the question of voice.

First, from the Latin, munus denotes a duty, obligation, or gift. Esposito emphasises that munus is the root of ‘community’: it is the shared burden or gift that binds people together. To be in community means that each member owes something to the others, an openness or giving of part of oneself. As Esposito puts it, ‘the subjects of community are united by an “obligation”, in the sense that we say “I owe you something”, but not “you owe me something”’ (Esposito, 2010: 6). Crucially, this gift is given without the guarantee of return, an ethical bond that operates through what one of Esposito’s key interlocutors, Marcel Mauss, analysed as hau, the spiritual force within gifts that compels circulation and return, ensuring that ‘the thing given is not inactive’ but retains something of the giver’s essence (Mauss, 1990 [1925]: 14–17). One might therefore think of munus as a responsibility towards others that constitutes the communal tie.

For Esposito, then, communitas (community) is defined through a shared munus, a relation constituted by lack and obligation, a condition in which community emerges through shared exposure and debt among its members. This he draws from Victor Turner, whose words remind us: ‘Communitas breaks in through the interstices of structure, in liminality; at the edges of structure, in marginality; and from beneath structure, in inferiority’. This aligns with Esposito’s insistence that community arises where structured obligations loosen and mutual exposure becomes visible. Turner continues, ‘It is almost everywhere held to be sacred or “holy”, possibly because it transgresses or dissolves the norms that govern structured and institutionalized relationships and is accompanied by experiences of unprecedented potency’ (Turner, 1969: 128). In a genuine community, thus, members are united precisely by what they owe or give to each other (such as care or support, and, by extension here, voice), creating a bond of reciprocity. Importantly, community takes shape as an exchange passing through individuals, connecting them while preserving what Jean-Luc Nancy, an important influence on Esposito, describes as their ‘singularity’ (Nancy, 1991 [1986]), and unfolding in the space of the ‘between’, in the mutual with-ness (cum) of those who carry the munus.

Finally, immunitas literally means exemption from obligation. If communitas is what one owes others, immunitas is what one is exempt from giving. Esposito uses this term to describe the protective measures individuals or societies take to defend themselves from communal obligations. As Esposito puts it, ‘the immune mechanism functions precisely through the use of what it opposes’ (Esposito, 2011: 8). This concept resonates closely with Thomas Hobbes’s assertion concerning the necessity of sovereign authority to mitigate mutual fear within society, particularly his claim that individuals inherently seek protection from potential threats: ‘For the use of Lawes. . .is not to bind the People from all Voluntary actions; but to direct and keep them in such a motion as not to hurt themselves by their own impetuous desires’ (Hobbes, 1996 [1651]: 230). Immunity in a biological sense protects an organism from external threats; analogously, social or political immunity protects individuals from the claims of others. Personal rights, private property, and legal immunities, for example, can be understood as mechanisms that shield individuals from the demands of the collective. While such immunity is necessary – it preserves life and individuality – Esposito warns that, taken to an extreme, it can negate community. Michel Foucault echoes this concern, noting how modern societies increasingly deploy strategies aimed at safeguarding life itself, describing power today as ‘a matter of taking control of life and the biological processes of man-as-species and of ensuring that they are not disciplined, but regularized’ (Foucault, 2003 [1997]: 246–247). A society obsessed with immunity – walling itself off and prioritising individual security above all – risks undermining the very bonds that make collective life possible. In Esposito’s analysis, modern Western political thought has often privileged immunitas (individual autonomy, self-ownership) at the expense of communitas (shared belonging and obligation), producing a fraught balance between the two (Esposito, 2011).

What matters here, then, is Esposito’s reconceptualisation of the body as a ‘common good’ and the implications of this idea for embodiment and community that are central. The dominant liberal tradition, influenced by Locke, has long treated the human body strictly as private property belonging to the individual, sharply distinguishing persons (as rational, autonomous agents) from things (as objects or property) (Descartes, 1998; Locke, 1980 [1689]). Esposito does not dismiss individual bodily autonomy, but rather seeks to challenge this strictly private conception, arguing, via Merleau-Ponty’s ontology of flesh, for the body’s inherently relational character: bodies are never isolated possessions but are deeply intertwined with other bodies and the world around them (Esposito, 2008, 2010, 2013b; Merleau-Ponty, 1968 [1964]). Such a relational perspective sees bodily existence as fundamentally communal, emphasising mutual vulnerability and reciprocal obligations – the munus – as constitutive of community itself (Durkheim, 1997 [1893]; Spinoza, 1992 [1677]). From this standpoint, Esposito’s biopolitics aligns closely with Arendt’s framing of life as the fundamental stake of politics, highlighting how political communities should treat bodily matters – health, reproduction, autonomy – as inherently collective rather than purely individual concerns (Arendt, 1998 [1958]).

The following section examines key questions arising from the conceptualisation of the body as a common good: If the body is a common good, owned by no one individual but shared in obligation, what does that imply for something like the human voice, which we take provisionally as an emanation of the body? Can the voice be understood in terms of munus, as part of our being-in-common? Before answering, let us first appreciate some unique qualities of voice that straddle the boundary of self and other.

The Voice as Site of Obligation

We often think of our voice as our own: my voice is uniquely mine, an expression of my interiority. Indeed, voices are as distinctive as fingerprints; the tone, timbre, and idiosyncrasies of a person’s voice betray a singular identity shaped by anatomy and experience. Philosophers have celebrated this uniqueness. Cavarero (2005), for instance, emphasises how each voice reveals ‘who’ is speaking beyond the content of words, grounding personal identity in the embodied act of vocal expression. However, it is equally true that the voice always already exists for and through others. A voice is not a voice at all until it is heard. The very purpose of voice is communicative, to traverse the space between bodies. In speaking or singing, I offer my voice to listeners; it leaves my body and enters the commons of sound. In this sense, voice can be understood as resembling a gift or obligation (munus), something that each of us circulates among us, consciously or not. I owe my interlocutors some intelligible speech just as they owe me a hearing; we trade voices and responses in a dialogical community. The content of what is spoken aside, the act of voicing is fundamentally an act of reaching towards another, a gesture of relation. Here we can already glimpse a transformation from what we might call vox immunis (the ‘immune voice’ conceived as purely personal property) to vox communis (the ‘common voice’ that resounds within a network of shared presence).

Let us unpack this transformation. Under the regime of vox immunis, one might imagine each person’s voice as sealed within the bounds of their throat and under the sovereignty of their will. This perspective aligns with an immunitary logic: the voice is a protected possession, and one speaks purely to express oneself or to assert one’s individual rights (for example, the right to free speech as a shield of personal autonomy). But such a view is in tension with the reality of how voices actually function. Language, the primary vehicle of voice, is itself a common inheritance; we do not invent our language ex nihilo; we receive it from others. My ability to speak meaningfully depends on a shared lexicon, a grammar, a whole history of voices that have come before me. In every word I utter, echoes of communitas can be heard. As the late Jonathan Sterne points out, ‘much of this has to do with technical or technological forms – digital assistants, talking objects, spectacular software – but also the increased availability of sound-making and processing technologies for people to do things with’. This suggests that a voice is less a private organ and more a node in a sociotechnical network. Vox communis is the idea that a voice gains reality and meaning only within this web of communal relations: social conventions, listening publics, technological amplifications, and collaborative dialogues.

Philosopher Hartmut Rosa’s (2019, 2023a, 2023b) work illustrates the notion of a shared field of voice in terms of resonance, a mutual affecting and being-affected between people and their world. Voice, in Rosa’s view, is a prime example of a resonant medium: when one person speaks and another truly listens, they do not remain unchanged; both resonate and respond, finding a connection that can leave each party transformed. Rosa defines resonance as ‘a specific, objectively observable form of relationship between two entities’ (2023b: 5) in which both responsiveness and change are present. Think of a conversation where voices alternately speak and listen with genuine understanding: there is a back-and-forth attunement that is more than an exchange of information; it is a tuning of selves to each other. In such moments, voice serves as a bridge between individual and collective existence, much as Esposito’s ‘common good’ of the body would suggest. The use of resonance as a metaphor underscores that voices co-create a shared space of meaning. One might say that a voice ‘has a body’ not only in the physical sense of lungs and larynx, but in the sense of embodying a relationship. It dwells in the intersubjective body of the community whenever it finds resonance there.

To be clear, treating voice as a common good does not erase the uniqueness of each voice. Rather, it frames that uniqueness as a singular contribution to a plural event. Each voice is an ‘embodied singularity’ that carries its personal history of experiences, yet in the act of voicing, those singular vibrations enter into a common acoustic and social space. We can thus speak of a ‘shared sonorous world’ in which voices mingle. The political theorist Iris Marion Young once evocatively noted that in the public realm ‘we all reciprocally hear one another’ and thus partake in a material commons of sound. Our voices literally vibrate the air between us, a physical commons that does not belong to any one person. In this way, voice can be read as exemplifying Esposito’s idea that community is based on shared lack or exposure: when I speak, I give something of myself (breath, sound, meaning) into a space I do not control, trusting others to receive it. That act of giving voice is laden with vulnerability; I cannot guarantee I will be understood or welcomed, yet it is precisely this risk that forms communal bonds. The voice, one might say, is both embodied and embodying: it originates in a body, but it also helps form a collective body by drawing listeners into relationship.

Esposito’s concept of munus can be mapped onto voice here in this way: the voice is a gift-obligation we continuously share. Social life could be described as a continuous exchange of voices in terms of speaking, listening, and responding through which we negotiate our coexistence. On one hand, this reveals a hopeful vision of solidarity: voices interweaving to create communal understanding, much like a choir in which different parts produce a harmony larger than any single voice. On the other hand, it also introduces tension: voices can clash, compete, or drown each other out. Here, again, Esposito’s immunitary dynamic enters. Just as a community must balance openness with protective measures, the sphere of voices is marked by a constant balancing of communion and exclusion. Some voices dominate while others are marginalised or silenced; some groups claim immunity from listening to voices deemed outside their circle. In extreme cases, entire communities immunise themselves by enforcing silences (censorship) or by creating echo chambers that admit only familiar voices. These are examples of vox immunis impulses persisting within what is ostensibly a shared space.

The challenge, then, is to foster vox communis, a genuinely common voice, amid these tensions. Practically, this might mean cultivating conditions where diverse voices can be heard and respected (an ethical obligation of listening that complements the right of speaking). It also means recognising the ways technology and society mediate voice. As Sterne (2021) and other media theorists remind us, every voice is technologically mediated at least to the extent that language itself is a technology. Today, even more explicit mediations are common: microphones, smartphones, hearing aids, and online platforms all extend our voices’ reach. Far from making voice less personal, these mediations underscore how voice exists between bodies, in circuits of amplification and feedback that are collective. We carry on conversations not just in person but across radio waves, fibre-optic cables, and digital networks, disembodied in one sense, yet re-embodied in another as they vibrate new air in distant places. If the body is a common good, perhaps the myriad technological bodies (devices, networks) that carry our voices can be seen as part of an expanded communal body. This perspective will be vital as we consider contemporary voice technologies: rather than seeing them as alienating voice from body, we might see them as reallocating aspects of the vocal ‘body’ into a broader communal apparatus.

This reconceptualisation of voice as simultaneously individual and communal, as mine yet not mine alone, prepares us to examine how new technologies are altering the voicescape. If voice has always lived in the space between bodies, what happens when that space is filled with microphones, algorithms, and synthetic vocal agents? Do these innovations fulfil the promise of vox communis by extending voice to those who lack it and knitting new forms of community? Or do they intensify vox immunis by commodifying voices and enabling new forms of detachment and deception? Esposito’s biopolitics, with its perpetual tension of communitas and immunitas, offers a lens to interpret these developments. In the next section, we apply this lens to several cutting-edge examples of voice technology, showing how each both reflects and reshapes the communal structures of voice.

When Voice No Longer Guarantees Presence

Rather than treating this as a direct account of voice, the Cartesian conception of the body can be seen as animating the early formulations of community in Esposito’s work, precisely because the dualistic separation of mind and body presupposes a purely individual, self-contained embodiment. This conception inevitably struggles to integrate the communal dimension of embodiment, leading to fundamental theoretical aporias. Initially, Esposito uses this dualistic starting point in a preparatory manner, with the aim of challenging empirical assumptions about individual embodiment, a move that this article extends to the question of voice, in order to open up a relational space of bodily communitas, leaving it to subsequent analysis to reconsider voice as intrinsically communal. When Cartesian dualism is no longer taken literally, when embodiment is no longer confused with individualistic self-enclosure, it acquires a genuinely affirmative and communal character, prompting different forms of questioning: whose body? which community? under what conditions? in what forms does embodiment realise its communal significance? Thus, Esposito’s inversion of Cartesianism means affirming bodily existence not as isolated self-presence but as inherently relational and communal. It implies defining the body affirmatively, as essentially shared, co-constructed, and interdependent. In doing so, Esposito follows a philosophical maxim analogous to Deleuze’s approach: the best way to engage philosophical tradition is not simply to repeat what it says, but to perform what it does, namely, to generate new concepts responding to evolving communal realities.

First, recent breakthroughs in neural interface technology have enabled what once would have seemed miraculous: giving voice to those who have lost the biological ability to speak (Canny et al., 2023; Stavisky, 2025). In one well-known case, researchers at UC San Francisco developed a BCI that translates neural signals from a paralysed individual directly into synthesised speech in real time (Willett et al., 2023). Patients with conditions like locked-in syndrome, fully conscious but unable to move or speak, have used such systems to communicate, dramatically improving their social integration and quality of life, findings consistently documented across recent studies (Chaudhary et al., 2021). From an Espositian perspective, this technology exemplifies a restoration of communitas through techne. The munus of voice, one’s obligation and gift to communicate with others, is fulfilled by an extension of the body via machines (Luo et al., 2022). The BCI essentially incorporates a machine into the patient’s embodiment, allowing neural intention to be shared as audible words, a phenomenon explored extensively in current research on hybrid biological-mechanical speech (Angrick et al., 2024). The voice here has a hybrid body: part biological (brain signals), part mechanical (the speech synthesiser). This blurring of the boundary between organism and apparatus calls to mind Donna Haraway’s cyborg metaphor, half organism, half machine, forging new kinships. Indeed, the patient using a BCI becomes a kind of vocal cyborg, their voice co-created by human and computer. Rather than seeing this as dehumanising, we can follow Haraway’s optimistic lens of ‘making kin’ across such boundaries. The machine becomes a kin that helps the individual rejoin the human conversation, fostering community by augmenting the common good of voice, an optimistic view also reflected in studies advocating user-centred BCI design (Sankaran et al., 2023).

Under the rubric of Esposito’s thought, BCIs for speech challenge the strict immunitary notion of the body as closed and self-sufficient. The individual here must accept an external gift, technology, to regain voice, an ethical scenario actively discussed in contemporary bioethical research (Chandler et al., 2022). In doing so, they exemplify how dependence can be enabling: the shared network of human caregivers, researchers, and devices collectively supports the individual’s basic right (or duty) to speak (Chaudhary et al., 2021). This collective endeavour highlights that voice is never purely one’s own achievement; it is co-produced by a supportive community (in this case, including scientists and engineers acting out of a social obligation to aid the voiceless), a point increasingly emphasised in literature focusing on collaborative BCI development (Luo et al., 2022; Willett et al., 2023). There is a poignant symbolism in a paralysed person’s voice being generated by another’s invention: it literalises the idea that we give each other voice. The communitas of voice is broadened when someone long silenced can finally participate in dialogue, their thoughts now enter communal discourse, no longer locked inside. One might say the community takes on the munus of providing the means of speech, exemplifying a positive biopolitics that ‘vivifies’ the communal body by ensuring no member is excluded from the shared language (life’s political dimension) due to mere physiology.

Of course, there are also immunitary considerations at play. The reliance on technology introduces new vulnerabilities and forms of control, as discussed extensively in the literature on ethical, legal, and social implications of BCIs (Chandler et al., 2022). The apparatus could fail, or the software could hypothetically be manipulated, raising questions of trust, whose voice is it if a glitch intervenes? Furthermore, not everyone has access to such cutting-edge care; scarcity could create a class of ‘immunised’ individuals who get the technology versus those left voiceless (a new inequality), a risk identified as a significant challenge in recent user-agency focused research (Sankaran et al., 2023). Here the community faces an ethical obligation: if voice is a common good, should access to voice-restoring technology be considered a right? The example of BCIs suggests a hopeful direction in which technology strengthens communitas (by including the excluded), but it also reminds us that this inclusion must be managed justly to avoid simply creating new forms of immunity (privileged access, corporate ownership of devices, etc.). Esposito’s framework would urge that we treat these voice technologies not as private luxuries but as common commitments, shared investments in each other’s ability to participate in the communal discourse, a position implicitly supported by research advocating broader social integration and equitable access to assistive technologies (Chaudhary et al., 2021).

Second, beyond medical applications, voice technologies have pervaded everyday life in the form of digital voice assistants (Siri, Alexa, etc.), speech recognition systems, and AI-generated voices for various media (Kudina, 2021; Melzner et al., 2023). These innovations effectively disembody the human voice and re-embody it in silicon, enabling voices to act at a distance or even in the absence of a human speaker. Modern AI systems can decode human speech with high accuracy and even replicate a person’s voice after sampling a small audio clip, thanks to sophisticated machine learning models. This digitisation of voice has prompted legal and philosophical debates about what it means for one’s voice to be reproduced and used by others (or by machines), a point extensively explored in recent scholarship on identity and biometric imaginaries (Kang, 2022). In one sense, the proliferation of synthetic voices and voice-enabled AI reflects an expansion of the commons: voices now interpenetrate our technological environment, forming what we might call an ‘acoustic commonwealth’ of humans and machines. We speak to our devices and they speak back; human voices are archived in databases to train AI; automated voices make announcements in public spaces. The voice has arguably become ubiquitous and de-localised, flowing through networks that connect millions of people.

Esposito’s notion of the body as common good finds a parallel here: the voice is increasingly treated as a shared resource or interface. Some legal scholars argue for recognising rights over one’s voice print, but at the same time voice data is massively aggregated and used communally (for instance, to improve a speech recognition algorithm for all users, our individual voice samples are pooled – a form of collectivisation of vocal property). This creates a tension between communitas and immunitas. On one hand, voice technology fosters communitas by integrating our voices into broader communicative systems: we can converse with people across the world, access information by voice commands, and have our words instantly translated into other languages – all of which break down bodily and linguistic barriers, arguably enriching the community of communication. On the other hand, these technologies necessitate new forms of immunity to protect individual interests: for example, laws or safeguards to prevent unauthorised cloning of someone’s voice (a very real concern as AI voice cloning becomes easier). Researchers highlight significant privacy and security concerns arising from widespread use of personal voice assistants, underscoring these immunitary needs (Cheng and Roedig, 2022). There is a growing recognition that a person’s voice is part of their identity, and using it without consent is a violation – hence discussions of ‘voice rights’ akin to image rights. The voice, in entering the digital commons, paradoxically must be shielded by immunitary measures (privacy regulations, watermarking of AI voices, etc.) to ensure trust and agency. We see here what Esposito would call the double-edged sword of biopolitics: the empowerment provided by new technologies comes with new risks that must be managed by a community to avoid self-destruction.

One illustrative concept is the idea of the voice becoming a ‘legal subject’. As voice clones and AI assistants take on roles in transactions (imagine an AI voice signing a contract or giving testimony), questions arise about accountability and personhood. If an AI speaks with my exact voice, is it legally me speaking? Should that synthesised voice be afforded credibility, or do we treat it with suspicion? Scholars like Bettina Minder et al. have proposed frameworks for how voice technologies might be governed within households and societies, emphasising that we will likely need new legal and social norms for this era. For example, one proposal is that synthesised voices should be transparently labelled, to uphold an immunitary defence against deception, while also ensuring that those who rely on synthetic speech (say, users of text-to-speech devices) are not unfairly discriminated against. From Esposito’s viewpoint, this is a balancing act: integrating AI voices into the communitas of daily life (‘speaking’ with us and for us) while crafting immunitas in the form of protections and boundaries (so that the communal space of voices is not overrun with fraud or coercion). Indeed, research has already begun exploring how familiarity in synthesised voices (such as family and friends) might influence user trust and engagement, suggesting both potential benefits and vulnerabilities (Chan et al., 2021; Poushneh, 2021). Concerns also remain regarding gender representation and inclusivity in the design and personification of voice assistants, highlighting yet another ethical dimension of communal and immunitary tensions (Malodia et al., 2024; Rincón et al., 2021). Ultimately, the presence of ubiquitous digital voices forces us to re-evaluate what ‘having a voice’ means. It is no longer solely a bodily attribute; it can be a software service, a cloud-based asset. But if the voice is part of our shared social body, then even these digital instances of voice must be woven into our communal ethics. We might say the munus of voice now extends to stewarding the digital voice commons, ensuring that the expanding chorus of human and AI voices serves the community and not just narrow interests.

Third, perhaps the most dramatic confrontation of voice and body, arrives with deepfake audio, where artificial intelligence is used to produce uncanny imitations of real persons’ voices (Gregory, 2021; Guerouaou et al., 2021). Unlike helpful voice assistants or empowering BCIs, deepfakes often hit the news as a tool of deception: for instance, scammers mimicking a CEO’s voice to fraudulently authorise a bank transfer, or fabricated audio of a politician’s ‘speech’ spreading misinformation (Cavedon-Taylor, 2024). This phenomenon intensifies what we might call the illusory dimension of voice. If a voice can be completely detached from any authentic source and fabricated at will, we face the unsettling possibility that hearing a familiar voice is no guarantee the person is truly present or ever uttered those words, a scenario described by scholars as potentially ushering in an ‘epistemic apocalypse’ (Habgood-Coote, 2023). In terms of our guiding question, deepfakes make the voice’s lack of a stable body horrifyingly clear: here, the voice has no body of its own at all, only a simulacrum of embodiment. It’s a ventriloquist’s fantasy at a massive scale, severing the age-old link between vocal sound and the speaker’s physical truth.

From the perspective of Esposito’s communitas/immunitas, deepfakes represent an extreme threat to the communal trust that binds societies. A healthy community relies on a basic level of good faith that voices correspond to honest presence or intention. The munus of communication carries an implicit ethical obligation: we generally expect that people present themselves truthfully in voice, and we in turn owe them our attentive, if critical, listening. Deepfake audio upends this contract by weaponizing the gift of voice as a tool for manipulation. It exploits the communal faith in voice’s authenticity, thereby violating the shared norms of discourse. Experiments confirm this threat, showing that voice-based deepfakes can indeed significantly influence trust even when listeners are informed of their AI-generated nature (Schanke et al., 2024). In response, we see a rapidly growing immunitary reaction: new technologies to detect deepfakes, legal proposals to criminalise certain uses, and a healthy scepticism among the public about accepting audio evidence at face value. Scholars have proposed developing an ‘authenticity infrastructure’ precisely to manage these threats and protect vulnerable contexts like journalism and human rights reporting (Boháček and Farid, 2022; Gregory, 2021, 2023). We are, in effect, inoculating ourselves against the harmful effects of fake voices. But such inoculation comes at a cost: it can breed a general atmosphere of doubt, eroding the resonance and openness that characterise vox communis. If everyone starts to suspect that any voice could be a fake, the very power of voice to connect and persuade is diminished. This is a classic immunitary paradox, protecting the community (from deception) by partly disabling the communal medium of trust. It calls to mind Esposito’s warning that too much immunity can negate communality.

And yet, the deepfake dilemma might also galvanise a renewed appreciation for authentic voice and the moral obligations of using voice technology responsibly. There is a growing consensus that ethical standards and communal agreements are needed to govern the use of synthetic media. For example, media organisations and tech companies are beginning to watermark AI-generated audio or to agree on disclosure practices, which are attempts to rebuild a shared ethical framework (a communitarian response) rather than leaving individuals alone to fend off fakes. Researchers caution, however, that these protective measures may inadvertently exacerbate surveillance concerns or inequities, suggesting careful implementation to avoid unintended consequences (Gregory, 2021). In an Espositian sense, the community is confronted with the munus of truth-telling in a new form: we collectively owe it to each other to not abuse the power of voice imitation, and to support systems of verification that uphold the integrity of voice as a vehicle of meaning. We might say the communal body is developing new ‘antibodies’ (social norms, detection algorithms) to preserve its vocal integrity. The process is ongoing, but one can cautiously hope that society will adapt without abandoning the openness that makes voice a vibrant communal force. After all, even a perfect deepfake cannot fully replicate the relational context of an honest human voice speaking to those who know and love that person. The community of listeners, armed with new vigilance, may still discern the difference, reasserting that a voice ultimately ‘has a body’ in the form of lived relationships and reputations that fakes cannot easily steal, a perspective echoed in scholarly discussions emphasising the need to rethink social norms and epistemic practices around synthetic media (Cover, 2022; Habgood-Coote, 2023).

Perhaps the tension emerges precisely from our evolving relation to voice itself, once tethered exclusively to bodily presence, now dispersed across technological surfaces and digital representations. As voices move freely through digital networks, we risk becoming overly attached to the ease and convenience offered by disembodied voices, seduced by their ubiquity and fluidity, as if voice alone can fulfil our desire for connection and recognition. Yet this contrasts sharply with the embodied voice we encounter in intimate conversations, voices that ground us not merely in hearing words, but in sensing the relational context and ethical presence of the speaker. In these encounters, the voice carries a unique affective charge; we feel not only through listening but through a deeper perception that connects voice to the embodied integrity and sincerity of the speaker. The allure of synthetic voices, endlessly available and malleable, tempts us towards a kind of detached consumption of vocal presence, a detached enjoyment divorced from the ethical grounding of genuine discourse. Yet, while digital culture clearly thrives on this proliferation and play of voices, embracing novelty, convenience, and even illusion, there remains a critical awareness of its dangers. Thus, the communal imperative becomes our capacity to listen and not listen, trust and suspect, moving fluidly between welcoming technological innovation and preserving the essential ethical fabric that makes communal dialogue meaningful and coherent.

Voice, Ethics, and the Conditions of Inclusion

Rather than emerging as a doctrine developed by Esposito himself, the notion of Vox Munus can be introduced here as a way of rethinking the relationship between individual bodies and communal existence, precisely because the question ‘Does the voice have a body?’ prejudges the voice as inherently singular, only later confronting its fundamentally collective nature. This initial formulation leads the inquiry into productive complexities. In this sense, the early framing of voice developed in this article, drawing on Esposito’s concepts, might be considered preparatory, its primary goal being to silence reductively individualistic interpretations and open up a broader communal region of voice as a shared resource, leaving its ethical and political implications to be elaborated subsequently. When this communal conception of voice ceases to be viewed merely figuratively, when it moves beyond metaphor towards ethical realisation, it acquires a serious and affirmative character, raising other forms of critical questioning: whose voices matter most urgently? how might technology redistribute vocal power? under what conditions can technology genuinely enhance communal life without eroding individual autonomy? in what ways might technological mediation of voice preserve or threaten authenticity and dignity?

What, then, is a voice when it does not belong entirely to the one who speaks? How does a voice take form across the practices and relations through which it is heard, answered, and taken up by others? To ask such questions of voice is, in a communal frame, to advance a proposition that places demands on how voice is understood. Yet, unlike approaches that treat vocal essence as fixed or self-contained, the question turns instead to voice as a plural, empirical event, as something taking shape through shared practices of speaking and listening. In this case, the proposition engages the conditions under which vocal essence appears as relational and continually reconstituted through communal exchange.

Indeed, Esposito’s analysis suggests that if one surveys philosophical and ethical traditions around voice, one searches largely in vain for satisfaction with simplistic or reductive individualism. When contemporary philosophers or disability activists confront the question ‘Whose voice matters?’ or ‘How is vocal agency ethically mediated?’ they do so precisely because more basic questions like ‘What is voice?’ obscure complex realities of social exclusion, technological empowerment, and ethical responsibility. Technologies that create synthetic voices for speech-impaired individuals, for example, illustrate the need for questioning not just ‘What is voice?’ but ‘Who should have voice and under what conditions?’ Ethical reflection thus becomes indispensable, since merely technical answers risk erasing crucial individual and social contexts.

Furthermore, Esposito’s approach extends into a wider field of ethical and political concerns surrounding voice technologies: economic arrangements shaping access and control; corporate ownership structures governing platforms and data; surveillance infrastructures capable of tracking, storing, and repurposing vocal expression; and design regimes that variously include or exclude disabled users and other marginalised groups. In some cases, these dynamics appear as questions of benefit and innovation; in others, they register as vulnerabilities, as conditions under which vocal autonomy becomes exposed to proprietary constraint or forms of monitoring that exceed individual control. Within this field, Esposito’s account of immunitary excess names a recurrent pattern in which protective or stabilising systems intensify into restrictive ones, redirecting attention towards practices such as open-source development, public-good frameworks, and participatory design processes that redistribute control over vocal technologies.

In this sense, vox communis also takes shape across practices of listening as well as speaking: modes of attention, amplification, and response through which voices are received, circulated, and sustained. These practices connect to broader efforts to support marginalised speakers, to cultivate inclusive communicative environments, and to develop capacities for shared engagement across difference. Here, Esposito’s framework resonates beyond its initial formulation, entering into ongoing discussions of technological mediation, ethical responsibility, and collective life, and continuing to generate new directions for thinking about voice as a relational and evolving condition.

Conclusion

An assessment of the concept of Vox Munus developed in this article (which intersects profoundly with discussions of community and biopolitics) lies beyond the immediate scope of this inquiry. It is initially through Esposito’s account of communitas and munus, as extended here to the question of voice, that we can pose the central question about the body’s relationship to voice, ‘Does the voice have a body?’, and indicate the significance of rethinking voice in terms of collective embodiment. However, there is a broader context and evolution to consider. After the introduction of a communal conception of voice grounded in Esposito’s framework, the exploration shifts progressively towards the ethical implications of technological mediation and synthetic voices. In this evolution, the individual bodily origin of voice becomes less important than the assemblage of listeners and speakers who actualise its meaning collectively. On the one hand, reading Esposito’s thought through the lens of voice suggests that voices never truly ‘belong’ exclusively to single, isolated bodies. Rather, they find their authentic realisation in communal exchanges and practices of listening, positioning voice as a shared resource rather than an individual possession. Thus, by extending Esposito’s concepts of community and obligation to vocal practice, this article challenges traditional individualistic conceptions of voice, giving way to the concept of communal assemblage, where voice is properly understood as a dynamic actualisation through collective participation.

On the other hand, Esposito does not overly romanticise the communal aspect, unlike some theorists who idealise pre-technological forms of community or voice. He acknowledges the complexities introduced by synthetic and digital voices, highlighting their potential to both empower marginalised speakers and pose significant ethical challenges. In Esposito’s framework, the emphasis is on negotiating a careful balance, between openness and security, inclusion and authenticity, in order to sustain an ethical communal voice. This negotiation, therefore, functions not as nostalgia for a lost original unity, but as a proactive, ethical stance that continuously reimagines community and voice amid technological transformation. Finally, one might suggest that as Esposito’s concept of Vox Munus finds resonance beyond his immediate writings, it invites others to further develop and adapt this communal approach to voice in diverse directions, responding to new technological realities and ethical imperatives. Concepts thus demonstrate their own autonomy and evolution, moving beyond their original contexts into broader discourses and practices.

Footnotes

ORCID iD

Mickey Vallee

Mickey Vallee is Canada Research Chair (Tier 2) in Sound Studies and Professor of Interdisciplinary Studies at Athabasca University. His research examines voice, listening, and the role of sound in shaping organizational life, culture, and collective experience.

References

Angrick

Miguel

Luo

Shiyu

Rabbani

Qinwan

, et al. (2024) Online speech synthesis using a chronically implanted brain–computer interface in an individual with ALS. Scientific Reports 14(1): 9617.

Arendt

Hannah

(1998 [1958]) The Human Condition (2nd edn). Chicago, IL: University of Chicago Press.

Barrett

G Douglas

(2021) ‘How we were never posthuman’: technologies of the embodied voice in Pamela Z’s Voci. Twentieth-Century Music 19(1): 3–27.

Bijsterveld

Karin

Cleophas

Eefje

Krebs

Stefan

, et al. (2014) Sound and Safe: A History of Listening behind the Wheel. Oxford: Oxford University Press.

Boháček

Matyáš

Farid

Hany

(2022) Protecting world leaders against deep fakes using facial, gestural, and vocal mannerisms. Proceedings of the National Academy of Sciences 119 (48): e2216035119.

Bull

Michael

(2000) Sounding Out the City: Personal Stereos and the Management of Everyday Life. New York: Berg.

Canny

Evan

Vansteensel

Mariska J

van der Salm

Sandra MA

, et al. (2023) Boosting brain-computer interfaces with functional electrical stimulation: potential applications in people with locked-in syndrome. Journal of NeuroEngineering and Rehabilitation 20(1): 157.

Cavarero

Adriana

(2005) For More Than One Voice: Toward a Philosophy of Vocal Expression. Redwood City, CA: Stanford University Press.

Cavedon-Taylor

Dan

(2024) Deepfakes: a survey and introduction to the topical collection. Synthese 204(1): 14.

10.

Chan

Samantha WT

Gunasekaran

Tamil Selvan

Pai

Yun Suen

, et al. (2021) KinVoices: using voices of friends and family in voice interfaces. Proceedings of the ACM on Human-Computer Interaction 5(CSCW2): 1–25.

11.

Chandler

Jennifer A

Van der Loos

Kiah I

Boehnke

Susan

, et al. (2022) Brain computer interfaces and communication disabilities: ethical, legal, and social aspects of decoding speech from the brain. Frontiers in Human Neuroscience 16: 841035.

12.

Chaudhary

Ujwal

Chander

Bankim Subhash

Ohry

Avi

, et al. (2021) Brain computer interfaces for assisted communication in paralysis and quality of life. International Journal of Neural Systems 31(11): 2130003.

13.

Cheng

Peng

Roedig

Utz

(2022) Personal voice assistant security and privacy: a survey. Proceedings of the IEEE 110(4): 476–507.

14.

Cover

Rob

(2022) Deepfake culture: the emergence of audio-video deception as an object of social anxiety and regulation. Continuum 36(4): 609–621.

15.

de Rancourt-Raymond

Audrey

Smaili

Nadia

(2022) The unethical use of deepfakes. Journal of Financial Crime 30(4): 1066–1077.

16.

Descartes

René

(1998) Discourse on Method and Meditations on First Philosophy (trans. Cress

; 4th edn; Original work published 1637 and 1641). Indianapolis, IN: Hackett Publishing.

17.

Durkheim

Émile

(1997 [1893]) The Division of Labor in Society (trans. Halls

). New York: Free Press.

18.

Esposito

Roberto

(2008) Bios: Biopolitics and Philosophy (trans. Campbell

). Minneapolis, MN: University of Minnesota Press.

19.

Esposito

Roberto

(2010) Communitas: The Origin and Destiny of Community. Redwood City, CA: Stanford University Press.

20.

Esposito

Roberto

(2011) Immunitas: The Protection and Negation of Life (trans. Hanafi

). Cambridge: Polity Press.

21.

Esposito

Roberto

(2013a) Community, immunity, biopolitics (trans. Hanafi

). Angelaki: Journal of the Theoretical Humanities 18(3): 83–90.

22.

Esposito

Roberto

(2013b) Terms of the Political: Community, Immunity, Biopolitics. New York: Fordham University Press.

23.

Foucault

Michel

(2003 [1997]) ‘Society Must Be Defended’: Lectures at the Collège de France, 1975-1976 (trans. Macey

). London: Picador.

24.

Goering

Sara

Klein

Eran

Sullivan

Laura Specker

, et al. (2021) Recommendations for responsible development and application of neurotechnologies. Neuroethics 14(3): 365–386.

25.

Gregory

Sam

(2021) Deepfakes, misinformation and disinformation and authenticity infrastructure responses: impacts on frontline witnessing, distant witnessing, and civic journalism. Journalism 23(3): 708–729.

26.

Gregory

Sam

(2023) Fortify the truth: how to defend human rights in an age of deepfakes and generative AI. Journal of Human Rights Practice 15(3): 702–714.

27.

Guerouaou

Nadia

Vaiva

Guillaume

Aucouturier

Jean-Julien

(2021) The shallow of your smile: the ethics of expressive vocal deep-fakes. Philosophical Transactions of the Royal Society B 377(1841): 20210083.

28.

Gutierrez

Ivan

(2024) The auditory dimension of the technologically mediated self. Open Philosophy 7: 1–13.

29.

Habgood-Coote

Joshua

(2023) Deepfakes and the epistemic apocalypse. Synthese 201(3): 103.

30.

Hobbes

Thomas

(1996[1651]) Leviathan (ed. Tuck

). Cambridge: Cambridge University Press.

31.

Kang

Edward B

(2022) Biometric imaginaries: formatting voice, body, identity to data. Social Studies of Science 52(4): 581–602.

32.

Kudina

Olya

(2021) ‘Alexa, who am I?’ Voice assistants and hermeneutic lemniscate as technologically mediated sense-making. Human Studies 44(2): 233–253.

33.

Locke

John

(1980 [1689]) Second Treatise of Government (ed. Macpherson

). Indianapolis, IN: Hackett Publishing.

34.

Luo

Shiyu

Rabbani

Qinwan

Crone

Nathan E

(2022) Brain-computer interface: applications to speech decoding and synthesis to augment communication. Neurotherapeutics 19(1): 263–273.

35.

Malodia

Suresh

Islam

Nazrul

Kaur

Puneet

, et al. (2024) Why do people use artificial intelligence (AI)-enabled voice assistants? IEEE Transactions on Engineering Management 71: 491–505.

36.

Mauss

Marcel

(1990 [1925]) The Gift: The Form and Reason for Exchange in Archaic Societies (trans. Halls

). New York: W. W. Norton & Company.

37.

Melzner

Johann

Bonezzi

Andrea

Meyvis

Tom

(2023) Information disclosure in the era of voice technology. Journal of Marketing 87(4): 491–509.

38.

Merleau-Ponty

Maurice

(1968 [1964]) The Visible and the Invisible (trans. Lingis

). Evanston, IL: Northwestern University Press.

39.

Millière

Raphaël

(2022) Deep learning and synthetic media. Synthese 200(3): 231.

40.

Nancy

Jean-Luc

(1991 [1986]) The Inoperative Community (trans. Connor

). Minneapolis, MN: University of Minnesota Press.

41.

Poushneh

Atieh

(2021) Humanizing voice assistant: the impact of voice assistant personality on consumers’ attitudes and behaviors. Journal of Retailing and Consumer Services 58: 102283.

42.

Rincón

Cami

Keyes

Cath

Corinne

(2021) Speaking from experience. Proceedings of the ACM on Human-Computer Interaction 5(CSCW1): 1–27.

43.

Rosa

Hartmut

(2019) Resonance: A Sociology of Our Relationship to the World. Cambridge: Polity Press.

44.

Rosa

Hartmut

(2023a) The Uncontrollability of the World (trans. Wagner

). Cambridge: Polity Press.

45.

Rosa

Hartmut

(2023b) Resonance as a medio-passive, emancipatory and transformative power: a reply to my critics. Journal of Chinese Sociology 10: 16.

46.

Sankaran

Narayan

Moses

David

Chiong

Winston

, et al. (2023) Recommendations for promoting user agency in the design of speech neuroprostheses. Frontiers in Human Neuroscience 17: 1298129.

47.

Schanke

Scott

Burtch

Gordon

Ray

Gautam

(2024) Digital Lyrebirds: experimental evidence that voice-based deep fakes influence trust. Management Science. Epub ahead of print 1 November. DOI: 10.1287/mnsc.2022.03316.

48.

Spinoza

Benedict de

(1992 [1677]) Ethics, Treatise on the Emendation of the Intellect, and Selected Letters (trans. Shirley

). Indianapolis, IN: Hackett Publishing.

49.

Stavisky

Sergey D

(2025) Restoring speech using brain-computer interfaces. Annual Review of Biomedical Engineering 27(1): 29–54.

50.

Sterne

Jonathan

(2021) Diminished Faculties: A Political Phenomenology of Impairment. Durham, NC: Duke University Press.

51.

Turner

Victor W

(1969) The Ritual Process: Structure and Anti-Structure. New Brunswick, NJ: Aldine Transaction.

52.

Weber

Heike

(2010) Head cocoons: a sensori-social history of earphone use in West Germany, 1950-2010. The Senses and Society 5(3): 339–363.

53.

Willett

Francis R

Kunz

Erin M

Fan

Chaofei

, et al. (2023) A high-performance speech neuroprosthesis. Nature 620(7976): 1031–1036.

54.

Williams

Tom

Matuszek

Cynthia

Jokinen

Kristiina

, et al. (2023) Voice in the machine: ethical considerations for language-capable robots. Communications of the ACM 66(8): 20–23.