Abstract
Intermingling new and old media, this article introduces the concept of phonographic theatricality to explore the performativity of human-machine vocality. It jointly discusses the theatricality of historical and new sound media: media principles that characterized the phonograph in its emergence are still evident in the speech of contemporary AI-voice agents. Phonographic theatricality is a cross-media concept describing the conditions, playfulness, contingencies, and implications of the actualization of pre-recorded voices. It covers a broad range of cases in which a machine injects a human sounding voice into a situation, usually acousmatic in its detachment from a human body, yet performing as a theatrical agent. The analysis demonstrates how sound media theatricalize truthfulness, fidelity, illusion, and trickery as part of their mediation of voices, and how humans and machines mingle in producing phonographic theatricality. Three historical dramas, as well as contemporaneous newspaper articles, caricatures and ads, unveil the rise of phonographic theatricality as a media principle, inviting and affording a discussion on the performativity of AI voices. When machines speak and laugh in human voices, they become actors, vocally reshaping the space in which they operate, charging it with theatricality.
In March 2018 the New York Times reported that users of Alexa, Amazon’s voice assistant, were startled by its erratic behavior, as it was ‘letting out an eerie laugh’ (Chokshi, 2018). Anxious users reported that these laughing bursts were unprompted and unrelated to any detectable reason (Liao, 2018). Some argued it started independently interrupting conversations; others described its laughter as uncannily resembling that of an actual person laughing near them; one even claimed Alexa spontaneously began listing cemeteries around their vicinity. Alexa deviated from script, and users rationalized this as technical malfunction, or as the accidental unmasking of Amazon listening to their domesticity; others hypothesized on the autonomy of the machine, or speculated on lurking ghosts animating the inanimate. Amazon fixed the problem by tightening the correlation between prompts and their outcomes, but some questions remain: Why was Alexa, an AI agent devised for efficiency, encoded with a laughing voice in the first place? How does this feature shape the relationality between generating AI voices and listening to them? Which kind of audial human-machine dynamics does this phenomena manifest?
Virtual assistants such as Alexa, as well as other platforms using text-to-speech applications, such as TikTok or OpenAI, are predesigned with a persona, programmed to impersonate a human-like entity. In their dependency on algorithmic manipulations of pre-recorded voices, these technologies enmesh human and mechanical elements and may produce uneasiness but also fascination. Humor generates an amusing atmosphere, which might elevate the uncanniness of a speaking machine, or enhance it, and generally contribute to the liveliness of these synthetic personas. Alongside other characteristics such as human names (Amazon’s Alexa, TikTok’s Jessi) or distinctive voice tint (authoritative, jumpy, snarky), laughter injects elements associated with humanness into Alexa’s algorithmic act. Scripted expressions including sarcastic answers and cynical jokes work towards the same aim (Natale, 2021).
By way of impersonation, Alexa continues age-old imaginations of media as animated: it is covertly dubbed by professional voice actors who create the illusion of liveness; it repurposes these disembodied voices to perform a quasi-independent entity; these nourish its plausible perception as haunted and governed by the occult (Scones, 2000). No wonder Alexa’s users were disturbed by its unexpected and uncontrolled bursts of laughter; they perceived it as either a technical feature gone wild, looked for the human within the machine, or explained its behavior as a supernatural curiosity. This ambivalence, we suggest, is an outcome of the sonic reshaping of a given space by replayed mechanical voices, a phenomenon we explain in terms of theatricality. The theatrical occurs when someone enacts in front of someone else (Bentley, 1964). When machines speak and laugh in human voices, they become actors, vocally reshaping the space in which they operate, charging it with theatricality.
Theatricality, we argue, is a core principle in all media, affecting multi-sensory experiences. The theatricality of phonographic media, from the phonograph and gramophones to contemporary AI-voice agents, issues from their ability to inscribe, archive, and replay voices, thus dramatizing tensions between presence and absence. The concept of theatricality affords a discussion on the ways phonographic media sonically manipulate spaces by replaying voices, imbuing them with role-playing, impersonation, and identity multiplicities. Theatricality is a dynamic cultural quality recurring in different media operations, and interconnecting previous and contemporary sound technologies by highlighting a particularly understudied aspect of their media-logic. Each medium is theatrical in its own way, summoning idiosyncratic theatrical experiences. To explain the unique ways by which all sound media are potentially theatrical, we coin the term phonographic theatricality. Phonographic theatricality is a cross-media concept describing the conditions, playfulness, contingencies and implications of the actualization of pre-recorded voices. It covers a broad range of cases in which a machine injects a human sounding voice into a situation, usually acousmatic in its detachment from a human body, yet performing as a theatrical agent. Phonographic theatricality highlights the enmeshment of the aforementioned human dramaturgical qualities with non-human features such as automaticity or replication, resulting in beyond-human horizons for voicing voices. As a methodology, it highlights the in-betweenness of the theatrical qualities, situated in the relational dynamics between listeners, sound technologies, and their designers.
This paper traces the roots of phonographic theatricality in historical dramas and newspaper articles as well as caricatures and advertisements. These reveal the imaginaries of phonography when it was still a novelty, and point at the cultural impact of phonographic theatricality outside theater and in the context of everyday life. Phonography introduced new dramaturgies to theater, while theater highlighted the theatricality of replayed voices. This encounter mixed phonic truth, fidelity, and realism, with illusion, magic, and imitation. In what follows, we will first define theatricality in terms of media principles, then examine the crystallization of phonographic theatricality as manifested in three plays. As a media concept, phonographic theatricality advances our understanding of how mechanical voices function in culture. It points at similarities between phonographic moments separated by more than a century: aspects of phonographic theatricality and its imaginaries which were true at the end of the nineteenth century are still relevant, mutatis mutandis, in contemporary society and its algorithmic-ruled voice technologies. The theatricality of other phonographic media, like tape recorders or podcasts, follows the basic principles we discuss. We conclude by bringing together these principles as they manifest in contemporary AI-voice applications.
Theatricality as a media principle
Theatricality is woven into the fabric of social thought: from Ervin Goffman’s (1971) dramaturgical model conceptualizing the social drama of everyday life as composed of masking, ‘front-’ and ‘back-stage’; through Butler’s (1990) focus on drag and masquerade as the main technique for gender roleplay and impersonation; continuing with the fluctuation of power structures and social hierarchies as played by actants in Latour’s (1996) Actor-Network Theory. Recently, research has explored the theatricality embedded in the computer and AI industries (Laurel, 1993; Nagy and Neff, 2024; Timplalexi and Rizopoulos, 2024). We further analyze the media qualities of theatricality and the ways in which it is actualized in vocal techniques, jointly created by humans and non-humans. Phonographic theatricality pivots theater studies with critical analyses of past and present media, and enriches voice studies with nuanced sensitivities to multivocalities, ambiguities, and the human-machine hybridities.
Research addresses theatricality as a relational artistic concept, which defines the embodied- and time-space manipulations occurring during a performance, and describes social dynamics between interlocutors (Feral, 1982). Theatricality frames role-playing, illusion, and masquerade, as well as superficiality, deception, and frivolousness, within one notion. It is packed with contradictory and complementary meanings, and as such it captures the elusiveness of human experiences in exaggeratedly dramatic situations (Fischer-Lichte, 1995: 88). An obscure term, it marks the attempt to grasp the parallel perceptual planes and communicative channels simultaneously infrastructuring the performance. It complicates our understanding of converging and not necessarily synchronic components – actors, roles, props, scenery, voices and sounds, dramatic scripts and more. Studying theatricality as derived from particular media allows concretizing its affect against its theoretical obscurity, and particularizing the impacts of the performativity of mechanical voices.
Phonographic theatricality conceptualizes the reshuffling of assumed associations between voices and bodies, and humans and machines, highlighting the intentional multiplicity embedded in the mediated performance. As a media principle, it embarked with the phonograph which introduced mechanical disembodied human voices which it recorded and replayed; subsequent sound media, from the gramophone and radio to podcasts, Siri or Alexa, transfigure these phonographic body-voice associations, with each medium reproducing slightly different configurations of this quality. What, then, are these theatrical qualities that the phonograph introduced? How is the theatricality of contemporary voice media informed by that of early phonography?
Phonographic theatricality specifies theatrical tensions generated by media entanglements of replayed recorded voices. As an analytical term, it brings together the phonograph and the theater, as two media committed to activating vocal potentials, albeit in different ways: on stage, actors perform prewritten and rehearsed texts, whereas the phonograph resounds pre-recorded sounds inscribed in its grooves. Without this activation, both the drama and the record are just sonic potentials awaiting performance. Additionally, both the phonograph and the theater have visual sides that enhance theatrical tensions between the sonic and the visual.
Theatricality and phonography are bonded in a multifaceted connection. Theatricality mobilizes audiences between concrete and imaginary times and places, while they physically remain in the theater hall. Similarly, the phonograph opens sonic dimensions additional to the happenings on stage, thus expanding the theatrical experience via auditory channels. This experience also submits the audience to time manipulations and the branching of parallel, dependent, and interlacing timelines: the actual time that the audience spends in the theater serves as a base for grappling with a variety of fictional times of the performance. Similar time-plays characterize the phonographic operation that both haunts time and is haunted by time: its media-logic necessarily employs a latency derived from the gap between the time of the recording and the time of the replay; its activation brings sounds from another time and is bound to repeat a sonic event that has already happened and faded. Phonographs additionally offer several time manipulating functions, such as repeat, fast/slow-play, stops and jumps, rewind and play backward. On stage, as famously demonstrated in Beckett’s Krapp’s Last Tape, these may impose theatrical illusions or unveil them as serving avant-garde disenchantments (Connor, 2014).
Playing with theatrical disguise and illusion, as well as with their exposure, is also maintained through inter- and intra-changeability of characters and actors. Voices serve for both masquerading and unmasking identities, and for delivering information supporting or confusing the visual channel. Similarly, the phonograph resonates multiple voices, only limited by the records it plays. In performance it may generate the illusion of aliveness emerging from the ‘dead’ machine. For this reason, it has been studied in terms of ventriloquism: like the theatrical practice of embroiling voices and bodies for animating the inanimate (e.g., a puppet), the phonograph replays the voices of others, enlivening dormant acoustic entities. This ventriloquistic act is part of a broader category of media ventriloquism (Baron et al., 2021; Cooren, 2010; Ramati and Abeliovich, 2024) which already defines body-voice entanglements as theatrical and mediated. On stage or behind the scenes, the phonograph operates as a theatrical tool.
Respectively, it fulfills the roles of director, prompter, and actor. Starting with the actor: the phonograph epitomizes some known and inter-related dramatic archetypes, among them the messenger, who is occasionally an immigrant, delivering a message from afar; the witness, informer, eavesdropper, and spy, all of whom report on what they saw or (over)heard; the thief, who transfers items or information from one location to another (typically to his possession); and the fool, who also repeats others’ sayings, mostly without being asked to, and while ridiculing their phrases. Like these types, the phonograph sounds pre-recorded vocal information, delivering it from another time and place; it ‘eavesdrops’ and then repeats other’s sayings, testifying to them, while ‘stealing’ their voices and producing the illusion of inherent vocal aliveness; its exact but also stochastic repetition may inject an eerie atmosphere to the stage or provide comic relief. It occasionally vocalizes an omniscient but absent or extradiegetic entity, associated with the Divine. All these roles serve overlapping theatrical features, embodied in one machine-actor.
Such capacities are the on-stage manifestations of the ways in which the phonograph integrates into the theatrical apparatus, derived from the interrelations between the dramatic text and its performance. Two off-stage characters are associated with this dynamic: the director who realizes the fictional creation of the playwright, and the prompter who extends the authority of the director by repeating their wordings and instructions during the performance. Both stand between the playtext (and its author) and the performance (the actors); they orchestrate the production from behind the scenes, the director in rehearsals, the prompter in real time. The prompter is an intriguing vocal agent acting from the background, heard only by the actors on stage but not by the audience. The prompter is required to listen attentively during the performance, conducting its pace and conveying the dramatic replicas to the actors on time. The prompter is a crucial cog in the array that acoustically shapes the theatrical space.
The phonograph reorients the basic theatrical situation, repurposing and enriching its fundamentals: it records sounds and then reenacts them. Sound media, specifically the ones both recording and replaying sounds, similarly act as prompters when they intermingle in everyday situations: they ‘listen’ when they record voices, and when they replay them, they may manipulate their tone, timbre, pace, and rhythm; they acoustically reshape and stimulate their surroundings, blending or interfering in aural events. The theatricality of a computerized voice is a contemporary lived experience, but the same theatrical features are evident in earlier phonographic media. A phonograph playing cantorial music in a room changes its acoustic atmosphere; a radio broadcasting hourly news injects the outside world into the domestic sphere. Such technologies audibly interweave theatricality into everyday situations, transforming both the time and space in which they operate. They integrate as part of the soundtrack of everyday life, orchestrating movements and actions: in airports, when an acousmatic voice commands all manner of stage directions such as the closing or opening of gates, rushing the last passengers to board, or announcing security and other informative instructions; or on public transportation, when a recorded voice of a human voice-professional narrates the journey of a bus or train. Theatricality is part and parcel of the mediation of sound technologies.
Similarly mingling everyday practicalities with artistic expressivity, the theatricality of AI-voice agents depends on blackboxed operations, directing their acoustic front from ‘backstage’, mixing and manifesting various theatrical roles: the actor, director, prompter, and audience. To imitate human-to-human communicative practices, they depend on algorithmic networks that process and repurpose pre-recorded human speech and re-concatenate phonemes and syllables ventriloquistically (Ramati, 2024). Their theatrical act is conversational, bidirectional, and multilayered, deepening users’ participation in their operation. First, in their replies, they compel human feedback and recursive engagements, prompting users to specify their searches and quarries. Sometimes users ask them silly questions, as human-machine boundary work that tests the limits of their human-like mechanical personas (Lind and Dickel, 2024). Their predesigned monotonous voice creates a comic tension between the taunting questions and the flat sounds of their answers. In other words, playing with Siri’s ‘mind’ by asking it ‘who is your mother?’ reflects a theatricality that moves humans to action, causing them to check the limits of the machine, thus reconstituting and reassuring themselves as humans and the machine as machine.
Second, for their function of retrieving information from the web, they must eavesdrop, silently anticipating users’ prompts, ready to answer and serve. The persona of a personal assistant is a vocal theater mask, camouflaging an inanimate object. Lurking and listening quietly in the room before suddenly vocally awakening as pseudo-human actors, they spy on users, surveilling and gathering data, and informing tech giants about their intel (Natale, 2021). Under dubious contracts, they take private information, transforming it into tradable commodities, trafficked into the possession of these corporations (Neville, 2020). In their phonographic theatricality, Siri and Alexa accompany everyday domestic life but also act as agents of espionage (Hepp, 2020; Mascheroni, 2024; Szendy, 2017); they are prompted by their users but they also prompt them from backstage; they generate an illusion of a dialogue with a sentient entity which is based on their silent constant presence as animated vessels. Ventriloquistically reassociating seemingly acousmatic voices to a mechanical container, AI agents culminate and reshuffle earlier phonographic theatricalities. Returning therefore to the first decades following the invention of the phonograph, when theater utilized the machine on stage, may reveal the primal scene out of which phonographic theatricality emerged as a media principle.
Theatrical truth and fidelity
Soon after the invention of the phonograph, theater began experimenting with recorded sounds, upgrading convoluted melodramatic plots with the cutting-edge machine. Its aesthetic potentials were exploited for intermingling recorded voices within live staged performances. For example, Eden Phillpotts’ A Platonic Attachment (1889) turned the phonograph ‘to dramatic account’, advancing the plot by exposing private conversations, mischief, and forgery (The Aberdeen Journal, 26 December, 1888, 3). In this play, a phonographic recording of a confession made in the gentlemen’s club unveils the true intentions of the protagonist, informs the other characters of his motivations, and saves the leading lady from the con artist. By eavesdropping on a voice that was uttered outside of the play’s performance-time and channeling it onto the stage, the phonograph plays the role of an informer revealing the truth in a crucial moment. Similarly, the phonograph in The Dangers of London by F. Scudamore (1890) ‘plays a prominent part in proving the innocence of the heroine’ who is wrongly convicted of theft (Bristol Mercury, June 17, 1890, 5). In both plays, the phonograph resounded a new kind of evidence, retrieving otherwise inaccessible vocal moments to which it was an audial-witness. In A Platonic Attachment it resounded the uttered confession of the imposter, whereas in The Dangers of London, it gave voice to the victim.
Heralding the age of media witnessing (Frosh and Pinchevski, 2009), end-of-the-century theater realized a key potency of the phonograph as a witness, implied in Thomas Edison’s Phonography and its Future (1878). Edison specifies potential applications of phonography, some targeting administrative contexts such as dictation, while others are imbued with theatricality and generating ‘speculative imaginations’, such as speaking dolls, audio books, capturing the last words of famous people, or listening to deceased ancestors (ibid., 527). Edison ties these future functionalities of recorded sounds with the qualities of ‘authenticity’, ‘accuracy’, and ‘fidelity’, and as generating ‘faithful reproduction’ (ibid., 528–531). These qualities are grounded on the idea that the phonograph witnesses and records on-site acoustic events to resound them later. The phonograph was from its invention committed to testifying on indexical sonic occurrences which were perceived as truthful. The theater of that time picked up on these values, but underscored the theatricality saturating them, thus forming an intermedial connection between phonography and theatricality. Harnessing the phonographic witnessing qualities to theatrical scenes, it exposed their inherent ambivalence, mixing evidentiality, truthfulness, and faithfulness with their imagination and mediation. Theaters staged phonographically mediated testimonials as sound-evidence and as a truth-revealing theatrical mechanism in pivotal dramatic scenes.
The phonograph refashioned well-established theatrical models. For example, it remodeled the conventional ‘screen scene’, in which acousmatic voices are heard on stage without a visible anchor (Rokem, 2015). In the screen scene, a character eavesdrops while the audience witnesses this clandestine presence; the knowledge accumulated via listening thickens the plot and advances the drama. Replacing the eavesdropper and reassociating the voices to a mechanical vessel, the phonograph rearranged the media relations typical to the screen scene: it restructured sound-space relations and resituated the audience, adding a layer of mediation to their witnessing; it channeled mechanical sounds that transgressed the staged present, exposing the audience to them and charging these replayed voices with a sense of imagined authenticity. The phonograph on stage presented mechanically mediated witnessing as a theatrical mechanism for uncovering truth.
The phonograph provides a truth-revealing climatic moment of this type in The Phonograph Witness by Hill (1882), as part of a murder investigation which is ultimately solved in court. Luther Tenniel, a banker, is murdered in his office by his associate Merton Courtney. Courtney sends his fixer Jim Backstone, an ex-con, to cover up evidence by setting the bank on fire. Consequently, the police suspects Edward Lee, who is Tenniel’s employee and the last person to be with him, and also the fiancé of his daughter Helen. A significant part of the drama is dedicated to finding a voice recording from Tenniel’s office: Richard Fellows, Helen’s uncle, has conducted an experiment with a phonograph hidden in Tenniel’s office, transforming it into a ‘Phonographic Chamber’ that casually and continuously records all acoustic happenings in that room (ibid., 9). Although Tenniel was not keen about people being ‘secretly reported in that way’ (ibid.), this recording is the evidence that eventually acquits Lee and convicts the murderer. Alas, throughout the play, this recorded evidence is missing, creating a pivotal gap in the plot and shaping the dramatic tension.
The phonograph is an absent protagonist which drives an extensive search due to its truth-revealing potency. When the missing recording reappears, it creates a vox ex machina moment, in which a recorded utterance uncovers vocal evidence, thus rapidly untangling the dramatic conundrum. Resounding the violent argument that leads to the murder, this drama remodels the familiar theatrical deus ex machina into a voice-machine centered theatrical scheme. It attributes to the phonograph god-like, omni-present powers, always listening, recording, and, potentially able to resound the truth, revealing information at the right moment. The formation of vox ex machina as a theatrical technique informed the mediated voice with ‘authority’ and trust, generating ‘obedience and belief’, subsequently transfiguring in voice tropes such as ‘his master’s voice’, the acousmatic narrator in films, or in airport and train stations announcement systems (Chion, 1994; Kane, 2014: 213). The acousmatic situation that the phonograph generates is essential to its theatrical zest, and theater relied on it to refashion some of its well-established schemes; subsequent sound-media further entangled the acousmatic experience as part of their phonographic theatricality.
Hill plays with the famous Chekhovian theatrical observation: the phonograph mentioned in the first act eventually ‘shoots’ the vocal evidence at the decisive point of the drama. For this manipulation to be credible, especially as the phonograph was still unknown to most audiences, the play argues for the legitimacy of sound recording as evidence, as emphasized in the name of the play. In grounding the phonograph as a trustworthy device, the play extensively debates the evidential strength of sound recording. In the first trial scene, the judge asks Fellows to explain the work of the phonograph and its capability ‘of giving out the actual record of all that took place in Mr Tenniel’s private office’ (Ibid., 19). Hidden during most of the play, the phonograph is revealed only during the court scene, resonating with prevailing conceptions of media as transparent and neutral conduits which capture reality with no ulterior motive. Experimenting with the notion of the phonograph as a device for total surveillance, this play imagines recorded vocal testimonies as self-operating, unintentional, and apparently unmediated. It thus demonstrates an essential side of phonographic theatricality which is the fantasy of automatic, casual, and thus impartial sound-witnessing.
End-of-the-century theater was not the only arena for these media imaginaries. Contemporaneous newspapers reported on various incidents in which the phonograph served to unveil audial events, and to transmit into court actual sounds from various crime scenes. For example, in 1907 several newspapers reported on a Belgian lawyer who used the phonograph to record the annoying ‘noises of hammering’ in his neighborhood and resounded them at court as evidence in his lawsuit against the factory responsible for the nuisance (Evening World’s Magazine, May 18, 1907). Regardless of the low fidelity of early sound records, and the technical limitations at the time, the notion that the phonograph could faithfully document audial happenings inspired new media imaginations. The Butler Weekly Times (March 26, 1903) reported on a divorce trial in which ‘a young married woman’ used the phonograph to spy on her unfaithful husband, catching on record his love-affair conversions. This recording served at court ‘as a witness’ to his infidelity.
Whether these stories actually took place, were exaggerations, or total fakes, these articles reflect a broader conception about the potential of the phonograph as a witness, propagated also by theater. Journalists realized the media-shift in the nature of evidence as it took place, envisioning that if ‘the use of the phonograph as a witness becomes general it will also become a universal memorandum. Contracts of all sorts, from a merger deal to a promise of marriage, can be recorded as infallibly as on paper or parchment’ (Deseret Evening News, March 20, 1906, 4). Phonography subsumed the truth-value inscribed in previous media, and assigned authority to auditorily mediated reality. This reality was sometimes fabricated or reenacted, as shown in Stadler’s (2010) study of sound recordings of violence against African-Americans which were disseminated as commercial entertainment. Therefore, the gap between the actual technological phonographic capabilities and the imagination of their potencies gave rise to theatricality as a techno-mental mechanism compensating for lack of knowledge; what was not known was imagined. As part of their phonographic theatricality, mediated voices took on the role of truth-tellers, whether this truth was hyperbole, fabricated, or imagined.
The alleged ‘faithful reproduction’ of voices that Edison envisioned in his manifesto were charged with duality from the start: striving for accuracy and authenticity, but trusting reproduction, recreation, and duplication to deliver all these values. When theater integrated the phonograph as part of its arsenal of stage techniques, it underscored the theatricality characterizing the phonograph, and the truth-value of sound recording and listening experiences as saturated with multiplicity, doubt, and contradiction. Phonographic theatricality reconfigured the assumed boundaries between truth and fabrication, authenticity and designed mediation, and between reality and its imagination.
This impossible link between imaginative theatricality and mediated truth emerged as a pivotal feature of phonographic theatricality, and thus is also evident in voice assistants: in the same way that theater and newspapers used the phonograph to structure sonic events as real, so do Alexa and Siri gain their credibility from their vocalic omni-presence, subsuming their authority as truth-tellers from their vocal fidelity (Levy-Landesberg and Cao, 2024; Natale and Cooke, 2021). They vocally epitomize Apple and Amazon, but we believe them because they sound human (Fetterolf and Hertog, 2023). They mingle the god-like authoritativeness of the acousmatic voice coming out of the machine with the mundane operation of their human-like speech. They ventriloquize professional human narrators, re-concatenating multiple voices, and can channel any voice put into their algorithmic infrastructure. These human-like acousmatic voices sound true, and at the same time tap into imaginations about automatons (Geoghegan, 2020), underscoring the link between the magical and the mechanical.
The magical and the mechanical
Turn-of-the-century theater not only established phonographic theatricality around truth-revealing moments and authoritative voices, but also approached this media principle playfully, emphasizing illusions, marvels, and trickery. It presented phonographic theatricality as mixing the magical and the mechanical, reconsidered human-machine dichotomies, and thematized vocal utterances as fraught with multiplicity.
David Pinski’s 1918 The Phonograph portrays the introduction of sound technology to a remote Russo-Yiddish town where Jews relied on ‘tradition and miracles’ (Pinski, 1920: 1). It focuses on Nahmen Riskin, a local entrepreneur returning from America to his hometown, smuggling a phonograph across national borders, with which he plans to establish his new business of public phonograph concerts. The marvelous device, perceived by the locals as a miracle, satisfies the community’s hunger for entertainment, resounding the voices of famous cantors from across the continent and beyond the Atlantic Ocean.
The comedy unfolds in Riskin’s home, when he introduces the phonograph to his hometown folks. At the center stage, bathed in a single spotlight, stands the phonograph. Family, friends, and some influential members of his community – the Rabbi, the Cantor, a local merchant, and their wives – gather and wait for the recorded concert to begin. In the meantime, a less prestigious triad of typical shtetl characters crash Riskin’s concert, mockingly reflecting and undermining traditional social hierarchies: the thief, the informer to the Tzar’s authorities, and the shtetl idiot. This arsenal of characters presents actual and metaphorical aspects of vocal relations derived from the phonographic logic: the Rabbi’s authoritative voice leads the community; the Cantor’s chanting of liturgies and religious songs resonates communal and individual emotions; the thief and merchant are both interested in making profit, albeit differently, by stealing the phonograph and the voices it plays; the informer eavesdrops conversations and threatens to pass them over to the authorities, acting as a vocal medium – he reorients the utterances of others by re-voicing and redirecting them to those who were not their intended addressees. Another phonograph-like character is the shtetl idiot: he repeatedly introduces himself as a ‘funnygraf’, singing wildly (ibid., 23). Throughout the comedy, the town folks’ voices are only heard from off-stage, constantly interjecting into the scene. These characters personify different phonographic functions, and together they amplify the role of the phonograph as a leading actor in this play, reflecting the new sonic reality of the turn of the century. Pinski’s play presents phonographic theatricality as a central dramaturgical mechanism, woven into all aspects of the staged spectacle.
When Riskin’s concert begins, attendees marvel at the machine’s ability to mobilize into the house voices singing sacred songs. They perceive this as either supernatural divine intervention, witchcraft, or as a ventriloquistic theatrical trick. The Cantor is amazed that the ‘box here sings like a cantor and his choir’ (20), and obsessively tries to identify the mechanically reproduced voices of famous cantors. The Rabbi unites the mechanical box with the sacred songs emerging from it: ‘Wonder of wonders! I can’t find words. There stands a box, and out of the box issues a voice, a chorus of voices […] and chants a prayer […] Miracles! A miracle of creation!’ (22). The merchant’s wife echoes his astonishment: ‘Why, the thing is really praying!’ (21), and her husband suggests, ‘I guess that when the Holy Days come […] we’ll rather put this phonograph before the altar, instead of the cantor’ (22). His suggestion of replacing man with machine prompts the cantor’s wife to murmur wryly, ‘If I only had that box for a husband!’ (24).
Conceiving the phonograph as obtaining miraculous creative powers, by producing the voices it replays, and thus as a machine that can potentially replace Cantors, echoed imaginaries regarding human-machine phonic relations. For example, a 1910 caricature from the New York Jewish satirical newspaper Der Kibetzer depicts a congregation in which the new Edison devices – the phonograph and the movie projector – replace the Rabbi, the Cantor and the choir in chanting VeNetane Tokef, one of the most recorded Jewish chants (Image 1). This caricature graphically illustrates phonographic theatricality, confusing the threshold between entertainment and spirituality. But it is also a parody of a real-life historical phenomenon: when early twentieth century New York synagogues were crowded during the high holidays, some congregations hired local playhouses and movie theaters to accommodate the communities’ requirements (Thissen, 2017). Prayers were delivered in spaces designated for popular entertainment. This caricature celebrates the intersection of several assumed cultural dichotomies: by relocating the holy ceremony it breaches the boundaries between the synagogue and the theater; it assigns technical means to a mystical duty; and it suggests replacing humans with machines. By chanting prayers, the phonograph embroiled spaces, times, humans, and technologies, converging them through its theatricality. Caricature from the satirical newspaper Der Kibetzer, 23/09/1910.
Pinski’s comedy also ponders the phonographic enmeshment of the divine and the earthly, the magical and the technical, the earnest and the frivolous. The Rabbi’s wife is reluctant to believe the phonographic wonder: ‘I thought it was some sort of huge box and that the cantor and his choir got inside’ (20). Later, she ‘peeps under the table seeking some concealed persons, and then eyes Nahmen with a penetrating glance, to see whether the tones are not coming from him’ (21). She laughingly hypothesizes: ‘maybe Reb Nahmen is a ventriloquist’ (23). Finally, she double-checks with Nahmen that ‘there isn’t any witchcraft about this’ (ibid.). She defines the performance of the phonograph as gravitating between theatrical illusion, such as ventriloquism or stage tricks, and the supernatural, such as magic and witchcraft. The Cantor’s wife appeases this conflict by proposing the aesthetic option: ‘Let it be witchcraft, so long as it’s beautiful’ (24). Unlike the Rabbi’s wife, she perceives the magical as emerging from the mechanical. The attractivity of the phonograph therefore derived from the acousmatic gap it opened between heard voices and the absences of their sources on stage. Ventriloquistically bewildering its audience, the phonograph prompted the ambivalence between the supernatural and the rational, embracing them both within its theatricality.
When the Cantor’s wife asks Nahmen to ‘bring the cantor in the box back’ (23), she underscores the tension embedded in phonographic repetitions: understanding the phonograph as a machine for resurrection, performing a ‘miracle of creation’ based on its potency of mechanical reproduction. These two women represent opposite positions along the spectrum between rational doubt and the suspension of disbelief: while one is trying to find the trick behind the aural marvel, the other accepts its elusive charm. Through these opposing stances, Pinski describes the phonograph as a new medium which is explained in terms of old media: witchcraft and sorcery or human-made technical illusion. The phonograph on stage serves Pinski to discuss tensions agitated by the introduction of the machine into domains traditionally regarded as exclusively human: by voicing human voices without detectable live sources, the phonograph problematized assumed causalities between voices, presence, liveness, and the lack of these in the mechanical. The talking machine presented a new type of vocal presence.
This was supported by a visual aspect: the body of the machine replaced the absent human body as a medium for reproduced voices. A spectacle and attraction in its own right, the phonograph was advertised in relation to theater experiences. Turn-of-the-century ads depicted the phonograph as replacing both orchestra and singers. For example, an 1899 poster captioned ‘Have you heard it?’, displays Edison’s phonograph as the central attraction of the show performing before a full concert house (Image 2). It compares the phonographic sound quality to that of a live concert, and can therefore replace human performances. This example stands out: most contemporaneous ads presented the phonograph or gramophone in the privacy of domestic spaces, sounding ‘the stage of the world’ or ‘the grand opera’ at home, thus creating a ‘fireside theater’ (Images 3–5). These ads presented the sound machine as ‘an entertainer’ which encapsulates various shows in one device reproducing the vocal presence of human performers and evoking recollections from the theatrical event. In their wildest visions, these ads promised the ‘best seat in the house’, stating that ‘a home without a Victor is a stage without a play’. In them an essential principle of phonographic theatricality is concretized: the phonograph instilled external sounds into the domestic sphere, facilitating imaginations of theater sounds while enjoying armchair comfort. The Edison concert phonograph poster. The U.S. Printing Co.,1899. Library of Congress. Columbia Grafonola, 1916. PhonoArt website. The Columbia Grafonola, 1917. PhonoArt website. Fireside theatre, 1907. PhonoArt website.



As with the telephone and the telegraph, albeit with important differences, the phonograph mobilized voices between times and spaces, defining phonographic theatricality as sonically intermingling different spaces. The phonograph introduced vocal proximity independent from physical presence, underscoring the concurrency of its mediated voices. Metonymic to his phonograph, Nahmen is a migrant who crosses borders and returns to his motherland, venturing to profit from importing, roam around the provinces and play records he brought with him. Pinski reflects here on actual historical practices, when phonograph concerts were held outdoors on streets and market squares (Abeliovich, 2024; Morat, 2019). The theatricality of the phonograph was committed to the convergence of separate spaces, overlapping and condensing faraway places as one imagined continuum, based on the teleportation of voices. The phonograph was a bidirectional messenger, delivering voices from afar and creating a shared listening experience while at the same time transporting the audience into imagined space-time.
In Pinski’s play, the listening experience is based on commercial phonograph records. Distributed across the globe, such records created a popular repertoire that was shared by actual and imagined listening communities. In the comedy, the listening community indulges in guessing the identity of the all-star cantors’ recorded voices, reflecting real-life sonic cultural networks. Akin to theater audiences who share the experience of theatrical space and time, phonograph audiences around the world shared a voice-dependent theatricality that created an imagined co-presence in which sounds, temporalities, and spaces were condensed together. Phonographic theatricality is infrastructural to community building: it capitalizes on a chain of tensions between connectivity and detachment, and between distance and proximity, while relocating voices, places, and human imaginations.
The phonographic opening of sonic spaces, and the building of listening communities, granted status to these celebrities’ voices, introduced renowned performers to new audiences, and elevated lesser-known voice artists to stardom. For example, among the first world-renowned phonograph stars was Enrico Caruso, already famous before his voice was recorded; the recordings amplified his stardom and made his voice famous across the globe (Williams, 2021). This empowerment-reinforcement cycle reflects a media modus operandi: media build on the status of celebrities, extending their popularity, but also grant status to new voices, uplifting them to fame. This mechanism is based on the audiences’ association of a voice with a specific performer, albeit in a different context. Theater counts on a similar process: the accumulation of previous roles, termed ‘ghosting’ in theater studies (Carlson, 2002), activates the memory of the audience, ultimately enlarging the vocal allure of celebrities’ voices. Every performance is imbued with and haunted by ghosts of past roles. Phonographic theatricality exposes vocal utterances as fraught with multiple vocal identities.
As principles of phonographic theatricality, status-granting and ghosting also inform contemporary AI voices. For example, when Waze added the voice of Morgan Freeman to its arsenal, it imbued its app with all the roles associated with his Hollywood persona. Freeman’s vocalic ghost haunts and animates algorithmic navigation, reshaping everyday driving routines as theatrical. Aware of the challenges of voice-ghosting in her profession, Susan Bennett, the voice-actress who became famous as Apple’s ‘original’ virtual assistant, was concerned that this identification would limit her in other projects; eventually she reclaimed her voice-persona as ‘the first Siri’ (Ravitz, 2013). Another voice-status complication is manifested in Scarlett Johansson’s recent refusal to contribute her voice to OpenAI’s algorithm. Relying on the mechanism of ghosting, OpenAI hoped that their new voice feature would be possessed by the ghost of Samantha, the virtual assistant in Spike Jonze’s Her (2014), dubbed by Scarlett Johansson. When declined by the Hollywood star, the company fabricated a close-enough imitation of her distinct voice, but eventually dropped it to avoid social controversies and legal implications. AI voices are ventriloquistic and therefore theatrical to their core: they channel, imitate, and manipulate voice-segments of real people, famous or not, and they conjure their presence, activating theatrical ghosting. When AI-voice agents operate within a domestic space, they channel these voice-based cultural markers, imbuing everyday-life situations with theatricality. Accordingly, Alexa and other AI agents become actors in their users’ dramas, tinting their voice-functionality with theatricality.
Playing machines
The Phonograph Witness and The Phonograph exemplify two intersecting aspects of phonographic theatricality: one devoted to structuring an aura of truth, authenticity, and realism; the other dedicated to the joy of illusion, make-believe, and trickery. Both themes derive from particular human-machine relations in which the machine extends human aural faculties by transporting voices across spaces and times, compensating for the lack of physical presence by reproducing and delivering acousmatic voices. In both themes, the phonographic machine is an actor performing the theatrical tensions between imaginaries of veracity and the pleasures of artificiality and fabrication. Beginning-of-the-century theater mirrored and complemented these dialectics by shaping human actors as phonographs, embodying phonographic operations on stage. Phonographic theatricality emerged as human-machine dynamism in which humans act like machines and machines perform like humans.
George Bernard Shaw’s Pygmalion, including its subsequent adaptations, thematizes such human-machine continuum. Shaw’s play presents a modern interpretation of the myth of Pygmalion, which focuses on the identity remodeling of a lower-class girl, Eliza Doolittle, into a high-society lady. Professor Higgins, a phonetician, aims at reshaping her speech style and etiquette through tedious vocal coaching and repetitive pronunciation drills. After much rehearsing, she is sent to mingle at an upper-crust social event, in which she is supposed to perform her new identity. Eventually, Eliza rebels against her patron, and exercises her agency by leaving his custody and attempting to acquire independence.
Higgins’ home is replete with machines for capturing speech and practicing diction, metonymic to Eliza and Higgins, who enact phonographic operations on stage (Buckley, 2015). Higgins is obsessed with phonetically inscribing spoken lingo; his strict methods and lack of empathy shape him as a heartless machine. Eliza is required to rehearse her pronunciation drills and transform into an automaton that speaks like a lady. In this sense, Eliza’s self-emancipation is a revolt against Higgins’ techno-phono control. Both of them are ‘technologized bodies’ who transfigure phonographic technicalities into staged acts of recording, repeating, and replaying (Ibid., 23). Shaw’s characters are designed as an experiment with human-machine vocal entanglements, presenting a theatrical tension between parrot-like vocal imitation-repetition and phono-graphic ventriloquism.
Shaw dedicatedly sought a technique or technology for inscribing phonetic guidelines and dramatic instructions; Pygmalion demonstrates a phono-graphic logic, in which characters act like phonographs, serving his model for the ideal ‘author-actor relationship’ (ibid., 24). Higgins directs Eliza, prompts her speech, and manipulates her staged actions, but he is also Shaw’s fictional creation, giving voice to words Shaw puts in his mouth. The play thus portrays a Shaw-Higgins-Eliza marionette model, as depicted in the 1956 Broadway production album cover of My Fair Lady which presents Eliza as a puppet manipulated by Higgins, who is controlled in turn by Shaw (Image 6). This inter- and intra-diegetic sequence duplicates social and gender hierarchies, structured around re-voicing: Eliza reiterates Higgins’ pronunciation, whereas both of them are ventriloquizing Shaw’s stances. As two overlapping phonographic performative modes, imitation implies the repetition of expressions, whereas ventriloquism channels idiosyncratic vocal utterances. Centered around human-machine vocal contingencies, Pygmalion intersects the actor-author power relations inherent to the theatrical apparatus with phonographic operations, shifting between imitation and ventriloquism. Original Broadway Poster by Al Hirschfeld, 1956. Wikipedia.
Pygmalion’s phonographic theatricality focuses on Higgins’ phonic fixations in acculturating Eliza, replacing her ‘flawed’ habits with re-stylized speech. Upon Higgins’ thundering demand, ‘Say your alphabet!’ she recites: ‘Ahyee, bǝyee, cǝyee’, and he corrects her pronunciation (Shaw, 1957: 49). This learning process concentrates on dismantling her speech to its tiniest phonemes and reconstructing it as refined expressions. Eliza internalizes Higgins’ regulations by repeatedly uttering fragmented syllables and loudly overstressing consonants; this casts her as a phonograph playing a record and draws attention to the phonetics of her voice. While at first she stutters, after many rehearsals she habituates it as ‘natural’ (Buckley, 2015: 38). Like an actor rehearsing her lines, but also like a machine operating automatically and inertly, Eliza imitates these utterances until she eventually embeds the vocal costume of a duchess-like performance.
Eliza’s character manifests a key perspective on human-machine relations as based on imitation, which has become instrumental for AI theory, from Turing’s ‘imitation game’, through Joseph Weitzenbaum’s ELIZA effect, to contemporary considerations of AI algorithms as stochastic parrots (Bender et al., 2021; Li, 2024). Famously, Eliza’s automatic language conditioning inspired Weitzenbaum to name his chatbot ELIZA to flag its operative similarities with Shaw’s character. Like Eliza, ELIZA was programmed to mimic human conversationality by restructuring speech segments and reflecting them back at her human conversant, producing the effect of people projecting human characteristics on bots (Natale, 2021; Switzky, 2020). However, Eliza’s learning abilities render her more than merely a replaying phonograph; she accomplishes Higgins’ demands and evolves beyond imitation into an actor, gravitating between modes of phonic operations. Her ‘pre-programmed’ vocal performances include impulsive relapses which expose theatricality in phonographic operations: when annoyed or frustrated, she retreats to her gutter-speech, interjecting curses or cries of animosity: ‘Ah-ah-oh-ow-ow-ow-oo!’ (23). Cracking the lady mask and uncovering the flower-girl underneath, these lapses expose her mechanical operation while disclosing her masterful acting as shifting between different phonographic recordings. Eliza’s phonographic theatricality reflexively reveals itself, and human speech in general, as based on mechanical replaying and imitation. Pygmalion argues that imitation characterizes human behavior, speaking and acting alike, as much as it describes machine-as-human operations.
Conclusion
Phonographic theatricality transcends the interconnected dichotomies of truth-magic, realism-illusion and imitation-ventriloquism. These hyphenated categories have continuously informed sound media since the beginning of the twentieth century because they relied on relocating severed voices, proliferating from tensions between presence and absence, and obtaining an indexical link to their human sources. The imagined truth-value relied on this indexicality, whereas the illusionary trickery drew on re-embodiment of voices in mechanical devices to summon ghostly animations.
AI-voice applications further develop these aspects of theatricality, by ventriloquizing voices and imitating speech; they operate like theater actors that are trained to reflect human speech, while reusing actual human voice recordings. They channel voices severed from the bodies of voice-professionals, deconstructing their utterances to tiny phonic particles and then reconstructing them into plausible speech. These voice fragments are manipulated below the threshold of human perception only to appear as human as possible; their human-like voice performances depend on beyond-human language sensitivities. Therefore, they are much more than imitative parrots because they reuse the recorded human voice to ventriloquize new utterances, demonstrating a theatrical human-machine codependency and continuity. They both ventriloquize and imitate human conversation: whereas the materiality of AI voices is produced by rearranging actual human vocal fragments, their expressivity imitates human articulations.
For this reason, when Alexa laughed, it was perceived as a misplaced animation feature, an extra-linguistic expressivity gone rogue. This laughter was thus a mis-performance of a theatrical feature of its persona, a bug just like Eliza’s ‘boo-hooing’ and the resurfacing of her gutter-speech. Amazon quickly adjusted this bug to fit its pre-programmed script. Contemporary AI-voice applications, like NotebookLM or OpenAI’s Whisper, re-introduce similar extra-linguistic sounds, such as laughter, humming, ums and ahs, designed to make them sound more human. These iterations theatricalize human expressions, corresponding with prevalent communication conventions of what speaking, conversing, and listening sound like. AI agents’ phonographic theatricality re-introduces the ‘bug’ of extra-linguistic expressivity as a feature. This AI expressivity is rooted in the principles of phonographic theatricality that have emerged and evolved since the introduction of the phonograph to the theater stage.
