Abstract
This article traces the social, technological and legal factors that, from the 1960s onward, transformed karaoke into a televisual medium. The author shows how the incorporation of the television screen into karaoke performance reveals a cross-section of postwar Japanese anxieties surrounding gendered leisure practices, licensing and storage of emerging media formats, and the regulation of the body within urban space. She further argues that the resultant ambient aesthetics of karaoke background videos encode a particular historical moment in the 1980s, in which karaoke emerged as a leisure activity involving the visually orchestrated somatic reconstitution of ambient space under pressure within Japan’s increasingly mediatized urban environment. The article ultimately suggests that resituating karaoke within television and media studies allows for a more resonant understanding of it as an embodied practice in which a visually driven nostalgia for ambient media alters, if not outright displaces, the affective connection to the audio it accompanies. The televisually mediated space of karaoke performance, then, reframes the nostalgia typically associated with both television and popular audio as, instead, nostalgia for the way ambient media allows for renegotiations of the relationship between the body and its surroundings under the attendant pressures of late capitalism.
Introduction: melody, unchained
Title and artist name, rendered in brightly colored, blocky Romanized lettering and transliterated into grainy pink katakana syllabary directly above, overlay a grainy, tilted freeze-frame of two bikers, staring straight ahead through their sunglasses on their Harleys in the middle of traffic on an overpass in some unidentified city. The shot rests, silent for a couple seconds until a single MIDI chord and drum beat precipitate a cut to a zoomed-in bird’s eye view of a very 1980s-looking midtown Manhattan, centering on the Chrysler building, as a simple drum line provides a couple of preparatory measures of the appropriate tempo. Then, as the opening lyrics of the song appear on the screen – steadily morphing from white into blue as a guide in time with the music – the camera jumps back to the bikers’ gruff smiles, sun glinting off chrome. As words to the Righteous Brothers’ ‘Unchained Melody’ slowly unfurl in text across the screen for the next four minutes, we watch these two ride across the Triboro Bridge side by side, bodies moving freely in an overcrowded environment that is, seemingly at random, interpolated with lingering shots of billboards by the highway and views of the suspensions overhead. This montage resists any explicit narrative either relating to or independent of the song’s lyrics; any visual logic that emerges is subordinate to the melodic and rhythmic dictates of the MIDI backing track and its silent textual accompaniment. Cuts pulsate back and forth in time with the beat, zooms tend to track alongside crescendos, and the camera treats us to open shots of sky or unclogged stretches of highway when accommodating sustained notes. We reach no destination save the end of a loop – the song ends just as the bikers ride into the freeze frame of the title screen – and yet we, as singer, audience, or both, have been moved in some less linear fashion, within and beyond the bounds of the screen.
This is television in the world of karaoke: an ambient space of somatic play and soothingly non-specific nostalgia. Karaoke background videos, like the one described above, have long formed a central part of the mediatized karaoke experience in Japan, orchestrating a singer’s relationship to both the music and the space allocated for performed engagement with it. Audio, invariably, has taken the spotlight in any media-centric discussions of karaoke. Yet shifting focus to the televisual elements that have been part of karaoke’s technological apparatus for almost four decades now allows for a richer consideration of how ambient aesthetics cue nostalgic social practice. How does a visual archive simultaneously de– and re-historicize an auditory one? How does the television screen create a space within karaoke performance and social practice that remediates an individual’s engagement with and navigation of environmental pressures?
Karaoke background videos (BGVs) with their thematically cheesy, low production value aesthetics cannot be considered merely subordinate to the audio they accompany. It is tempting to read them as merely ambiently augmenting the karaoke performance space, as the visual amplification of the nostalgia inherent in a music catalog spanning decades. Yet this ‘ambient nostalgia’ does not fully account for the fluid way these videos continue to be deployed across boundaries of musical genre and era, exceeding even the clear visual markers that peg them to the era of their own production in the 1980s and early 1990s. Instead, what emerges from their ambient aesthetics is neither historical specificity nor timelessness, but rather the orchestration of a leisure activity – karaoke – that in its performance temporarily suspends the flow of time in late capitalism and its attendant pressures on the body and the self. The persistence of these background videos into the present day reveals a nostalgia, central to the karaoke experience, for the space this ambient media affords the individual to renegotiate their relationship to their environment.
Unboxing television: recapping discourse to date
The genesis of karaoke background videos in 1980s Japan both aligns with and potentially expands on what is largely Western-centric scholarship on television and video to date. Parallel to the discursive ambivalence surrounding the possibility and peril inherent in television’s invasion of the private sphere that Lynn Spigel traces in her foundational sociocultural study Make Room for Television (1992), the rise of the public–private karaoke performance space – with the television screen serving as an organizing function – reflects similar cultural and economic anxieties around the interpenetration of public and private spheres, shifting gender and generational dynamics around consumption and leisure, a renewed interest in theatricality and technology’s effect on attention and bodily health. Yet, karaoke television inhabits a space neither fully public nor private. Thus, it is outside the parameters of Spigel’s study. Nonetheless, karaoke opened a new, interstitial window on the tension between bodies and their environments, biorhythms within biopolitics – an even more complex theatricality than Spigel explores.
The interactive nature of the karaoke system set-up and the performative interaction it demands offer the opportunity to elaborate on Sheila Murphy’s (2011: 37) argument that ‘physical, tangible connections between television and new media technologies as demonstrated by systems that enable embodied virtual interactions with interactive video . . . complicate our understanding of media convergence.’ Karaoke background videos remediate both old and new audio, provide rhythmic cues choreographing a performer’s physical engagement with their sensory surroundings and serve as an important nexus for a shared aesthetic and affective experience between performer and audience.
The history of the technological and legal forces that shaped the classic karaoke BGV aesthetic also adds another dimension to Lucas Hilderbrand’s (2009) analysis of the aesthetics of analog video and bootlegs, which he ties to the affective qualities attached to the physical specificities of the tape medium as well as questions of copyright and intellectual property. The dated aesthetic of the Japanese BGV catalog itself and its continued use up to the present day enshrines it as an archive for a communal nostalgia central to karaoke as both social and personal practice of theatrically remediating space. Screened in a space that blurs the boundary between public and private, these background videos are hard to neatly categorize as either television or cinema. The television screen, displaying a remixed archive of pre-recorded texts, becomes an active object of non-contemplation. Thus television in light of karaoke becomes a medium that can function ambiently even when not situated in one’s periphery, an element that simultaneously orchestrates and opens up one’s emotional, embodied engagement with it.
Karaoke: seeing beyond the empty orchestra
Karaoke can also stand to learn from television. To the extent that it makes an appearance at all within the realms of English- or Japanese-language scholarship, karaoke has been primarily treated ethnographically as a means of identity performance or a case study in globalization versus localization (Brown, 2015; Mitsui, 1998; Xun and Tarocco, 2007). However, a serious examination of the social, technological, and legal factors that transformed karaoke into a televisual medium reveals a cross-section of postwar Japanese anxieties surrounding gendered leisure practices, licensing and storage of emerging media formats, and the regulation of the health of the individual body within overdetermined urban space – all compressed into a few minutes’ worth of ad hoc video.
The aesthetic conventions of karaoke BGVs encode a particular historical moment in which karaoke emerged as a leisure activity involving the repetitive, somatic renegotiation of ambient space, yet their persistence into the present day (within, but also beyond, a kitschy sort of afterlife) suggests an evolution to the theatricality of Japanese karaoke practice corresponding to its increased mediatization. A closer look at how increasingly digitalized archival systems stored and reshuffled older visual content, recombining it to match with an ever-expanding auditory catalog, contributes to a more resonant understanding of karaoke as a nostalgic practice in which a visually-driven nostalgia for a specific era of foregrounded ambience itself alters, if not outright displaces, the affective connection to the historicity of the audio. The introduction of the television screen and video into the realm of karaoke performance creates a temporally ambivalent space at once inherently of its time while also utterly outside of it.
Karaoke as the systematic recording of commercial accompaniment music for amateur public performance emerged in late 1960s Japan in the confluence of a longstanding nightlife culture of amateur singing and the proliferation of 8-track stereo technology. Prior to the appropriation of 8-tracks for voiceless accompaniment, bar owners relied on jukeboxes (legacies of the immediate postwar years of the US occupation) and home-recorded audio tapes (available on the market since the late 1950s) of music broadcast on radio to serve as backing tracks for the informal performances of their patrons. The disadvantages of these formats were numerous – the most obvious being their inherently limited repertoire, the distraction of the original vocals inextricably embedded in the recordings, and the lack of potential modulation in key and tempo to better facilitate amateur performance. Collectively, these disadvantages contributed to a strong preference for live piano accompaniment when available (Mitsui, 1998: 34–36). By the late 1960s, any number of independent prototypes combining microphones, stereo systems, and 8-track tapes of prerecorded audio had sprung up across the country, but it is generally musician and producer Daisuke Inoue who is credited with inventing, in 1972, what would become the prototype for the first mass-produced karaoke machine (Ugaya, 2008: 49–53, see Figure 1). Based in Kobe, a city in the Kansai region known for its musical nightlife invariably fueled by a historically strong yakuza mob presence, Inoue ostensibly had the idea to combine commercially prerecorded accompaniment, the 8-track loop cartridge medium, and juke-box-style selection format into a single system when asked by a client to prerecord individually-tailored accompaniments to a handful of popular songs on a reel-to-reel tape so that he could entertain his employees with his singing on a company-sponsored getaway. Intrigued by the commercial possibilities of such a request, Inoue and his associates designed the 8–Juke, a portable machine that mixed audio from a microphone input with accompaniment recorded on modified 8-track loop tapes originally intended for car stereos, while incorporating both a small amplifier and a coin-operated juke-box-style timer. Crucially, Inoue’s modifications shortened the track loop, enabling instantaneous song selection and thus allowing for significantly increased customer control over playback (Mitsui, 1998: 38). Initially, Inoue leased this small machine, along with an accompanying book of handwritten and photocopied lyric sheets, to local bars, but within a year entered into cooperation with manufacturer T&M to mass produce both the equipment and its accompanying audio tapes. Over the course of the 1970s, both larger electronics manufacturers – Clarion chief among them – and major record companies such as EMI and Polydor increasingly invested in the nascent karaoke industry, contributing to the widespread incorporation of the 8-track-based karaoke machine into the domestic bar scene, as well as a greater emphasis on the sound quality and standardized, accessible orchestration of the accompanying audio tracks (Mitsui, 1998: 39). However, these systems retained certain limitations – a repertoire confined to the increments permitted by 8-track storage capacity and a lack of any integrated visual component – throughout the rest of the decade.

The 8–juke system designed by Daisuke Inoue, c. 1972 (via The Appendix). Available at: https://theappendix.net/images/issues/1/4/large-inoue3.jpg
The 1980s saw major innovations in both visual and archival technology that radically redefined the performative possibilities of karaoke. In 1980, Tōei, primarily a film distributor, introduced a supplementary video component to existing karaoke systems that enabled the simultaneous display of song lyrics projected on a small television set or nearby screen, replacing the cumbersome books of printed lyrics that accompanied most machines (Brown, 2015: 33). However, it was with the advent of laser disc technology, largely disseminated starting in 1982 by electronics giant Pioneer, that both the visual and auditory capabilities of karaoke significantly expanded. Pioneer’s laser discs, and later Sony’s introduction of CD-based systems in 1984, not only allowed for a vastly expanded audio library compatible with each company’s distinct software system, but also increased user-driven manipulation of the audio track (especially key-shifting), remote control, and the firm integration of televisual components within a standard karaoke set-up (see Figure 2). In 1985, as Japan began to market systems for home use domestically as well as to export these and more commercial systems to other regions in Asia and across the Asian diaspora, the extended graphic capacities of the newly introduced CD-G format opened up the potential for companies to create accompanying visual content far beyond mere text, extending into the realm of genuine music videos.

Pioneer laser disc-based karaoke system with basic televisual capability, early 1980s (via Flickr).
The switch to cable-based systems (tsūshin, or ‘communication’ karaoke) in the 1990s (see Figure 3), spearheaded by Taitō and XING/JoySound in 1992, enabled the compilation and reorganization of vast quantities of extant audiovisual material (including the conversion of audio into a standard MIDI format to facilitate broadcasting and reduce copyright fees) and set the stage for increasingly digitized archives and higher-resolution interfaces that currently include karaoke-centric videogames as well as a variety of web- and mobile-based karaoke apps reliant on satellite as opposed to cable (Ugaya, 2008: 141–142). Overall, the main trends driving the development of karaoke technology over the past half-century are these: the enlargement of the song library instantaneously accessible through a single system, expanded user control over adjustments to prerecorded audio, the pairing of this audio with increasingly complex televisual accompaniments, and the digital modification, archiving, and remediation of audiovisual property.

Current cable-based commercial karaoke system with interactive digital remote (via WikiMedia Commons). Available at: https://commons.wikimedia.org/wiki/File:Hitokara_at_Karaoke-Box.jpg
Ain’t noise pollution: postwar leisure space and the rise of the karaoke box
It is meaningless to discuss this ostensible evolution in a vacuum without situating karaoke’s development within the sociocultural currents that shaped postwar leisure and performance space in Japan and the technologies that delimited it. Up through the early 1980s, karaoke remained a primarily homosocial practice comprised of largely middle-aged men (and the occasional bar hostess) belting out traditional ballad standards in boozy nightlife venues as both a professional and personal bonding activity (Ogawa, 1998: 46). However, starting in 1984, the strategic mass production and marketing of karaoke systems designed for the private home targeted the vast potential consumer base of mothers, largely excluded from the workforce, and children; the so-called karaoke boom of the mid-1980s has been directly traced to this expansion of karaoke performance into a family-friendly activity, even as its preexisting ties to bar culture remained undiminished (Xun and Tarocco, 2007: 35). Yet, due to the peculiarities of Japanese urban planning, both public and private karaoke would soon come under fire for the same reason – noise pollution.
Given the extremely tight quarters found in the bulk of Japanese residential construction, along with its notoriously poor soundproofing, noise pollution has long been an issue in urban neighborhoods. As legal scholar Mark West (2005: 90) discusses in an examination of how karaoke-centric disputes were formally recorded and resolved in urban neighborhoods in the mid-1980s, the proliferation of home systems corresponded to a marked spike in noise-related complaints. However, these complaints were not limited to home users – many were directed at commercial establishments as well. Japanese law, West highlights, ‘often results in notoriously messy mixed land use . . . zoning regulation is extremely lenient, overlay zones are common, and much land, even in cities, is unzoned’ (pp. 96–97). Thus a partial solution was conceived: a well-insulated space, situated at a slight remove from residential neighborhoods, where the commercial focus would be on karaoke as opposed to food and alcohol.
The first of these ersatz ‘boxes’ (see Figure 4) sprang up around 1984 in the countryside of Okayama prefecture, when an entrepreneur
transformed a freight car into a karaoke facility by insulating it, installing a machine, and providing a few basic furnishings. The converted freight car, which sat in the middle of a rice paddy, proved extremely popular . . .[especially with] teenagers, older people, and others who would not have visited a bar to sing. (West, 2005: 94)

An early container-style karaoke box (via Ameba user yamanogogo). Available at: https://ameblo.jpyamanogogo/image-12453734037-12875341834.html
Despite early ‘box’ karaoke’s liminal location on the fringe of the urban environment, the emblematic figure of the cargo container enmeshed it with larger processes of globalization and late capitalism, though shifts in leisure activity engendered by these same processes would slowly reinstall karaoke within the geographic center of Japanese daily life. Gradually, the karaoke-box-as-leisure-establishment migrated from the suburban periphery back into urban centers where, as alternatives to bars they became, and remain, a ‘social space for female office workers, housewives, students, and families . . . combining the intimacy of a private living room a small space to be enjoyed with friends and family, with the vibrancy of a public venue’ (Xun and Tarocco, 2007: 35). These enterprises now serve as nodes for multiple industries – electronics companies leasing the system hardware and software, existing entertainment or commercial centers seeking to expand the range of leisure activities offered, even catering companies facilitating light food and beverage service (Noguchi, 2005: 116). Yet, despite the addition of other amenities, the social and commercial focus remains on the provision of performative space for singing. Customers pay primarily not for food and drink (as they would in a bar), nor by song (as in earlier venues offering karaoke); instead, they reserve and pay for quantities of time within a given room. The individual ‘boxes’, or rooms, themselves have increasingly diversified to accommodate a wide variety of social configurations – a single facility can situate a large party within a spacious, themed room replete with a mini stage while also offering a solo customer the most basic of options – the economical dimensions of the no-frills, single-seater wankara (‘one–karaoke’) box. However, certain spatially organizing elements remain universal, no matter in which box one might find oneself. The walls are always lined with sofa-style seating, with some low, flat coffee table-like surface (in the center or to the side) intended for microphones, remote controls, menus, or any refreshment you care to order (see Figures 5 and 6). These two essential furnishings already begin to sketch the intimacy of domestic space beyond the confines of the home. But, above all, in the center of one’s field of vision, invariably placed on the wall opposite all seating arrangements and dominating any other decoration the room affords, sits the glowing television screen, as much a centerpiece of the postwar living room in Japan (as in the US) and the primary feature of any karaoke box.

Contemporary rooms in a karaoke box showing the television screen as focal point and organizing principle of the space (via Flickr).
Taking space to make space: the genesis of karaoke background videos
The final piece in the puzzle of karaoke’s strange televisuality is the history of the background videos themselves. The switch to CD-Gs and their expanded graphic capacities as the preferred storage format around 1985 roughly coincides with the initial marketing of karaoke systems designed for private use and the subsequent rise of the karaoke box as semi-public entertainment space. The potential for the screen to add an enticing visual richness – both for singer and audience – to the atmosphere of these performance environments was not lost on Pioneer and other manufacturers, but the opportunity also presented a few major challenges.
First, audience demographics and the technological limitations of earlier generations of karaoke had directed the focus of most companies’ audio libraries to older hits, albeit standards from both Japanese and Western artists. While MTV’s establishment in 1981 had solidified the music video as an integral artistic component and marketing tool for popular music on a domestic and global scale, it also raised yet another tier of licensing concerns for a karaoke industry already struggling to adjust to increased copyright fees collected by JASRAC (Japan Society for the Rights of Authors, Composers, and Publishers) around karaoke versions of new hits, added by popular demand to the CD-circulating audio library with increasingly short turnaround times (Brown, 2015: 21–22). Paying more for potentially exorbitant copyrights to the additional visual content of the hottest music videos did not seem like the most cost-effective option for quickly amassing large quantities of visual content. Furthermore, the vast majority of the existing audio catalog of golden oldies and cultural touchstones had never possessed corresponding visual content to begin with.
While access to archival materials in Japan is limited and thus what follows is only a partial view of the solution that invariably ignores nuances in domestic production, what appears to have happened is as follows: Japanese companies such as Pioneer would hire directors in between other projects in film, television, and commercials to produce batches of economically-shot background videos. US directors were in demand for videos geared toward largely foreign hits – each one filmed within a couple days and costing around $4,200 – that imitated traditional music video aesthetics without incurring any copyright entanglements or referencing any other performance versions of the song being shot (Raftery, 2008: 155). Directors were given vast artistic license in what they shot so long as it avoided nudity or violence, which enabled many to experiment with filmic techniques or shoot multiple videos on locations that simply happened to fall geographically between other jobs. The tight budget and schedule for most shots had an effect of deprioritizing narrative cohesiveness in favor of atmospheric experimentation, though not universally so, and directors were almost never credited explicitly for their work.
A similar archive seems to have evolved domestically for Japanese-language songs, both contemporary hits and oldies, shaped by similar production methods and concerns, and it is important to note that, while domestically produced videos are occasionally paired with foreign songs, likely because of greater availability, domestic hits tend to remain resolutely Japanese-coded in the televisual space that accompanies them. Japanese directors were typically tasked with videos that paired domestic oldies with sentimentalized natural landscapes and contemporary Japanese-language hits with the industrialized cityscapes of Japan’s bubble economy. For the latter half of the 1980s and the early 1990s, each song in a company’s audio library would receive its own corresponding background video, but the conversion to cable-based systems connecting individual machines by phone line to a much larger, digitized repository of audiovisual content enabled companies to create new taxonomies within their stock of BGVs. Rather than a one-to-one correspondence between unique audio and visual content, existing BGVs could be categorized according to ‘mood’ – sad, romantic, playful, energetic, and so on – and then digitally recombined with any songs in the audio catalog sorted into the corresponding category, thus eliminating the need for, and expense of, the generation of new visual content. Thus the ambient aesthetics of these BGVs are in part the product of thrift on the part of these karaoke companies, and the recycling of visual content became, in turn, the recycling or recombining of mood or affect as well. The televisual realm of Japanese karaoke remains, in some sense, aesthetically bound to this very historically specific era of its production. Yet, a formal analysis of BGVs across different ‘mood’ genres or musical styles exposes a shared nostalgic visual sensibility that goes beyond kitsch and foregrounds a visceral engagement with ambient space.
Navigating BGV aesthetics: from oldies to top 40s
The JoySound system’s background video for ‘Ai Sansan’ (愛燦燦, ‘Drenched in Love’, 1986), a late-in-life melancholy ballad hit for Misora Hibari – roughly, Japan’s Judy Garland – opens on a recycled stock image featuring two glasses of champagne with a chilled bottle resting nearby. However, once the MIDI melody begins to flow, we cut to the quivering motions of a parakeet trembling about a cage, only to pull focus to view the exquisitely made-up face of an elegant woman tearfully gazing at it. We next see the same woman in full profile, sitting sorrowfully, face upturned into in a stream of light pouring through the open space of an empty dance studio (see Figure 7). The briefest of plots is hinted at through the interpolation of an under 10-second wedding scene – perhaps a divorce hangs over the rest of the sequence? – but the majority of the screen time is devoted to the same woman in different outfits traversing empty spaces all alone. Late at night she solitarily wanders along a beach in a stylish green suit, then mopes under a wilting cherry blossom tree in a white dress. Suddenly, the camera cuts to an extended sequence with her driving a powerboat in the middle of a deserted lake wearing an unreadable expression, then to a shot of her in a red dress running down an empty street toward the camera. Finally, the camera cuts away to a recapitulation of the first scene with the parakeet, closing on a final lingering image of her tearstained cheeks. Given the close rhythmic tracking of shots with the accompanying audio, as well as the song’s mid-1980s release date, this seems to be an instance of a video shot expressly for the song it accompanies. If so, it is perhaps not too much of a stretch of the imagination to suppose a deliberate choice in how these visuals, rather than pursuing a strong narrative arc, instead choose to ambiently register the tone of the melody and its lyrics – about smiling through life’s myriad disappointments – through repeated reconsiderations of the female body’s affective response to, and rhythmic movement through, both manmade and natural surroundings. Similar concerns inform JoySound’s BGV for the Carpenters’ ‘Yesterday Once More’ (1973), another central nostalgic touchstone in the Japanese cultural imagination. The visuals interpolate imagery of ecosystems and wildlife vaguely reminiscent of the coastal south US with tracking shots of a slightly smiling young brunette riding a bicycle through a generic-looking town (see Figure 8), followed by close-ups of her lounging on a dock while gazing wistfully out across a marina. There is even less plot to be found than in ‘Ai Sansan’, possibly because, given the markedly looser rhythmic correspondence between video and audio, there is a high probability that this particular video was drawn from a stock ‘mood’ category to backfill gaps in JoySound’s audio library (of which the Carpenters’ catalog surely represented a substantial one). Yet, despite these non-negligible differences, both videos share a marked interest in how individual bodies in motion rhythmically conduct and reorganize the space surrounding them.

The opening sequence of Misora Hibari’s ‘Ai Sansan’ (JoySound, via YouTube). Available at: https://www.youtube.com/watch?v=dFEno3Y40tg&t=168s

Bicycling shot from the Carpenters’ ‘Yesterday Once More’ (JoySound, via YouTube). Available at: https://www.youtube.com/watch?v=f9dGuNZPri8
Some background videos for more recent hits, which seem to have been chosen from pre-sorted ‘mood’ pools, increasingly center on a singular organizing human figure as a visual and emotive guide through the scenes they present, opting for a sequence of exotic natural or urban environments that may have only the most tenuous thematic links to the songs with which they are paired. JoySound’s BGV for the Backstreet Boys’ plaintive ballad, ‘I Want It That Way’ (1999, see Figure 9), for example, features a continuous sequence of rolling tropical surf scenes, with hardly a human to be found. On the opposite end of the spectrum, the upbeat pop of Carly Rae Jepsen’s ‘Call Me Maybe’ (2012, see Figure 10) plays along to a touristy montage of New Orleans that eschews reference to the city’s rich music history in favor of scenes of Bourbon Street and the waterfront, reflecting globalized industry over local specificity. While these more recent televisual supplements stray a bit from the conventions of their predecessors, perhaps digging deep into B- roll leftover from the heyday of BGV production, they, too, remain largely divorced from the historical and cultural identity of the audio, concentrating instead on matching the emotional ‘mood’ of the song with the rhythmic cues and pacing of a given visual environment. The primary object of these videos seems to be to impart a visceral sense of movement through ambient space to both performers and audiences looking at the television screen of the karaoke system, particularly one housed within the orderly confines of a karaoke box that, by design, forecloses the peripheral distraction of other stimulation.

The surf of Backstreet Boys’ ‘I Want It That Way’ (JoySound, via YouTube). Available at: https://www.youtube.com/watch?v=Z8Vff5pc8U4

New Orleans in Carly Rae Jepsen’s ‘Call Me Maybe’ (JoySound, via YouTube). Available at: https://www.youtube.com/watch?v=u0MCI61eAYs&t=78s
Television, ambient media, and technologies of health
While karaoke as a highly theatrical mode of emotive performance facilitated by an ambient televisual medium might be a novel theorization, the notion of television itself as an ambient medium in a public setting is not. In Ambient Television: Visual Culture and Public Space (2001), Anna McCarthy examines how the presence of television monitors in public spaces in postwar US directed, delineated, and reflected the individual’s spectatorship, routine, and location within the institutional pressures of environments outside the home. Many of her arguments about how a pervasive, public televisual presence continuously regulates and inscribes the ‘modern mobile subject [within a] diffuse network of gazes and institutions, subjects and bodies, screens and physical structures . . . entwined domains of contest, control, and consumption’ (p. 3) apply as equally to postwar Japan, where screens were actively integrated into urban spaces beginning in the 1970s, moving shoppers through commercial centers and directing commuters along public transit. Even at their inception, karaoke BGVs were by no means the only ambient televisual elements in Japanese daily life that orchestrated a somatic engagement with a built environment. They do, however, have much in common with a host of content generated in the self-care boom of 1980s Japan, in which, through new environmental media technologies such as the Walkman, individuals ‘sought to manage their own somatic potentials in relation to an intensifying set of biopolitical pressures and demands’ within the urban environment (Roquet, 2016: 14).
In his book on the subject, Paul Roquet draws upon Foucauldian biopolitics, the idea of subjectivation in particular, to argue that, rather than mere tools for social pacification, media designed for mood regulation exist in a much more ambivalent relation to the individual. Ambient media, according to Roquet, shape social behavior while concurrently creating a space of greater attunement within which to rethink and re-feel the self’s relationship to its increasingly regulated environment – a state he calls ‘ambient subjectivation’ and defines thus:
Ambient subjectivation in this urban context consists of attunement to both the larger flows of the metropolis and the smaller flows moving within, without, above, and below the body. This is where ambient media comes in. By recasting urban rhythms in ambient forms more focused on biorhythmic attunement, ambient media serve as a training in how to sense and sway to the syncopated cycles emerging with every new mixture of people and place. Ambient media bring urban rhythms into circulation with aesthetic materials and through this blending locate points of attunement between them. To find a way to dance here amid the crowds is to resist falling back into the isorhythm of a sedimented social identity (of the state, of the company, of the family) and to resist tripping forward into the isorhythm of the isolated body, a form of social withdrawal radically severed from other cycles of life. (p. 82)
This notion of attunement resonates with the rhythmic ambience of karaoke’s televisual aesthetics and how it mediates the embodied theatricality of karaoke performance, itself a leisure practice deeply enmeshed in discourses around mood regulation and bodily health. Singing as a form of fitness has roots as far back as the inclusion of music performance in the compulsory prewar educational curriculum, and the physical benefits of consistent karaoke ‘practice’ continue to be regularly included in books on general exercise and diet. One leading popular Japanese wellness expert claims that regular karaoke singing helps maintain memory function and stimulates proper, relaxed breathing technique – essential to both physical longevity and the ability to regulate one’s mental health (Maekawa, 2009: 197–199). The immersive yet bounded, repeatable and interchangeable ambient environments of karaoke BGVs offer the singer, and to a lesser extent other individuals within the performance space, the opportunity to rehearse an embodied negotiation of otherwise overwhelming biopolitical pressures that pervade public space.
Scripting ambient space
Televisual karaoke functions as an almost perfect inverse of the Sony Walkman, a contemporaneous environmental media technology that debuted to great fanfare in the spring of 1980. The portable nature of the audio player offered users – through a mass-produced commercial device – the personal autonomy of being able to soundtrack their daily routine, regulate their mood, and block out or manage overwhelming sensory stimuli intrinsic to public spaces shaped by postwar consumerism. It reduced, in other words, the external environment down to a manageable volume that would not threaten to drown out an individual’s own somatic rhythms. The experience of the karaoke box, by contrast, is the equivalent of shrinking down to miniature size, clambering inside the cassette deck, pulling the door shut behind you and then peering out through the plastic window onto the world outside. It is environmental media made performative rather than portable. Within the safe, streamlined intimacy of the standard karaoke room, singers can attune themselves to the somatic effects and affective qualities of both the sonic parameters of the backing track as well as the visual rhythms of diverse environments offered up within the manageable window afforded by the physical boundaries and limitations of the space of the television screen.
Of course, the Walkman and the set-up of the karaoke box differ in that the latter facilitates a solitary listening experience whereas the latter typically demands a social performance, even as the emergence of the karaoke box enabled the rise of solo karaoke as another popular recreational activity. But the inherently theatrical nature of karaoke demands that the performer actively interpret, rather than merely passively follow, audiovisual cues, and thus, similar to the Walkman, it becomes another exercise of personal autonomy facilitated by the regulatory indeterminacy of the ambient medium. One can emote without attachment to a specific narrative. One’s emotiveness, instead, becomes attached to the performance space of the karaoke box itself and its televisually guided ambience. The socialization of this active engagement with the ambient environmental aesthetics of the karaoke box, in turn, helps foster the cultural nostalgia attached to the BGVs themselves. At once deeply aesthetically bound to the era of their production, these background videos, endlessly recombined for songs of different eras as well as different audiences with this karaoke box itself, also offer visceral access to an ambient non-specificity of space and time that is both individual and shared.
This, finally, seems to explain classic karaoke BGVs’ thematic preoccupation with the meticulously-attended movement of human bodies through vastly under- or overdetermined space – a mirror of the amateur singer’s own remediation and orchestration of mass-produced audiovisual content through performative engagement. Consider JoySound’s video for ‘Tokyo Waltz’ (see Figures 11, 12 and 13), a melancholy 1984 pop hit for established singer and actress Yuki Saori. Montages of a pseudo heroine (as much a character as the genre will admit) navigating busy urban scenes dissolve into close ups of her facial expressions, the slowly softening lines of her posture and profile reframing the lingering visual structures and rhythms of the metropolitan street. In an entirely different genre, the video for ‘Linda Linda’ (1987), signature song of seminal punk band the Blue Hearts and universally acknowledged karaoke staple, features a young couple spinning vertiginously around in open, empty plazas encircled by skyscrapers, the visceral joy of their sensory disorientation reanimating their otherwise palpably sterile surroundings. It is precisely these ambient aesthetics that have become the televisual contribution to the nostalgic aspect of karaoke performance.

The body remixing urban space in ‘Tokyo Waltz’ (JoySound, via YouTube). Available at: https://www.youtube.com/watch?v=h76BcJoFgul

Spinning around in ‘Linda Linda’ (JoySound, via YouTube). Available at: https://www.youtube.com/watch?v=hC4Lf65RPTs
From ambient nostalgia to nostalgia for ambience
From the very beginning, back with the tipsy salarymen and 8–Juke in the bars of Kobe, karaoke’s auditory nostalgia has been bound up in the specific historical and cultural moment of a given song’s creation and reception that could be affectively accessed through individual or collective performance. The ambient aesthetics of BGVs, on the other hand, defy attempts to pin them down to any specific, symbolically meaningful time or location, generating instead a sense of what is not so much timelessness, per se, as it is the momentary suspension of the imposition of time upon the self – a window into the space of ambient subjectivation. Recentering this mediatization of the television screen within our larger understanding of karaoke as both social activity as well as multimedia format allows us to rethink both our conceptions of ambient media as ambience becomes increasingly portable as well as the terms of the nostalgia associated with karaoke practice.
The ambience of karaoke BGVs complicates the definition of ambient media as offering ‘imaginary sensory landscapes to filter, unify, and stabilize existing environments . . . absolute background [that] provides people with a ready-made framework for grounding their sense of ontological security’ (Roquet, 2016: 51–52). While the non-specific, interchangeable spaces of these videos encourage the karaoke performer’s renegotiation of the relationship between self and environment, the performer’s engagement is necessarily less oblique than it would be with most other ambient media. Through the design of the karaoke box, with the television screen as organizing principle and performance aid (featuring lyrics and musical direction), the ambience of these background videos is, in fact, foregrounded for the performer.
All of this has important implications for the auditory experience of karaoke. On one hand, the cheesy 1980s aesthetics of these videos re-historicize the era associated with any given song, older or more contemporary. On the other, the recyclability of these videos’ affect and deliberate non-specificity in regards to space, time, and narrative vies with the specificity of the nostalgia associated with the audio, remediating the relationship of the music to the performer and their audience. As the ambient audiovisual moves in the Walkman-like direction of ever greater portability – the screens we carry about our person, the increasing compactness and connectivity of the audio in our ears – the almost anachronistic persistence of the karaoke box and its BGVs raises the question of what remains so compelling about its particular ambience. Our typical engagement with ambient media is solitary, passive, and now instrumentalized, coopted by neoliberal logics of productivity that constrain our engagement to the principle of utility. The ambience of karaoke BGVs, however, contained within the recreational space of the karaoke box, requires active engagement through performance and is, more often than not, communally experienced. Karaoke’s televisuality, in other words, provides a more visceral and complete framework of ambience than the vast majority of other popular forms of ambient media today.
Reconsidering how ambience is televisually foregrounded within karaoke aesthetics, in turn, reframes how we think about the nostalgia intrinsic to karaoke’s social practice. If the nostalgia of the historical specificity of the audio is remediated by the ambient non-specificity of the visuals of BGVs, then it stands to reason that this ambient non-specificity might itself be an object of nostalgia. It is possible to construe any visually-focused karaoke nostalgia as directed toward the specifically 1980s aesthetics of these videos. Yet, in accepting this historical specificity, one can also argue that the socio-economic conditions of Japan’s 1980s were themselves intrinsic to not only the production conditions that shaped karaoke’s ambient visual aesthetics, but also the pressures of late capitalism that made ambient media so popular as a means of engaging with and navigating environmental pressures. In other words, in a karaoke context, nostalgia for the 1980s also means nostalgia for the foregrounded experience of ambient media it engendered.
If nostalgia is typically construed as longing for a defined past moment, then karaoke’s televisuality paradoxically fulfills the longing for a specific historical moment that popularized the soothing, dehistoricized non-specificity of the audiovisual ambient aesthetic. In other words, insofar as they survive within the framework of an increasingly digitized, screen-mediated form of leisure performance, karaoke BGVs do not so much invoke an ambient nostalgia for their own era of production, but rather provoke a nostalgia for the notion of ambience itself. BGVs’ ongoing remediation of the historical specificity of audio within karaoke provides space within karaoke performance that allows for renegotiations of the relationship between the body and its surroundings, as well as a nostalgic engagement with the ambient aesthetics that these BGVs continue to lend to karaoke social practice. That is to say, what moves us to return to the karaoke box again and again is not merely music we miss, but also the desire to be moved by – and beyond – the television screen.
Footnotes
Address: Department of East Asian Languages and Cultures, Green Hall 236, Wellesley College, 106 Central Street, Wellesley, MA 02481, USA. [email:
