Abstract
If music is so varied, how do we understand it? Is there anything universal about it? And if not, can there be a cognitive science of it? Radically limiting examples so they fit certain frameworks but then calling everything else an exception is not helpful. We propose a redefinition of music that is based not on specific features but rather as creative experimentation with what we term “virtual universals.” These are universals that exert force even when they are not actualized or sounded. Our argument has applicability beyond the domain of music; in principle, the ideas in this paper could be applied to any domain of human behavior.
Introduction
If music is a human universal, then an intriguing possibility is raised: music might have characteristics that are common across all cultures, past and present. Although this idea has been around for a long time (e.g., Darwin, 1871), a spate of recent studies tested it by compiling large, cross-cultural music datasets and asking if there are any statistics that show common features among them. Here are a few examples. One study found that, while there are no absolute universals, there are many features across the world’s music that the authors dubbed “statistical universals” (those for which there are some exceptions). These included the use of discrete pitches (as in a musical scale) and a beat (a regular periodic pulse), and that performances occurred in groups (Savage et al., 2015). Another study built and analyzed both an ethnography and discography of some of the world’s music and showed that there are common contexts associated with music (e.g., infant care, love, dance) that can be predicted by acoustic features such as tempo and pitch ranges (Mehr et al., 2019). Yet another large-corpus study linked music perception to language experience, suggesting that speakers of tonal languages (like Chinese) were better at discriminating melodies versus beats (Liu et al., 2023). The findings from these big data studies are supported by theoretical work arguing for the presence of universal design features in music (Haiduk & Fitch, 2022; Patel, 2023; Tomlinson, 2015). Universality means these design features are shared by humans everywhere and that the observed uniformity is based on some species-wide, biologically based characteristic.
Conversely, it is well known that music can take on extremely diverse forms. It is for this reason that ethnomusicologists are typically resistant to musical and cognitive universals (for some important discussions, see Blacking, 1995; Bohlman, 2000; Gourlay, 1984; Titon & Slobin, 2009; Wong, 2014). And their resistance is not without merits (Meyer, 1960). Too often, proposed universals run roughshod over cultural difference, which is why most recent ethnomusicological studies are highly localized and largely jettison generalization (e.g., Chávez, 2017; Mahon, 2020; Meintjes, 2017; Wong, 2019). For instance, despite the size of the databases employed, the cross-cultural studies cited above still draw on a limited sample of the world’s cultures (Patel, 2023). There are around 7,000 extant languages indicating roughly as many distinguishable cultures—an order of magnitude more than the samples of music used in any of the studies claiming musical universals. Moreover, these studies depend upon a circular argument: they derive universal music traits from a set of recordings that are already categorized as music. If these recordings are recognizable as music, that only begs the question of who is listening and recognizing the recordings as such. Another way of saying this is that researchers (in general, not just those that focus on music) often formulate questions and criteria based on the tools and datasets at hand, rather than interrogating the assumptions behind those tools and datasets.
In summary, there exist two divergent tendencies in music research: one searches for universal characteristics, while the other emphasizes diversity and pluralization. The same is true of many aspects of human behavior, including behaviors related to kinship, value, and so on. Universality vs. plurality is a central conflict in 21st-century intellectual life.
Here, we offer less a rapprochement between disciplinary approaches than a synthesis through redefinition. We argue that synthesizing contradictory commitments (universalizing vs. pluralizing) requires understanding our objects of study differently. We suggest replacing definitions of music based on identifiable traits with a view that every instance of music is a particular dynamic and creative process. Indeed, it is the creative process that is universal. To account for the dialectic of universality and plurality, we propose the concept of the virtual universal, which we believe is necessary to move beyond the current intellectual deadlock.
By “virtual,” we do not mean anything to do with virtual reality or the digital realm. We use the term, rather, in the more technical sense to denote those elements in a system which are not actual (i.e., they cannot be heard directly in the case of music), but which nonetheless exert real effects (see DeLanda, 2002; Deleuze, 1994; Grosz, 2020). In other words, commonalities across music are not identifiable as a list of traits, but rather as points within a psychoacoustical system that hold significance whether or not they are literally present. But we need to lay out the intellectual paradigm more clearly before elaborating this concept.
Exceptions and Synthesis: A Fresh Look
Evolutionary and cognitive research analyzes human expression in broad categories such as music and speech/language and then compares and contrasts them (Albouy et al., 2020; Albouy et al., 2024; Patel, 2010). In this section, we present the dominant view, with which we only partly agree. Generally speaking, two main distinctions between music and language are noted in the literature:
Language is comprised primarily of phonemes (units that can be combined in different combinations to form words), whereas music is comprised of pitches (musical “notes” used to form melodies). Phonemes are discerned primarily through timbre, that is, the spectral arrangement and spectral envelop of a sound (e.g., the contrasting vowel sounds in “me,” “may,” and “my,” or the contrasting consonants in “me,” “we,” and “see”).
1
By contrast, pitches are usually perceived independently from timbre (e.g., middle C is the same note whether played on the piano or the oboe). Music has a clear rhythm and an underlying pulse (or “beat”), whereas languages are spoken in a more fluid and less periodic way. People dance together to the sounds of West African drumming, march in time to military drums, co-ordinate their work in fields through singing and swaying, and tap out the rhythmic cycles (tala) of Indian classical music. Language can be used in many ways, but in non-musical contexts it seldom has a melody or periodic beat.
Violin concerti, Andean panpipe music, Indonesian gamelan, and Zimbabwean mbira music all fit this definition of music, since they are made up notes that unfold through melodies that have rhythms and a periodic pulse. Public speeches in English, whispered secrets in Zulu, giving directions in German, and asking whether a friend has eaten in Korean all fit the definition of language, since they are made up of phonemes that are combined to make words to create meaningful phrases, sentences, questions, and so on.
While recognizing exceptions and in-between cases, most researchers are satisfied with broad generalizations such as these. After all, definitional traits are rarely completely comprehensive, and to focus on exceptions and in-between cases rather than on common features would cast doubt on the scientific enterprise in toto.
The question that concerns us, however, is this: What if there are so many exceptions that they overwhelm the generalizations? How many exceptions are researchers willing to ignore?
Consider the following:
- Tone languages (such as Chinese and Ewe, a language spoken by millions in West Africa) use pitch as well as timbre to create syllables. They are spoken by as much as one quarter of the world’s population. It is no coincidence that tone languages have been neglected in music cognition (for a recent exception, see Liu et al., 2023) considering the globalization of English (Arac, 2002), which is not a tone language, and its hegemonic status as the language of contemporary science. - In West Africa, drums are frequently used to say quite specific things (Ong, 1977). For example, based on the fact that Ewe is a tone language, Ewe speakers use “talking” drums in ways that are every bit as linguistic as they are musical. - Hip-hop (or rap) is one of the most popular global genres of music today, with millions of daily listeners. The vocal delivery of rap is dependent just as much—and perhaps more—on non-melodic aspects of language (rhyming, flow, prosody, meaning, etc.) than on manipulating pitch to create melodies (Krims, 2000). - Many genres of traditional and popular music deploy non-pitched sound (or “noise”) for musical effect (Fales & McAdams, 1994). There is also an entire genre of music (pioneered in Japan and the United States) paradoxically called “noise music” (Novak, 2013). - In parts of Central Asia (such as Tuva), singers use a technique called “throat singing” or “overtone singing” (Levin & Edgerton, 1999). The practice has been observed elsewhere in the world, too (Dargie, 1991; Nattiez, 1999). With this technique, singers isolate individual partials from a harmonic complex tone, and in so doing break down (or at least manipulate) the distinction between “pitch” and “timbre”. - Not all music has a discernable beat or pulse. Indeed, there are exceptions galore, from the drones and unmetered sections of Hindustani classical music to the slow glissandi of some contemporary Western music (e.g., work from the composer Catherine Lamb). And there may even be exceptions within this exception: at least one scholar of Indian music has observed that sections typically assumed to be “unmetered” may, in fact, unfold over a very slow and subtle pulse. This pulse is “not invariable,” however, and it is discernible to a listener only with direct instruction from the performer (Widdess, 1994, p. 68).
2
Again, we ask: Can so many exceptions, encompassing the behavior of millions of humans, really be elided? How useful is a definition if so many kinds of music do not fit within its scope?
There is another way. Rather than pushing the many objections to the side and relegating them to the benign status of mere exceptions, we propose a synthesis of the generalizing and pluralizing impulses. Our aim is to take the exceptions seriously without devolving into culturally-specific pluralism.
Virtual Universals
As it is typically understood, musical universals are those features of music that are thought to be common across humanity, that are independent of culture. One way or another, and at least since the 19th century, many researchers have staked a claim on their existence and on which features may count as “universal” (Savage, 2019). Indeed, a typology of categories of musical universals has been put forth, with a list of 70 features that are potential candidates (Brown & Jordania, 2013). To put this list of features in a quantitative framework that could test these potential candidates, Savage et al. (Savage et al., 2015) developed statistical criteria upon which musical features could count as “universal,” the most important of which is that the feature is present “on average” in music sampled across the globe as well as represented in nine specific geographic regions. We do not really have a quarrel with this approach. It has a specific goal. We are, instead, proposing a redefinition of music that takes the focus off features and traits. The concept of virtual universals is not a new typology or an alternative to statistical universals. It is different way of looking at music altogether.
Instead of defining music as the organization of pitch (rather than timbre) and of periodic temporality (rather than fluid temporality), we understand it as a creative experimentation. Musical creativity, in this definition, is not simply the deployment of discrete pitch, but rather the play of pitch and non-pitched sound. For example, it can be particularly compelling when performers deliberately do not use varied pitch when one might expect it.
To be sure, this reconceptualization takes us in a radically new direction, threatening, even, to explode the very possibility of a definition of music. This, we believe, is a risk worth taking. We hold no special commitment to music as an ontological category, and we fully recognize that in many cultures sonic expression exists side-by-side with dance, ritual, and other practices (e.g., Jankowsky, 2022; Stone, 2010; for critiques of associating music directly sound, see Barrett, 2016; Leach, 2007).
What we want to suggest, in fact, is that human behaviors can be thought of as a more-or-less stable set of relations in a constellation of virtual universals. And those behaviors may intersect or overlap. It’s less important, then, to pinpoint a stable category called “music” than it is to understand how humans operate creatively within a cultural, cognitive, and biological landscape (for more about which, see below). The notion of virtual universals is a nifty way to shift in that direction.
Returning to some of the so-called exceptions we cited above: “Noise music” (e.g., Merzbow and Boredoms) is sensorially and affectively potent precisely because it jettisons harmonically pure pitches, and because it mixes pitches with non-periodic frequencies. African talking drums, for their part, deliberately mix rhythm and speech. Or consider the so-called “twelve-tone” music of the Second Viennese School (which includes the composers Schoenberg, Webern, and Berg). Rather than existing as an exception to the system of tonality (another oft-cited musical universal), this music can be understood as having an oppositional relationship to tonality, which some scholars have termed “anti-tonal” rather than the conventional “atonal” (Von Hippel & Huron, 2020).
Instead of viewing these as outliers or exceptions, we might understand them as the very stuff of music, and of human creativity itself.
The challenging case of tone languages is instructive; we therefore dwell on it for a moment longer. Music theorist Kofi Agawu (Agawu, 1995) acknowledges that the gap between language and music is “not ultimately eliminable.” But he finds something else more interesting. In his analysis of the Ewe language (which is tonal), Agawu focuses on scenarios where ambiguity or tension arise. Although the transition from language to music proper is breached only through “explicit, usually externalized meter,” the tendency of language to become music is heightened by the tonal richness of the Ewe language itself, and in this sense, Ewe is always “on its way to music.” For Agawu, the interesting thing about the relationship between music and language “is to be found in this dynamic and unstable condition.”
We do not deny that, in many scenarios, music is produced through the complex arrangement of pitches unfolding over a regular pulse. But sonic creative expression—which some cultures label as “music,” and which often exists alongside dance and other cultural practices—can also be a play with the very thresholds pitch, pulse, and other parameters. Our expanded definition places value on schismogenesis, on surprise, and on maximizing information in creativity. Any definition of music that aims for sufficient generality cannot, therefore, be satisfied with producing a list of universal traits purportedly hardwired into us by biology. What we want to emphasize in this paper, instead, is the creative act of toying with cognitive capacities.
Let us return to what Agawu identifies as the ineliminable gap between music and language. We can still hold onto notions such as discrete pitch and discrete time without, however, using them as straightforward definitions of what music (or language) is. This is why we are proposing the new concept of the virtual universal. The philosopher Gilles Deleuze initially developed the concept of the virtual to describe forces that are not “actual” (i.e., we cannot directly see or hear them), but which nonetheless have very real effects. The virtual does not lie ahead, in the future, like a vague possibility. Rather, it is a potent force that acts on the behavior of a system.
The virtual is every bit as real as the actual. From a complex systems perspective, we could state it like this: if the actual is a concrete reality (e.g., the state of a thing), the virtual is a non-actualized attractor in a dynamical system. This virtual attractor exerts a force on the state of a thing, thereby defining (at least in part) the thing’s behavior within a system of relations.
Levi Bryant helps elucidate the notion of the virtual through a simple example: linguistic structure (Bryant, 2006). Take the example of language. We say that [the structure of language] is the condition of [actual instances of speaking]. Without language there is no speech. Thus, when I say to a friend “please leave me alone,” this enunciation isn’t possible without a prior shared system of language. There is a condition upon which this enunciation depends and that condition is language. Yet where is language? [L]anguage qua language is something that can’t be heard (as it’s composed of pure differences or phonemes…), it can’t be seen, it can’t be touched, it can’t be found in any particular object in the way that we might discern a quality such as red belonging to a ball.
We propose that what researchers typically call musical universals (e.g., pitch, pulse), should be better understood as virtual. As we have suggested, virtual universals can be understood as attractors (or basins of attraction), since they are notches which “pull” elements in particular directions without necessarily pulling something all the way into a basin (DeLanda, 2013; Eubank & Farmer, 1997). They exert force even though they may not be directly audible in a performance or recording.
We’ve emphasized that calling these universals virtual does not imply that they are unreal. Nor are virtual universals mere theoretical abstractions. On the contrary, virtual universals exert very real pressures on the way that humans behave (DeLanda, 2013; Deleuze, 1994). Discrete pitch, for example, may indeed be a human universal, a product, for instance, of our vocal anatomy and auditory perceptual capabilities, but that does not mean that we should expect to find it in all forms of music. Instead, discrete pitch acts as a basin of attraction for music. What we call “music” denotes a range of creative processes by which humans use sound to either pull away from, get closer to, or otherwise play with that basin of attraction.
Virtual universals are not susceptible to forms of analysis that search for traits or features in the world’s music. In this sense, they compel a reorientation of what music cognition should strive for. Rather than looking for traits, the notion of virtual universals prioritizes the study of creative process, alongside the constraints that influence that process. Those constraints are cultural, cognitive, and biological.
Complexity
Although we do not have space here to elaborate on the ontological makeup on virtual universals, a few remarks are in order. Researchers are increasingly realizing the extent to which physical and biological processes are nonlinear. If we understand the human as a biocultural system, then the combination of context-dependent constraints provides coherence to disparate parts (Juarrero, 2023). The parts themselves then become affected by this coherence and we then have a different type of causation, often called “mereological.” Mereological causation explains why it is not wise to focus analysis on isolated features or traits. How does this type of causation apply to music?
Music is a creative practice of intensifying, contradicting, and playing among the attractors in a cultural, cognitive, and biological landscape. Let us unpack what we mean that (borrowing from Falandays & Smaldino, 2022). In every culture, there is a probability landscape that describes the likelihood of observing different musical variants; its shape is the result of the population of individuals within it, their patterns of social interactions, and ecological factors (e.g., sound transmission, availability of materials for instruments, etc.). The landscape has valleys—basins of attraction—that correspond to high-probability musical variants for that population. Individuals within the landscape are themselves a mini-landscape within the larger one: each is a probability function of a different behaviors given a range of inputs. These behaviors are a combination of sensation, motor control, memory, and other cognitive and biological processes that influence how an individual responds to sounds and generates new behaviors. On the biological side, our human biology is constrained by our hearing range (Masterton et al., 1969), what sounds we can produce with our voices (Ghazanfar & Rendall, 2008), what rhythms we can produce with our motor system (Jacoby et al., 2024; Kotz et al., 2018), and so on. Given a particular environment, individuals may be more or less likely to produce and perceive some sounds versus others.
In light of these landscapes, the creation of every musical variant is simultaneously the product of both. There is a mutual dependency. As individuals develop within and learn from a particular musical culture (and beyond), their unique cognitive landscapes may in turn produce new musical variants in the next generation. The culture is constrained by the individuals within it, and the individuals are constrained by the culture. The causation is virtual, a product of the mutual interaction (as opposed to efficient causation like a billiard ball hitting another) (Juarrero, 2023). Pitch (the fusing of harmonic spectra by a complex auditory system within an equally complex listening environment) and rhythmic periodicity (the bottom-up synching of bodies and attentions through sound) are virtually existing realities available to, but not always directly used by, musicians.
Concluding Remarks
Our proposed notion of the virtual universal has a dual function. On the one hand, it helps us get away from conventional understandings of universality, which always generate many exceptions. Exceptions undermine the very definition of “universality,” and cannot be easily wished away or banished. On the other hand, human evolution is obviously undergirded by biological and physical processes, which means that comparison between geographically and historically dispersed groups must be possible on some level. It’s not important to us whether anything one thing “X” (e.g., “music”) has universally consistent features. From a zoomed-out perspective, it obviously has, and from a zoomed-in perspective, it obviously hasn’t. Both perspectives have their own truths. But understanding how the two perspectives relate to each other and advance the overall picture requires inventive new ways of thinking. We hope we’ve offered one such fruitful possibility here.
Our thinking on this topic generates a host of other questions, which lie beyond the bounds of this short paper. Here’s a key one: What if we consider that the universality and diversity of music is really an outcome of many varied processes, not any biological specializations or pre-adaptations? The history of human societies brims with countless ways of living, governing, and creating, often as a response to environmental conditions including neighboring societies (Graeber & Wengrow, 2021). Musical behavior could be just one such creative response, perhaps created on many occasions in the face of common challenges.
Our goal here has been to provide a way of thinking about music that makes room for both universality and diversity. In so doing, we hope to have found a way to mediate scientific and humanistic modes of inquiry. Why is this endeavor necessary? We contend not that both sides have it “wrong.” Nor do we want to suggest that each side is somehow limited, and that scientists can somehow fulfill something lacking in ethnomusicology and vice versa. We wish rather to emphasize the opposite, that both perspectives are stunningly illuminating. What interests us, then, is why—if this is indeed the case, that is, if both sides seem to be doing so well—why is there so little dialogue between the two disciplinary formations? And how is it that the (perfectly reasonable and often profound) arguments of scientist and ethnomusicologists often clash with or contradict each other?
Of course, we know why and how this is the case. But things need not be this way. Rather than continuing to speak past each other, researchers might find common ground with a revised conceptualization of the object of study. Foregrounding creativity and paying heed to the potency of virtual universals offers one way forward.
Footnotes
Acknowledgments
This work was supported by a grant to G.S. and A.A.G from Princeton University’s Humanities Council Magic Project
Action Editor
Ian Cross, University of Cambridge, Faculty of Music.
Peer Review
Jonathan Stock, University College Cork, Music.
Patrick Savage, University of Auckland, School of Psychology, and Keio University, Faculty of Environment and Information Studies.
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
This research did not require ethics committee or IRB approval. This research did not involve the use of personal data, fieldwork, or experiments involving human or animal participants, or work with children, vulnerable individuals, or clinical populations.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
