Abstract
The interactions between species-specific predispositions and cultural plasticity in the development of human musical behavior have recently become the rationale for a possible Baldwinian origin of human musicality. In the previously suggested Baldwinian scenarios of music origin, social bonding has been indicated as the crucial adaptive value that became the main cause of the co-evolutionary process that led to our musicality. However, the adaptive value of social bonding does not explain the cultural variability of musical expressions that enabled the Baldwinian evolution of musicality. The main aim of this article is to show that free rider recognition, along with social bonding and signaling commitment, could have been a possible adaptive function of hominin musical rituals. In the proposed scenario, free rider recognition became a “flywheel” of the arms race between deception and cooperation. As a result, the interplay between the canalization and plasticity of musical learning became a part of music evolution. This process created a cultural niche in which hominin vocal learning was specialized in the imitation of discrete pitch and rhythm.
Human musical behavior is driven by species-specific predispositions and culture-specific factors. As a result, music from different cultures is very diverse (Mehr et al., 2019; Merriam, 1964). However, despite this diversity there are widespread features of music, known as musical universals (Brown & Jordania, 2013; Mehr et al., 2019; Nettl, 2000; Savage et al., 2015; Trehub, 2015), which suggest that music, along with crying, laughter, and speech, belongs to well recognizable human-specific auditory signaling systems. The coexistence of species- and culture-specific elements in music additionally suggests that the evolution of human musicality had to coincide with the evolution of cognitive plasticity that led to the development of a cultural environment (Podlipniak, 2021). Such circumstances, if stable long enough, would have created a good opportunity for gene-culture co-evolution (Lumsden & Wilson, 1982). In line with this assumption, the origin of human musicality has recently been hypothesized as the result of this co-evolutionary process (Killin, 2016, 2017, 2018; Patel, 2018, 2021; Podlipniak, 2015, 2016, 2017, 2021; Savage et al., 2021a; Shilton, 2022; Tomlinson, 2015; van der Schyff & Schiavio, 2017). Taking into account the important role of inventiveness in human musical behavior, some of these co-evolutionary scenarios of music origin have included the “Baldwin effect” (Podlipniak, 2015, 2016, 2017, 2021; Savage et al., 2021a), that is, a type of gene-culture co-evolution in which an initially invented behavioral trait is transformed by means of natural selection into an instinctive behavior (Baldwin, 1896a, 1896b). Savage et al. (2021a) have additionally elaborated on the Baldwinian scenario by postulating an “iterated Baldwin effect,” as an evolutionary mechanism that led to the emergence of the co-evolving system we know as music. All these Baldwinian explanations of music origin have so far indicated social bonding as an adaptive value of music. In fact, there are a lot of premises that support the crucial role of music in establishing and sustaining social bonds (Dunbar et al., 2012; Pearce et al., 2015, 2017; Tarr et al., 2014). However, the unanswered question concerning the role of social bonding in the Baldwinian evolution of music is why would the cultural variation of music be necessary for social bonding? The “social bonding” hypothesis does not seem to explain this issue alone due to the following reasons.
The first step in every Baldwinian scenario is a social invention (Baldwin, 1896a). In this process, learning must be costly enough in terms of energy and time to allow natural selection to favor instinctively behaving individuals (Dor & Jablonka, 2000). The crucial role of cultural information in this process means that the adaptive function of this behavior should be initially achieved more effectively by cultural change than by fixed instinctive features (Godfrey-Smith, 2007). To use cultural information in the domain of vocal signaling “vocal production learning” is necessary (Merker, 2021). As far as musicality is concerned, the key sound features to be learned are pitch and rhythm (Bannan, 2009). Therefore, the actual reason for the emergence of the Baldwinian evolution of music is related to the selective pressures that were responsible for the appearance of vocal production learning of pitch and rhythm. It is difficult, however, to envisage how social bonding could have contributed to the evolution of vocal production learning in the domain of pitch and rhythm among hominins, taking into account the postponed adaptive effect of social cooperation. The strengthening of social bonds by means of a learned behavior, even in the case of music, which is relatively fast at creating social bonds (Pearce et al., 2015), is a delayed benefit in comparison to the instant profit of vocal warning obtained by innate signaling, for example (Seyfarth et al., 1980). From this perspective, social bonding is a beneficial consequence of signaling (e.g., signaling commitment), rather than the main reason for the appearance of a particular vocalization. In line with this argument, Mehr et al. (2021) have suggested that music evolved as a tool of coalition and parental attention signaling. This idea does not explain, however, why music is a signaling system composed of such a large amount of culture-specific traits. The learning of culture-specific musical traits is costly in terms of time and energy. Why would natural selection have supported such a costly signaling system, instead of preferring innate culturally unchanged vocalizations? After all, both signaling of coalition strength and parental attention can be achieved by innate vocalizations as seen in chimpanzee pant-hooting (Fedurek et al., 2013) and mammalian distress calls (Root-Gutteridge et al., 2021).
Another unsolved quandary with the claim that social bonding is the reason for the appearance of the Baldwinian evolution of music is related to the problem of the reliability of the signal. It is known that costly signals of commitment can be an important factor in influencing the acceptance of group membership (Ohtsubo & Watanabe, 2009; Power, 2017; Yamaguchi et al., 2015). Signaling commitment and trust can also lead to the strengthening of social bonds. However, as signaling can also be used as a tool of deception (Searcy & Nowicki, 2005), there is a risk that signaling commitment by an individual can be an egoistic strategy designed to cheat the community. As a result, every group that uses signals of commitment must face the challenge of the recognition of their credibility. After all, only a credible signal of commitment can be a good enough source for social bonding. From this perspective, a newly invented code, as in the case of the initial Baldwinian invention, being susceptible to deception, seems to be a poor tool for the creation of social bonds without any mechanism that ensures its credibility. Therefore, while social bonding can be a good reason for sustaining the Baldwinian feedback loop (Savage et al., 2021a) and strengthening the selective pressure during the iteration of the Baldwin effect, it is hard to account for this alone as the trigger for the Baldwinian evolution of music. The aim of this article is to indicate that the Baldwinian evolutionary scenario of music origin should be completed by adding the initial selective forces that led to the next stages of music evolution. To find this music origin missing link, the concept of “free rider” recognition as an adaptive value of the first socially invented musical ritual is proposed. In other words, by indicating “free rider” recognition as a possible additional adaptive function of music, the proposed view is an extension of the social bonding hypothesis.
The puzzle of the origin of vocal production learning among hominins
Vocal production learning is the ability to reproduce perceived sounds by voice (Janik & Knörnschild, 2021; Janik & Slater, 2000; Merker, 2012). This rare ability consists of adjusting the structure of produced sounds to the acoustic parameters of heard sounds by the means of vocal control. Apart from Homo sapiens, this ability has also been noticed in other mammalian taxa such as bats, cetaceans, pinnipeds, and elephants (Janik & Knörnschild, 2021), as well as in three groups of birds, that is, songbirds, parrots, and hummingbirds (Päckert, 2018). Although some convergence between the acoustic parameters of produced and perceived sounds has also been observed in the vocalizations of other mammals, including primates (Janik & Knörnschild, 2021), it is claimed that Homo sapiens is the only primate endowed with vocal production learning (Fitch & Jarvis, 2013; Janik & Slater, 1997; Jarvis, 2019; Petkov & Jarvis, 2012). As imitation is a necessary condition for the development of vocal culture, vocal production learning must have been crucial for the Baldwinian evolution of music. In other words, without vocal production learning, no social invention of even the simplest song would have been possible. Merker, in his commentary to Savage and colleagues’ proposal, has claimed, however, that vocal production learning could not have evolved by the Baldwinian mechanism (Merker, 2021). In response, Savage et al. (2021b) have agreed, suggesting that vocal production learning evolved biologically. While the appearance of vocal production learning may have probably been the result of solely biological forces (Merker, 2021), this does not necessarily mean, however, that the origin of music has no connection with the Baldwinian model of evolution. First, vocal production learning is not a “binary trait” (Arriaga & Jarvis, 2013; Janik & Knörnschild, 2021; Martins & Boeckx, 2020; Petkov & Jarvis, 2012; Vernes et al., 2021). Indeed, it is characterized by many dimensions such as “accuracy of the copy,” or “type of vocal modifications” (Vernes et al., 2021). This means that once vocal learning evolved, the changes of and within its dimensions could have been induced in a Baldwinian way. Therefore, if vocal production learning among our predecessors had not evolved originally as a musical ability, then there would have still been the possibility that musicality would have appeared as a result of the Baldwin effect.
Second, although human vocal production learning is treated as one of the most elaborate modes of learning among mammals (Janik & Knörnschild, 2021), people seem to be especially talented in the imitation of particular sound features rather than in a precise duplication of all acoustic traits that characterize every perceived sound (Lemaitre et al., 2016). Not surprisingly, humans are most efficient in the imitation of the features crucial for the recognition of speech and singing units. Moreover, the vocal learning of the distinctive features of the mother tongue (Warlaumont, 2020) and a culture-specific music system, such as pitch intervals and rhythms (Benetti, Costa-Giomi, 2019), is spontaneous and happens from infancy. The tight connection between human vocal learning and the learning of speech and singing seems to be facilitated by infants’ special attention directed toward speech (Vouloumanos et al., 2010) and singing (Costa-Giomi, 2014; Costa-Giomi & Ilari, 2014). Therefore, the vocal learning that we observe today among humans seems to be especially tuned into speech and music, which is functionally involved in the expression of language-specific propositional meanings and music-specific emotional sensations.
Third, the fact that vocal production learning is absent in our closest relatives—chimpanzees, suggests the relatively faster evolution of volitional vocalizations among hominins. There are, however, at least four abilities observed among some other non-human primates that, if also present in hominins, can be interpreted as the preadaptations for hominin vocal production learning: (1) the ability to adjust vocal production to social context (Seyfarth & Cheney, 2018), (2) the ability to modify certain spectral features of vocalizations (Kalan et al., 2015; Watson et al., 2015), (3) the ability to associate a particular type of vocal signal with referential meaning (Slocombe & Zuberbühler, 2005, 2006), and (4) the use of pitch and rhythm in affective prosody (Zimmermann et al., 2013). While the repertoire of primates calls is relatively constrained and rigid to acoustic modifications (Hammerschmidt & Fischer, 2008), both a primates’ decision to vocalize and their choice of a particular call often depend on the social context (Seyfarth & Cheney, 2018; Slocombe & Zuberbühler, 2007). The same dependence most probably characterized the last common ancestor of humans and chimpanzees. This means that even before hominins were able to vocally control their calls, the use of their vocalizations were susceptible to cultural change. Such an ability could have been a good starting point for the creation of vocal habits related to signaling social intentions. In addition, the ability to combine the acoustic features of vocalizations with culturally flexible meaning, which we observe nowadays among chimpanzees (Watson et al., 2015), could have opened an enormous space for coding information, restricted only by hominins’ memory and perceptive resolution. Such a tendency to exploit sounds as a medium of cultural information, if only adaptive, must have been fertile ground for the evolution of vocal culture. This means that the beginning of the evolution of hominin vocal culture preceded the appearance of vocal production learning. Nevertheless, as vocal production learning is definitely a crucial ability, which facilitates and accelerates cultural evolution in the domain of vocal culture, it seems reasonable to hypothesize that vocal production learning was a milestone in this process. The main questions, however, are which acoustic features were the first sound objects to imitate, and what adaptive value was responsible for the appearance of vocal production learning and for the signal variability among hominins in the domain of music.
Adaptive factors in favor of musical signal variability
Judging by the interspecies comparison of signaling, Griebel and Oller (2008) have indicated different functions, such as intra- and intersexual competition, social cohesion, including parent–offspring bonding, and deception as the potential reasons for the evolution of signal variability. Parent–infant bonding as a reason for the evolution of musical signal variability does not seem very promising. Parental singing to infants, that is, “infant directed singing,” is usually an exaggerated and simplified version of adult singing in terms of pitch contour and tonal complexity, respectively (Trainor et al., 1997; Trehub et al., 1993). Lullabies are also less rhythmically complex compared with other songs (Mehr et al., 2019) and infants prefer tonal simplicity (Trainor, 1996; Trehub et al., 1993; Unyk et al., 1992), which suggest the lack of a tendency to complicate musical structure in parent–infant musical communication. Although simplicity of infant directed singing does not exclude variability the openness for complexity observed in adult-mode singing increases the scope of variability. Therefore, even if parent–infant bonding had been an initial source of musical variability (Leongómez et al., 2021), the subsequent evolutionary trajectory for musical signal variability would not have been related to this function, redirecting it toward social bonding among adults. Also, sexual selection as a source of musical signal variability does not seem to be a very convincing explanation. While sexual competition may result in the appearance of culturally variable signals, as in the case of bird and whale songs (Catchpole, 2000; Garland & McGregor, 2020; Noad et al., 2000), there are some characteristics of music that suggest that social factors rather than sexual selection played a crucial role in the origin of musical signal variability. The main clue supporting this claim is the fact that singing, in contrast to speech, tends to be simultaneous (Bannan, 2020). Both singing in unison and in polyphony imposes a coordination between singers, which is costly. Even antiphonal singing necessitates the matching of harmonic series between calls separated in time from responses (Wagner & Hoeschele, 2022), which imposes coordination between singers too. The result of this coordination blends all individual displays into a more or less homogeneous signal, which makes it an ineffective strategy for individual fitness advertisement that is indispensable for sexual competition. In fact, although all communal singing can be interpreted as a signal of social cohesion, the value of this signal is measured by the level of similarity. In contrast, sexual display is oriented to show an individual advantage over other individuals. In this game, there can be only one winner. Therefore, although one cannot entirely exclude any role of sexual selection in the evolution of musicality (Darwin, 1871; Miller, 2000; Ravignani, 2018) the predominant communal character of music and its connection with social life (Blacking, 1973; Merriam, 1964; Savage et al., 2015; Turino, 2008) indicates that cooperation, not competition, must have been a more important force related to the evolution of musical signal variability. The social origin of hominin collective and antiphonal singing is additionally supported by the fact that bird (Tobias et al., 2016) and mammalian (King & McGregor, 2016; Tyack, 2008) duets and choruses are also associated with establishing stable social bonds and territoriality.
Free riding as an inevitable component of social life
If social factors had been related to the appearance of musical signaling and its cultural variability, what would have been the actual adaptive advantages linked to the social life of hominins that would have created the pressure for this process? The obvious benefits of living in social groups include more effective detection of, deterioration of, and defense against predators (Dunbar, 1996), more successful hunting of large prey (MacNulty et al., 2014; Scheel & Packer, 1991), increased probability of food localization (Bickerton, 2010; Bugnyar, 2013), and so on. However, to achieve these advantages, gregarious animals have to create and sustain social bonds. This task is associated with many challenges such as inter-individual conflicts resulting from competition within a group, uneven contribution of individual efforts to the group, and the recognition of group members. All these benefits and challenges are the consequence of two antithetical forces that govern life in social groups—the “centripetal force” that sustains cooperation and the “centrifugal force” that promotes selfish behavior (Dunbar, 1996, p. 19; Nowak, 2006). On one hand, to sustain a social group, the individual benefits of group members obtained from cooperation must exceed the profits achieved individually. On the other hand, as reproduction necessitates inter-individual competition it is impossible to eliminate every selfish behavior from a social group. An obvious egoistic strategy is to reap the social benefits without contributing one’s own efforts—“free riding” (Axelrod, 1984). One possible way to achieve this aim is using deception (Searcy & Nowicki, 2005). If some hominin vocalizations had been used as credible signals of commitment to the group, the simplest way to obtain free rider benefits would have been to mimic these credible signals. In other words, free riders would have received greater benefits than others (Grafen, 1990). In line with this reasoning, a lack of countermeasures against free riding (e.g., in the case of a certain “musical” individual being endowed with a mutation that prevented him or her from musically induced prosocial behavior) has been posed as one of the arguments used to undermine the “social bonding hypothesis” (Mehr et al., 2021, but see Harrison & Seale, 2021; Wood, 2021). This means that hominins had to face yet another challenge—the recognition of deception. Importantly, in the case where deception is a part of communication, a crucial condition for signal flexibility is learnability (Griebel & Oller, 2008). From this perspective, changing or adding a new learned variant of vocalization can act as a protection against the fake signals of commitment. Learning this new variant necessitates devotion of time and energy, which can test the veracity of a hominin’s intentions.
Which sounds were vocally learned first by hominins?
As living primates use spectral shape (Watson et al., 2015) and F0 (Kalan et al., 2015) as the sound signatures of objects it seems reasonable to hypothesize that hominins also used them for these same purposes. The so-called “affective prosody” (Brown, 2017) that is observed in many living mammalian species, including all primates (Scheumann et al., 2014; Zimmermann et al., 2013), is also based on the modulation of these acoustic parameters, providing credibility to the presence of this vocalization among hominins. Therefore, it seems to be reasonable that hominins used these acoustic features to code information at least in two important ways: (1) to communicate about external objects, for instance, danger, the location of food sources, types of food, and (2) to communicate subjective attitudes such as aggression, distress, and appeasement (Podlipniak, 2022). While some of these vocalizations were well established instinctive fixed signals, other vocalizations became subject of volitional control. From this perspective, the evolution of hominin vocal production learning is in actual fact the taking of volitional control over two types of vocalizations designed to inform about the concepts of objects and internal emotional states.
If among the vocal repertoires of ancient hominins there were vocally learned calls that were sound symbols of mental concepts referring to perceived objects such as a food source, predators, or prey, the vocal imitation of acoustic features would have started from the sound traits previously used by hominins for this same function. The fact that we observe a tendency to arbitrarily use particular sounds as food symbols among chimpanzees (Kalan et al., 2015; Watson et al., 2015), and the instinctive character of affective prosody (Filippi, 2016; Filippi et al., 2017; Scheumann et al., 2014; Zimmermann et al., 2013), suggests that hominin vocal culture also started from this type of signaling. It seems possible that the competition between hominin groups for food resources created a pressure for the use of group-specific signals that informed not only about the location of fruits but also about other individuals. To restrict the intelligibility of the signals, the cultural modifications of these calls would have been the best solution. As a result, the acoustic features of existing vocalizations were not only mimicked but also modified. The increasing number of concepts to communicate was probably the main pressure for the evolution of the vocal control of the aforementioned acoustic traits. This suggests that the appearance of vocal production learning among hominins was related to sound symbols of mental concepts rather than the sound expressions of preconceptual emotional sensations that characterize music.
The beginnings of hominin vocal culture and proto-music
Once the plasticity of vocalization was accessible, both the cultural evolution and gene-culture co-evolutionary mechanisms could have operated. The learnable group-specific signals could have been prone to functional flexibility (Griebel & Oller, 2008), giving the possibility to be used as a signal informing about belonging to a particular group. Taking into account that social bonding is nowadays facilitated by music in which rhythm plays a crucial role (McNeill, 1995), this type of signaling would have been most probably achieved at the beginning by means of sound synchronization. Alternatively, it has been proposed that well synchronized “musical” signals could have been used as the signals of group consolidation (Hagen & Bryant, 2003; Hagen & Hammerstein, 2009; Mehr et al., 2021) which could have functioned as an acoustic aposematism (Jordania, 2011). However, even if hominin synchronized signaling had evolved originally as an acoustic deterrent oriented against other groups of the same species (Hagen & Bryant, 2003; Hagen & Hammerstein, 2009) or against predators (Jordania, 2011), none of these functions could explain the cultural flexibility of the synchronized signals. While well synchronized signals can be an obvious indicator of consolidation, the cultural variations of such signals seems like an unreasonable expenditure of energy and would make the signal more ambiguous. In other words, the synchronization by itself, not the variations of the synchronized patterns, is enough to send a deterring message. Therefore, the cultural flexibility of the synchronized rhythms must have evolved because of other reasons, most probably related to social bonding.
Although Homo sapiens is the only living primate endowed with the ability to synchronize with periodic sounds in different tempi (Honing, 2019; Patel, 2008), some studies show that the synchronization of movement with musical beat in a restricted periodicity, that is, to 600 ms can be strenuously learned by chimpanzees (Hattori et al.,2013, 2015) This means that hominins were probably able to learn rituals based on movements synchronized with sounds. Interestingly, it has been proposed that the appearance of the brain mechanism that enables auditory–motor synchronization in humans was possible thanks to the evolution of vocal production learning (Patel, 2006, 2008, 2021; Patel & Iversen, 2014) but see (Brown, 2022; Cook et al., 2013). Alternatively, it has been suggested that the evolution of human auditory–motor synchronization has its deeper evolutionary roots in perceptive abilities that evolved in primates much before the appearance of vocal production learning (Honing et al., 2012, 2018; Merchant & Honing, 2014). Independent of which of these hypotheses is true they do not exclude the view that the gradual evolution of vocal production learning was directed into broadening the volitional control of vocalization timing. Specifically, Patel (2021) has suggested that vocal learning was a preadaptation for the sporadic perception of and synchronization to beat. According to Patel (2021), the advanced form of beat perception and synchronization to it, which we observe among Homo sapiens, is a result of gene-culture co-evolution .
Another premise suggests that apart from rhythm, pitch could also have been a feature used at this time to signal group belonging. This is the crucial role of pitch in affective prosody. As social bonding is emotional (Shultz & Dunbar, 2010), which means that social relations are based on subjective internal states, the exaptation of some elements presented in affective prosody, being designed to communicate subjective attitudes, seems the most parsimonious explanation. One of these elements is pitch that can be used as an emotional signal. An effective way to signal belonging to a particular group is in the alignment of emotional states (Bharucha et al., 2011; Feldman, 2017; Shilton, 2022). As pitch is an important ingredient of affective signaling, emotional alignment can be achieved also by the synchronization of pitches between them. As vocalizations produced by the vocal cords are complex harmonic sounds, the synchronization of pitches does not necessarily mean unison singing. Instead, it can be synchronization between F0 and other harmonics leading to polyphony (Bannan, 2012). In fact, even singing the same melody by men and women is usually a specious unison, which is actually singing in octave (Bannan et al., 2023), which means synchronization between the fundamental frequency and the first harmonic. Thus, the synchronization of pitches can be viewed as a kind of spectral synchronization (Wagner & Hoeschele, 2022). To synchronize vocalized pitches between a group of singing individuals, every singer must predict the changes of pitch in time that will occur in their co-singers’ singing. Regardless of how the synchronization of affective calls seem to be effective in signaling group belonging, the continuous changes of pitch in affective prosody are hard to predict in comparison to the relatively stable use of pitch in singing (Zatorre & Baum, 2012). This can explain the transition from the former to the latter. The use of a stable pitch in response to the recognized pitch of co-singers’ vocalizations necessitates, however, the volitional control of F0. This is the moment when the vocal production learning of pitch would have evolved, giving the foundations for the ability of monotonous singing. This ability would have become a milestone in the evolution of singing (Bannan, 2012) and would have opened a space for the cultural variability of pitch sequences.
The role of deception in Baldwinian feedback
If the use of culturally flexible sequences of discrete pitches and rhythms became the signals of group belonging, then deception strategies could have entered the game. As the invention of a complex sound sequence forces the learners to spend time together, the use of communal singing can be a measure of commitment by checking the effort devoted to learn a group-specific vocal signal. Those individuals who devoted less time to learning the group-specific tunes (spending this time for their egoistic aims) could have been detected by the rest of the group by means of the recognition of their low standard performance. One of the hypothetical scenarios of such “free rider” detection could have occurred when a member of the group, instead of participating in ritual group singing, took the opportunity to steal food gathered and stored by other group members. While the rest of the group strenuously learnt a new variant of ritual song by the means of many repetitions, the free rider would not have been practiced enough to learn this new variant of song. As a result, during the next communal singing the free riders’ poor performance would have attracted the attention of the rest of the group risking the possibility of ostracism. In this scenario, communal singing serves as an activity that allows all members of a group to control the behavior of others. Alternatively, one can imagine that a free rider could have tried to cheat by inventing and promoting a new vocalization rather than devoting time to learn the existing song. In this scenario, however, the free rider (not endowed with musicality that characterizes modern humans) had to devote an equal amount of time and energy to invent this new song (and to persuade the rest of the group that this new song is better than the existing one) as in the case of learning the song proposed by the group. Because of this, the former scenario seems more probable. In such long-term conditions, due to inter-individual in-group competition, natural selection would have preferred those who learned faster, in other words those who avoided effort. Under these circumstances, the appearance of deception triggered an arms race between deception and the recognition of deception (Griebel & Oller, 2008). On one hand, the effective recognition of cheaters could have led to strengthening social bonds between individuals who had recognized themselves as non-cheaters. On the other hand, the presence of “fast learners” who abused the group—free riders—created the pressure for plasticity that enabled the inventiveness of vocalizations. This arms race is based on the canalization of pitch and rhythm learning as well as on the plasticity, which is necessary for creating new vocalizations. This means that free rider recognition can be an important factor in the Baldwinian evolution of music.
The functional specificity of the Baldwinian evolution of music
What is the primordial reason for the Baldwinian evolution of music? Is this the signaling of commitment and trust, the creation and strengthening of social bonds, or the recognition of" “free riders”? In some sense, all these functions can be viewed as different sides of the same coin. There are theoretical models indicating that costly signals can co-evolve with costly cooperative traits (Salahshour, 2019). Taking this model into account, the signaling of commitment by means of vocalized sequences, being costly due to the vocal production learning of discrete pitches and rhythms, could have facilitated cooperation between hominins. Cooperation, in turn, could have induced the tendency to complicate the “musical” sound signals. This process could have facilitated social bonds as a consequence of the fact that social bonding and cooperation need trust (Roberts, 2020). In addition, to eliminate the inevitable instances of deception the recognition of “free riders” would have had to evolve. As the arms race between deception and cooperation would have resulted in the interplay between the canalization and plasticity of musical learning, respectively, the recognition of “free riders” would have been included into the set of forces influencing the Baldwinian evolution of music. In other words, apart from the role in social consolidation, the culture-specific variations of music structure can function as a hallmark of group identity. Only learned music structure allows an individual to successfully synchronize with the other members of a group. Those individuals that are unable to synchronize with the group reveal their lack of integration and are endangered with ostracism (Podlipniak, 2017). Therefore, the adaptive value of “musical plasticity” is not social bonding itself but the recognition of “self-other” in terms of the assessment of trustworthiness. From this point of view, in the process of music evolution, all these three functions, that is, the signaling of commitment and trust, the creation and strengthening of social bonds, and the recognition of “free riders” have been interdependent, leading to the appearance of functional feedback loop.
Conclusion
The presented idea concentrates on “free rider” recognition as a function that has been so far neglected in the Baldwinian scenarios of music evolution (but see Podlipniak, 2017). However, this idea does not diminish the role of social bonding and signaling commitment, but indicates them as equally important factors in the Baldwinian evolution of music. The proposed extension of the Baldwinian model of music evolution focuses only on the ultimate explanation (Fitch, 2015; Tinbergen, 1963), leaving questions about behavioral and neurobiological mechanisms and their development in ontogeny for further research. Of course, the claims about the adaptive functions of hominins’ musical behavior are difficult to test because all hominins, except for Homo sapiens, are extinct. Therefore, we cannot conduct any experiments on our ancestral species that were not endowed with contemporary human musicality, or observe the behavioral changes that had been occurring in our ancestral lineage. The scope of data that can shed light on the possible role of “free rider” recognition in the process of shaping our musicality is therefore restricted to that data that can be obtained from interspecies comparative studies and from research on modern humans. However, neither living primates nor modern humans cannot be treated as the reliable models of hominins since both their brains and behavioral repertoires differ from those of hominins as a result of phylogenetic distance. Nevertheless, both some of our and our close animal relatives’ traits can be interpreted as the remnants and pre-adaptations of hominins’ abilities, respectively. Therefore, a useful method to detect possible pre-adaptations for the ability to recognize “free riders” by the means of music would be by looking for the use of vocalizations as hallmarks of group identity among chimpanzees (cf. e.g., Crockford et al., 2004). Similarly, the suggested idea of “free rider” recognition can allow us to predict behavioral facts observed among modern humans that imply a possible role of this function in the evolution of human musicality. One such implication could be the level of social cohesion obtained by the means of communal singing. For example, a possible way to trace the remnants of a “free rider” recognition strategy is to compare the level of social cohesion between the “devoted” singers of a spontaneously created choir and the singers who avoid singing or who sing out of tune. Another way is to measure the behavioral, physiological, and neural correlates of ostracism (Hudac, 2019; McGuire & Raleigh, 1986; Morese et al., 2019) among the aforementioned “poor” singers before and after singing. Both the higher level of social cohesion among “devoted” singers in comparison to “poor” singers and ostracism toward “poor” singers, if observed, cannot be explained solely by social bonding, credible signaling, and mate selection theories. Another source of premises that could suggest that “free rider” recognition could have been an important factor in shaping our musicality is the research on the convergent evolution of vocalizations. As a convergent evolution of similar traits is usually the result of similar selective pressures (Losos, 2017), the use of culture-specific vocalizations as the tools for “free rider” recognition by animals phylogenetically distant from us (such as birds) could support the presented view.
As human musicality is a set of abilities (Fitch, 2015; Honing, 2018) rather than one uniform trait, its origin has probably been a complex process that has been influenced by many selective pressures. Therefore, the hypotheses of music origin must take into account multifaceted evolutionary paths that have led to the appearance of different abilities. In fact, many contemporary scientific efforts and studies have shown that among our abilities used in music production and perception only some can be treated as music-specific. Nonetheless, looking for their origin is not only an abstract, theoretical task but can also contribute to answering many questions such as what is the scope of the possible use of music in the solution of social conflicts resulting from suspicion of free riding, and what is an optimal strategy for music education? In the former case, learning and singing together new songs by feuding parties should reduce the conflict, while in the latter, the greater care for individual speed of learning in choirs and musical ensembles would improve the development of teamwork skills. Another important conclusion related to the Baldwinian scenario of music origin is the fact that the so far proposed different adaptive functions of music are not mutually exclusive (Harrison & Seale, 2021). Instead, they could have influenced the appearance of different musical features. The last but not the least postulate is that research on the functions of music should take into account pragmatics. After all, the interpretation of any signal can depend on the context (Seyfarth & Cheney, 2018). As this way of attributing meaning has been observed among chimpanzees (Kalan et al., 2015) one should take this into account in the evolutionary scenarios of music origin. For example, the interpretation of the same well synchronized singing could have been experienced by hominins as formidable, as in the case when one was listening to foreigners and as encouraging when the listener belonged to the singers’ group. This means that what acted as a deterrent from one perspective, could have functioned at the same time as a social glue from another.
Footnotes
Acknowledgements
I would like to thank the reviewers for their useful suggestions and inspiring questions. I would also like to thank Peter Kośmider-Jones for his language consultation.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded in whole by, National Science Centre, Poland” [grant number 2021/41/B/HS1/00541].
