Sage Journals: Discover world-class research

Abstract

Predictive coding has emerged as a key framework for analyzing music listeners’ behavior and experiences. However, the simplicity of stimuli in empirical research frequently fails to reflect the multifaceted complexity of real-world music, potentially limiting the applicability of the predictive coding model within the broader sphere of music perception. To address this shortcoming, this study explores how principles of predictive processing manifest across different sections of sonata form. The analysis suggests that the tonal uncertainty encountered at the start of Beethoven's Symphony No. 1 may lead listeners to activate diverse hidden states, thereby enriching their listening experience. Sequential modulations in the development section prompt listeners to frequently revise their internal model of tonality, while concurrently reinforcing melodic predictability through consistent repetition of motifs/phrases. Conversely, the transitions within the exposition and recapitulation, as well as the retransition within the development, display unexpected changes in rhythm, dynamics, texture, timbre, melodic contour, motif, and figuration against a relatively predictable harmonic background. Such interactions between multiple musical aspects embody a compositional principle: balancing uncertainty or surprise in some musical aspects with predictability in others to enhance listeners’ engagement. By examining how this balancing principle unfolds in various sections of sonata form, this study offers fresh insights into the integration of large-scale structural considerations with predictive processing in music cognition.

Keywords

Cognitive neuroscience musical theme music theory predictive coding sonata form

Introduction

Recent developments in music cognition have highlighted predictive processing as a key mechanism by which the brain anticipates and interprets musical information. Predictive processing provides a theoretical framework that not only encapsulates sensory expectations derived from immediate features within sensory inputs but also emphasizes cognitive expectations shaped by our long-term exposure to and understanding of musical structures (Cheung et al., 2024). Within this conceptualization, predictive coding offers a more detailed model focusing specifically on cognitive expectations. According to this model, higher cognitive processes continuously generate top-down predictions about incoming sensory data. These predictions are subsequently matched against actual bottom-up sensory inputs, with any discrepancies (i.e., prediction errors) serving to refine and adjust future predictions (Friston, 2010). The landmark paper “Music in the Brain” by Vuust et al. (2022) has synthesized existing research in music cognition through the lens of predictive coding. While Savage and Fujii (2022) acknowledged that this review revitalized interest in predictive coding, they also noted its predominant focus on Western music, suggesting a potential expansion to include non-Western musical traditions for a more comprehensive understanding of music cognition.

From a musicological standpoint, Vuust et al. (2022) appeared to marginally neglect certain essential elements of music theory, even within the scope of Western music. Despite its extensive review of rhythm, melody, and harmony, the article did not sufficiently address crucial aspects of music such as form, theme, texture, timbre, dynamics, articulation, and figuration. These elements are instrumental in evoking expectations, surprises, and uncertainties among listeners. Furthermore, Vuust et al. (2022) did not delve into more intricate musical concepts such as tonicization, hypermeasure, and motivic development, all of which are highly relevant to predictive processing of harmony, rhythm, and melody.

These limitations of Vuust et al.'s (2022) review may stem more from the longstanding gap between disciplines than from the article itself. Musicologists typically examine music as experienced in real life, whereas cognitive psychology experiments tend to employ artificial stimuli. These simplified stimuli enable researchers to manipulate specific factors to test targeted hypotheses. However, in order to ensure greater rigor in experimental design and interpretation of results, such experiments may compromise ecological validity. Ecological validity is contingent upon three facets of experimental methodology: the nature of the experimental research environment, the types of responses made by participants, and the types of stimuli. This article emphasizes the importance of selecting stimuli, advocating for the incorporation of more complex and authentic musical examples.

Criticism has been raised toward the use of stimuli with low ecological validity in cognitive neuroscience, leading to the incorporation of naturalistic stimuli in neuroimaging studies over the past decade. To approximate brain functions in real-life contexts more accurately, studies have incorporated complex stimuli such as music, movies, and narratives (Sonkusare et al., 2019). It is important to note that the choice of stimulus material is closely linked to the capabilities of the measurement tools in use. Tervaniemi (2023) highlighted that functional magnetic resonance imaging (fMRI) studies have begun employing longer music excerpts to investigate the neural underpinnings of music preference and emotion. In contrast, methodologies such as event-related potentials (ERP) or mismatch negativity (MMN) typically utilize shorter musical phrases to affirm or challenge participants’ predictions.

While empirical studies have contributed significantly to our understanding of predictive coding in music, the simplicity of the stimuli used in such studies often fails to reflect the intricacy of real-world music. This discrepancy may limit the applicability of the predictive coding model in the broader field of music cognition. To overcome this limitation, the current article seeks to intertwine the predictive coding model—and, more broadly, the predictive processing framework—with the analysis of selected works that are widely acknowledged to be musical masterpieces from the high Classical era. This approach endeavors to contribute to the burgeoning discourse between the realms of aesthetics and predictive processing. Recent discussions highlight that while the predictive coding model enriches our understanding of aesthetic experiences with substantial theoretical depth, it also beckons humanists for a closer examination and inspiration. This interdisciplinary exchange strives to transcend reductionism, fostering a more holistic comprehension of aesthetic experiences (Frascaroli et al., 2024; Pepperell, 2024).

In this vein, this article zeroes in on sonata form, a musical form prevalent in Western classical music from the mid-eighteenth century through the entirety of the nineteenth century. Rohrmeier and Koelsch (2012) have established a connection between predictive processing and sonata form by noting that experienced listeners often anticipate standard modulations (i.e., changes in tonality) within the form and have specific expectations for its crucial sections. The choice of sonata form for the current study is based on its sophisticated melding of tonal structure and thematic development. This form is distinguished by its short-range topical flexibility, forward-driving dynamism, balance, symmetry, closure, and the rational resolution of tensions (Hepokoski & Darcy, 2006). These characteristics create a nuanced framework that is particularly conducive to exploring predictive processing in music and its role in the evolution of expectations throughout various sections.

This article pursues two research objectives. The first is to explore how different sections of the sonata form exhibit a dynamic balance between predictability and unpredictability. I suggest that part of music's appeal lies in how certain aspects exhibit low predictability while others maintain high predictability, a concept inspired by the work of Cheung et al. (2019). Their research utilized a machine learning model to study harmonic progressions in American pop music, highlighting the simultaneous influence of prediction error (or surprise) and certainty. They identified two scenarios where chord progressions induce pleasure: one, where progressions align with expectations amid an uncertain context, marked by low surprise and certainty; and two, where they significantly deviate from firm expectations, characterized by high surprise and certainty. The present theoretical study endeavors to substantiate and expand upon these insights within various sections of sonata form. It is important to clarify that this article does not aim to establish “rules” for the balance between uncertainty and surprise in music. Instead, it seeks to illustrate how the principle of balancing predictability and unpredictability relates to the designated functions of various sections of sonata form.

The second aim of this study is to broaden the scope of predictive processing in music cognition beyond the traditional domains of rhythm, melody, and harmony to encompass other musical aspects. Rohrmeier and Koelsch (2012) pointed out that less commonly theorized forms of prediction or surprise, such as those involving orchestration, timbre, or a pause in the texture, suggest that listeners hold expectations for the continuity in these aspects. Building upon the hypothesis that listeners anticipate continuity in musical aspects, this study aims to delve into predictive processing as it pertains to timbre, dynamics, texture, melodic contour, and more.

Predictive Coding, Musical Rhythm, and Tonality

In music psychology, predictive coding has increasingly been applied to understand listeners’ behavior and experiences. Vuust and Witek (2014) posited that individuals employ schematic knowledge and structural regularities to construct internal models predicting future musical events. For instance, the brain establishes meter based on the temporal regularities of previous musical passages, leading to a belief about the likelihood of note occurrences. Building on this, Vuust et al. (2018) refined the model of rhythm perception in predictive coding. In this hierarchical neural network model, lower-level layers process auditory inputs encompassing musical rhythm, while upper layers generate predictions about the likelihood of future musical events based on established meter. The model operates through a bidirectional flow of information: top-down predictions are conveyed to lower levels, and bottom-up information, bearing prediction errors, enables upper layers to refine their predictive models. This model effectively elucidates why people respond differently to various musical rhythms. Music with minimal syncopation may be perceived as uninteresting due to the lack of substantial prediction errors, whereas music with excessive syncopation might undermine the confidence in the internal model or its precision. Music that incorporates moderate syncopation strikes a balance between prediction error and precision, having a higher likelihood of eliciting bodily movement and pleasure (Witek et al., 2014). Vuust et al. (2018) also hypothesized that individual preferences for syncopation are linked to the product of prediction error and precision arising from meter clarity, termed precision-weighted prediction error (pwPE). Rhythms with medium complexity often yield the highest levels of pwPE, making them more pleasurable compared with rhythms with either high or low complexity. This observation suggests a fundamental principle in musical composition: effective and engaging music typically achieves equilibrium between prediction error and uncertainty, thereby generating relatively high levels of pwPE.

While meter is the internal model for the temporal aspect of music, tonality is the internal model for the pitch aspect of music. As listeners engage with music, preceding notes establish tonality, leading to the anticipation that ensuing notes will adhere to this tonality. Deviations from this established tonality result in prediction errors. In addition, the concept of enculturation highlights the significant impact of exposure on musical cognition. It suggests that through ongoing exposure, listeners absorb the patterns inherent in musical styles, subsequently shaping their tonal perception and musical expectations (Pearce, 2018).

A recent study by Li et al. (2021) investigated the stability/clarity of tonality in Western music, paralleling the relationship between syncopation and pleasure elucidated by Vuust et al. (2018). In diatonic music, the alignment of all notes with the established tonality minimizes prediction errors. In contrast, atonal music presents challenges in defining a clear tonality, leading to minimal prediction precision. Chromatic music, positioned between these extremes, strategically utilizes out-of-key notes to create harmonic tension. This approach results in an optimal mix of prediction error and precision, thereby maximizing pwPE.

The concept of tonicization in chromatic harmony illustrates the sophistication of predictive coding in music. This process involves a nontonic chord assuming the tonic role temporarily, referred to as a tonicized chord. The chord preceding this tonicized chord is known as a secondary dominant, serving as the dominant of the tonicized chord's temporary key. Such tonicization showcases the multilayered approach inherent to the predictive coding model. The perception of a secondary dominant may prompt the low-level neural layer to signal a prediction error to the middle-level layer, leading to an update in the belief of local tonality. Meanwhile, the high-level layer might maintain a stable global tonality belief after receiving the prediction error signal from the middle-level layer, waiting for further musical developments. In many instances, the resolution of the secondary dominant to the tonicized chord subsequently reaffirms the unchanged global tonality.

To illustrate this concept in practice, let us analyze the opening of Ludwig van Beethoven's Symphony No. 1 in C major, Op. 21 (Figure 1). While this passage is frequently discussed in music theory classes in relation to the concept of secondary dominant chords, our approach here is through the fresh perspective of predictive coding. This analysis posits that neural layers at different levels process information over varying time spans; the low-level layer processes the pitches currently presented, the middle-level layer discerns local tonality spanning one to two measures, and the high-level layer interprets global tonality spanning three to six measures. This approach mirrors the way the human brain employs predictive coding to process language, engaging predictions that span multiple timescales and hierarchical representation levels (Caucheteux et al., 2023).

Figure 1.

Simplified musical score of the beginning of Beethoven's Symphony No. 1.

In Figure 1, the first two chords form a V–I progression in F major, likely leading the middle-level layer to identify the local tonality as F major. This belief is then challenged by a V–vi progression in C major in the subsequent measure. Here, the high-level layer integrates the initial four chords and hypothesizes a global tonality of C major, possibly reinterpreting the first chord as I^♭7 in C major (Aldwell et al., 2003). This hypothesis gains some support from the tonicized half cadence in C major in measures 3–4, which features the resolution of the dominant of the dominant—the most common secondary dominant—to the tonicized dominant chord.

The preceding discussion enables us to gain a better understanding of the effect created by Beethoven. In sonata form, the introduction section paves the way for the exposition section. It sometimes introduces tonal uncertainty but ultimately resolves into the dominant harmony of the piece's tonic key. This resolution sets the stage for the exposition, which typically commences in the tonic of that key. In Beethoven's Symphony No. 1, the listeners’ brains encounter initial tonal uncertainty, prompting the middle-level neural layer to decipher unstable tonality across the first few measures. Meanwhile, the high-level layer progressively reinforces its prediction of C major as the dominant harmony emerges more clearly. Contrasting with symphonies lacking an introduction or that establish tonality unambiguously, the initial measures of Beethoven's Symphony No. 1 seems to deliberately provoke tonal prediction errors and a sense of gradually diminishing uncertainty in the listener. Beethoven's strategic use of chromaticism not only enriches the auditory experience but also actively involves the listener in the unfolding musical narrative.

Drawing on predictive coding theory, the perceived richness or colorfulness of chromatic harmony is likely derived from an extensive spectrum of hidden states. Friston (2010) describes these hidden states as the brain's internal predictions or models about the causes of sensory inputs. These are labeled ‘hidden’ because they are inferential constructs rather than directly perceived sensory information. When encountering chromatic harmonies, the brain utilizes a wide array of harmonic progressions from different tonalities for interpretation. For example, in the first four measures of Beethoven's Symphony No. 1, the brain may consider harmonic progressions to F major, A minor, C major, and G major as hidden states. Each tonality, with its distinct harmonic progression, significantly informs the brain's anticipatory mechanisms and interpretations of the music. This is in contrast to the predictive coding of diatonic harmonies, which typically involve hidden states related to a single tonality.

This link between rich hidden states and chromatic music is supported by an fMRI study conducted by Li et al. (2021). Their research examined neural responses to diatonic, chromatic, and atonal music, revealing increased activation in the dorsolateral prefrontal cortex and the inferior parietal lobule in response to chromatic harmony. Given the established role of these brain regions in working memory functions (Bunge et al., 2001; Crottaz-Herbette et al., 2004; Diwadkar et al., 2000), this finding suggests that chromatic music requires more extensive retrieval, maintenance, and manipulation of complex hidden states of harmonic progression compared with diatonic and atonal music.

Hierarchical and Nonhierarchical Processing in Musical Prediction

Building on prior discussion about meter and tonality, a critical examination of the interrelation between hierarchical processing and predictive processing in music cognition is warranted. This section aims to delineate that meter and tonality are inherently involved in hierarchical processing. In contrast, other musical aspects such as motif, dynamics, texture, timbre, pitch range, melodic contour, and tempo do not engage in this hierarchical paradigm to the same extent. This distinction is pivotal in highlighting the distinct nature of predictive coding compared with other prediction-based approaches within music cognition.

The processing of sequential information critically involves the concept of chunking, which is integral to hierarchical processing, a phenomenon observed in both language and music (Newport et al., 2004). Chunking involves organizing elements into significant groups or “chunks,” facilitating the establishment of connections between nonadjacent elements and the decoding of a sequence's inherent structure within a hierarchical context. In linguistics, the comprehension of sentence structure necessitates the identification and interpretation of various grammatical constructs, including those comprising nonadjacent words, situated within the language's hierarchical architecture. For example, a relative clause, which typically consists of a noun and its modifiers, serves as a lower-level chunk within the broader hierarchical structure of a sentence. Analogously, in music, tonicization resembles linguistic relative clauses regarding hierarchical processing. Tonicization marks a temporary shift where a nontonic chord briefly assumes the tonic's role, similar to a noun in a relative clause momentarily taking a central position in a sentence's structure. Organizing the tonicized chord and its preceding chords, such as the secondary dominant chord, into a “chunk” facilitates the understanding of the passage's overall harmonic progression.

Emerging hypotheses and evidence suggest that the cognitive processing of both linguistic and musical syntax, owing to their hierarchical nature, implicates the involvement of the inferior frontal gyrus (IFG) (Asano et al., 2021; Fitch & Martins, 2014; Levitin & Menon, 2003; Li et al., 2021; Tillmann, 2012). The right IFG, in particular, is recognized for its crucial role in hierarchical processing of tonal information (Asano et al., 2022; Bianco et al., 2016; Kim et al., 2011; Koelsch et al., 2005; Musso et al., 2015; Tillmann et al., 2006). Furthermore, the perception of rhythm in music, especially under conditions of conflicting metrical information, also requires the engagement of the IFG (Heard & Lee, 2020; Thaut et al., 2014; Vuust & Witek, 2014). This is particularly significant considering that concepts of compound meter and hypermeasure are founded on the hierarchical organization and chunking of musical information. For example, in music with a 6/8 time signature, three smaller beats can be chunked into one larger beat, and two larger beats form a measure. Furthermore, measures can be grouped to create a hypermeasure unit (Cone, 1968). It has been argued that analyzing sonata form requires a comprehension of how hypermeasure, meter, harmonic rhythm, and motif interact with each other (Ng, 2012; Smyth, 1990; Temperley, 2008).

While the preceding discussion underscores the critical role of hierarchical processing in music, it is important to note that music cognition involves both hierarchically and nonhierarchically organized structures. Whereas tonality and meter are typically processed through a hierarchical framework, the formation of other types of expectations seems to deviate from this hierarchical approach. It is essential to distinguish between different types of expectations, as they likely have distinct neural correlates. While the IFG is actively engaged in processing tonality and meter, it does not respond to veridical expectations, which are expectations of specific events or patterns in a familiar musical sequence (Vuust et al., 2022). Burunat et al. (2014) investigated neural networks related to working memory by tracking repetitions of salient musical motifs in a tango piece by Astor Piazzolla. They observed that working memory processing of motifs involved integrated neural activity across cognitive, motor, and limbic subsystems. Notably, hippocampal connectivity with the dorsolateral prefrontal cortex, supplementary motor area, and cerebellum was modulated by motif repetitions. Conducted under naturalistic listening conditions, this research emphasized the hippocampus's key role in processing musical motifs. Crucially, the expectations of musical motifs did not engage the IFG, likely due to the nonhierarchical nature of motif processing.

In addition to motif repetitions, musical expectations regarding dynamics, texture, timbre, pitch range, melodic contour, and tempo are also unlikely to be related to hierarchical organization. Listeners generally anticipate stability in these musical aspects (Rohrmeier & Koelsch, 2012). Consequently, changes in these aspects can lead to prediction errors. However, it is crucial to acknowledge that such errors may not be processed by the neural networks that manage hierarchical processing. Rather, these changes are likely to elicit automatic orienting responses in listeners, characterized by distinct physiological reactions to novel or motivationally significant stimuli (Lynn, 1966). These responses, engaging the brain's bottom-up attentional mechanisms, are an immediate reaction to unexpected auditory stimuli. As suggested by Grewe et al. (2007), emotional and physiological arousal responses to changes in pitch range, dynamics, texture, or timbre might be viewed as orienting responses.

Two Forms of Transition in Sonata Form

Having explored various types of musical predictions, we can now turn our attention back to sonata forms. In the major-key sonata-allegro form, the structure unfolds in three distinct sections. Initially, the exposition presents the primary theme within the tonic key, then shifts to the dominant key to introduce secondary themes. The development elaborates and varies these themes across different tonal areas, leading to the recapitulation where both primary and secondary themes are revisited, this time entirely within the tonic key. Within the exposition, although a wealth of thematic material is essential for enriching the musical experience, sudden changes in themes can lead to difficulties in comprehension, similar to abrupt shifts in topics during language processing. Consequently, carefully managing thematic transitions—those connecting the first and second theme groups—is crucial to maintain musical coherence and engage the listener effectively. Thematic transitions within sonata form are marked by uncertainty and surprise in various musical aspects, including weak motivic associations and disrupted hypermeasure structures, leading to varied timespans and marked changes in texture and dynamics from the preceding primary theme (Batt, 1988). These elements appear to enhance the activity and energy within transitions, traits that are considered hallmark features of transitions (Hepokoski & Darcy, 2006).

Some surprising elements in transitions are connected to the medial caesura (MC), a concept introduced by Hepokoski and Darcy (2006). The MC refers to a pause or break in the texture at the end of transitions, creating a clear division before the arrival of the second theme. Various methods are used to accentuate the MC and enhance its rhetorical impact, including expressive deformations that hint at or reveal unusual events or challenges within the musical narrative; textural and dynamic shifts that highlight the pause; expanded caesura-fill, where the MC space is intentionally prolonged and filled with material to serve expressive purposes such as indicating energy loss or introducing wit or surprise; and the definitive use of general pauses or silence across all voices to mark the MC's presence, ensuring a distinct textual break. Hepokoski and Darcy's (2006) analysis demonstrates that transitions in sonata form, especially from the late eighteenth century forward, are deliberately designed to present a diverse array of musical changes. Composers from this period focused on innovation, ensuring that the transitions within their compositions possessed unique characteristics. Such innovative efforts likely enhanced the unpredictability of transitions.

Transitions in sonata form take two distinct forms: the exposition form and the recapitulation form. Through the lens of predictive processing, these transitions illustrate scenarios of high predictability in global tonality and low predictability in various other musical aspects. In the exposition, the primary theme moves to the secondary themes through a modulatory transition, which introduces low predictability in terms of motif, rhythm (including at the hypermeasure level), texture, and dynamics. Because this transition usually modulates to a tonally close area—often to the dominant in major keys or to the relative major in minor keys—the overall harmonic progression remains predictably structured. In the recapitulation, the primary theme transits to the secondary themes without a modulation that leads to a new key, resulting in minimal surprise in global tonality. Yet, this phase incorporates a form of surprise grounded in veridical expectations. As audiences engage with a sonata-form composition, they may compare the recapitulation with their memory of the exposition. This memory fosters a veridical expectation for the recapitulation's transition. Since the recapitulation instead reaffirms the tonic key, it subtly contradicts these expectations, creating a unique form of prediction error. This reaffirmation of the tonic key, differing from the anticipated modulation in the exposition's transition, paradoxically enhances listener satisfaction by providing tonal closure and a sense of resolution. From this perspective, uncertainty and surprise in motif, rhythm, texture, and dynamics during the recapitulation's transition can intensify the musical tension, ultimately leading to a profound sense of resolution.

The preceding discussion of thematic transitions in the exposition and recapitulation sections highlights how musical tension and energy, by fostering uncertainty and surprise in the listener's perception, effectively set the stage for the emergence of the second theme. Analogously, this effect is mirrored in the concluding passage of the development section, known as the retransition, which cultivates anticipation in the audience, priming them for the re-emergence of the primary theme. The exploration of this phenomenon will be the subject of the next section.

Predictive Processing during Retransition

While schematic knowledge of musical forms can enhance the enjoyment and comprehension of music, this benefit is not limited to individuals with formal training in classical music. Commonly, listeners familiar with pop songs intuitively anticipate the primary theme, the chorus, at the conclusion of the verse. In the context of unfamiliar pop songs, the musical cues signaling the end of the verse were found to activate the frontoparietal network and dopaminergic pathways in the brain (Li & Tsai, 2024). This anticipatory response involves cognitive processes reliant on the listener's schematic knowledge with the verse–chorus form and sensitivity to musical cues preceding the primary theme. In the realm of sonata form, analogous anticipatory experiences are evident. The final portion of the development section, the retransition, notably elevates listener attention and emotional anticipation for the main theme's recapitulation. An fMRI study by Li and Tsai (2022) reported that during listening to familiar sonata-form compositions, retransitions heightened activity in the frontal regions and dopaminergic pathways. Recognizing the importance of anticipation of the primary theme, this section will analyze retransitions in selected celebrated works from the high Classical era to explore various facets of predictive processing.

The retransition in sonata form serves to alleviate the tonal uncertainty cultivated during the development section by re-establishing the dominant harmony of the tonic key. Typically, the retransition consists of two distinct stages: initially, it transitions back to the dominant harmony, and subsequently, it elaborates on the dominant pedal (Ivanovitch, 2011; Shamgar, 1981; Webster, 2001). I posit that retransitions in works from the high Classical era often juxtapose contrasting or evolving musical elements (indicating low predictability) against a backdrop of highly predictable harmony, culminating in a focus on the dominant harmony.

The elaboration on the dominant pedal during the second stage of retransitions showcases composers’ skill in introducing uncertainty and surprise within a stable harmonic environment. A notable illustration of this technique is found in the second movement of Wolfgang Amadeus Mozart's Piano Concerto No. 25 in C major, K. 503. This movement features an extraordinarily prolonged dominant prolongation during the retransition, distinguished as the most extended in Mozart's oeuvre. In performance settings, this segment of dominant prolongation often extends for about a minute. According to Ivanovitch (2011), this dominant prolongation can be divided into four stages: an initial piano ascent targeting the chordal seventh with a surprising leap downward just before reaching this seventh note (measures 59–63), increased tension through syncopation and instrumental texture variation (measures 63–68), subtle hints of minor mode with 5–♭6–5 fluctuations (measures 68–70), and a melodic descent with intricate woodwind textures that creates a surprising effect of temporal compression and extension (measures 70–74). Throughout this dominant prolongation, Mozart ingeniously incorporated variations in melody, figuration, rhythm, texture, modal color, and timbre. These variations introduce significant levels of unpredictability, which, when juxtaposed with the stable harmony, create a tranquil yet dynamic beauty.

In the first movement of Mozart's Symphony No. 40 in G minor, K. 550, the retransition also exemplifies the principle of balancing predictability with unpredictability. During the initial stage of the retransition, the tonality moves toward a state of increasing certainty, directing back to the dominant harmony of the tonic key. This stage is marked by a two-measure rhythmic pattern involving dialog between strings and woodwinds (measures 139–152). This rhythmic regularity shifts into more frequent dialog between strings and woodwinds (measures 153–160) as the second stage begins (Figure 2). As the retransition progresses into its second stage, there is a dramatic surge in dynamics and texture. This increase, along with the condensed dialogs between strings and woodwinds, generates a sense of surprise. Subsequently, there is a gradual increase in the certainty of the melodic contour, transitioning from initial fluctuations to a consistent descending pattern. In measures 160–165, the woodwinds take over the descending melodic contour from the strings, imparting a more lyrical quality to the passage. The changes in timbre and dynamics, combined with the chromatically descending stepwise motion, could induce a prediction error in the listener's brain. In the retransition's last two measures, the shift in timbre, as the theme is taken up by the strings, is likely to provoke a delightful surprise in the listener. Throughout this retransition, Mozart meticulously crafted rich variations in rhythm, timbre, dynamics, texture, and melodic contour to create low levels of predictability, which adds a significant degree of drama and allure to the predictable harmony.

Figure 2.

Simplified musical score of the first movement of Mozart's Symphony No. 40 in G minor, K. 550. Motifs played by the strings and woodwinds are marked in light blue and light yellow, respectively. The initial dialogs between the strings and woodwinds (measures 139–152) sets an expectation for the continuation of such dialogs. However, starting from the last beat of measure 152, these dialogs become more condensed, as evident from the shortening of the light blue sections. This condensation of the dialog creates a surprising effect, which is further amplified by an elevated pitch range and enhanced texture.

Sequential Modulation in Development Section

While the transitions and retransition in sonata form feature predictable harmonic goals, the development section is distinguished by frequent modulations that amplify the tonal tension initially set during the exposition's transition. Given that modulations increase activity in the listener's prefrontal cortex (Koelsch et al., 2003; Tsai & Li, 2019), frequent modulations are likely to pose cognitive challenges to listeners. To counterbalance these challenges, composers may strategically maintain motivic predictability during modulations.

The development section typically navigates various tonal areas through sequential modulation, which refers to a compositional technique where a specific melodic or harmonic pattern is transposed and repeated at different pitch levels, allowing for this pattern to be reinterpreted within various tonal contexts. To illustrate this concept, an excerpt from Mozart's Piano Sonata in F major, K. 547a, is examined, where sequential modulations employing the circle of fifths is utilized. In this example, the material used for sequential modulations is a four-measure phrase with motifs reminiscent of the codetta in the exposition. Each phrase within a sequential modulation can be perceived as a hypermeasure (Figure 3). The hypermeasure covering measures 98–101 is approximately a transposed version of the hypermeasure covering measures 94–97. Listeners may develop an expectation after hearing the first hypermeasure, which is then almost fulfilled in the subsequent hypermeasure. Meanwhile, the tonality moves across C minor, G minor, and D minor, necessitating updates to the listeners’ internal model of local tonality.

Figure 3.

Musical score of the first movement of Mozart's Piano Sonata in F major, K. 547a.

This example highlights how expectations, harmony, tonality, and hypermeasure interact in music. Composers such as Mozart have utilized phrases for sequential modulations in the development section, thereby reducing the precision of the listener's belief in global tonality while still supporting the prediction of melodic progressions and shifts in tonality. This can lead to a scenario where, amid an uncertain global tonal context, listeners rely more heavily on the predictable repetition of a phrase to navigate the progression of the music.

Sequential modulations engage listeners in forming and adjusting dynamic expectations, which are short-lived expectations that evolve with the musical context, such as when repeated phrases lead listeners to expect similar future sequences (Vuust et al., 2022). This engagement requires the temporary retention of these phrases in short-term memory, potentially manifesting in subvocal rehearsal, such as covert humming. Although the idea of listeners engaging in covert humming of motifs remains speculative, the research by Burunat et al. (2014) into musical motifs has provided valuable insights. Their findings reveal that motif repetitions lead to alternations in hippocampal connectivity with the dorsolateral prefrontal cortex, supplementary motor area, and cerebellum. Notably, the supplementary motor area and cerebellum are involved in audiomotor processing and the generation of musical imagery (Gordon et al., 2018; Tsai et al., 2010; Zatorre et al., 2007).

It is noteworthy that melodic sequences, which involve repeating a motif/phrase at a higher or lower pitch within the same voice, are widely employed for the elaboration of melodies, such as in the exposition of the first or second themes within sonata form. Temperley (2014) observed that when an intervallic pattern defining the motif or phrase is repeated with one interval altered, the altered interval tends to be larger in the second occurrence than in the first. Moreover, the second occurrence of an intervallic pattern typically contains more chromaticism than the first. He explained this trend by noting that the repetition of a motif or phrase is typically highly predictable; occasionally, the information density or level of surprise resulting from such repetition may be lower than desired. To compensate for this, composers commonly inject elements of surprise, such as larger intervals or chromatic notes, into the repetitions. His view aligns with the balance principle discussed in the current study, which emphasizes the importance of juxtaposing unpredictability with predictability across various musical aspects to enhance listener engagement.

Limitations and Future Directions

The current study has four notable limitations that warrant future research. First, the detailed impact of sustained predictability on the processing of sequential modulations has not been fully explored. Hepokoski and Darcy (2006) observed that the development section frequently employs patterns such as circle-of-fifth progressions and tonal movements by seconds and thirds. An experienced listener may predict subsequent modulations following the initial ones. While the initial fulfillment of such predictions can be engaging, an overabundance of predictable sequential modulations may diminish the surprise element, potentially leading to a less captivating musical experience. This study has not examined the ideal number of sequential modulations needed to maintain an optimal balance of predictability and unpredictability, presenting a gap in our understanding of how predictability affects listener engagement. Second, Hepokoski and Darcy (2006) also highlighted that a development section might shift between strategies, such as from a descending circle of fifths to ascending sequences by whole steps via a chromatic bass line, creating opportunities for higher-order surprises. The present study has not addressed this shift between strategies, which represents another critical area for future research.

The third limitation of this study is its lack of assessment regarding the impact of familiarity with specific musical pieces on predictive processing. While listeners may form schematic expectations from their first encounter with a piece—drawing on both the regularities within the heard music and their musical knowledge (Vuust et al., 2022)—the full scope and impact of familiarity with specific musical pieces on schematic expectations are not completely determined. It is noteworthy that during the first listening experience, without the repetition of the exposition, listeners might find it challenging to develop veridical expectations for the recapitulation's transition based on the exposition's transition. Enhanced familiarity with particular compositions could bolster the development of veridical expectations and might even encourage covert humming along with the music, thereby deepening engagement and facilitating more effective predictive processing (Li & Tsai, 2022; Tsai, in press). Furthermore, repetitive exposure to rhythmic patterns can lead to the development of short-term rhythmic expectations that, over time, evolve into veridical expectations regarding a music piece's temporal structure (Vuust et al., 2022). Thus, familiarity with music may facilitate the integration of different types of expectations, representing a complex and worthy area for further investigation.

Fourth, while this study unveils a balancing principle where low predictability in certain musical aspects often coincides with high predictability in others, the precise mechanism through which this balance operates and its overall effectiveness in capturing and maintaining listener interest are not yet fully understood. The concept of pwPE provides an intriguing framework for examining how listeners react to musical passages that balance predictability and unpredictability. Music that incorporates surprises within an overall predictable context appears to be more engaging. This suggests that listeners are equipped to process these surprises. Future research should consider a broader application of the pwPE concept to investigate how the interplay between predictability and unpredictability across different musical aspects influences listener engagement.

Conclusion

This article endeavors to deepen our understanding of predictive processing of music by examining celebrated works in sonata form from the high Classical era, highlighting how each section in this form employs various techniques to navigate and manipulate musical uncertainty and surprise. In the introduction section, a gradual consolidation of the listener's perception of global tonality is observed. In the exposition, the modulation from the primary theme to secondary themes presents a harmonic surprise at a low level. This transition often gains energy through unexpected changes in rhythm, motif, dynamics, and texture. The development intensifies tonal uncertainty and tension via sequential modulations while maintaining melodic predictability. The retransition, marking the development's conclusion, leads back to and establishes the dominant harmony of the tonic key. Despite its highly predictable harmony, the retransition often incorporates unpredictability in diverse musical aspects such as melodic contour, modal color, texture, timbre, dynamics, and figuration. In the recapitulation, the transition serves as a reinterpretation of its counterpart in the exposition. Notably, the veridical expectation linked to the exposition's transition is not fully actualized in the recapitulation's transition. This creates a paradoxical effect: the resulting prediction error ultimately fosters listener satisfaction by affirming tonal resolution and certainty near the end of this movement. By considering the large formal structure in music, this study extends the predictive processing framework to include a broader range of musical aspects.

Footnotes

Action Editor

Markus Neuwirth, Anton Bruckner Privatuniversität für Musik, Schauspiel und Tanz, Institut für Theorie und Geschichte.

Peer Review

One anonymous reviewer Poundie Burstein, The City University of New York, Hunter College, Music.

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical Approval

This research did not require ethics committee or IRB approval. This research did not involve the use of personal data, fieldwork, or experiments involving human or animal participants, or work with children, vulnerable individuals, or clinical populations.

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science and Technology Council (grant number NSC 112-2410-H-002 -071).

ORCID iD

Chen-Gia Tsai

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

Aldwell

Schachter

Cadwallader

A. C.

(2003). Harmony & voice leading (3rd ed.). Schirmer/Cengage Learning.

Asano

Boeckx

Seifert

(2021). Hierarchical control as a shared neurocognitive mechanism for language and music. Cognition, 216, 104847. https://doi.org/10.1016/j.cognition.2021.104847

Asano

Brown

(2022). The neural basis of tonal processing in music: An ALE meta-analysis. Music & Science, 5, 20592043221109958. https://doi.org/10.1177/20592043221109958

Batt

(1988). Function and structure of transitions in sonata-form music of Mozart. Canadian University Music Review, 9, 157–201. https://doi.org/10.7202/1014927ar

Bianco

Novembre

Keller

P. E.

Kim

S. G.

Scharf

Friederici

A. D.

Villringer

Sammler

(2016). Neural networks for harmonic structure in music perception and action. Neuroimage, 142, 454–464. https://doi.org/10.1016/j.neuroimage.2016.08.025

Bunge

S. A.

Ochsner

K. N.

Desmond

J. E.

Glover

G. H.

Gabrieli

J. D.

(2001). Prefrontal regions involved in keeping information in and out of mind. Brain, 124(10), 2074–2086. https://doi.org/10.1093/brain/124.10.2074

Burunat

Alluri

Toiviainen

Numminen

Brattico

(2014). Dynamics of brain activity underlying working memory for music in a naturalistic condition. Cortex, 57, 254–269. https://doi.org/10.1016/j.cortex.2014.04.012

Caucheteux

Gramfort

King

J. R.

(2023). Evidence of a predictive coding hierarchy in the human brain listening to speech. Nature Human Behaviour, 7, 430–441. https://doi.org/10.1038/s41562-022-01516-2

Cheung

V. K. M.

Harrison

P. M. C.

Koelsch

Pearce

M. T.

Friederici

A. D.

Meyer

(2024). Cognitive and sensory expectations independently shape musical expectancy and pleasure. Philosophical Transactions of the Royal Society B, 379, 20220420. https://doi.org/10.1098/rstb.2022.0420

10.

Cheung

V. K. M.

Harrison

P. M. C.

Meyer

Pearce

M. T.

Haynes

J. D.

Koelsch

(2019). Uncertainty and surprise jointly predict musical pleasure and amygdala, hippocampus, and auditory cortex activity. Current Biology, 29(23), 4084–4092.e4. https://doi.org/10.1016/j.cub.2019.09.067

11.

Cone

E. T.

(1968). Musical form and musical performance (1st ed.). W. W. Norton.

12.

Crottaz-Herbette

Anagnoson

R. T.

Menon

(2004). Modality effects in verbal working memory: Differential prefrontal and parietal responses to auditory and visual stimuli. NeuroImage, 21(1), 340–351. https://doi.org/10.1016/j.neuroimage.2003.09.019

13.

Diwadkar

V. A.

Carpenter

P. A.

Just

M. A.

(2000). Collaborative activity between parietal and dorso-lateral prefrontal cortex in dynamic spatial working memory revealed by fMRI. NeuroImage, 12, 85–99. https://doi.org/10.1006/nimg.2000.0586

14.

Fitch

W. T.

Martins

M. D.

(2014). Hierarchical processing in music, language, and action: Lashley revisited. Annals of the New York Academy of Sciences, 1316(1), 87–104. https://doi.org/10.1111/nyas.12406

15.

Frascaroli

Leder

Brattico

Van de Cruys

(2024). Aesthetics and predictive processing: Grounds and prospects of a fruitful encounter. Philosophical Transactions of the Royal Society B, 379, 20220410. https://doi.org/10.1098/rstb.2022.0410

16.

Friston

(2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138. https://doi.org/10.1038/nrn2787

17.

Gordon

C. L.

Cobb

P. R.

Balasubramaniam

(2018). Recruitment of the motor system during music listening: An ALE meta-analysis of fMRI data. PLoS One, 13(11), e0207213. https://doi.org/10.1371/journal.pone.0207213

18.

Grewe

Nagel

Kopiez

Altenmuller

(2007). Emotions over time: Synchronicity and development of subjective, physiological, and facial affective reactions to music. Emotion, 7, 774–788. https://doi.org/10.1037/1528-3542.7.4.774

19.

Heard

Lee

Y. S.

(2020). Shared neural resources of rhythm and syntax: An ALE meta-analysis. Neuropsychologia, 137, 107284. https://doi.org/10.1016/j.neuropsychologia.2019.107284

20.

Hepokoski

Darcy

(2006). Elements of sonata theory: Norms, types, and deformations in the late-eighteenth-century sonata. Oxford University Press.

21.

Ivanovitch

(2011). Mozart’s art of retransition. Music Analysis, 30(1), 1–36. https://doi.org/10.1111/j.1468-2249.2011.00305.x

22.

Kim

S. G.

Kim

J. S.

Chung

C. K.

(2011). The effect of conditional probability of chord progression on brain response: An MEG study. PLoS One, 6(2), e17337. https://doi.org/10.1371/journal.pone.0017337

23.

Koelsch

Fritz

Schulze

Alsop

Schlaug

(2005). Adults and children processing music: An fMRI study. Neuroimage, 25(4), 1068–1076. https://doi.org/10.1016/j.neuroimage.2004.12.050

24.

Koelsch

Gunter

Schroger

Friederici

A. D.

(2003). Processing tonal modulations: An ERP study. Journal of Cognitive Neuroscience, 15(8), 1149–1159. https://doi.org/10.1162/089892903322598111

25.

Levitin

D. J.

Menon

(2003). Musical structure is processed in “language” areas of the brain: A possible role for brodmann area 47 in temporal coherence. NeuroImage, 20(4), 2142–2152. https://doi.org/10.1016/j.neuroimage.2003.08.016

26.

C. W.

Guo

F. Y.

Tsai

C. G.

(2021). Predictive processing, cognitive control, and tonality stability of music: An fMRI study of chromatic harmony. Brain and Cognition, 151, 105751. https://doi.org/10.1016/j.bandc.2021.105751

27.

C. W.

Tsai

C. G.

(2022). Attention control and audiomotor processes underlying anticipation of musical themes while listening to familiar sonata-form pieces. Brain Sciences, 12(2), 261. https://doi.org/10.3390/brainsci12020261

28.

C. W.

Tsai

C. G.

(2024). Motivated cognitive control during cued anticipation and receipt of unfamiliar musical themes: An fMRI study. Neuropsychologia, 194, 108778. https://doi.org/10.1016/j.neuropsychologia.2023.108778

29.

Lynn

(1966). Attention, arousal, and the orientation reaction. Pergamon Press.

30.

Musso

Weiller

Horn

Glauche

Umarova

Hennig

Schneider

Rijntjes

(2015). A single dual-stream framework for syntactic computations in music and language. Neuroimage, 117, 267–283. https://doi.org/10.1016/j.neuroimage.2015.05.020

31.

Newport

E. L.

Hauser

M. D.

Spaepen

Aslin

R. N.

(2004). Learning at a distance II. Statistical learning of non-adjacent dependencies in a non-human primate. Cognitive Psychology, 49(2), 85–117. https://doi.org/10.1016/j.cogpsych.2003.12.002

32.

(2012). Phrase rhythm as form in classical instrumental music. Music Theory Spectrum, 34(1), 51–77. https://doi.org/10.1525/mts.2012.34.1.51

33.

Pearce

M. T.

(2018). Statistical learning and probabilistic prediction in music cognition: Mechanisms of stylistic enculturation. Annals of the New York Academy of Sciences, 1423(1), 378–395. https://doi.org/10.1111/nyas.13654

34.

Pepperell

(2024). Being alive to the world: An artist’s perspective on predictive processing. Philosophical Transactions of the Royal Society B, 379, 20220429. https://doi.org/10.1098/rstb.2022.0429

35.

Rohrmeier

M. A.

Koelsch

(2012). Predictive information processing in music cognition. A critical review. International Journal of Psychophysiology, 83(2), 164–175. https://doi.org/10.1016/j.ijpsycho.2011.12.010

36.

Savage

P. E.

Fujii

(2022). Towards a cross-cultural framework for predictive coding of music. Nature Reviews Neuroscience, 23(10), 641. https://doi.org/10.1038/s41583-022-00622-4

37.

Shamgar

(1981). On locating the retransition in classic sonata form. Music Review, 42(2), 130–143.

38.

Smyth

D. H.

(1990). Large-scale rhythm and classical form. Music Theory Spectrum, 12(2), 236–246. https://doi.org/10.1525/mts.1990.12.2.02a00040

39.

Sonkusare

Breakspear

Guo

(2019). Naturalistic stimuli in neuroscience: Critically acclaimed. Trends in Cognitive Sciences, 23(8), 699–714. https://doi.org/10.1016/j.tics.2019.05.004

40.

Temperley

(2008). Hypermetrical transitions. Music Theory Spectrum, 30, 305–325. https://doi.org/10.1525/mts.2008.30.2.305

41.

Temperley

(2014). Information flow and repetition in music. Journal of Music Theory, 58, 155–178. https://doi.org/10.1215/00222909-2781759

42.

Tervaniemi

(2023). The neuroscience of music - towards ecological validity. Trends in Neurosciences, 46(5), 355–364. https://doi.org/10.1016/j.tins.2023.03.001

43.

Thaut

Trimarchi

Parsons

(2014). Human brain basis of musical rhythm perception: Common and distinct neural substrates for meter, tempo, and pattern. Brain Sciences, 4(2), 428–452. https://doi.org/10.3390/brainsci4020428

44.

Tillmann

(2012). Music and language perception: Expectations, structural integration, and cognitive sequencing. Topics in Cognitive Science, 4, 568–584. https://doi.org/10.1111/j.1756-8765.2012.01209.x

45.

Tillmann

Koelsch

Escoffier

Bigand

Lalitte

Friederici

A. D.

von Cramon

D. Y.

(2006). Cognitive priming in sung and instrumental music: Activation of inferior frontal cortex. Neuroimage, 31(4), 1771–1782. https://doi.org/10.1016/j.neuroimage.2006.02.028

46.

Tsai

C. G.

Chen

C. C.

Chou

T. L.

Chen

J. H.

(2010). Neural mechanisms involved in the oral representation of percussion music: An fMRI study. Brain and Cognition, 74, 123–131. https://doi.org/10.1016/j.bandc.2010.07.008

47.

Tsai

C. G.

(in press). Anticipating the main theme: A model for understanding prospective memory and reward learning in sonata-form listening. Musicae Scientiae. https://doi.org/10.1177/10298649231217121

48.

Tsai

C. G.

C. W.

(2019). Increased activation in the left ventrolateral prefrontal cortex and temporal pole during tonality change in music. Neuroscience Letters, 696, 162–167. https://doi.org/10.1016/j.neulet.2018.12.019

49.

Vuust

Dietz

M. J.

Witek

Kringelbach

M. L.

(2018). Now you hear it: A predictive coding model for understanding rhythmic incongruity. Annals of the New York Academy of Sciences, 1423(1), 19–29. https://doi.org/10.1111/nyas.13622

50.

Vuust

Heggli

O. A.

Friston

K. J.

Kringelbach

M. L.

(2022). Music in the brain. Nature Reviews Neuroscience, 23(5), 287–305. https://doi.org/10.1038/s41583-022-00578-5

51.

Vuust

Witek

M. A.

(2014). Rhythmic complexity and predictive coding: A novel approach to modeling rhythm and meter perception in music. Frontiers in Psychology, 5, 1111. https://doi.org/10.3389/fpsyg.2014.01111

52.

Webster

(2001). Retransition. In Sadie

Tyrrell

(Eds.), The new grove dictionary of music and musicians (2nd ed., Vol. 21, p. 230). Macmillan.

53.

Witek

M. A.

Clarke

E. F.

Wallentin

Kringelbach

M. L.

Vuust

(2014). Syncopation, body-movement and pleasure in groove music. PLoS One, 9(4), e94446. https://doi.org/10.1371/journal.pone.0094446

54.

Zatorre

R. J.

Chen

J. L.

Penhune

V. B.

(2007). When the brain plays music: Auditory-motor interactions in music perception and production. Nature Reviews Neuroscience, 8, 547–558. https://doi.org/10.1038/nrn2152

Predictive Processing within Music Form: Analysis of Uncertainty and Surprise in Different Sections of Sonata Form

Abstract

Keywords

Introduction

Predictive Coding, Musical Rhythm, and Tonality

Hierarchical and Nonhierarchical Processing in Musical Prediction

Two Forms of Transition in Sonata Form

Predictive Processing during Retransition

Sequential Modulation in Development Section

Limitations and Future Directions

Conclusion

Footnotes

Action Editor

Peer Review

Declaration of Conflicting Interests

Ethical Approval

Funding

ORCID iD

Data Availability Statement

References