Abstract
This study aims to create a dialogue between music psychology and music-theoretical discourse on the perception of musical form. In a lab and online study, we investigated how listeners segment Western classical music in real time, focusing on (1) inter-rater agreement, (2) alignment with music-theoretical analyses of sonata form, (3) subjective qualities of segment boundaries (beginnings, endings, importance ratings), and (4) the effects of repeated listening. Participants listened to two stylistically contrasting orchestral movements—one classicist, early romantic and one romantic work—and marked perceived segment boundaries during two exposures. In line with previous research, results reveal consistent above-chance agreement among listeners, particularly at structurally prominent moments with clear surface cues. While most boundaries aligned with music-theoretical segmentations, others did not, especially when formal boundaries were not reinforced by salient acoustic features. Listeners frequently identified boundaries based on perceived beginnings, but higher agreement was associated with strong endings and higher importance ratings. Exploratory categorization of boundary types revealed a tendency to emphasize transitional moments, suggesting a processual, forward-directed mode of engagement. Repeated listening reduced the number of segmentation responses, but its effect on agreement varied by piece, suggesting that structural clarity and ambiguity modulate how listeners integrate musical form. Findings underscore the importance of both surface salience and formal function in shaping real-time segmentation. We argue for a listener-centered approach to musical form that integrates perceptual and analytical perspectives, and we propose that perceptually salient yet analytically unmarked moments merit greater attention in theories of musical form.
Introduction
How do listeners follow a piece of music in real time? And how do they segment and integrate a continuous auditory stream into coherent units such as phrases, sections, or entire formal architectures? This longstanding question lies at the intersection of Western music theory and music perception. While music theorists have long considered segmentation as a cornerstone to musical understanding (Koch, 1802/1969; Riemann, 1897, p. 233), the extent to which analytically defined structures align with listeners’ real-time perceptual experiences remains empirically underexplored. Traditions of Formenlehre, especially in the work of Dahlhaus (1978), Caplin (1998), and Hepokoski and Darcy (2006), emphasize top-down analysis based on tonal trajectories, thematic functions, and their stylistic conventions. Yet the psychological plausibility of such structures—the degree to which they are perceptually salient—has received comparatively little attention in music theory. In this study, we address this gap by comparing music-theoretical analysis of two sonata-form movements with listeners’ spontaneous segmentation behavior in a real-time task.
Psychological research has questioned whether global structures in music are perceptually relevant in real-time listening. Some suggest that listeners may not consciously perceive or prioritize large-scale formal structures at all (Levinson, 1997; Tillmann & Bigand, 2004). Listeners frequently overlook key structural features such as tonal closure and modulation (Cook, 1987; Marvin & Brinkman, 1999), or the overall order of sections (Gotlieb & Konečni, 1985; Eitan & Granot, 2008; Karno & Konečni, 1992; Konečni, 1984; Tillmann & Bigand, 1996), implying that the global organization of events may not shape their aesthetic experience. However, as McDonald and Wöllner (2022) argue, listeners’ willingness to accept altered versions of a piece as aesthetically coherent does not necessarily indicate insensitivity to formal disruption. Their findings call into question the methodology of earlier studies, many of which relied on scrambled or artificially manipulated stimuli. Segmentation studies, by contrast, preserve the musical structure and thus offer a more ecological approach to investigating how listeners perceive form. We note that our design does not assess aesthetic or emotional responses; participants’ judgments concern the cognitive perception of structural boundaries rather than liking, preference, or affective experience.
Against this backdrop, segmentation studies have shown that listeners are sensitive to perceptual boundaries in real-time musical experience (Clarke & Krumhansl, 1990; Deliège & El Ahnmadi, 1990; Hartmann, 2016). Such boundaries are often cued by salient acoustic features—pauses, changes in texture, articulation shifts, or dynamic contrasts—that signal a transition. In some cases, listeners also exhibit anticipatory awareness of upcoming formal changes (Peebles, 2011), suggesting that segmentation is not merely reactive but guided by expectations as well.
This study aims to bridge a longstanding gap between theoretical models of musical form and empirical research on how listeners perceive it. Drawing on insights from music psychology and cognitive music theory, we investigate which formal boundaries in complete classical works are marked by participants. We compare this spontaneous real-time segmentation with music-analytical models grounded in sonata-form conventions (Caplin, 1998; Hepokoski & Darcy, 2006), allowing us to explore the relationship between perceptual salience and formal structure in an ecologically valid context.
Music Analysis and Psychological Approaches to Segmentation
Although both music theory and music psychology consider segmentation as the identification of boundaries between discrete musical units, they rely on different criteria and methods. Music analysis typically approaches segmentation through style-specific conventions based on the score, focusing on discontinuities of musical parameters (Hasty, 1981; Lefkowitz & Taavola, 2000). These may coincide with salient surface cues, such as changes of timbre, rhythm, or pitch, but are not defined by them alone; there is also a contextual or even purely structural domain (Hanninen, 2001, 2012). In the context of sonata form, contextual and structural principles such as cadential articulation, thematic and tonal development, or rhetorics play a decisive role in delineating formal sections (Caplin, 1998; Hepokoski & Darcy, 2006). Formenlehre traditionally emphasizes hierarchical integration across an entire piece (Caplin, 2010; Riemann, 1888), and the interest lies rather in the relations between annotated segments and a top-down hierarchical integration than in the real-time perceptibility of these sections.
In contrast, psychological research approaches segmentation as a real-time perceptual process shaped by attention, memory constraints, and surface-level cues. Drawing on Gestalt theory (Bregman, 1990; Deutsch, 2013), cognitive models such as the Generative Theory of Tonal Music (GTTM; Lerdahl & Jackendoff, 1983) or Event Segmentation Theory (Zacks et al., 2007) propose that listeners parse a flow of information by detecting salient changes—in the case of music, shifts in register, articulation, texture, or rests. Hypotheses of GTTM have been supported by empirical studies showing that surface-level cues like pauses significantly influence perceived segmentation (Deliège, 1987; Frankland & Cohen, 2004). Related computational modeling efforts (Hutchison et al., 2015; Pearce et al., 2010; Temperley, 2004) have replicated similar findings, though these studies have so far focused primarily on monophonic music, limiting their generalizability to richer textures.
Only a few studies have explicitly compared listener segmentation with formal music analysis, in most cases as a reference for comparison of the listener responses. For example, Deliège and El Ahnmadi (1990) found that listeners’ segmentation of a contemporary classical piece closely matched analyses by two composers (see also Deliège, 1989). Pauses emerged as primary segmentation cues but were effective only when accompanied by clear contrasts on either side of the boundary. Similarly, Clarke and Krumhansl (1990) investigated expert musicians’ segmentation of a contemporary piano piece. After an initial familiarization, participants marked boundaries during a second listening and refined them with the score on a third. They rated each boundary's strength, difficulty, and defining features. The group showed consensus on boundary locations, which aligned both with GTTM's grouping hypotheses and the piece's compositional structure. Krumhansl (1996) confirmed these findings in a study on Mozart's Piano Sonata KV 282, a miniature sonata form. Participants reliably marked major sections, while agreement was lower for minor divisions. The experiment included trials for identifying major section endings, continuous tension rating, and marking new musical ideas. Notably, beginnings were more frequently indicated than endings. Krumhansl referenced two independent music-theoretical analyses of the piece: a hierarchical model by Narmour (1996) and a schema-driven approach by Gjerdingen (1996). While she emphasized similarities between these analyses and listener responses, differences were less explored—for example, Gjerdingen's labels for new ideas were considerably more detailed than participants’ markings. However, Krumhansl's design served as an important inspiration for our study, even though she did not statistically evaluate listener agreement.
More recent research has extended segmentation studies beyond scores from the classical repertoire to include popular music (Bruderer, 2008), live performances of classical works (Phillips et al., 2020), or cross-cultural comparisons (Popescu et al., 2021) and have shed new light on methodology and the role of expertise (Hartmann et al., 2016). Consistent with earlier findings, these studies report high listener agreement on boundary locations. Notably, Western listeners demonstrated strong consensus in segmenting Indian ragas and could differentiate between hierarchical boundary levels within this non-Western musical tradition (Popescu et al., 2021), suggesting that perceptual segmentation may transcend cultural boundaries while respecting hierarchical organization.
Despite converging findings on boundary agreement, most studies struggle to disentangle surface salience from formal structural importance. Because acoustically prominent moments often coincide with structurally significant boundaries (in the case of Bruderer, 2008; Deliège & El Ahnmadi, 1990; Krumhansl, 1996; Phillips et al., 2020; Popescu et al., 2021), interpretations remain confounded from a music-theoretical perspective. This persistent overlap highlights the need for a more nuanced understanding of the relationship between perception and analysis of musical form, and how these perceptions may be influenced by both surface-level cues and deeper structural features. In our study, we therefore used two types of stimuli: one piece with very clear surface features and one with ambiguous structural boundaries.
Performance Practice, Music Analysis, and Listener Perception
The distinction between clear and ambiguous boundaries in a musical piece only becomes apparent through the performance of a score. Thus, recent performance research has challenged the assumption that musical structure is fully determined by the score, proposing instead that form and meaning emerge through performative realization and embodied engagement (Cook, 2013; Doğantan-Dack, 2015; Leech-Wilkinson, 2012). This emphasis has implications for understanding listener-based segmentation: Boundaries may also arise from expressive and temporal nuances of performance. To address the possible influence of performance features on listeners’ segmentation behavior, we include a qualitative comparative analysis of the two recordings, outlining expressive and interpretive characteristics (see Supplement 4). This analysis situates the stimuli within a performance-research perspective, highlights interpretive nuances that may contribute to perceptual salience, and finally underscores the decisions made in our music-theoretical analysis.
Challenges in Experimental Design
Differences in the instruction of real-time listening tasks and operational definitions of “segment” complicate cross-study comparisons. Some studies ask participants to mark formal sections (Krumhansl, 1996; Phillips et al., 2020), while others solicit “instants of significant change” (Hartmann et al., 2016), “new ideas” (Krumhansl, 1996; Lalitte & Bigand, 2006), “phrase ends” (Peebles, 2011; Popescu et al., 2021), or “phrase transitions” (Kreutz, 1995). These variations affect both the timing and type of responses. Moreover, few studies distinguish between endings and beginnings of segments or allow for the marking of ambiguous transitions (though see Krumhansl, 1996). This is not only crucial in the context of music analysis: The distinction between anticipation before and consolidation during and after a boundary is also evident in the activation of different brain networks (Burunat et al., 2024).
Familiarity and expertise introduce further complexity in understanding segmentation. While increasing exposure to a piece or style tends to improve segmentation accuracy (Fredrickson, 2000; Peebles, 2011; Phillips et al., 2020), the influence of formal musical training is less straightforward in behavioral tasks. For instance, Deliège (1989) observed that non-musicians sometimes identify more boundaries than experts, possibly focusing on local changes rather than overarching form. However, Hartmann and his team (2016) reported no significant difference between musicians and non-musicians. Electroencephalogram (EEG) studies suggest that the brain registers musical closures even when listeners are not actively attending to phrase structure (Knösche et al., 2005; Neuhaus et al., 2006), supporting the idea of an unconscious or intuitive component to segmentation. Since we were interested in a general approach to music segmentation, we choose participants who were familiar with classical music but who had no formal musical training.
Research Aims and Hypotheses
To serve both our empirical and music-theoretical purpose, we designed a two-phase experiment combining real-time segmentation with an integrated rating task. Our stimuli were two sonata-form movements differing in formal clarity: an early romantic piece (Piece 1) with clearly articulated segments and a romantic piece (Piece 2) with less distinct divisions. Both pieces follow a sonata form. Participants without formal musical training marked perceived “sections” during a first listening, with no technical definition of “section” provided—encouraging intuitive, surface- or structure-driven responses. During a second listening, they marked and then rated each moment on (a) expectancy of something new, (b) sense of an ending, and (c) perceived importance. We decided to include questions on expectation and endings because they play a crucial role in constructing meaning in music theory (Meyer, 1956; Narmour, 1992) and we hoped to gain some qualitative insight into listeners’ decisions.
Our goal was not only to identify where listeners perceive boundaries but also to understand why, and how these perceptions correspond to formal music-analytical segmentation. Theoretical segmentations were coded at two hierarchical levels (see also Popescu et al., 2021): Level 1 for major formal divisions (e.g., exposition vs. development) and Level 2 for sub-sectional phrases. Our research questions and hypotheses were:
A fifth, more qualitative research focus concerns potential divergences between empirical findings and theoretical models. Specifically, do participant ratings mirror hierarchical structural levels? How do empirical results challenge current Formenlehre assumptions? Might these findings prompt revisions to formal theoretical frameworks? These questions will be explored in the discussion and future research section.
By integrating music-theoretical analysis with empirical perception data, this study aims to offer a novel, bidirectional contribution: using music analysis to contextualize listener behavior, and listener data to critically evaluate music-analytical assumptions (Cross, 1998). We propose that segmentation is neither solely surface-driven nor purely theoretical, but emerges from the dynamic interplay of style familiarity, auditory salience, and structural expectations.
Method
Participants
Two studies were conducted to investigate listener-based musical segmentation: a laboratory study incorporating repeated trials, subjective ratings, and questionnaire data, and a complementary, shorter online study designed to boost sample size for inter-rater reliability analyses. All experimental procedures were ethically approved by the Ethics Council of the Max Planck Society (Nr. 2017_12) and were undertaken with the written informed consent of each participant.
Lab Study
Forty-three participants (30 self-identified female, 13 male) were recruited via the participant database of the Max Planck Institute for Empirical Aesthetics. Ages ranged from 20 to 35 years (M = 27.5, SD = 4.2). All reported normal hearing. Participants completed the German version of the Goldsmiths Musical Sophistication Index (Gold-MSI; Müllensiefen et al., 2014; Schaal et al., 2014) and the Barcelona Music Reward Questionnaire (BMRQ; Mas-Herrero et al., 2013). The mean Gold-MSI general sophistication score was 62.65 (SD = 13.56; range = 34–90), corresponding to the 40th percentile of the German norm sample. Mean scores for the Gold-MSI Perception subscale and the BMRQ were 44.44 (SD = 8.17; range = 26–62) and 83.60 (SD = 10.59; range = 56–105), respectively.
Online Study
Ninety-seven participants (53 self-reported female, 41 male, 3 unspecified) completed a shortened online version of the task, which included only one of the pieces, one trial of segmentation, and the Gold-MSI general. After excluding 9 participants due to missing data, the final sample consisted of 88 individuals. Participants were recruited via social media outreach through the Max Planck Institute for Empirical Aesthetics and a local classical radio station. The sample's age ranged from 19 to 90 years (M = 43.2, SD = 17.3), and the average Gold-MSI general score was 73.8 (SD = 18.0), consistent with population norms (Schaal et al., 2014).
Material
Two lesser-known romantic orchestral works were selected to investigate segmentation behavior under varying structural clarity: Piece 1 (Farrenc, 1849), the fourth movement of Louise Farrenc's Nonet in E-flat major op. 38 (clear structural boundaries); and Piece 2 (Paine, 1908), the first movement of John Knowles Paine's Symphony No.1 in C minor, up to the recapitulation (less clear boundaries). Both works exhibit sonata-form structure but differ in their degree of formal clarity. Piece 1 features conventional sectional boundaries (e.g., pauses, texture shifts, dynamic contrasts), whereas Piece 2 includes more ambiguous transitions (e.g., overlapping phrases, modulations, blurred cadences). To minimize familiarity effects while maintaining stylistic coherence, we selected works by underrepresented but idiomatically typical composers of the common practice era. The stimuli Piece 1 (Mehta & New York Philharmonic, 1989) and Piece 2 (Consortium Classicum, 2010) were presented as .wav files (44.1 kHz, 24-bit) always consecutively, with amplitude normalization and fade-outs (2 s for Piece 2) applied using Audacity.
Music-Theoretical Annotation
A detailed formal analysis of both pieces was performed by the first author, drawing on Caplin's theory of formal functions (1998) and sonata theory (Hepokoski & Darcy, 2006). Each piece was annotated for expected segmentation points, labeling phrases, subsections, and sections, combined with a tonal analysis of the main modulations and cadences (Supplement 2). To distinguish between the importance of boundaries, they were categorized in two hierarchical levels: the beginning of complete sections, for example, the beginning of the development (Level 1), and subsections, for example, a half cadence at the end of the primary theme (Level 2). Events were additionally tagged as “beginning,” “ending,” or “ambiguous” depending on their formal character. Sonic Visualizer (Cannam et al., 2010) was used to align the score with the audio files. We reduced the resulting time series to beats that contained section boundaries and labeled them with measure numbers. This resulted in two time series of different event densities, a coarse-grained Level 1 and a fine-grained Level 2, which we used as reference points in the quantitative analysis. Any Level 2 boundary is also considered a Level 1 boundary, which means that this segmentation is hierarchical. In addition to the music-theoretical perspective, we provide a performance analysis of the recordings in Supplement 4, focusing on the parameters of tempo, phrasing, articulation, dynamics, and timbre.
Design and Procedure
Lab Study
Participants completed two segmentation trials per stimulus in a sound-attenuated room, using Beyerdynamic DT 770 Pro headphones. The experimental scenario was implemented using a custom user interface (Muralikrishnan, 2019) programmed in Presentation software (Neurobehavioral Systems, 2020). In each trial, participants listened to the full piece and marked perceived segment boundaries by pressing a designated key. A 100-ms silence preceded playback onset. In the second trial, additional rating features were enabled: When a mark was placed, the playback stopped, and participants could either cancel, replay the preceding 10 s, or confirm the boundary and respond to three Likert-scale questions (1–6): a) How strong is the impression of an end? b) How strong is the expectation for something new? c) How important is that moment in the overall context? (see also Schmuckler, 1989). After each piece, participants rated liking and task difficulty on similar six-point scales. Following both pieces, participants completed the Gold-MSI general and perception subscales (Müllensiefen et al., 2014; Schaal et al., 2014) and the BMRQ (Mas-Herrero et al., 2013) in German. The session lasted one hour; compensation was 14€.
Online Study
The online study replicated the first trial of the lab task but used only one piece (randomly assigned). No second trial or rating questions were included. Participants provided demographic data and completed the Gold-MSI general subscale. No compensation was provided.
Analysis
Segmentation Agreement Measures
To assess the agreement between participants and their alignment with analytical boundaries, we employed three complementary methods: Gaussification (see also Bruderer, 2008; Hartmann et al., 2016), dynamic time warping (DTW), and signal detection theory with F1 scores (see also Popescu, 2021). We chose these measures because they offer a comprehensive assessment of segmentation agreement, considering the challenge of a possibly lagged response time, which is impossible to control for.
Results
We first report the time marker data, all the participants’ segmentation marks, and general observations on the inter-rater agreement with Gaussification, DTW, and F1 scores. In the order of our research questions, we then show how similar the participants’ responses were (1—Inter-rater agreement) and continue with the comparison of music-theoretical boundaries (2—Relation with music analysis). We report which peaks of the inter-rater agreement are similar to the music-theoretical perspective, and which peaks differ. We then present the results of the participants’ ratings of beginning, ending, and importance, and interpret these in relation to the music analysis (3—Subjective ratings). Finally, we check if results differ between first and second listening (4—Effects of trial and familiarity). An interactive visualization of the segmentation data and analytical annotation is available online (Data Availability or Supplement 1, Figure 4).
General Observations
Participants provided time-based segmentation responses across two pieces in both lab (two trials) and online (one trial) settings. An overview of the time marker data is displayed in Table 1. Inter-segment intervals (ISI) from both participant responses and theoretical analyses followed a log-normal distribution (Supplement 1, Figure 1). A linear mixed-effects model (Supplement 1, Table 1), with participants as random effects and interactions of piece and trial and of piece and experiment type as fixed effects, revealed that ISIs were significantly longer for Piece 2 and increased slightly in the second trial, suggesting fewer boundary markings with repeated exposure. Online participants had a tendency for slightly shorter ISIs than lab participants.
Overview of time marker data.
Note. ISI = Intersegment interval in seconds.
R1: Inter-rater Agreement
We used Gaussification with a 2-s bandwidth to assess inter-rater agreement. Figure 1 shows the resulting peak structures for both pieces. Visually, there are several clear peaks in each piece, while plenty of small peaks characterize the bottom.
Comparison with 50 random simulations (Table 2), generated from log-normal ISI distributions, confirmed that observed agreement significantly exceeded chance levels (rated: .116–.169, simulated: .025–.041 agreement rate). Piece 1, which features clearer structural cues, elicited higher agreement than Piece 2. As hypothesized and in line with previous studies, listeners choose similar moments as segment boundaries. The agreement for Level 1 boundaries is higher, and clearer surface cues (as in Piece 1) also improve inter-rater agreement.
R2: Relation with Music Analysis
Using F1 scores and normalized DTW, we compared participant segmentations (aggregated via Gaussification) with music-theoretical boundaries at two levels of granularity across different bandwidths (0.5 s–4 s). For each bandwidth and level, we selected all peaks exceeding the mean to construct summary segmentations, for which F1 scores and normalized DTW distance with the theoretical predictions can be determined. Additionally, we calculated the same values for N = 25 random segmentations based on the summary segmentation.

Participants’ responses of segments. Gaussification (red) and individual segmentation marks (gray) for Piece 1 (top) and Piece 2 (bottom). Note. the Gaussification depicts a continuous probability density function; high peaks represent high segmentation agreement.
Inter-rater agreement of trials vs. random agreement.
Note. Trial 1 includes both lab and online study. For random agreement, a simulation of N = 50 runs with a bandwidth of 2 s was used. Agreement scale lies between 0 (none) and 1 (absolute).
The results (Figure 2) showed significantly higher alignment with theoretical boundaries than chance, particularly at the major structural boundaries of Level 1. While F1 scores indicated above-chance alignment even at Level 2, DTW distances for Level 2 showed no significant improvement over random segmentations. Unsurprisingly, agreement improved with wider bandwidths, reflecting increased temporal tolerance.

Comparison of F1 scores and DTW normalized scores against theoretical predictions. Note. F1 Scores (left) and DTW normalized scores (right) for both Piece 1 (Farrenc) and Piece 2 (Paine). Summary segmentation in red; random segmentation in blue. Theoretical predictions are based on Level 1 and Level 2 of the music analysis.
Agreement and Difference Between Music-Theoretical Boundaries and Participants’ Responses
Figure 3 shows the Gaussification of all participant data against the backdrop of sonata form. Peaks for both pieces that reflect the agreement of at least a third of the participants are listed in Table 3 and marked in the score (Supplement 3).

Participants’ responses of segments with annotated sonata form labels. Note. the figure displays all main sections of the sonata form, annotated as Level 1: Introduction (Intro), Primary theme (P), Transition (Tr), Secondary theme (S), Development (Dev), Re-transition (RTr), Coda (C), Medial caesura (MC), Essential expositional closure (EEC), Essential structural closure (ESC).
Peaks with highest inter-rater agreement with brief analytical description.
Note. All peaks are indicated in the score (Supplement 3); the given bar is the one preceding the peak. Beginning and End ratings from −6 (beginning) to 6 are summarized; importance ratings from 1–6. Perfect authentic cadence (PAC), Essential expositional closure (EEC), Half cadence (HC), Medial caesura (MC), Primary theme (P), Secondary theme (S), Transition (T). For boundary type, see Figure 4.
At the very beginning of Piece 1, we observed a pattern where many participants marked their segments barwise. Since we did not include a practice trial, participants were obviously getting used to the task, and we therefore excluded these four peaks (.36–.47 relative peak height) from the analysis.
In Piece 1, the top five peaks all correspond with Level 1 boundaries. In contrast, Piece 2 showed more divergence: Only three of the top five peaks align with Level 1, while others coincided with prominent surface features (e.g., timbral or dynamic changes) lacking formal structural functions.
Notably, some Level 1 boundaries received little attention from participants—only 6 of 13 (Piece 1) and 6 of 11 (Piece 2) were marked with considerable agreement. Conversely, participants also marked two non-labeled moments, such as a rhetorical pause (Piece 1, m. 245) and a dynamic development toward a cadence (Piece 2, m. 269). In both cases, surface features (crescendo, rhetorical increase) and their resulting structural expectations lead to a kind of “deceptive cadence”: The misleading listening impression in that moment overrides music-theoretical criteria (see Supp. 4, Fig. 2, for a qualitative analysis of performance parameters).
While the development and recapitulation boundaries were reliably marked in both pieces, secondary themes were not directly identified as segment boundaries—challenging assumptions of sonata form perception. Instead, listeners were sensitive to the respective transitions toward the secondary theme. In Piece 1, listener agreement highlights the beginning of the transition section in the exposition (peak no. 1.2 in Table 3), and the end of the transition in the recapitulation (1.3). In Piece 2, the focus lies on the structural cadence (medial caesura, MC) leading into the secondary theme (2.2). Thus, the onset of the secondary theme itself is not as crucial for listener agreement as the transition toward it. Furthermore, the context of the piece itself influences segmentation behavior at similar moments in Piece 1, suggesting a changing understanding of secondary themes in the exposition and development.
This confirms our hypothesis, that not only major section boundaries should correspond with participant markings, but also moments lacking music-theoretical importance. These moments appear with obvious changes in musical parameters, but they nevertheless suggest a structural understanding by revealing certain expectations within a framework (e.g., cadences). Most importantly, the linear sections of sonata theory are weighted very differently in the listening behavior than in music analysis and therefore seem incomplete in places (e.g., missing secondary theme). Each of the two pieces reveals a different listening strategy, highlighting the role of transitions as indirect formal markers.
R3: Subjective Ratings: Beginnings Rated Higher, but Endings More Consistently Identified
In the second trial of the lab experiment, participants rated each respective boundary on three scales (1–6): expectation of something new (beginning), sense of closure (ending), and importance. In this paragraph, we simply refer to these subjective ratings as judgements of beginning and ending.
Beginnings were rated slightly higher (Piece 1: M = 4.43, SD = 0.48; Piece 2: M = 4.73, SD = 0,51) than endings (Piece 1: M = 3.88, SD = 0.41; Piece 2: M = 4.26, SD = 0.78), with a significant difference in both cases (p < .05). There was no overall difference in importance ratings between pieces (t(29) = 1.93, p = .063), though Piece 2 received slightly higher subjective ratings in general. A linear mixed model for the ratings based on type and piece interaction as fixed effects and participants as random effects confirmed these findings: Beginnings received higher ratings than ending and importance ratings (Supplement 1, Table 2). However, no significant interaction was found.
Correlational analysis (Table 4) showed strong relationships between beginning and ending ratings (r = .60), implying that participants often perceived boundaries as events with both closure and initiation. In Piece 2, importance strongly correlated with ending ratings (r = .82); in Piece 1, importance aligned more with inter-rater agreement than with subjective ratings.
Significant correlation of raters’ boundary assessments.
Note. We gathered the Gaussification peaks with a 2-s bandwidth and selected all boundary markers by participants within a 2-s window around the peak position. Next, we aggregated all beginning, ending, and importance ratings for each peak as well as the relative peak height. We correlated the three assessments with each other and with the relative peak height, for both pieces together, and for each piece alone. For simplicity, only the significant correlations are shown.
A linear mixed-effects model confirmed that ending and importance ratings significantly predicted agreement in terms of relative peak height (Table 5), whereas beginning ratings did not. Even though participants often rate beginnings highly, they don’t lead to consistent agreement across listeners. This suggests that perceived endings were more consistently shared among listeners, offering nuanced support for our hypothesis that endings play a major role for the listeners.
Linear mixed model for relative peak height of both pieces (as a proxy for inter-rater agreement).
Note. We modelled relative peak height with beginning, ending, and importance ratings, using participants and piece as random effects (intercepts only, as models with slope were singular and did not improve model fit).
Music Analysis of Different Boundary Types and Their Subjective Ratings
To obtain a single continuous variable, we combined beginning and ending ratings into one single scale from −6 to 6, with 0 expressing similar beginning and ending qualities. Negative numbers express beginnings (or a high expectancy of something new), and positive numbers endings (a low expectancy of something new). Mirroring the higher means of beginnings, peaks with the highest inter-rater agreement (Table 4) also show higher beginning than ending ratings.
To quantify the inter-rater agreement, we had determined a time window of 2 s around the peak maximum as a suitable bin size. In cases where a phrase end coincides with a new beginning, the high correlation between the subjective ratings of beginning and end depict the metrical ambiguity of the musical surface. However, the music analysis includes many instances where two clear segment boundaries appear within the chosen bin size of the time window, for example, when the cadential end of a subsection is followed by a short break and a new beginning within 2 s. Therefore, we looked at the subjective ratings of the previously discussed peaks (Table 3) with a more fine-grained bin size of 0.5 s The higher resolution should reveal an accurate picture of whether and how participants distinguished between close segment boundaries. One should keep in mind that we did not ask the participants if they heard a clear beginning or ending (forced choice) but rather how strongly they rated their expectation for something new to come or something to close (single item). This way, each marked event can be seen as a continuum between these two possibilities, leaving room for an ecological, ambivalent description of segment boundaries and connecting to methodologies of expectancy ratings (e.g., Schmuckler, 1989). Participants gave subjective ratings only in the second trial of the lab study. However, we included all trials in our analysis, because the distribution was similar between the trials and between the online and lab experiments.

Boundary types defined on subjective ratings. Note. Symbolic representation of Type A (Beginnings after clear endings), Type B (Suspended beginnings), and Type C (Clear endings). Symbolic Gaussification (red) and music theoretical analysis (blue).
Taken together, listeners strongly focused on processual tendencies of the music: Even after clear endings from a music analytical perspective, participants seemed to focus on what was to follow and rated their expectancy of something new higher than the previous ending. However, if participants did agree that a segmentation served as an ending, the importance rating was stronger than for the beginning ratings. Or hypothesis that endings play an important role was confirmed, but we need to carefully distinguish between the individual high ratings for beginnings and the fact that forward-driven listening seems to be as important as the feeling of closure.
R4: Effect of Repeated Exposure Shows Fewer Segmentations but Differences in Agreement Rates
We hypothesized that the repeated exposure in the second trial would result in fewer boundaries, reflecting a growing structural awareness through familiarity. This holds true for both pieces (see Table 1). However, the agreement varied: in Piece 1, agreement decreased in the second trial, while in Piece 2 it increased (Table 1). This could be due to the difference of the stimuli. Piece 2 was more ambiguous and difficult to segment, and a second listening could indeed help to clear some of the ambiguities and result in a decrease of boundaries, summarizing and correcting experiences of the first trial. Piece 1 had more and clearer boundaries than Piece 2, and therefore the second trial in Piece 1 offered more possibilities for segmentation than Piece 2. Interestingly, even though segments did increase over trials as hypothesized, it was not possible to link it to a growing structural awareness through familiarity.
Discussion
This study examined how listeners segment classical music, focusing on (1) inter-rater agreement, (2) alignment with sonata form boundaries, (3) subjective boundary perceptions, and (4) the effects of repeated listening. Using two stylistically contrasting pieces, we found robust agreement among listeners, partial correspondence with theoretical structures, and distinct roles for perceived beginnings and endings in shaping listeners’ structural understanding. These findings underscore the gap between perceptual and analytical models of form and suggest the potential of perceptual data to inform theoretical discourse. We also consider limitations and outline future directions.
Inter-rater Agreement and Relation to Music Theory
Listeners showed above-chance agreement, particularly at salient formal boundaries, consistent with prior work emphasizing perceptual salience in segmentation (Addessi & Caterina, 2000; Clarke & Krumhansl, 1990; Deliège, 2001; Hartmann et al., 2016; Popescu et al., 2021). Agreement was stronger for Piece 1, which featured clear cadences and dynamic shifts. The Gaussified response peaks (Figure 1) further confirmed shared perceptual structuring, especially at the boundaries of main sections (Level 1). By contrast, the more ambiguous structure of Piece 2 yielded lower agreement, suggesting that formal clarity enhances listener synchronization.
The relationship between listener responses and music-theoretical boundaries reveals a complex interplay. While major structural boundaries—such as exposition-to-development transitions—were reliably identified, secondary themes and less salient boundaries were often missed. For Piece 1, boundaries marking the start of the development (1.1 in Table 3) and recapitulation (1.5) ranked highly, whereas analogous points in Piece 2 were recognized but with less pronounced peaks, reflecting the less distinct sectional divisions typical of Romantic style.
Rotation Principle and Secondary Theme in Sonata Form Theory as Purely Structural Categories?
A defining feature of sonata form is the recapitulation or double return, where the exposition is repeated, typically transposed to the home key. However, participants did not consistently mark formally analogous points. For example, the repetition of the primary theme in the exposition (boundary 1.9 in m. 27, see Table 3) showed high segmentation likelihood, while its recapitulation (m. 169) did not—challenging the perceptual salience of formal “rotations” (Hepokoski & Darcy, 2006). Listeners also highlighted boundaries not reflected in the analysis—for example, dynamic or timbral shifts like boundaries 1.7 or 2.6—emphasizing the weight of surface features in real-time perception.
Such findings suggest that rhetorical salience often overrides formal equivalence. Prior studies may have obscured this by using stimuli where surface and structural changes aligned (Phillips, 2020; Popescu et al., 2021) or by incorporating lyrics (Bruderer, 2008). Our results, by contrast, point to a multidimensional segmentation process shaped by tension and expectation (Huron, 2006).
From a theoretical standpoint, particularly within the Western sonata tradition, these findings raise the question of why some structurally important moments lack salient surface cues. For instance, secondary themes may be intentionally disguised by the absence of clear breaks (Richards, 2013). Hepokoski and Darcy's (2006) sonata theory assumes that each sonata follows a normative sequence of actions that may be fulfilled or denied, allowing for blurred or disguised boundaries. While analysts can infer such boundaries from the score, our real-time listening data suggest that “denied” boundaries have limited perceptual salience. In contrast, acoustically prominent moments—whether formally significant or not—strongly influenced listener segmentation and thus could inform music analysis.
Beginnings and Endings as Constituents of Musical Form
Subjective ratings on beginning, ending, and importance qualities revealed that while beginnings were more frequently marked, endings and importance ratings were more predictive of inter-rater agreement. This asymmetry aligns with studies on musical closure (Sears et al., 2018) and temporal expectancy (Jones & Boltz, 1989), suggesting that closure provides a stronger shared cue than initiation. Beyond perceptual salience, moments of closure may also hold deeper experiential significance: As Doğantan-Dack (2013) argues, the sense of completion and release at musical endings resonates with the embodied experience of temporal flow and affective resolution. In this broader view, closure is not merely a structural cue but a locus of musical meaning.
In our data, endings and importance ratings were especially correlated in Piece 2, suggesting that listeners assign greater significance to closure than to initiation—despite attending to both. While endings served as a powerful reference point in listeners’ mental representation of musical structure, the frequent description of boundaries as beginnings cannot be explained as easily. The segmentation process appears to operate on two levels: one emphasizing a few strong endings that elicit high intersubjective agreement and another focusing on the identification of beginnings in a more individual and variable manner. The prominence of beginnings echoes Krumhansl (1996), where listeners also marked more new ideas than closures in a sonata-form piece. This dual-layered pattern implies that listeners collectively recognize moments of closure as structural anchors, while their attention to beginnings reflects more personal or context-dependent interpretations of musical continuity.
Agreement on Endings and Highlighting the Role of Beginnings
The high rating of boundaries as beginnings, even following clear endings of the music analysis, suggests a processual listening mode—a forward-directed engagement described in dynamic theories of musical form (Huron, 2013). This highlights that listeners attend to both closure and continuity, experiencing segment boundaries as transitional rather than discrete (Burunat et al., 2024). Consensus among listeners may stem more from retrospective closure, when the ending is already in relation with the following music, underscoring listeners’ focus on transitional aspects of the music (Fink & Lange, 2025).
Our exploratory classification revealed a predominance of “beginnings after clear endings” (Type A) and “suspended beginnings” (Type B), indicating sensitivity to onset cues even in the absence of strong closure. Though clear endings (Type C) were rare, they were rated as highly important—highlighting their perceptual salience. Together, these findings suggest that segmentation judgments in our stimuli are characterized by a few clear endings with high agreement and at the same by a constant sensitivity to directional energy indicated by many individual beginning ratings. This dual focus reflects the tension between static and dynamic conceptions of form (Meyer, 1961): one emphasizing discrete sections and the other emphasizing process and continuity.
Effects of Repeated Listening and Challenges in Hierarchical Integration
As expected, repeated listening led to fewer markings, consistent with the idea that structural understanding develops over time and that exposure refines mental models of musical form (Margulis, 2012; Hansen & Pearce, 2014). However, effects on agreement differed by piece: In Piece 1, agreement decreased, possibly reflecting the exploration of alternative segmentation strategies within a clearer structure. In contrast, Piece 2 showed increased agreement, suggesting that repetition helped listeners converge on a shared interpretation (Peebles, 2011). Notably, increased familiarity did not enhance alignment with analytical boundaries, challenging the assumption that repeated exposure yields “correct” segmentation. Instead, our findings support a listener-centered model in which segmentation is shaped by formal, acoustic, and experiential factors (Clarke, 2005).
Contrary to Popescu's (2021) result that major boundaries are marked early in the light of a possible top-down approach of the listeners, we found that listeners often waited until after structural pauses to mark segment transitions. For example, Piece 2's most agreed-upon boundary (bar 244) was typically marked after the fermata and onset of the next section—suggesting a reliance on contextual continuity over anticipatory recognition. It furthermore supports our critique that stimuli with obvious surface cues, even in an unknown musical style, lead to prompt responses, while more complex stimuli evoke a different segmentation behavior.
Hierarchical integration, if present, manifested differently across the pieces. For Piece 1, decreased agreement in the second trial may reflect greater interpretive freedom, while Piece 2's increased agreement suggests convergence toward a more unified structure. Thus, clearer formal outlines may invite diverse segmentations, while ambiguous ones encourage shared frameworks through repetition.
Implications for Music Theory
Hanninen (2001) draws a helpful distinction between conceptual “ways of thinking” (p. 357) about segmentation and its perceptual features: Listeners access sonic and contextual cues—such as changes in timbre, texture, or rhythm—in real time, while structural criteria often function at a more abstract level, relevant primarily to composers, analysts, and performers. This may explain why theoretically crucial moments—such as essential expositional or structural closures (EEC, ESC; Hepokoski & Darcy, 2006)—were not marked in our data, suggesting they function more as generic constructs than perceptual events.
Our findings affirm a partial overlap between perceptual and structural segmentation but underscore their non-equivalence. Rather than privileging one framework, we advocate for integrating perceptual data into music-theoretical analysis. Salient surface changes—whether structurally significant or not—profoundly shape listener understanding and warrant analytical attention. Music analysis might benefit from supplementing top-down structural interpretations with bottom-up, perceptually informed insights. From a different perspective, but with a similar goal, Greenberg (2022) has recently criticized the holistic claim of Formenlehre that all individual formal parts are conditional in relation to a greater whole (p. 116). As an alternative interpretation of musical form, he suggests considering it diachronically rather than synchronically, supporting bottom-up models of form construction and emphasizing the cumulative effect of local stylistic cues.
Limitations and Future Studies
This study focused on two contrasting pieces, which, while appropriate for our aims, limits generalizability. Task design may also shape the results (Hartmann et al., 2016): The rating task in the second trial may have reduced the number of boundary markings due to increased attentional demands, complicating interpretations of reduced segmentation as increased familiarity.
Our findings resonate with previous research on inter-rater agreement and the prominence of perceived beginnings but also highlight the limited correspondence with sonata-based theoretical models. If listeners focus more on the dynamic flow of musical events than on analytically distinct sections, alternative paradigms may be warranted. Continuous measures such as tension ratings (Krumhansl, 1996; Lehne et al., 2013) could offer more nuanced insight into listeners’ evolving sense of form.
While the use of fixed recordings ensured comparability, it abstracts from the inherently performative and variable nature of musical form. Recent work in performance studies emphasizes that musical structure and meaning emerge through performative realization and embodied engagement rather than residing solely in the score (Cook, 2013; Doğantan-Dack, 2015; Leech-Wilkinson, 2012). Expressive nuances such as phrasing, articulation, and timing (Keller, 2012; Palmer, 1997) can shape how listeners perceive and segment music. Future studies could systematically vary performance parameters (e.g., Gingras et al., 2016; Keller & Appel, 2010) or combine empirical and qualitative approaches to explore how interpretive choices influence the perception of musical structure.
Overall, these considerations point to the value of integrative approaches that bridge cognitive, analytical, and performative perspectives (Doğantan-Dack, 2022) on musical form.
Conclusion
Segmentation—the parsing of musical flow—plays a central role in both music theory and music psychology, though the two fields approach it from different perspectives. While music theory tends to emphasize structural and deductive criteria, music psychology focuses on perceptual and experiential dimensions. Our study examined how listeners perceive segment boundaries and how those perceptions align—or diverge—from formal music-analytical interpretations.
Our study confirms that listeners segment music in meaningful, partially shared ways, shaped by both formal structure and surface features. While closures elicited stronger agreement and were rated as more important, beginnings were marked more frequently—reflecting a dynamic tension between retrospective closure and prospective anticipation.
These findings raise important questions for music theory. If composers, following historical (Koch, 1787) or reconstructed (Gjerdingen, 1996; Hepokoski & Darcy, 2006) conventions, crafted musical narratives with a clear dramaturgical intent, then music theory might more directly address how these intentions manifest in the acoustic and perceptual domain. Rather than viewing perception as secondary to structure, a more integrated, listener-centered approach would acknowledge how real-time experience informs and challenges formal interpretation.
Supplemental Material
sj-pdf-1-mns-10.1177_20592043261437593 - Supplemental material for Listener Perception and Musical Structure: A Study of Segmentation in Sonata Form
Supplemental material, sj-pdf-1-mns-10.1177_20592043261437593 for Listener Perception and Musical Structure: A Study of Segmentation in Sonata Form by L. T. Fink, K. Frieler, R. Muralikrishnan and M. Wald-Fuhrmann in Music & Science
Supplemental Material
sj-pdf-2-mns-10.1177_20592043261437593 - Supplemental material for Listener Perception and Musical Structure: A Study of Segmentation in Sonata Form
Supplemental material, sj-pdf-2-mns-10.1177_20592043261437593 for Listener Perception and Musical Structure: A Study of Segmentation in Sonata Form by L. T. Fink, K. Frieler, R. Muralikrishnan and M. Wald-Fuhrmann in Music & Science
Footnotes
Acknowledgment
The authors would like to thank the Deutsche Gesellschaft für Musikpsychologie for hosting the online experiment on DOTS, Pauline Larrouy-Maestri and Kyrill Fayn for their helpful suggestions regarding the procedure and data analysis, and Wolfgang Burgert (Divox AG) and Lisa Kahlden (New World Records) for providing the recordings used in this study.
Action Editor
Isabel Martinez, Universidad Nacional de La Plata, Facultad de Bellas Artes.
Peer Review
One anonymous reviewer.
Peter Harrison, University of Cambridge, Faculty of Music.
Ethical Approval
All experimental procedures were ethically approved by the Ethics Council of the Max Planck Society (Nr. 2017_12).
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability
All supplementary materials (further statistical analyses, music-theoretical analyses, annotated scores, and performance analyses) are available on the Open Science Framework:
(Fink et al., 2026) The datasets generated and analyzed during the current study are available at https:/github.com/klausfrieler/form_segmentation (Frieler, 2025). An interactive visualization tool of all datasets for exploring listener responses and analytical boundaries is available at: https:/testing.musikpsychologie.de/form_segmentation/ (Frieler, 2023).
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
