Listener Perception and Musical Structure: A Study of Segmentation in Sonata Form

Abstract

This study aims to create a dialogue between music psychology and music-theoretical discourse on the perception of musical form. In a lab and online study, we investigated how listeners segment Western classical music in real time, focusing on (1) inter-rater agreement, (2) alignment with music-theoretical analyses of sonata form, (3) subjective qualities of segment boundaries (beginnings, endings, importance ratings), and (4) the effects of repeated listening. Participants listened to two stylistically contrasting orchestral movements—one classicist, early romantic and one romantic work—and marked perceived segment boundaries during two exposures. In line with previous research, results reveal consistent above-chance agreement among listeners, particularly at structurally prominent moments with clear surface cues. While most boundaries aligned with music-theoretical segmentations, others did not, especially when formal boundaries were not reinforced by salient acoustic features. Listeners frequently identified boundaries based on perceived beginnings, but higher agreement was associated with strong endings and higher importance ratings. Exploratory categorization of boundary types revealed a tendency to emphasize transitional moments, suggesting a processual, forward-directed mode of engagement. Repeated listening reduced the number of segmentation responses, but its effect on agreement varied by piece, suggesting that structural clarity and ambiguity modulate how listeners integrate musical form. Findings underscore the importance of both surface salience and formal function in shaping real-time segmentation. We argue for a listener-centered approach to musical form that integrates perceptual and analytical perspectives, and we propose that perceptually salient yet analytically unmarked moments merit greater attention in theories of musical form.

Keywords

Boundary detection classical music form perception listener agreement music theory

Introduction

How do listeners follow a piece of music in real time? And how do they segment and integrate a continuous auditory stream into coherent units such as phrases, sections, or entire formal architectures? This longstanding question lies at the intersection of Western music theory and music perception. While music theorists have long considered segmentation as a cornerstone to musical understanding (Koch, 1802/1969; Riemann, 1897, p. 233), the extent to which analytically defined structures align with listeners’ real-time perceptual experiences remains empirically underexplored. Traditions of Formenlehre, especially in the work of Dahlhaus (1978), Caplin (1998), and Hepokoski and Darcy (2006), emphasize top-down analysis based on tonal trajectories, thematic functions, and their stylistic conventions. Yet the psychological plausibility of such structures—the degree to which they are perceptually salient—has received comparatively little attention in music theory. In this study, we address this gap by comparing music-theoretical analysis of two sonata-form movements with listeners’ spontaneous segmentation behavior in a real-time task.

Psychological research has questioned whether global structures in music are perceptually relevant in real-time listening. Some suggest that listeners may not consciously perceive or prioritize large-scale formal structures at all (Levinson, 1997; Tillmann & Bigand, 2004). Listeners frequently overlook key structural features such as tonal closure and modulation (Cook, 1987; Marvin & Brinkman, 1999), or the overall order of sections (Gotlieb & Konečni, 1985; Eitan & Granot, 2008; Karno & Konečni, 1992; Konečni, 1984; Tillmann & Bigand, 1996), implying that the global organization of events may not shape their aesthetic experience. However, as McDonald and Wöllner (2022) argue, listeners’ willingness to accept altered versions of a piece as aesthetically coherent does not necessarily indicate insensitivity to formal disruption. Their findings call into question the methodology of earlier studies, many of which relied on scrambled or artificially manipulated stimuli. Segmentation studies, by contrast, preserve the musical structure and thus offer a more ecological approach to investigating how listeners perceive form. We note that our design does not assess aesthetic or emotional responses; participants’ judgments concern the cognitive perception of structural boundaries rather than liking, preference, or affective experience.

Against this backdrop, segmentation studies have shown that listeners are sensitive to perceptual boundaries in real-time musical experience (Clarke & Krumhansl, 1990; Deliège & El Ahnmadi, 1990; Hartmann, 2016). Such boundaries are often cued by salient acoustic features—pauses, changes in texture, articulation shifts, or dynamic contrasts—that signal a transition. In some cases, listeners also exhibit anticipatory awareness of upcoming formal changes (Peebles, 2011), suggesting that segmentation is not merely reactive but guided by expectations as well.

This study aims to bridge a longstanding gap between theoretical models of musical form and empirical research on how listeners perceive it. Drawing on insights from music psychology and cognitive music theory, we investigate which formal boundaries in complete classical works are marked by participants. We compare this spontaneous real-time segmentation with music-analytical models grounded in sonata-form conventions (Caplin, 1998; Hepokoski & Darcy, 2006), allowing us to explore the relationship between perceptual salience and formal structure in an ecologically valid context.

Music Analysis and Psychological Approaches to Segmentation

Although both music theory and music psychology consider segmentation as the identification of boundaries between discrete musical units, they rely on different criteria and methods. Music analysis typically approaches segmentation through style-specific conventions based on the score, focusing on discontinuities of musical parameters (Hasty, 1981; Lefkowitz & Taavola, 2000). These may coincide with salient surface cues, such as changes of timbre, rhythm, or pitch, but are not defined by them alone; there is also a contextual or even purely structural domain (Hanninen, 2001, 2012). In the context of sonata form, contextual and structural principles such as cadential articulation, thematic and tonal development, or rhetorics play a decisive role in delineating formal sections (Caplin, 1998; Hepokoski & Darcy, 2006). Formenlehre traditionally emphasizes hierarchical integration across an entire piece (Caplin, 2010; Riemann, 1888), and the interest lies rather in the relations between annotated segments and a top-down hierarchical integration than in the real-time perceptibility of these sections.

In contrast, psychological research approaches segmentation as a real-time perceptual process shaped by attention, memory constraints, and surface-level cues. Drawing on Gestalt theory (Bregman, 1990; Deutsch, 2013), cognitive models such as the Generative Theory of Tonal Music (GTTM; Lerdahl & Jackendoff, 1983) or Event Segmentation Theory (Zacks et al., 2007) propose that listeners parse a flow of information by detecting salient changes—in the case of music, shifts in register, articulation, texture, or rests. Hypotheses of GTTM have been supported by empirical studies showing that surface-level cues like pauses significantly influence perceived segmentation (Deliège, 1987; Frankland & Cohen, 2004). Related computational modeling efforts (Hutchison et al., 2015; Pearce et al., 2010; Temperley, 2004) have replicated similar findings, though these studies have so far focused primarily on monophonic music, limiting their generalizability to richer textures.

Only a few studies have explicitly compared listener segmentation with formal music analysis, in most cases as a reference for comparison of the listener responses. For example, Deliège and El Ahnmadi (1990) found that listeners’ segmentation of a contemporary classical piece closely matched analyses by two composers (see also Deliège, 1989). Pauses emerged as primary segmentation cues but were effective only when accompanied by clear contrasts on either side of the boundary. Similarly, Clarke and Krumhansl (1990) investigated expert musicians’ segmentation of a contemporary piano piece. After an initial familiarization, participants marked boundaries during a second listening and refined them with the score on a third. They rated each boundary's strength, difficulty, and defining features. The group showed consensus on boundary locations, which aligned both with GTTM's grouping hypotheses and the piece's compositional structure. Krumhansl (1996) confirmed these findings in a study on Mozart's Piano Sonata KV 282, a miniature sonata form. Participants reliably marked major sections, while agreement was lower for minor divisions. The experiment included trials for identifying major section endings, continuous tension rating, and marking new musical ideas. Notably, beginnings were more frequently indicated than endings. Krumhansl referenced two independent music-theoretical analyses of the piece: a hierarchical model by Narmour (1996) and a schema-driven approach by Gjerdingen (1996). While she emphasized similarities between these analyses and listener responses, differences were less explored—for example, Gjerdingen's labels for new ideas were considerably more detailed than participants’ markings. However, Krumhansl's design served as an important inspiration for our study, even though she did not statistically evaluate listener agreement.

More recent research has extended segmentation studies beyond scores from the classical repertoire to include popular music (Bruderer, 2008), live performances of classical works (Phillips et al., 2020), or cross-cultural comparisons (Popescu et al., 2021) and have shed new light on methodology and the role of expertise (Hartmann et al., 2016). Consistent with earlier findings, these studies report high listener agreement on boundary locations. Notably, Western listeners demonstrated strong consensus in segmenting Indian ragas and could differentiate between hierarchical boundary levels within this non-Western musical tradition (Popescu et al., 2021), suggesting that perceptual segmentation may transcend cultural boundaries while respecting hierarchical organization.

Despite converging findings on boundary agreement, most studies struggle to disentangle surface salience from formal structural importance. Because acoustically prominent moments often coincide with structurally significant boundaries (in the case of Bruderer, 2008; Deliège & El Ahnmadi, 1990; Krumhansl, 1996; Phillips et al., 2020; Popescu et al., 2021), interpretations remain confounded from a music-theoretical perspective. This persistent overlap highlights the need for a more nuanced understanding of the relationship between perception and analysis of musical form, and how these perceptions may be influenced by both surface-level cues and deeper structural features. In our study, we therefore used two types of stimuli: one piece with very clear surface features and one with ambiguous structural boundaries.

Performance Practice, Music Analysis, and Listener Perception

The distinction between clear and ambiguous boundaries in a musical piece only becomes apparent through the performance of a score. Thus, recent performance research has challenged the assumption that musical structure is fully determined by the score, proposing instead that form and meaning emerge through performative realization and embodied engagement (Cook, 2013; Doğantan-Dack, 2015; Leech-Wilkinson, 2012). This emphasis has implications for understanding listener-based segmentation: Boundaries may also arise from expressive and temporal nuances of performance. To address the possible influence of performance features on listeners’ segmentation behavior, we include a qualitative comparative analysis of the two recordings, outlining expressive and interpretive characteristics (see Supplement 4). This analysis situates the stimuli within a performance-research perspective, highlights interpretive nuances that may contribute to perceptual salience, and finally underscores the decisions made in our music-theoretical analysis.

Challenges in Experimental Design

Differences in the instruction of real-time listening tasks and operational definitions of “segment” complicate cross-study comparisons. Some studies ask participants to mark formal sections (Krumhansl, 1996; Phillips et al., 2020), while others solicit “instants of significant change” (Hartmann et al., 2016), “new ideas” (Krumhansl, 1996; Lalitte & Bigand, 2006), “phrase ends” (Peebles, 2011; Popescu et al., 2021), or “phrase transitions” (Kreutz, 1995). These variations affect both the timing and type of responses. Moreover, few studies distinguish between endings and beginnings of segments or allow for the marking of ambiguous transitions (though see Krumhansl, 1996). This is not only crucial in the context of music analysis: The distinction between anticipation before and consolidation during and after a boundary is also evident in the activation of different brain networks (Burunat et al., 2024).

Familiarity and expertise introduce further complexity in understanding segmentation. While increasing exposure to a piece or style tends to improve segmentation accuracy (Fredrickson, 2000; Peebles, 2011; Phillips et al., 2020), the influence of formal musical training is less straightforward in behavioral tasks. For instance, Deliège (1989) observed that non-musicians sometimes identify more boundaries than experts, possibly focusing on local changes rather than overarching form. However, Hartmann and his team (2016) reported no significant difference between musicians and non-musicians. Electroencephalogram (EEG) studies suggest that the brain registers musical closures even when listeners are not actively attending to phrase structure (Knösche et al., 2005; Neuhaus et al., 2006), supporting the idea of an unconscious or intuitive component to segmentation. Since we were interested in a general approach to music segmentation, we choose participants who were familiar with classical music but who had no formal musical training.

Research Aims and Hypotheses

To serve both our empirical and music-theoretical purpose, we designed a two-phase experiment combining real-time segmentation with an integrated rating task. Our stimuli were two sonata-form movements differing in formal clarity: an early romantic piece (Piece 1) with clearly articulated segments and a romantic piece (Piece 2) with less distinct divisions. Both pieces follow a sonata form. Participants without formal musical training marked perceived “sections” during a first listening, with no technical definition of “section” provided—encouraging intuitive, surface- or structure-driven responses. During a second listening, they marked and then rated each moment on (a) expectancy of something new, (b) sense of an ending, and (c) perceived importance. We decided to include questions on expectation and endings because they play a crucial role in constructing meaning in music theory (Meyer, 1956; Narmour, 1992) and we hoped to gain some qualitative insight into listeners’ decisions.

Our goal was not only to identify where listeners perceive boundaries but also to understand why, and how these perceptions correspond to formal music-analytical segmentation. Theoretical segmentations were coded at two hierarchical levels (see also Popescu et al., 2021): Level 1 for major formal divisions (e.g., exposition vs. development) and Level 2 for sub-sectional phrases. Our research questions and hypotheses were:

R1: Do listeners agree on segmentation points? (Inter-rater agreement)

H1: Consistent with prior work, we expect higher agreement for Level 1 than for Level 2 boundaries, especially in Piece 1 with clearer surface cues.

R2: Do participant markings align with theoretical segmentation? (Relation to music analysis)

H2: Participant markings should correspond with major section boundaries, though listeners may not indicate all structural boundaries and may also mark salient surface changes lacking structural importance.

R3: Are endings rated as more important and salient than beginnings? (Subjective ratings)

H3: We anticipate a bias toward endings, with ambiguous or dual-function transitions perceived as especially salient and important.

R4: Does repeated listening affect segmentation? (Effect of repetition and familiarity)

H4: We predict fewer boundaries on the second listening, reflecting increased structural awareness through familiarity rather than formal expertise.

A fifth, more qualitative research focus concerns potential divergences between empirical findings and theoretical models. Specifically, do participant ratings mirror hierarchical structural levels? How do empirical results challenge current Formenlehre assumptions? Might these findings prompt revisions to formal theoretical frameworks? These questions will be explored in the discussion and future research section.

By integrating music-theoretical analysis with empirical perception data, this study aims to offer a novel, bidirectional contribution: using music analysis to contextualize listener behavior, and listener data to critically evaluate music-analytical assumptions (Cross, 1998). We propose that segmentation is neither solely surface-driven nor purely theoretical, but emerges from the dynamic interplay of style familiarity, auditory salience, and structural expectations.

Method

Participants

Two studies were conducted to investigate listener-based musical segmentation: a laboratory study incorporating repeated trials, subjective ratings, and questionnaire data, and a complementary, shorter online study designed to boost sample size for inter-rater reliability analyses. All experimental procedures were ethically approved by the Ethics Council of the Max Planck Society (Nr. 2017_12) and were undertaken with the written informed consent of each participant.

Lab Study

Forty-three participants (30 self-identified female, 13 male) were recruited via the participant database of the Max Planck Institute for Empirical Aesthetics. Ages ranged from 20 to 35 years (M = 27.5, SD = 4.2). All reported normal hearing. Participants completed the German version of the Goldsmiths Musical Sophistication Index (Gold-MSI; Müllensiefen et al., 2014; Schaal et al., 2014) and the Barcelona Music Reward Questionnaire (BMRQ; Mas-Herrero et al., 2013). The mean Gold-MSI general sophistication score was 62.65 (SD = 13.56; range = 34–90), corresponding to the 40th percentile of the German norm sample. Mean scores for the Gold-MSI Perception subscale and the BMRQ were 44.44 (SD = 8.17; range = 26–62) and 83.60 (SD = 10.59; range = 56–105), respectively.

Online Study

Ninety-seven participants (53 self-reported female, 41 male, 3 unspecified) completed a shortened online version of the task, which included only one of the pieces, one trial of segmentation, and the Gold-MSI general. After excluding 9 participants due to missing data, the final sample consisted of 88 individuals. Participants were recruited via social media outreach through the Max Planck Institute for Empirical Aesthetics and a local classical radio station. The sample's age ranged from 19 to 90 years (M = 43.2, SD = 17.3), and the average Gold-MSI general score was 73.8 (SD = 18.0), consistent with population norms (Schaal et al., 2014).

Material

Two lesser-known romantic orchestral works were selected to investigate segmentation behavior under varying structural clarity: Piece 1 (Farrenc, 1849), the fourth movement of Louise Farrenc's Nonet in E-flat major op. 38 (clear structural boundaries); and Piece 2 (Paine, 1908), the first movement of John Knowles Paine's Symphony No.1 in C minor, up to the recapitulation (less clear boundaries). Both works exhibit sonata-form structure but differ in their degree of formal clarity. Piece 1 features conventional sectional boundaries (e.g., pauses, texture shifts, dynamic contrasts), whereas Piece 2 includes more ambiguous transitions (e.g., overlapping phrases, modulations, blurred cadences). To minimize familiarity effects while maintaining stylistic coherence, we selected works by underrepresented but idiomatically typical composers of the common practice era. The stimuli Piece 1 (Mehta & New York Philharmonic, 1989) and Piece 2 (Consortium Classicum, 2010) were presented as .wav files (44.1 kHz, 24-bit) always consecutively, with amplitude normalization and fade-outs (2 s for Piece 2) applied using Audacity.

Music-Theoretical Annotation

A detailed formal analysis of both pieces was performed by the first author, drawing on Caplin's theory of formal functions (1998) and sonata theory (Hepokoski & Darcy, 2006). Each piece was annotated for expected segmentation points, labeling phrases, subsections, and sections, combined with a tonal analysis of the main modulations and cadences (Supplement 2). To distinguish between the importance of boundaries, they were categorized in two hierarchical levels: the beginning of complete sections, for example, the beginning of the development (Level 1), and subsections, for example, a half cadence at the end of the primary theme (Level 2). Events were additionally tagged as “beginning,” “ending,” or “ambiguous” depending on their formal character. Sonic Visualizer (Cannam et al., 2010) was used to align the score with the audio files. We reduced the resulting time series to beats that contained section boundaries and labeled them with measure numbers. This resulted in two time series of different event densities, a coarse-grained Level 1 and a fine-grained Level 2, which we used as reference points in the quantitative analysis. Any Level 2 boundary is also considered a Level 1 boundary, which means that this segmentation is hierarchical. In addition to the music-theoretical perspective, we provide a performance analysis of the recordings in Supplement 4, focusing on the parameters of tempo, phrasing, articulation, dynamics, and timbre.

Design and Procedure

Lab Study

Participants completed two segmentation trials per stimulus in a sound-attenuated room, using Beyerdynamic DT 770 Pro headphones. The experimental scenario was implemented using a custom user interface (Muralikrishnan, 2019) programmed in Presentation software (Neurobehavioral Systems, 2020). In each trial, participants listened to the full piece and marked perceived segment boundaries by pressing a designated key. A 100-ms silence preceded playback onset. In the second trial, additional rating features were enabled: When a mark was placed, the playback stopped, and participants could either cancel, replay the preceding 10 s, or confirm the boundary and respond to three Likert-scale questions (1–6): a) How strong is the impression of an end? b) How strong is the expectation for something new? c) How important is that moment in the overall context? (see also Schmuckler, 1989). After each piece, participants rated liking and task difficulty on similar six-point scales. Following both pieces, participants completed the Gold-MSI general and perception subscales (Müllensiefen et al., 2014; Schaal et al., 2014) and the BMRQ (Mas-Herrero et al., 2013) in German. The session lasted one hour; compensation was 14€.

Online Study

The online study replicated the first trial of the lab task but used only one piece (randomly assigned). No second trial or rating questions were included. Participants provided demographic data and completed the Gold-MSI general subscale. No compensation was provided.

Analysis

Segmentation Agreement Measures

To assess the agreement between participants and their alignment with analytical boundaries, we employed three complementary methods: Gaussification (see also Bruderer, 2008; Hartmann et al., 2016), dynamic time warping (DTW), and signal detection theory with F1 scores (see also Popescu, 2021). We chose these measures because they offer a comprehensive assessment of segmentation agreement, considering the challenge of a possibly lagged response time, which is impossible to control for.

Gaussification: Segmentation responses were aggregated via Gaussian kernel density estimation following Frieler (2004). Each participant's mark was represented by a Gaussian function centered on the response time, with a standard deviation of 2 s (see also Burunat et al., 2024; Margulis, 2012 p. 380). Summing across participants yielded a continuous probability density function. The peaks of this curve represent moments of high segmentation agreement. The height of a peak is approximately proportional to the number of participants who placed a mark in that vicinity.

Dynamic time warping (DTW): To quantify alignment between participant responses and music-theoretical boundaries, we used DTW (via the R (v. 4.4.2) dtw package (v. 1.23-1)), which optimally matches two time series based on a cost-minimizing algorithm. The resulting normalized DTW distance served as a metric of temporal correspondence. Because segmentation marks were distributed over fixed-length stimuli (322 s for Piece 1; 421 s for Piece 2), offset correction was unnecessary. For a visualization of DTW, see Supplement 1, Figure 2.

F1 scores: F1 scores were calculated to assess agreement based on information retrieval metrics. Each stimulus was divided into equal-length time bins; a bin was coded as “1” if it contained at least one segmentation mark, and “0” otherwise. Precision, recall, and F1 were computed for participant data against the annotated ground truth (Level 1 and Level 2 of the music-theoretical analysis), capturing both over- and under-segmentation.

Inter-rater Agreement and Comparison with Music Analysis. We quantified inter-rater agreement using two metrics derived from the Gaussification curve: mean normalized peak height, defined as the average height of all major peaks divided by the number of participants, and peak-to-trough contrast, calculated as the difference between mean peak and mean trough height, also normalized by sample size. Both metrics yield values between 0 and 1, with higher scores reflecting greater consensus. To determine whether observed agreements between raters and music analysis exceeded chance levels, we generated simulated random segmentations using a log-normal model of inter-segmentation intervals (ISIs), based on observed ISI distributions (see Results). ISIs were sampled from a fitted log-normal distribution, exponentiated, and cumulatively summed to construct synthetic segmentations. Comparison of actual segmentation metrics (Gaussification metrics, DTW, F1) with those from simulated data allowed us to test whether listener behavior diverged significantly from random patterns.

Results

We first report the time marker data, all the participants’ segmentation marks, and general observations on the inter-rater agreement with Gaussification, DTW, and F1 scores. In the order of our research questions, we then show how similar the participants’ responses were (1—Inter-rater agreement) and continue with the comparison of music-theoretical boundaries (2—Relation with music analysis). We report which peaks of the inter-rater agreement are similar to the music-theoretical perspective, and which peaks differ. We then present the results of the participants’ ratings of beginning, ending, and importance, and interpret these in relation to the music analysis (3—Subjective ratings). Finally, we check if results differ between first and second listening (4—Effects of trial and familiarity). An interactive visualization of the segmentation data and analytical annotation is available online (Data Availability or Supplement 1, Figure 4).

General Observations

Participants provided time-based segmentation responses across two pieces in both lab (two trials) and online (one trial) settings. An overview of the time marker data is displayed in Table 1. Inter-segment intervals (ISI) from both participant responses and theoretical analyses followed a log-normal distribution (Supplement 1, Figure 1). A linear mixed-effects model (Supplement 1, Table 1), with participants as random effects and interactions of piece and trial and of piece and experiment type as fixed effects, revealed that ISIs were significantly longer for Piece 2 and increased slightly in the second trial, suggesting fewer boundary markings with repeated exposure. Online participants had a tendency for slightly shorter ISIs than lab participants.

Table 1.

Overview of time marker data.

Source	N	Number of boundary markers	M ISI (s)	SD ISI (s)
Piece 1 (Farrenc, Nonet, 4^th mvt.)
Lab study Trial 1	43	676	17.3	12.0
Lab study Trial 2	43	414	30.2	17.3
Online study	11	266	11.6	7.6
Music analysis level 1		13	26.8	20.4
Music analysis level 2		43	7.7	5.1
Piece 2 (Paine, Symphony No. 1, 1^st mvt.)
Lab study Trial 1	43	542	27.8	16.7
Lab study Trial 2	43	429	36.5	19.3
Online study	77	1.163	24.1	15.5
Music analysis Level 1		11	38.6	28.3
Music analysis Level 2		27	14.9	8.3

Note. ISI = Intersegment interval in seconds.

R1: Inter-rater Agreement

We used Gaussification with a 2-s bandwidth to assess inter-rater agreement. Figure 1 shows the resulting peak structures for both pieces. Visually, there are several clear peaks in each piece, while plenty of small peaks characterize the bottom.

Comparison with 50 random simulations (Table 2), generated from log-normal ISI distributions, confirmed that observed agreement significantly exceeded chance levels (rated: .116–.169, simulated: .025–.041 agreement rate). Piece 1, which features clearer structural cues, elicited higher agreement than Piece 2. As hypothesized and in line with previous studies, listeners choose similar moments as segment boundaries. The agreement for Level 1 boundaries is higher, and clearer surface cues (as in Piece 1) also improve inter-rater agreement.

R2: Relation with Music Analysis

Using F1 scores and normalized DTW, we compared participant segmentations (aggregated via Gaussification) with music-theoretical boundaries at two levels of granularity across different bandwidths (0.5 s–4 s). For each bandwidth and level, we selected all peaks exceeding the mean to construct summary segmentations, for which F1 scores and normalized DTW distance with the theoretical predictions can be determined. Additionally, we calculated the same values for N = 25 random segmentations based on the summary segmentation.

Figure 1.

Participants’ responses of segments. Gaussification (red) and individual segmentation marks (gray) for Piece 1 (top) and Piece 2 (bottom). Note. the Gaussification depicts a continuous probability density function; high peaks represent high segmentation agreement.

Table 2.

Inter-rater agreement of trials vs. random agreement.

Piece	Trial	N	Mean agreement (rated)	Mean agreement (simulated)	t	p
1	1	54	.169	.041	−286.62	<.000***
1	2	43	.144	.029	−271.01	<.000***
2	1	120	.116	.030	−289.85	<.000***
2	2	43	.123	.025	−394.18	<.000***

Note. Trial 1 includes both lab and online study. For random agreement, a simulation of N = 50 runs with a bandwidth of 2 s was used. Agreement scale lies between 0 (none) and 1 (absolute).

The results (Figure 2) showed significantly higher alignment with theoretical boundaries than chance, particularly at the major structural boundaries of Level 1. While F1 scores indicated above-chance alignment even at Level 2, DTW distances for Level 2 showed no significant improvement over random segmentations. Unsurprisingly, agreement improved with wider bandwidths, reflecting increased temporal tolerance.

Figure 2.

Comparison of F1 scores and DTW normalized scores against theoretical predictions. Note. F1 Scores (left) and DTW normalized scores (right) for both Piece 1 (Farrenc) and Piece 2 (Paine). Summary segmentation in red; random segmentation in blue. Theoretical predictions are based on Level 1 and Level 2 of the music analysis.

Agreement and Difference Between Music-Theoretical Boundaries and Participants’ Responses

Figure 3 shows the Gaussification of all participant data against the backdrop of sonata form. Peaks for both pieces that reflect the agreement of at least a third of the participants are listed in Table 3 and marked in the score (Supplement 3).

Figure 3.

Participants’ responses of segments with annotated sonata form labels. Note. the figure displays all main sections of the sonata form, annotated as Level 1: Introduction (Intro), Primary theme (P), Transition (Tr), Secondary theme (S), Development (Dev), Re-transition (RTr), Coda (C), Medial caesura (MC), Essential expositional closure (EEC), Essential structural closure (ESC).

Table 3.

Peaks with highest inter-rater agreement with brief analytical description.

No.	Rel. Peak Height	Measure	Section	Preceding musical event	Analysis level	Beginning/End (diff.)	Importance	Boundary type
Piece 1
1.1	0.89	94	Exp	End exposition / beginning dev	1	0.03	4.16	A
1.2	0.67	36	Exp	End primary theme	1	−0.48	4.00	B
1.3	0.66	211	Recap	End transition	1	−0.28	3.97	C
1.4	0.66	6	Exp	Beginning preparatory phrase	1	−0.50	4.07	A
1.5	0.62	152	Recap	Beginning recapitulation	1	−0.52	4.33	A
1.6	0.56	262	Recap	PAC after EEC	1	−0.36	3.68	A
1.7	0.53	245	Recap	Surface feature	–	−0.73	4.09	B
1.8	0.35	73	Exp	Repetition secondary theme	2	−0.88	3.38	A
1.9	0.34	27	Exp	Repetition primary theme, HC	2	−1.20	4.00	B
1.10	0.34	186	Recap	Beginning transition	2	−0.27	3.73	B
1.11	0.34	160	Recap	Continuation primary theme	2	−0.20	3.70	A
Piece 2
2.1	0.81	244	Dev	Fermata, retransition	1	1.32	4.97	C
2.2	0.67	69	Exp	MC, caesura fill	1	−0.70	4.09	B
2.3	0.66	22	Exp	Continuation P	2	−0.36	5.00	A
2.4	0.64	30	Exp	Primary theme, HC	2	−1.10	4.43	A
2.5	0.56	209	Dev	Development S-space	1	−0.38	3.88	A
2.6	0.48	269	Dev	Deceptive cadence	–	−0.95	4.00	B
2.7	0.46	159	Exp	End exposition	1	0.00	4.65	B
2.8	0.46	307	Recap	End development, Beg. recap	1	0.50	4.86	A
2.9	0.42	102	Exp	New key	2	−0.38	4.46	A
2.10	0.42	54	Exp	T, repetition	2	−0.67	3.78	A
2.11	0.39	92	Exp	New S	2	−0.31	3.81	A
2.12	0.39	227	Dev	New S	2	−0.22	3.78	A
2.13	0.35	74	Exp	S-space	1	−0.44	3.78	A
2.14	0.32	184	Dev	Development core (H)	2	−0.75	4.00	A

Note. All peaks are indicated in the score (Supplement 3); the given bar is the one preceding the peak. Beginning and End ratings from −6 (beginning) to 6 are summarized; importance ratings from 1–6. Perfect authentic cadence (PAC), Essential expositional closure (EEC), Half cadence (HC), Medial caesura (MC), Primary theme (P), Secondary theme (S), Transition (T). For boundary type, see Figure 4.

At the very beginning of Piece 1, we observed a pattern where many participants marked their segments barwise. Since we did not include a practice trial, participants were obviously getting used to the task, and we therefore excluded these four peaks (.36–.47 relative peak height) from the analysis.

In Piece 1, the top five peaks all correspond with Level 1 boundaries. In contrast, Piece 2 showed more divergence: Only three of the top five peaks align with Level 1, while others coincided with prominent surface features (e.g., timbral or dynamic changes) lacking formal structural functions.

Notably, some Level 1 boundaries received little attention from participants—only 6 of 13 (Piece 1) and 6 of 11 (Piece 2) were marked with considerable agreement. Conversely, participants also marked two non-labeled moments, such as a rhetorical pause (Piece 1, m. 245) and a dynamic development toward a cadence (Piece 2, m. 269). In both cases, surface features (crescendo, rhetorical increase) and their resulting structural expectations lead to a kind of “deceptive cadence”: The misleading listening impression in that moment overrides music-theoretical criteria (see Supp. 4, Fig. 2, for a qualitative analysis of performance parameters).

While the development and recapitulation boundaries were reliably marked in both pieces, secondary themes were not directly identified as segment boundaries—challenging assumptions of sonata form perception. Instead, listeners were sensitive to the respective transitions toward the secondary theme. In Piece 1, listener agreement highlights the beginning of the transition section in the exposition (peak no. 1.2 in Table 3), and the end of the transition in the recapitulation (1.3). In Piece 2, the focus lies on the structural cadence (medial caesura, MC) leading into the secondary theme (2.2). Thus, the onset of the secondary theme itself is not as crucial for listener agreement as the transition toward it. Furthermore, the context of the piece itself influences segmentation behavior at similar moments in Piece 1, suggesting a changing understanding of secondary themes in the exposition and development.

This confirms our hypothesis, that not only major section boundaries should correspond with participant markings, but also moments lacking music-theoretical importance. These moments appear with obvious changes in musical parameters, but they nevertheless suggest a structural understanding by revealing certain expectations within a framework (e.g., cadences). Most importantly, the linear sections of sonata theory are weighted very differently in the listening behavior than in music analysis and therefore seem incomplete in places (e.g., missing secondary theme). Each of the two pieces reveals a different listening strategy, highlighting the role of transitions as indirect formal markers.

R3: Subjective Ratings: Beginnings Rated Higher, but Endings More Consistently Identified

In the second trial of the lab experiment, participants rated each respective boundary on three scales (1–6): expectation of something new (beginning), sense of closure (ending), and importance. In this paragraph, we simply refer to these subjective ratings as judgements of beginning and ending.

Beginnings were rated slightly higher (Piece 1: M = 4.43, SD = 0.48; Piece 2: M = 4.73, SD = 0,51) than endings (Piece 1: M = 3.88, SD = 0.41; Piece 2: M = 4.26, SD = 0.78), with a significant difference in both cases (p < .05). There was no overall difference in importance ratings between pieces (t(29) = 1.93, p = .063), though Piece 2 received slightly higher subjective ratings in general. A linear mixed model for the ratings based on type and piece interaction as fixed effects and participants as random effects confirmed these findings: Beginnings received higher ratings than ending and importance ratings (Supplement 1, Table 2). However, no significant interaction was found.

Correlational analysis (Table 4) showed strong relationships between beginning and ending ratings (r = .60), implying that participants often perceived boundaries as events with both closure and initiation. In Piece 2, importance strongly correlated with ending ratings (r = .82); in Piece 1, importance aligned more with inter-rater agreement than with subjective ratings.

Table 4.

Significant correlation of raters’ boundary assessments.

Piece	x	y	r	p	n
Both	Beginning	Ending	0.638	<.001	64
Both	Beginning	Importance	0.452	.001	64
Both	Ending	Importance	0.645	<.001	64
1	Beginning	Ending	0.584	.002	33
1	Beginning	Relative peak height	0.548	.004	33
1	Ending	Relative peak height	0.616	.001	33
2	Beginning	Ending	0.648	<.001	31
2	Beginning	Importance	0.524	.010	31
2	Ending	Importance	0.819	<.001	31

Note. We gathered the Gaussification peaks with a 2-s bandwidth and selected all boundary markers by participants within a 2-s window around the peak position. Next, we aggregated all beginning, ending, and importance ratings for each peak as well as the relative peak height. We correlated the three assessments with each other and with the relative peak height, for both pieces together, and for each piece alone. For simplicity, only the significant correlations are shown.

A linear mixed-effects model confirmed that ending and importance ratings significantly predicted agreement in terms of relative peak height (Table 5), whereas beginning ratings did not. Even though participants often rate beginnings highly, they don’t lead to consistent agreement across listeners. This suggests that perceived endings were more consistently shared among listeners, offering nuanced support for our hypothesis that endings play a major role for the listeners.

Table 5.

Linear mixed model for relative peak height of both pieces (as a proxy for inter-rater agreement).

Effect		Estimate	SE	Df	t	p
Fixed
Intercept		0.151	0.053	14.3	2.845	.013
Beginning		0.007	0.008	656.5	0.910	.363
Ending		0.052	0.007	659.1	7.745	< .001
Importance		0.029	0.008	654.2	3.793	< .001
Random
Participants	SD (Intercept)	0.081
Piece	SD (Intercept)	0.037
Residual	SD (Observation)	0.185

Note. We modelled relative peak height with beginning, ending, and importance ratings, using participants and piece as random effects (intercepts only, as models with slope were singular and did not improve model fit).

Music Analysis of Different Boundary Types and Their Subjective Ratings

To obtain a single continuous variable, we combined beginning and ending ratings into one single scale from −6 to 6, with 0 expressing similar beginning and ending qualities. Negative numbers express beginnings (or a high expectancy of something new), and positive numbers endings (a low expectancy of something new). Mirroring the higher means of beginnings, peaks with the highest inter-rater agreement (Table 4) also show higher beginning than ending ratings.

To quantify the inter-rater agreement, we had determined a time window of 2 s around the peak maximum as a suitable bin size. In cases where a phrase end coincides with a new beginning, the high correlation between the subjective ratings of beginning and end depict the metrical ambiguity of the musical surface. However, the music analysis includes many instances where two clear segment boundaries appear within the chosen bin size of the time window, for example, when the cadential end of a subsection is followed by a short break and a new beginning within 2 s. Therefore, we looked at the subjective ratings of the previously discussed peaks (Table 3) with a more fine-grained bin size of 0.5 s The higher resolution should reveal an accurate picture of whether and how participants distinguished between close segment boundaries. One should keep in mind that we did not ask the participants if they heard a clear beginning or ending (forced choice) but rather how strongly they rated their expectation for something new to come or something to close (single item). This way, each marked event can be seen as a continuum between these two possibilities, leaving room for an ecological, ambivalent description of segment boundaries and connecting to methodologies of expectancy ratings (e.g., Schmuckler, 1989). Participants gave subjective ratings only in the second trial of the lab study. However, we included all trials in our analysis, because the distribution was similar between the trials and between the online and lab experiments.

Boundary Types. Based on the most prominent peaks in Figure 1 and Table 3, we inductively derived three symbolic boundary types (Figure 4). These boundary types combine the music-theoretical annotations of beginnings and endings with the respective subjective characterization of the individual peaks. In most cases, the subjective ratings highlighted perceived beginnings. Thus, the most common boundary type emphasizes a new beginning in close proximity after a strong ending according to the music-theoretical analysis (type A, beginnings after clear endings). Another type appears around suspended entrances, ambivalent harmonic turning points, or caesura fills in the music analysis, also strongly anticipating what follows (type B, suspended beginnings). We also want to mention the possibility of peaks that did not appear in the top 15, because they are spread out wider than the 4 s that we considered as part of the 2 s bandwidth. They appear in both pieces around fermatas or caesura fills (e.g., Piece 1, m. 5), are also characterized by a high expectation of something new, and also count into this second type. Finally, the rarest type, but interestingly with the highest importance ratings, was the perceived end of section; these were always connected with very obvious surface features and tempo changes (type C, clear endings).

Figure 4.

Boundary types defined on subjective ratings. Note. Symbolic representation of Type A (Beginnings after clear endings), Type B (Suspended beginnings), and Type C (Clear endings). Symbolic Gaussification (red) and music theoretical analysis (blue).

Taken together, listeners strongly focused on processual tendencies of the music: Even after clear endings from a music analytical perspective, participants seemed to focus on what was to follow and rated their expectancy of something new higher than the previous ending. However, if participants did agree that a segmentation served as an ending, the importance rating was stronger than for the beginning ratings. Or hypothesis that endings play an important role was confirmed, but we need to carefully distinguish between the individual high ratings for beginnings and the fact that forward-driven listening seems to be as important as the feeling of closure.

R4: Effect of Repeated Exposure Shows Fewer Segmentations but Differences in Agreement Rates

We hypothesized that the repeated exposure in the second trial would result in fewer boundaries, reflecting a growing structural awareness through familiarity. This holds true for both pieces (see Table 1). However, the agreement varied: in Piece 1, agreement decreased in the second trial, while in Piece 2 it increased (Table 1). This could be due to the difference of the stimuli. Piece 2 was more ambiguous and difficult to segment, and a second listening could indeed help to clear some of the ambiguities and result in a decrease of boundaries, summarizing and correcting experiences of the first trial. Piece 1 had more and clearer boundaries than Piece 2, and therefore the second trial in Piece 1 offered more possibilities for segmentation than Piece 2. Interestingly, even though segments did increase over trials as hypothesized, it was not possible to link it to a growing structural awareness through familiarity.

Discussion

This study examined how listeners segment classical music, focusing on (1) inter-rater agreement, (2) alignment with sonata form boundaries, (3) subjective boundary perceptions, and (4) the effects of repeated listening. Using two stylistically contrasting pieces, we found robust agreement among listeners, partial correspondence with theoretical structures, and distinct roles for perceived beginnings and endings in shaping listeners’ structural understanding. These findings underscore the gap between perceptual and analytical models of form and suggest the potential of perceptual data to inform theoretical discourse. We also consider limitations and outline future directions.

Inter-rater Agreement and Relation to Music Theory

Listeners showed above-chance agreement, particularly at salient formal boundaries, consistent with prior work emphasizing perceptual salience in segmentation (Addessi & Caterina, 2000; Clarke & Krumhansl, 1990; Deliège, 2001; Hartmann et al., 2016; Popescu et al., 2021). Agreement was stronger for Piece 1, which featured clear cadences and dynamic shifts. The Gaussified response peaks (Figure 1) further confirmed shared perceptual structuring, especially at the boundaries of main sections (Level 1). By contrast, the more ambiguous structure of Piece 2 yielded lower agreement, suggesting that formal clarity enhances listener synchronization.

The relationship between listener responses and music-theoretical boundaries reveals a complex interplay. While major structural boundaries—such as exposition-to-development transitions—were reliably identified, secondary themes and less salient boundaries were often missed. For Piece 1, boundaries marking the start of the development (1.1 in Table 3) and recapitulation (1.5) ranked highly, whereas analogous points in Piece 2 were recognized but with less pronounced peaks, reflecting the less distinct sectional divisions typical of Romantic style.

Rotation Principle and Secondary Theme in Sonata Form Theory as Purely Structural Categories?

A defining feature of sonata form is the recapitulation or double return, where the exposition is repeated, typically transposed to the home key. However, participants did not consistently mark formally analogous points. For example, the repetition of the primary theme in the exposition (boundary 1.9 in m. 27, see Table 3) showed high segmentation likelihood, while its recapitulation (m. 169) did not—challenging the perceptual salience of formal “rotations” (Hepokoski & Darcy, 2006). Listeners also highlighted boundaries not reflected in the analysis—for example, dynamic or timbral shifts like boundaries 1.7 or 2.6—emphasizing the weight of surface features in real-time perception.

Such findings suggest that rhetorical salience often overrides formal equivalence. Prior studies may have obscured this by using stimuli where surface and structural changes aligned (Phillips, 2020; Popescu et al., 2021) or by incorporating lyrics (Bruderer, 2008). Our results, by contrast, point to a multidimensional segmentation process shaped by tension and expectation (Huron, 2006).

From a theoretical standpoint, particularly within the Western sonata tradition, these findings raise the question of why some structurally important moments lack salient surface cues. For instance, secondary themes may be intentionally disguised by the absence of clear breaks (Richards, 2013). Hepokoski and Darcy's (2006) sonata theory assumes that each sonata follows a normative sequence of actions that may be fulfilled or denied, allowing for blurred or disguised boundaries. While analysts can infer such boundaries from the score, our real-time listening data suggest that “denied” boundaries have limited perceptual salience. In contrast, acoustically prominent moments—whether formally significant or not—strongly influenced listener segmentation and thus could inform music analysis.

Beginnings and Endings as Constituents of Musical Form

Subjective ratings on beginning, ending, and importance qualities revealed that while beginnings were more frequently marked, endings and importance ratings were more predictive of inter-rater agreement. This asymmetry aligns with studies on musical closure (Sears et al., 2018) and temporal expectancy (Jones & Boltz, 1989), suggesting that closure provides a stronger shared cue than initiation. Beyond perceptual salience, moments of closure may also hold deeper experiential significance: As Doğantan-Dack (2013) argues, the sense of completion and release at musical endings resonates with the embodied experience of temporal flow and affective resolution. In this broader view, closure is not merely a structural cue but a locus of musical meaning.

In our data, endings and importance ratings were especially correlated in Piece 2, suggesting that listeners assign greater significance to closure than to initiation—despite attending to both. While endings served as a powerful reference point in listeners’ mental representation of musical structure, the frequent description of boundaries as beginnings cannot be explained as easily. The segmentation process appears to operate on two levels: one emphasizing a few strong endings that elicit high intersubjective agreement and another focusing on the identification of beginnings in a more individual and variable manner. The prominence of beginnings echoes Krumhansl (1996), where listeners also marked more new ideas than closures in a sonata-form piece. This dual-layered pattern implies that listeners collectively recognize moments of closure as structural anchors, while their attention to beginnings reflects more personal or context-dependent interpretations of musical continuity.

Agreement on Endings and Highlighting the Role of Beginnings

The high rating of boundaries as beginnings, even following clear endings of the music analysis, suggests a processual listening mode—a forward-directed engagement described in dynamic theories of musical form (Huron, 2013). This highlights that listeners attend to both closure and continuity, experiencing segment boundaries as transitional rather than discrete (Burunat et al., 2024). Consensus among listeners may stem more from retrospective closure, when the ending is already in relation with the following music, underscoring listeners’ focus on transitional aspects of the music (Fink & Lange, 2025).

Our exploratory classification revealed a predominance of “beginnings after clear endings” (Type A) and “suspended beginnings” (Type B), indicating sensitivity to onset cues even in the absence of strong closure. Though clear endings (Type C) were rare, they were rated as highly important—highlighting their perceptual salience. Together, these findings suggest that segmentation judgments in our stimuli are characterized by a few clear endings with high agreement and at the same by a constant sensitivity to directional energy indicated by many individual beginning ratings. This dual focus reflects the tension between static and dynamic conceptions of form (Meyer, 1961): one emphasizing discrete sections and the other emphasizing process and continuity.

Effects of Repeated Listening and Challenges in Hierarchical Integration

As expected, repeated listening led to fewer markings, consistent with the idea that structural understanding develops over time and that exposure refines mental models of musical form (Margulis, 2012; Hansen & Pearce, 2014). However, effects on agreement differed by piece: In Piece 1, agreement decreased, possibly reflecting the exploration of alternative segmentation strategies within a clearer structure. In contrast, Piece 2 showed increased agreement, suggesting that repetition helped listeners converge on a shared interpretation (Peebles, 2011). Notably, increased familiarity did not enhance alignment with analytical boundaries, challenging the assumption that repeated exposure yields “correct” segmentation. Instead, our findings support a listener-centered model in which segmentation is shaped by formal, acoustic, and experiential factors (Clarke, 2005).

Contrary to Popescu's (2021) result that major boundaries are marked early in the light of a possible top-down approach of the listeners, we found that listeners often waited until after structural pauses to mark segment transitions. For example, Piece 2's most agreed-upon boundary (bar 244) was typically marked after the fermata and onset of the next section—suggesting a reliance on contextual continuity over anticipatory recognition. It furthermore supports our critique that stimuli with obvious surface cues, even in an unknown musical style, lead to prompt responses, while more complex stimuli evoke a different segmentation behavior.

Hierarchical integration, if present, manifested differently across the pieces. For Piece 1, decreased agreement in the second trial may reflect greater interpretive freedom, while Piece 2's increased agreement suggests convergence toward a more unified structure. Thus, clearer formal outlines may invite diverse segmentations, while ambiguous ones encourage shared frameworks through repetition.

Implications for Music Theory

Hanninen (2001) draws a helpful distinction between conceptual “ways of thinking” (p. 357) about segmentation and its perceptual features: Listeners access sonic and contextual cues—such as changes in timbre, texture, or rhythm—in real time, while structural criteria often function at a more abstract level, relevant primarily to composers, analysts, and performers. This may explain why theoretically crucial moments—such as essential expositional or structural closures (EEC, ESC; Hepokoski & Darcy, 2006)—were not marked in our data, suggesting they function more as generic constructs than perceptual events.

Our findings affirm a partial overlap between perceptual and structural segmentation but underscore their non-equivalence. Rather than privileging one framework, we advocate for integrating perceptual data into music-theoretical analysis. Salient surface changes—whether structurally significant or not—profoundly shape listener understanding and warrant analytical attention. Music analysis might benefit from supplementing top-down structural interpretations with bottom-up, perceptually informed insights. From a different perspective, but with a similar goal, Greenberg (2022) has recently criticized the holistic claim of Formenlehre that all individual formal parts are conditional in relation to a greater whole (p. 116). As an alternative interpretation of musical form, he suggests considering it diachronically rather than synchronically, supporting bottom-up models of form construction and emphasizing the cumulative effect of local stylistic cues.

Limitations and Future Studies

This study focused on two contrasting pieces, which, while appropriate for our aims, limits generalizability. Task design may also shape the results (Hartmann et al., 2016): The rating task in the second trial may have reduced the number of boundary markings due to increased attentional demands, complicating interpretations of reduced segmentation as increased familiarity.

Our findings resonate with previous research on inter-rater agreement and the prominence of perceived beginnings but also highlight the limited correspondence with sonata-based theoretical models. If listeners focus more on the dynamic flow of musical events than on analytically distinct sections, alternative paradigms may be warranted. Continuous measures such as tension ratings (Krumhansl, 1996; Lehne et al., 2013) could offer more nuanced insight into listeners’ evolving sense of form.

While the use of fixed recordings ensured comparability, it abstracts from the inherently performative and variable nature of musical form. Recent work in performance studies emphasizes that musical structure and meaning emerge through performative realization and embodied engagement rather than residing solely in the score (Cook, 2013; Doğantan-Dack, 2015; Leech-Wilkinson, 2012). Expressive nuances such as phrasing, articulation, and timing (Keller, 2012; Palmer, 1997) can shape how listeners perceive and segment music. Future studies could systematically vary performance parameters (e.g., Gingras et al., 2016; Keller & Appel, 2010) or combine empirical and qualitative approaches to explore how interpretive choices influence the perception of musical structure.

Overall, these considerations point to the value of integrative approaches that bridge cognitive, analytical, and performative perspectives (Doğantan-Dack, 2022) on musical form.

Conclusion

Segmentation—the parsing of musical flow—plays a central role in both music theory and music psychology, though the two fields approach it from different perspectives. While music theory tends to emphasize structural and deductive criteria, music psychology focuses on perceptual and experiential dimensions. Our study examined how listeners perceive segment boundaries and how those perceptions align—or diverge—from formal music-analytical interpretations.

Our study confirms that listeners segment music in meaningful, partially shared ways, shaped by both formal structure and surface features. While closures elicited stronger agreement and were rated as more important, beginnings were marked more frequently—reflecting a dynamic tension between retrospective closure and prospective anticipation.

These findings raise important questions for music theory. If composers, following historical (Koch, 1787) or reconstructed (Gjerdingen, 1996; Hepokoski & Darcy, 2006) conventions, crafted musical narratives with a clear dramaturgical intent, then music theory might more directly address how these intentions manifest in the acoustic and perceptual domain. Rather than viewing perception as secondary to structure, a more integrated, listener-centered approach would acknowledge how real-time experience informs and challenges formal interpretation.

Supplemental Material

sj-pdf-1-mns-10.1177_20592043261437593 - Supplemental material for Listener Perception and Musical Structure: A Study of Segmentation in Sonata Form

Supplemental material, sj-pdf-1-mns-10.1177_20592043261437593 for Listener Perception and Musical Structure: A Study of Segmentation in Sonata Form by L. T. Fink, K. Frieler, R. Muralikrishnan and M. Wald-Fuhrmann in Music & Science

Supplemental Material

sj-pdf-2-mns-10.1177_20592043261437593 - Supplemental material for Listener Perception and Musical Structure: A Study of Segmentation in Sonata Form

Supplemental material, sj-pdf-2-mns-10.1177_20592043261437593 for Listener Perception and Musical Structure: A Study of Segmentation in Sonata Form by L. T. Fink, K. Frieler, R. Muralikrishnan and M. Wald-Fuhrmann in Music & Science

Footnotes

Acknowledgment

The authors would like to thank the Deutsche Gesellschaft für Musikpsychologie for hosting the online experiment on DOTS, Pauline Larrouy-Maestri and Kyrill Fayn for their helpful suggestions regarding the procedure and data analysis, and Wolfgang Burgert (Divox AG) and Lisa Kahlden (New World Records) for providing the recordings used in this study.

Action Editor

Isabel Martinez, Universidad Nacional de La Plata, Facultad de Bellas Artes.

Peer Review

One anonymous reviewer.

Peter Harrison, University of Cambridge, Faculty of Music.

Ethical Approval

All experimental procedures were ethically approved by the Ethics Council of the Max Planck Society (Nr. 2017_12).

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability

All supplementary materials (further statistical analyses, music-theoretical analyses, annotated scores, and performance analyses) are available on the Open Science Framework: (Fink et al., 2026) The datasets generated and analyzed during the current study are available at https:/github.com/klausfrieler/form_segmentation (Frieler, 2025). An interactive visualization tool of all datasets for exploring listener responses and analytical boundaries is available at: https:/testing.musikpsychologie.de/form_segmentation/ (Frieler, 2023).

ORCID iDs

L. T. Fink

K. Frieler

R. Muralikrishnan

M. Wald-Fuhrmann

Supplemental Material

Supplemental material for this article is available online.

References

Addessi

A. R.

Caterina

(2000). Perceptual musical analysis: Segmentation and perception of tension. Musicae Scientiae, 4(1), 31–54. https://doi.org/10.1177/102986490000400102

Bregman

A. S.

(1990). Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press.

Bruderer

M. J.

(2008). Perception and modeling of segment boundaries in popular music. PhD Thesis Technische Universiteit Eindhoven.

Burunat

Levitin

D. J.

Toiviainen

(2024). Breaking (musical) boundaries by investigating brain dynamics of event segmentation during real-life music-listening. Proceedings of the National Academy of Sciences, 121(36), e2319459121. https://doi.org/10.1073/pnas.2319459121

Cannam

Landone

Sandler

(2010). Sonic Visualiser: An Open Source Application for Viewing, Analysing, and Annotating Music Audio Files. (Version 4.5.1) http://sonicvisualiser.org.

Caplin

E. (2010). What are Formal Functions?. In Caplin

W. E.

Hepokoski

J. A.

Webster

(Eds.), Musical Form, Forms & Formenlehre: Three Methodological Reflections (pp. 21–40). Leuven University Press.

Caplin

W. E.

(1998). Classical Forms: A Theory of Formal Functions for the Instrumental Music of Haydn, Mozart, and Beethoven. Oxford University Press.

Clarke

(2005). Ways of listening: An ecological approach to the perception of musical meaning. Oxford, Oxford University Press.

Clarke

E. F.

Krumhansl

C. L.

(1990). Perceiving musical time. Music Perception: An Interdisciplinary Journal, 7(3), 213–251. https://doi.org/10.2307/40285462

10.

Consortium Classicum. (2010). Jeanne-Louise Farrenc: Nonet & Clarinet Trio. Nonet in E-Flat Major op. 38, IV–Adagio-Allegro. [Audio recording]. Divox (CDX29205).

11.

Cook

(1987). The perception of large-scale tonal closure. Music Perception: An Interdisciplinary Journal, 5(2), 197–205. https://doi.org/10.2307/40285392

12.

Cook

(2013). Beyond the Score: Music as Performance. Oxford University Press.

13.

Cross

(1998). Music analysis and music perception. Music Analysis, 17(1), 3–20. https://doi.org/10.2307/854368

14.

Dahlhaus

(1978). Der rhetorische Formbegriff H. Chr. Kochs und die Theorie der Sonatenform. Archiv für Musikwissenschaft, 35(3), 155–177. https://doi.org/10.2307/930814

15.

Deliège

(1987). Grouping conditions in listening to music: An approach to Lerdahl & Jackendoff's grouping preference rules. Music Perception: An Interdisciplinary Journal, 4(4), 325–359. https://doi.org/10.2307/40285378

16.

Deliège

(1989). A perceptual approach to contemporary musical forms. Contemporary Music Review, 4(1), 213–230. https://doi.org/10.1080/07494468900640301

17.

Deliège

(2001). Introduction - similarity perception ↔ categorization ↔ cue abstraction. Music Perception, 18(3), 233–243. https://doi.org/10.1525/mp.2001.18.3.233

18.

Deliège

El Ahnmadi

(1990). Mechanisms of cue extraction in musical groupings: A study of perception on Sequenza VI for Viola Solo by Luciano Berio. Psychology of Music, 18(1), 18–44. https://doi.org/10.1177/0305735690181003

19.

Deutsch

(2013). The Psychology of Music. Academic Press.

20.

Doğantan-Dack

(2013). Tonality: The shape of affect. Empirical Musicology Review, 8(3-4), 208–218. https://doi.org/10.18061/emr.v8i3-4.3943

21.

Doğantan-Dack

(2015). Artistic Practice as Research in Music: Theory, Criticism, Practice. Taylor & Francis.

22.

Doğantan-Dack

(2022). Expanding the Scope of Music Theory: Artistic Research in Music Performance. Zeitschrift der Gesellschaft für Musiktheorie, 19(2), 13–42. https://doi.org/10.31751/1169

23.

Eitan

Granot

R. Y.

(2008). Growing oranges on mozart’s apple tree: “inner form” and aesthetic judgment. Music Perception, 25(5), 397–418. https://doi.org/10.1525/Mp.2008.25.5.397

24.

Farrenc

(1849). Nonet in E-flat major, op. 38 [Musical score]. Julian Gau. https://imslp.org/wiki/Nonet,_Op.38_(Farrenc,_Louise).

25.

Fink

L. T.

Frieler

Muralikrishnan

Wald-Fuhrmann

(2026). Supplementary Material & Data for the Article "Listener Perception and Musical Structure: A Study of Segmentation in Sonata Form". https://doi.org/10.17605/OSF.IO/RGUZV.

26.

Fink

L. T.

Lange

E. B.

(2025). Beginning, middle, end: Perception of temporal functions in sonata form. Music Perception, 43(4), 1–19. https://doi.org/10.1525/mp.2025.2327163

27.

Frankland

B. W.

Cohen

A. J.

(2004). Parsing of melody: Quantification and testing of the local grouping rules of Lerdahl and Jackendoff's A Generative Theory of Tonal Music. Music Perception, 21(4), 499–543. https://doi.org/10.1525/mp.2004.21.4.499

28.

Fredrickson

W. E.

(2000). Perception of tension in music: Musicians versus nonmusicians. Journal of Music Therapy, 37(1), 40–50. https://doi.org/10.1093/jmt/37.1.40

29.

Frieler

(2004). Beat and Meter Extraction Using Gaussified Onsets. In ISMIR. Proc. 5th International Conference on Music Information Retrieval. Universitat Pompeu Fabra, Barcelona, Spain.

30.

Frieler

(2023). Form Segmentation Analysis App. https://testing.musikpsychologie.de/form_segmentation/.

31.

Frieler

(2025) https://github.com/klausfrieler/form_segmentation .

32.

Gingras

Pearce

M. T.

Goodchild

Dean

R. T.

Wiggins

McAdams

(2016). Linking melodic expectation to expressive performance timing and perceived musical tension. Journal of Experimental Psychology: Human Perception and Performance, 42(4), 594–609. https://doi.org/10.1037/xhp0000141

33.

Gjerdingen

R. O.

(1996). Courtly behaviors. Music Perception, 13(3), 365–382. https://doi.org/10.2307/40286175

34.

Gotlieb

Konečni

V. J.

(1985). The effects of instrumentation, playing style, and structure in the Goldberg Variations by Johann Sebastian Bach. Music Perception, 3(1), 87–101. https://doi.org/10.2307/40285323

35.

Greenberg

(2022). How Sonata Forms: A Bottom-Up Approach to Musical Form. Oxford University Press.

36.

Hanninen

D. A.

(2001). Orientations, criteria, segments: A general theory of segmentation for music analysis. Journal of Music Theory, 45(2), 345–433. https://doi.org/10.2307/3653443

37.

Hanninen

D. A.

(2012). A Theory of Music Analysis: On Segmentation and Associative Organization. University Rochester Press.

38.

Hansen

N. C.

Pearce

M. T.

(2014). Predictive uncertainty in auditory sequence processing. Frontiers in Psychology, 5, 2014. https://10.3389/fpsyg.2014.01052 https://doi.org/10.3389/fpsyg.2014.01052

39.

Hartmann

Lartillot

Toiviainen

(2016). Multi-scale modelling of segmentation: Effect of music training and experimental task. Music Perception, 34(2), 192–217. https://doi.org/10.1525/mp.2016.34.2.192

40.

Hasty

(1981). Segmentation and process in post-tonal music. Music Theory Spectrum, 3(1), 54–73. https://doi.org/10.2307/746134

41.

Hepokoski

J. A.

Darcy

(2006). Elements of Sonata Theory: Norms, Types, and Deformations in the Late Eighteenth-century Sonata. Oxford University Press.

42.

Huron

(2006). Sweet Anticipation: Music and the Psychology of Expectation. MIT Press.

43.

Huron

(2013). A psychological approach to musical form: The habituation–fluency theory of repetition. Current Musicology, 96, 7–35. https:// 10.7916/cm.v0i96.5312

44.

Hutchison

J. L.

Hubbard

T. L.

Hubbard

N. A.

Brigante

Rypma

(2015). Minding the gap: An experimental assessment of musical segmentation models. Psychomusicology: Music, Mind, and Brain, 25(2), 103–115. https://doi.org/10.1037/pmu0000085

45.

Jones

M. R.

Boltz

(1989). Dynamic attending and responses to time. Psychological Review, 96(3), 459–491. https://doi.org/10.1037/0033-295X.96.3.459

46.

Karno

Konečni

V. J.

(1992). The effects of structural interventions in the 1st movement of Mozart Symphony in G-Minor K-550 on aesthetic preference. Music Perception, 10(1), 63–72. https://doi.org/10.2307/40285538

47.

Keller

P. E.

(2012). Mental imagery in music performance: Underlying mechanisms and potential benefits. Annals of the New York Academy of Sciences, 1252(1), 206–213. https://doi.org/10.1111/j.1749-6632.2011.06439.x

48.

Keller

P. E.

Appel

(2010). Individual differences, auditory imagery, and the coordination of body movements and sounds in musical ensembles. Music Perception, 28(1), 27–46. https://doi.org/10.1525/mp.2010.28.1.27

49.

Knösche

T. R.

, et al. (2005). Perception of phrase structure in music. Human Brain Mapping, 24(4), 259–273. https://doi.org/10.1002/hbm.20088

50.

Koch

H. C.

(1787). Versuch einer Anleitung zur Composition. Zweiter Teil. Adam Friedrich Böhme.

51.

Koch

H. C.

(1802). Musikalisches Lexikon. August Hermann.

52.

Konečni

V. J.

(1984). Elusive effects of Artists’ ‘messages’. Advances in Psychology, 19, 71–93. https://doi.org/10.1016/S0166-4115(08)62346-8

53.

Kreutz

(1995). Aspekte musikalischer Formwahrnehmung. In de la Motte- Haber

Kopiez

(Eds.), Der Hörer als Interpret (Vol. 7) (pp. 125–147). Peter Lang.

54.

Krumhansl

C. L.

(1996). A perceptual analysis of Mozart's piano sonata K. 282: Segmentation, tension, and musical ideas. Music Perception, 13(3), 401–432. https://doi.org/10.2307/40286177

55.

Lalitte

Bigand

(2006). Music in the Moment? Revisiting the Effect of Large Scale Structures. Perceptual and Motor Skills, 103(3), 811–828. https://doi.org/10.2466/pms.103.3.811-828

56.

Leech-Wilkinson

(2012). Compositions, Scores, Performances, Meanings. Music Theory Online, 18(1), https://www.mtosmt.org/issues/mto.12.18.1/mto.12.18.1.leech-wilkinson.php#Beginning. https://doi.org/10.30535/mto.18.1.4

57.

Lefkowitz

D. S.

Taavola

(2000). Segmentation in Music: Generalizing a Piece-Sensitive Approach. Journal of Music Theory, 44(1), 171–229. https://doi.org/10.2307/3090673

58.

Lehne

, et al. (2013). The influence of different structural features on felt musical tension in two piano pieces by Mozart and Mendelssohn. Music Perception, 31(2), 171–185. https://10.1525/mp.2013.31.2.171 https://doi.org/10.1525/mp.2013.31.2.171

59.

Lerdahl

Jackendoff

R. S.

(1983). A Generative Theory of Tonal Music. MIT Press.

60.

Levinson

(1997). Music in the Moment. Cornell University Press.

61.

Margulis

E. H.

(2012). Musical repetition detection across multiple exposures. Music Perception, 29(4), 377–385. https://doi.org/10.1525/mp.2012.29.4.377

62.

Marvin

E. W.

Brinkman

(1999). The effect of modulation and formal manipulation on perception of tonic closure by expert listeners. Music Perception, 16(4), 389–407. https://doi.org/10.2307/40285801

63.

Mas-Herrero

Marco-Pallares

Lorenzo-Seva

Zatorre

R. J.

Rodriguez-Fornells

(2013). Individual differences in music reward experiences. Music Perception, 31(2), 118–138. https://doi.org/10.1525/mp.2013.31.2.118

64.

McDonald

Wöllner

(2022). Appreciation of form in Bach’s Well-Tempered Clavier: Effects of structural interventions on perceived coherence, pleasantness, and retrospective duration estimates. Music Perception, 40(2), 150–167. https://doi.org/10.1525/mp.2022.40.2.150

65.

Mehta

, & New York Philharmonic. (1989). John Knowles Paine: Symphony No. 1. I–Allegro con brio. [Audio recording]. Anthology of Recorded Music.

66.

Meyer

L. B.

(1956). Emotion and Meaning in Music. University of Chicago Press.

67.

Meyer

L. B.

(1961). On rehearing music. Journal of the American Musicological Society, 14(2), 257–267. https://doi.org/10.2307/829760

68.

Müllensiefen

Gingras

Musil

Stewart

(2014). The musicality of non-musicians: An Index for assessing musical sophistication in the general population. Plos One, 9(2), e89642. https://doi.org/10.1371/journal.pone.0089642

69.

Muralikrishnan

(2019). Presentation UI Modules. In (Version Version 4.3) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.4032981

70.

Narmour

(1992). The Analysis and Cognition of MelodicCcomplexity: The Implication-Realization Model. University of Chicago Press.

71.

Narmour

(1996). Analyzing form and measuring perceptual content in Mozart's Sonata K. 282: A new theory of parametric analogues. Music Perception, 13(3), 265–318. https://doi.org/10.2307/40286173

72.

Neuhaus

Knösche

T. R.

Friederici

A. D.

(2006). Effects of musical expertise and boundary markers on phrase perception in music. Journal of Cognitive Neuroscience, 18(3), 472–493. https://doi.org/10.1162/jocn.2006.18.3.472

73.

Neurobehavioral Systems. (2020). Presentation (Version 22.0) [Computer software]. Neurobehavioral Systems.

74.

Paine

J. K.

(1908). Symphony No. 1 [Musical score] Leipzig, Breitkopf und Härtel. https://imslp.org/wiki/Symphony_No.1,_Op.23_(Paine,_John_Knowles).

75.

Palmer

(1997). Music performance. Annual Review of Psychology, 48, 115–138. https://doi.org/10.1146/annurev.psych.48.1.115

76.

Pearce

M. T.

Müllensiefen

Wiggins

G. A.

(2010). Melodic grouping in music information retrieval: New methods and applications. In Raś

Z. W.

Wieczorkowska

A. A.

(Eds.), Advances in Music Information Retrieval (pp. 364–388). Springer. https://doi.org/10.1007/978-3-642-11674-2_16

77.

Peebles

(2011). The Role of Segmentation and Expectation in the Perception of Closure. The Florida State University.

78.

Phillips

Stewart

A. J.

Wilcoxson

J. M.

Jones

L. A.

Howard

Willcox

du Sautoy

De Roure

(2020). What determines the perception of segmentation in contemporary music?. Frontiers in Psychology, 11(1001). https://doi.org/10.3389/fpsyg.2020.01001

79.

Popescu

Widdess

Rohrmeier

(2021). Western listeners detect boundary hierarchy in Indian music: A segmentation study. Scientific Reports, 11(1), 3112. https://doi.org/10.1038/s41598-021-82629-y

80.

Popescu

Widdess

Rohrmeier

(2021). Western listeners detect boundary hierarchy in Indian music: a segmentation study. Scientific Reports, 11(1), 156. https://doi.org/10.1038/s41598-021-82629-y

81.

Richards

(2013). Sonata form and the problem of second-theme beginnings. Music Analysis, 32(1), 3–45. http://www.jstor.org/stable/43864504 https://doi.org/10.1111/musa.12011

82.

Riemann

(1888). Wie hören wir Musik? Drei Vorträge. Max Hesse.

83.

Riemann

(1897). Grundriss der Kompositionslehre (Musikalische Formenlehre). I. (theoretischer) Teil. Max Hesse.

84.

Schaal

N. K.

Bauer

A.-K. R.

Müllensiefen

(2014). Der Gold-MSI: Replikation und Validierung eines Fragebogeninstrumentes zur Messung Musikalischer Erfahrenheit anhand einer deutschen Stichprobe. Musicae Scientiae, 18(4), 423–447. https://doi.org/10.1177/1029864914541851

85.

Schmuckler

M. A.

(1989). Expectation in music: Investigation of melodic and harmonic processes. Music Perception, 7(2), 109–149. https://doi.org/10.2307/40285454

86.

Sears

D. R. W.

, et al. (2018). Expecting the end: Continuous expectancy ratings for tonal cadences. Psychology of Music, 48(3), 358–375. https:// 10.1177/0305735618803676 https://doi.org/10.1177/0305735618803676

87.

Temperley

(2004). The Cognition of Basic Musical Structures. MIT Press.

88.

Tillmann

Bigand

(1996). Does formal musical structure affect perception of musical expressiveness? Psychology of Music, 24(1), 3–17. https://10.1177/0305735696241002 https://doi.org/10.1177/0305735696241002

89.

Tillmann

Bigand

(2004). The relative importance of local and global structures in music perception. The Journal of Aesthetics and Art Criticism, 62(2), 211–222. www.jstor.org/stable/1559204 https://doi.org/10.1111/j.1540-594X.2004.00153.x

90.

Zacks

J. M.

, et al. (2007). Event perception: A mind-brain perspective. Psychological Bulletin, 133(2), 273–293. https://doi.org/10.1037/0033-2909.133.2.273

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.51 MB

1.18 MB