Abstract
In the present study, a classically trained pianist aimed to investigate whether the visual cues in her performance influenced observers’ evaluation of her singing tone quality. The research experiment consisted of audiovisual stimuli in which some performances were mimicked to a preexisting track while others contained matching or mismatching musical material and body movements. Thirty-seven music students from the Sydney Conservatorium of Music and the Faculty of Music, University of Arts in Belgrade observed these audiovisual stimuli and rated the quality of a singing tone on a scale of 1 (poor) to 7 (excellent). Analysis of variance with one repeated factor of hand movement (minimal, medium, and large) was used to test the effects of movement type on the rating of singing-quality tone production in the mimicking part of the experiment, while a two-factorial, repeated-measure analysis of variance was used for the mismatching part of the experiment. The results for the mimicking part revealed no significant differences in ratings. Conversely, the results of the mismatching part of the experiment revealed that the effect of hand movement was significant, while the effect of sound and the interaction between hand movement and sound were not significant. Altogether, the results indicate that perception of tone quality is a complex phenomenon in which the visual aspect of a pianist's movements could play a vital role, as suggested by prior research. Further studies are needed to investigate other factors influencing the perception of tone quality, such as the possible connection between performers’ somatic senses and audience recognition of them.
Introduction
Musical performance is a multisensory experience wherein visual cues play a vital role in both the performer's and observer's perception of music. Seeing performers can provide a crucial insight into their expressive ideas (Broughton & Stevens, 2009; Davidson, 1993; Siminoski et al., 2020; Vuoskoski et al., 2014; Wapnick et al., 2004), action understanding (Broughton et al., 2021), and specific emotions (Dahl & Friberg, 2007). Performers’ hand-arm movements are considered to be “bodily analogues” of their expressive musical goals (Leman et al., 2017, p. 184; see also Leman, 2007). Although gestures can enhance musicians’ auditory communication, they can also alter the audience's perception of musical elements, such as the pitch and size of intervals (Thompson et al., 2010), duration of notes (Schutz & Lipscomb, 2007), and the evaluation of phrasing, rubato, and dynamics (Juchniewicz, 2008). Although numerous studies have explored the effect of visuals on observers’ perception of musicians’ expression, the literature lacks empirical investigations regarding the impact of pianists’ movements on the audience's evaluation of tone quality (also known as timbre or sound). However, tone quality is an essential feature in piano playing and serves as a critical expressive tool for pianists (Berman, 2000/2017; Li & Timmers, 2020; Neuhaus, 1973/2002; Sándor, 1981; see also Bernays, 2013). Moreover, pianists and piano pedagogues believe that an embodied experience plays a crucial role in producing timbre (Doğantan-Dack, 2011, 2016; Li et al., 2021; Li & Timmers, 2020), where perception and communication depend on both auditory and visual facets (Li et al., 2021; Li & Timmers, 2020; Ortmann, 2011; Parncutt, 2013; Parncutt & Troup, 2002; Wapnick et al., 2004). According to Parncutt (2013), tone quality “often depends on feelings in the body while performing, or an audience projection of those feelings” (p. 4). As singing-quality sound is indispensable for every pianist, and, as a pianist myself, I am interested in investigating whether these findings from the literature can benefit my communication with the audience. Accordingly, I designed and performed the experimental stimuli in the current study. This paper aimed to investigate whether the visual aspect of gestures influenced observers’ rating of my singing-quality tone production, or, more specifically, whether different degrees of movement amplitude, such as minimal, medium, and large, affected the evaluation of singing tone quality. The study therefore contributes to research on music perception by offering insights from the performer's perspective and outlining the practical applications of research for performers.
The Role of a Pianist's Movement and Touch in Tone Production
For centuries, influential pianists and teachers have written about piano playing, provided commentary and instructions on developing technical skills, and optimized movements to achieve the intended tone production while preventing injuries. The elaborate literature grew hand in hand with the advancement of the piano in the 19th century. Although the teaching approach was significantly different between the first and second halves of the 19th century, a majority of pedagogues regarded the way the key was touched as a crucial determinant of tonal quality (for an overview, see Gerig, 1974/2007). The teachings of Clementi (1801/1974), Cramer (1812), and Hummel (1827) endorsed the “finger technique” without much arm movement. On the other hand, educators of the late 19th and early 20th centuries recognized the need to incorporate the use of the arm in piano performances to reduce stiffness and achieve better tone production. The flexibility of the hand and arm is viewed as a crucial factor in creating cantabile melodic lines.
However, throughout the last century, the literature has presented opposing views on whether a player's touch can influence the quality of a single tone as a separate factor from the hammer velocity. Scholars have argued that the speed at which a hammer hits a string is the only factor determining the timbre and volume of sound (Hart et al., 1934; Ortmann, 2011; White, 1930). The impact of various touches on key movement was also examined. Ortmann (2011) referred to Newton's laws of motion, whereby the force applied to the key equals the product of mass and acceleration in the direction of the force; applying more force to the keys resulted in increasingly accelerated key movement and, consequently, louder sound production. For example, a note played with a “rigid arm” would sound louder than a note produced with a “relaxed arm” due to different key velocities (pp. 22–23). However, there was no variance in the key movements between the two touches when playing with the same intensity. According to Ortmann (2011), the only circumstance in which a player can control a single-tone quality is when they have to minimize the noise of a finger hitting the key while striking from a certain distance.
Recent studies have tested listeners’ ability to discriminate between two equally loud single tones produced by two different types of touch: pressed and struck (Furuya et al., 2010; Goebl et al., 2004; Suzuki, 2007). In the pressed condition, a player touched the key without raising a finger, while in the struck condition, the key was hit with a raised finger. Goebl et al. (2004) suggested that half of the participating musicians can differentiate between two notes of identical sound intensity based on the auditory cues of finger strikes on the keys. Although these touch noises might be imperceptible in live performances, they were crucial in discriminating between the two touches in a controlled study, without which the participants would be unable to recognize the difference. The noise of a struck touch might also explain the harsher timbre perception in the study by Furuya et al. (2010), as proposed by the authors.
Similarly, Suzuki (2007) compared the spectral difference between the soft and hard touches of the G3, G4, and G5 notes. The spectrum of the two touches differed slightly only with G5 notes. Also, some participants were able to perceive these distinctions in sound (p. 6). However, the author acknowledged that the tone production change was marginal compared to the pianists’ expectations. Furthermore, the findings also indicated that participants could distinguish the tones with key-bottom sounds that occurred once the key reached the keyframe felt from those in which the felt was avoided (Goebl et al., 2014).
Contemporary literature acknowledges that the role of piano touch extends beyond merely producing sound. The visual component of a pianist's movement is crucial for observers to perceive the player's intended tone quality (Goebl, 2017; Li & Timmers, 2020; Li et al., 2021; Parncutt, 2013).
Multimodal Experience of Tone and Music Performance
Timbre is regarded as a perceptual characteristic of sound that cannot be measured in terms of purely physical properties (McAdams, 2019; Wallmark et al., 2021). The literature suggests that the integration of several sensory modalities contributes to the experience and evaluation of tone quality.
Ortmann (1927, as cited in Gerig, 1974/2007) suggested that pianists’ imagination and the “kinesthetic sensation” of the desired tone could be illustrated by their movements, enabling the audience to perceive performers’ intentions visually (p. 440). For example, seeing a pianist “simulating vibrato” may result in an observer's illusion of a more extended and resonant tone (Rosen, 2002, p. 31). Similarly, a tone produced with the “flat” hand posture would take on a similar quality, sounding “dull” and “dry” (Caland, 1903/2010, p. 41). Therefore, movements can both improve and impair the perception of a performer's sound quality. Parncutt and Troup (2002) speculated that watching a performer violently hammering the keyboard could result in the perception of a harsher tone. These views indicate that if a pianist plays a cantabile melodic line using circular hand movements, the impression of timbre might be “rounder” and more resonant than a timbre that generates similar hammer velocity but is produced with restricted movement. Indeed, a recent empirical investigation by Li et al. (2021) showed that the visual aspect of a performer significantly contributed to the communication of certain timbres. Furthermore, the study found an evident cross-modal relationship between tone qualities and shapes (Li et al., 2021).
Studies of audiovisual sensory integration have previously illustrated that vision can alter auditory information. For instance, in speech perception, the sound “ba” was perceived as “da” when paired with the lip movement uttering a different syllable “ga” (McGurk & MacDonald, 1976). This is also true for music, where the perception of mismatched aural and visual stimuli is influenced by the latter. Research indicates that vision can affect listeners’ perceived and felt emotions (Krahé et al., 2015) and create an illusion of a note duration (Schutz & Lipscomb, 2007), as well as affecting the perception of pitch and the size of intervals (Thompson et al., 2010). For example, in marimba performances, Schutz and Lipscomb (2007) found that the same note duration could appear different when paired with different hand movement sizes: A long movement provided an impression of a significantly longer note than a short movement.
Empirical studies have also identified movement as crucial to conveying musicians’ expressive intentions (Broughton & Stevens, 2009; Davidson, 1993; Nusseck & Wanderley, 2009; Vuoskoski et al., 2014) and specific emotions (Dahl & Friberg, 2007) to the audience. The research also demonstrated that observers could understand a pianist's musical communication based on visual kinematic information alone (Davidson, 1993; Siminoski et al., 2020). In the study of Davidson (1993), participants were asked to rate the three levels of a performer's expressivity in three different modes: sound only, sound and vision, and vision only. The pianist played in three performing manners: deadpan (without expression), projected (regular expression), and exaggerated (overstating the expression). The results indicated that the expressivity rating was similar between projected and exaggerated conditions in the sound-only mode but was distinguishable in the other two modes, in which the visual aspect of the performer was available. Similarly, Vuoskoski et al. (2014) found that although both sound and vision are pivotal in conveying a player's expressivity, the influence of the visual mode on the audience's perception was more significant than the auditory mode.
In a different methodological approach in which the pianist played along with the prerecorded audio in three performing conditions, the player's body movements appeared critical in assessing specific musical elements, such as phrasing, dynamics, and rubato (Juchniewicz, 2008). The results showed that, as the bodily movements of the performer increased, the observers’ perceptual judgment of the amount of phrasing, rubato, and dynamics was higher.
Many researchers and pianists have noted anecdotally that observing a specific touch, such as a flat or curved finger, used by a pianist may affect the assessment of tone quality. To the best of my knowledge, this phenomenon has not been empirically examined. Given the scarcity of studies on timbre perception, the current study reviewed broader literature on visual perception in music. The investigations predominantly employed experimental approaches wherein musicians were video-recorded while executing an excerpt of the musical piece under different performance conditions:
Different expressive intentions and emotions or distinct categories of gestures. Subsequently, the auditory and visual elements of the stimuli were swapped to generate congruent and incongruent pairings for the audience (Krahé et al., 2015; Saldaña & Rosenblum, 1993; Schutz & Lipscomb, 2007; Thompson et al., 2010; Vuoskoski et al., 2014). Different expressive intentions and emotions or distinct categories of gestures. Subsequently, the auditory and visual elements of the stimuli were swapped to generate congruent and incongruent pairings for the audience (Krahé et al., 2015; Saldaña & Rosenblum, 1993; Schutz & Lipscomb, 2007; Thompson et al., 2010; Vuoskoski et al., 2014). Mimicry of prerecorded audio using different types of movement categories such as no movement, head and facial movement, and full-body movement (Bland & Cho, 2021; Juchniewicz, 2008).
In summary, the research has demonstrated that performers’ movements are crucial for communicating their expressive goals or particular emotions to the audience. Studies have documented that visual cues can alter aural information and that observing a musician's gestures can affect performance evaluation.
Gestures Today: No Rules, No Methods
The literature has confirmed that gestures in music performance have various purposes (Dahl et al., 2010; Bishop & Goebl, 2018), the functions of which often overlap. According to Jensenius et al. (2010), the four main types of gestures are classified as sound-producing, communicative, sound-facilitating, and sound-accompanying. However, some gestures can serve more than one purpose at the same time. For example, Dahl et al. (2010) suggested that gestures directly involved in sound production may convey a meaningful musical message to an audience. This statement suggests that a flexible circular wrist motion can soften sound intensity and indicate the performer's intention of tone quality (e.g., a round timbre).
There is no consensus in the current literature on a single method that players should follow when organizing their movements to produce sound (Dahl et al., 2010; Doğantan-Dack, 2016; Li & Timmers, 2020; Li et al., 2021; MacRitchie, 2015; MacRitchie & Zicari, 2012). The way a pianist moves their hands and touches the piano key is highly individual (Bernays & Traube, 2014; Dalla Bella & Palmer, 2011; Li & Timmers, 2020; Li & Timmers., 2021) and depends on many factors, the most important of which is the player's imagination of an intended tone and musical circumstances (Berman, 2000/2017; MacRitchie & Zicari, 2012). Tone production is integrated with a performer's “feeling” of the finger-key contact (Doğantan-Dack, 2011, 2016; Goebl, 2017; Li & Timmers, 2020; Parncutt, 2013, p. 765), thus forming a player's unique signature (Berman, 2000/2017; Neuhaus, 1973/2002; Sándor, 1981). Consequently, there is no definite guide that pianists can follow to communicate their sound intentions to an audience successfully. The literature agrees that proprioception and haptic feelings are essential in interpreting a piece and creating a desired tone (Doğantan-Dack, 2016; Li & Timmers, 2020; Li et al., 2021).
Furthermore, the visual aspect of a pianist's touch or movements can significantly contribute to the perception of tone quality (Li et al., 2021; Ortmann, 2011; Parncutt, 2013; Parncutt & Troup, 2002; Rosen, 2002). However, as pianists have restricted command over timbre once the key is pressed, using “larger gestures,” according to Dahl et al. (2010), could compensate for the instrument's limitation in producing a continuous sound (p. 62).
A Singing Tone
A singing tone mainly implies an impression of a resonant sound with a “long-lasting” intensity (Berman, 2000/2017, p. 17), usually described by piano educators in timbre terms, such as round, rich, and deep (for an overview, see Doğantan-Dack, 2016, p. 180). To generate an “illusion” of a tone that sings, a pianist has to conceal the impression of the hammer striking the strings (Berman, 2000/2017, pp. 29–31) while minimizing the impact caused by the finger touching the key (for an overview, see Berman, 2000/2017; Lhevinne, 1924/2014).
A player's haptic sensations are a crucial element in the production of a singing sound. For example, a pianist has been advised to experience a sensation of “grasping the key” (Lhevinne, 1924/2014, p. 21), carrying the “pressure” to the following one (Doğantan-Dack, 2016, p. 185), and “shaping the phrase as if moulding warm clay” (Berman, 2000/2017, p. 27; for an overview, see Doğantan-Dack, 2016, p. 180). To attain a desirable “kinesthetic/tactile/aural” experience, a pianist should not perceive the keys not as rigid but rather as an “ever-changing surface” that reacts to touch with “buoyancy and rebound” (Doğantan-Dack, 2016, p. 178).
However, the guidance on a singing touch is too broad to apply to every circumstance demanding a cantabile manner of playing. Doğantan-Dack (2016) asserts that written teaching instructions lack “specific examples” for executing singing-quality sound; hence, the guidance on sound images and haptic sensations is not “related” (p. 181). She states that the literature insufficiently examines the various haptic sensations experienced by a performer in different musical compositions or contexts. For example, a player does not experience a uniform haptic sensation in all situations involving a singing-quality sound. Instructions unrelated to the musical context may also induce physical discomfort in a performer. Doğantan-Dack (2016) posits that the selection of gestures should arise from the performer's autonomy to interpret the work based on their personal convictions. According to her, pianists must be true to their individual experiences and beliefs to evolve their artistry.
Present Study
The present study investigated whether the visual aspect of my movements affects the audience's rating of the singing tone quality in my performance of Chopin Nocturne No. 21 in C Minor. I created two experiments combining established methodologies summarized in the previous section in such a way that some excerpts consisted of matching and mismatching audiovisual parts (Saldaña & Rosenblum, 1993; Schutz & Lipscomb, 2007; Vuoskoski et al., 2014), while other excerpts were played along with prerecorded audio (Juchniewicz, 2008). Therefore, the first part of the experiment is called “mismatching,” while the second part is named “mimicking.”
Method
Participants
Thirty-seven undergraduate and postgraduate students (15 males and 22 females) from the Sydney Conservatorium of Music and Faculty of Music, University of Arts in Belgrade, participated in the study. Participants from two different institutions were surveyed to increase the number of observers, which was challenging due to COVID-19 restrictions being in place when the study was conducted. The participants were between 18 and 28 years old, with a mean age of 20.62 years (SD = 2.21). All the participants were musicians: 25 reported the piano as their main instrument, while the others were studying composition, voice, classical percussion, opera singing, viola, flute, cello, trumpet, bass guitar, oboe, or horn. No participants were familiar with me as a performer before the current study.
I gave each participant a participant information statement. The students received the English version of the statement from the Sydney Conservatorium of Music, and the Serbian version (translated from English) from the Faculty of Music, University of Arts in Belgrade. All participants signed the written consent form and received either a $10 canteen voucher (for participants from the Sydney Conservatorium of Music) or €10 (for participants from the Faculty of Music, University of Arts in Belgrade) as reimbursement for participation.
Stimuli and Materials
I performed audiovisual stimuli in this study as a part of my Doctor of Musical Arts (DMA) degree in piano performance at the Sydney Conservatorium of Music. I have been playing the piano for 30 years. I selected an excerpt from Chopin's Nocturne 21 in C Minor and performed it using three different types of movements to achieve a consistent singing tone quality across each type, comprising:
Minimal hand and arm movements, Medium hand and arm movements, and Large hand and arm movements.
In this study, “minimal movements” refer to those movements necessary solely for the technical execution of the composition. These movements involve only finger movements while the rest of the arm remains stationary. “Medium movements” represent my typical approach to playing cantabile, incorporating both finger and forearm movement. “Large movements” involve finger, forearm, and upper arm movements, such that the entire arm participates in the execution. The medium and large movements follow curvilinear contours – forearm circular motion in the medium movements and whole-arm circular motion in the large movements – executed on the same notes in the piece.
Figures 1, 2, and 3 present snapshots of stimuli demonstrating the three movements varying in amplitude.

Snapshots of stimuli demonstrating minimal movement.

Snapshots of stimuli demonstrating medium movement.

Snapshots of stimuli demonstrating large movement.
I recorded the audio files in a recording studio on an Alex Steinbach grand piano, using Neumann KM 84 microphones in an XY configuration at close proximity for the audio recordings. Using a Canon EOS 100D digital camera, I filmed the visuals in HD (1920 × 1080 pixels), in a 25 frames per second video format. I recorded the audio and visual modes in the same music studio. As Figures 1, 2, and 3 illustrate, I positioned the camera on the right side of the piano, capturing my torso, shoulders, and hands. Furthermore, I excluded my head from the video frame to prevent the potential influence of facial expressions on the audience's perception. To edit the audiovisual clips, I employed Vegas Movie Studio 16 Platinum software, capping the duration of both video and sound in all 12 videos at 38 s. None of the audiovisual combinations were artificially synchronized to match the rhythm and tempo in mismatched audiovisual combinations.
After several months of practicing and recording 12 Chopin nocturnes, I selected Nocturne No. 21 for the current study. The piece's structure allowed me to execute three types of movements in the same places without experiencing body tension Figure 4.

Chopin: Nocturne No. 21 in C Minor, bars 1–8.
Material Preparation
In the mimicking part, I prerecorded a single audio track to achieve a particular singing-quality tone. Subsequently, I filmed myself mimicking the same audio, using three types of movements varying in size. However, given the possibility of sound alterations caused by movement variations in real-life scenarios, it was essential to provide audiovisual recordings in which each type of movement generates its corresponding sound. To address this concern, I made the mismatching component of the experiment, which offered all three sound and movement combinations in a 3 × 3 design. Consequently, I could control for any extraneous confounding variables resulting from unintentional sound changes.
First Part of the Experiment–Mismatching
I recorded an excerpt lasting 38 s three times in audiovisual mode, aiming to produce a specific singing-quality tone in three different performing manners while employing the same types of movements as stated previously. After recording the three audiovisual excerpts, I combined each sound (S) recording with each video movement (M) recording, which resulted in nine combinations (see Table 1).
Second Part of the Experiment – Mimicking
First, I recorded an excerpt lasting 38 s in audio mode to produce a singing tone quality using large movements. After that, I video-recorded myself playing along to the same audio recording three times, each time using a different type of movement: minimal, medium, and large. This part of the experiment represents the three audiovisual stimuli in which a pianist mimics the prerecorded sound. 1
Movement Analysis
To assess stimuli manipulation and determine whether minimal, medium, and large movements significantly differ in size, I took movement measurements from the performing stimuli, each depicting one movement type. I used Adobe Premier Pro software to record measurements as follows:
I captured a snapshot of each movement type from the stimuli at the same location, that is, during the execution of the same note in the right-hand melodic line (note G on the third beat, Bar 1). Thereafter, I repeated the same process five times, varying the locations of the performing stimuli (e.g., note C on the fourth beat). I overlayed each snapshot with a coordinate system overlayed, with the 0,0-point in the picture's lower right corner. I marked and measured two coordinates, one of the wrist and another of the elbow, on each snapshot. This process yielded 12 measurements per movement type and 36 measurements in total for all three types of movement stimuli.
I calculated movement amplitude as a vector length based on two coordinates (x and y) using Pythagoras’ theorem. The results of one-way analysis of variance (ANOVA) revealed significant differences among all three types of movements for both wrist (F[2,15] = 64.25, p < .01, η2 = .89) and elbow amplitude (F[2,15] = 350.91, p < .01, η2 = .98). As anticipated, the amplitude was highest in the large movement stimuli type and lowest in the minimal movement stimuli type.
Audio Analysis
I took peak and LUFS measurements of each sound to examine the loudness variations of the audio materials in the two experiments. The one-way ANOVA results demonstrated no significant differences in either peak values (F[3, 21] = 0.97, p > .05) or LUFS values (F[3, 21] = 2.38, p > .05) between the four performing conditions (three from the mismatching part and one from the mimicking part of the experiment). These results indicate that all types of sounds maintained comparable levels of volume.
Procedure
I conducted the study in either individual sessions or small groups of two participants in quiet rooms at the Sydney Conservatorium of Music and Faculty of Music, University of Arts in Belgrade. Before the study commenced, participants completed a short questionnaire about their age and primary musical instruments. Thereafter, I assigned each participant a computer containing musical examples. I asked them to observe 12 audiovisual clips in random order and rate their perceptions of the singing tone quality on a scale from 1 (poor) to 7 (excellent). The participants confirmed that they understood the task and were familiar with the term “singing tone quality.” To listen to the stimuli, I provided participants with Sennheiser HD 580 headphones
Each video was followed by a short pause, which gave participants time to complete the rating. The experiment lasted approximately 15 min.
Results
“Mismatching” Part of the Experiment
The results of the two-factorial repeated measures analysis of variance (ANOVA) revealed that the effect of hand movement was significant (F[2, 72] = 10.02, p < .001, η2= .22), while the effect of sound (F[2, 72] = 0.63, p > .05, η2= .02) as well as the interaction between hand movement and sound (F[4, 144] = 2.13, p = .08, η2= .06) were not significant. Despite the non-significant interaction effect, the Sidak post hoc test results suggested a potential interaction between movement and sound, as only certain sound-movement combinations showed differences between movement types.
Sidak post hoc tests for differences in singing tone quality for three movements in all three types of sound.
Mauchly's test of sphericity was not significant for all effects, including sound (χ2[2] = 1.8, p > .05), movement (χ2[2] = 3.72, p > .05), and the interaction between the two (χ2[9] = 12.63, p > .05). This finding indicates that the sphericity assumption was satisfied. Moreover, in all nine conditions, the reported singing tone quality measure did not deviate significantly from the normal distribution, since absolute values of standardized skewness ranged between 0.13 and 2.09 and kurtosis ranged between 0.07 and 1.06. Neither the standardized skewness nor the kurtosis values exceeded 2.58.
As previously mentioned, this part of the experiment contained the three types of movements that produced the three types of sounds: derived from minimal movement (minimal sound), derived from medium movement (medium sound), and derived from large movement (large sound).
For minimal sound, the participants rated the singing tone quality lower when matched with large movements than when matched with minimal or medium movements. For medium sound, the participants rated the singing tone quality lower when matched with minimal movements than when matched with medium or large movements. For large sound, they rated the singing tone quality lower when matched with large movements but higher when matched with medium movements (see Table 2, Figure 5).

Mean ratings of singing tone quality for all three movements and all three sounds.
Conversely, when compared all three types of sounds , separately within each movement type, the Sidak post hoc test revealed significant differences only when sounds were matched with large movements. Differences were such that large sound was perceived to have lower singing tone quality than either minimal (p < .05) or medium sound (p < .05).
Since the sample comprised 25 (67.6%) pianists and 12 (32.4%) non-pianists, I additionally controlled for the effects of the instrument played by the participants (i.e., piano versus another instrument). I applied a three-factor ANOVA for hand movement (minimal, medium, or large) and sound (derived from minimal, medium, or large hand movements) as within-subject factors and instrument (piano or non-piano) as a between-subjects factor. Once again, I used the participants’ ratings of singing tone quality as a dependent measure. This analysis confirmed the results, as only the effect of hand movement was significant (F[2, 70] = 6.37, p < .01, η2= .15), while all other effects of sound (F[2, 70] = 0.52, p > .05, η2= .01), instrument (F[1, 35] = 1.03, p > .05, η2= .03), and their interactions, including sound * instrument (F[2, 70] = 0.10, p > .05, η2= .003), movement * instrument (F[2, 70] = 1.96, p > .05, η2= .05), sound * movement (F[4, 140] = 2.12, p > .05, η2= .04), and sound * movement * instrument (F[4, 140] = 0.46, p > .05, η2= .01), were non-significant.
Mimicking Part of the Experiment
The ANOVA results with one repeated factor of hand movement (minimal, medium, and large) indicated that there were no significant differences (F[2;72] = 1.91, p = .156, η2= .05). Mauchly's test of sphericity was not significant for the effect of movement (χ2[2] = 3.7, p > .05), indicating that the sphericity assumption was satisfied. Additionally, in all three conditions, the dependent variable did not deviate significantly from the normal distribution, since absolute values of standardized skewness ranged between 0.23 and 0.63 and kurtosis between 0.69 and 1.47. Neither the standardized skewness nor the kurtosis values exceeded 2.58. These results indicate that the participants assessed the audiovisual stimuli containing minimal, medium, or large movements as having the same singing tone quality when combined with a prerecorded sound (Figure 6).

Mean ratings of singing-quality tone for three movement types.
As in the previous experiment, I additionally controlled the effects of the instrument played by the participants. Thereafter, I applied two-factorial ANOVA with hand movement (minimal, medium, or large) within the subject and instrument (piano or non-piano) as between-subject factors to test their effects on singing tone quality ratings. Again, this analysis confirmed the results, as all effects – including hand movement (F[2, 70] = 1.69, p > .05, η2= .05), instrument (F[1, 35] = 0.52, p > .05, η2= .01), and their interactions (F[2, 70] = 0.04, p > .05, η2= .001), were non-significant.
Discussion
In this study, as a performer and researcher, I investigated whether the different degrees of my movements would affect observers’ evaluations of my singing tone quality in solo piano performance by creating two types of audiovisual performances: “mismatching” and “mimicking” (see Table 1).
Audiovisual stimuli material representing the original sound (S) and movement (M) productions (in the diagonal) and their mismatched combinations.
Note. The sound levels from the three audiovisual recordings are labeled as S1 (produced by minimal movements), S2 (produced by medium movements), and S3 (produced by large movements). The movement levels from the three audiovisual recordings are labeled as M1 (minimal movements), M2 (medium movements), and M3 (large movements).
Mismatching Part of the Experiment
The mismatching part of the experiment revealed that, in most combinations of sound and movement, participants rated the tone quality of the same audio significantly differently depending on the type of movement displayed. There was a predominantly lower tone quality rating pattern when audiovisual stimuli displayed minimal movements compared to the other two movements, medium and large. Participants gave low ratings to all three types of sounds—those produced by minimal, medium, and large movements—for the singing tone quality when the sounds were presented alongside a video featuring minimal movements. However, when, for example, the experiment paired a sound produced by minimal movements with a video representing large movements, the participants evaluated the singing tone quality as significantly higher. The same was true for the participants’ perception of sounds produced by medium and large movements: They rated the singing tone quality of the sound produced by medium movement significantly higher when it was combined with a video of medium or large movements as opposed to minimal movements.
Similarly, the participants rated the sound produced by large movements significantly higher when it was matched with a video with medium movements compared to a video with minimal movements. In other words, whether the sound originated from minimal, medium, or large movements, the participants’ perceptions of the singing tone seemed diminished when audiovisual performances displayed minimal hand activity. Since the participants rated almost every sound combined with minimal movements lower , this result complements Davidson's (2002) finding that the lowest expressivity rating was given to “low amplitude movements” (p. 26).
Participants rated the singing tone quality similarly in all three sound recordings (S1, S2, and S3) that I presented with a video of medium hand movements. Furthermore, most of these pairings received the highest rating. As medium movement had a greater amplitude compared to minimal movement, the present findings align with a postulation by Dahl et al. (2010, p. 62) that larger gestures would better facilitate a pianist's expressive intention of sound. By contrast, the participants judged the singing-quality tone to be higher in some combinations but lower in others when audiovisual stimuli were presented with large hand movements. For instance, the placement of a video of large hand movement over the sound produced by minimal movement significantly improved how the quality of the singing tone was judged. However, the large movement in its congruent form (S3 + M3) reduced the participants’ judgment of its quality. These results, based on the Sidak test for post hoc pairwise comparisons, suggest that some interaction between movement and sound might exist even though the interaction effect was not significant. Although one can only speculate what caused the improvement of tone quality in some sound and movement combinations but not in others, I address some possible factors from a performer's standpoint that may have contributed to the observers’ ratings in the subsequent discussion.
Regarding minimal movements, the participants likely deemed the singing-quality sound to be naturally poorer when it was produced with minimal hand activity, despite my effort to produce a tone quality that was as similar as possible. Producing a sound with minimal wrist and hand movements might have affected the singing quality, as Lhevinne (1924/2014) and many other piano pedagogues have suggested. However, when I paired the same sound with a video featuring large wrist and arm movements, it seemed to enhanced the participants’ perception of its quality.
Similarly, a large hand motion could have affected the timbre in some way, such as by producing a more uneven tone on certain occasions (referred to as “bumps” in Lhevinne 1924/2014, p. 21), despite my efforts to create a similar singing sound across the three conditions. Although the audio analysis revealed no significant differences in average peak and LUFS values among the sound produced from large, minimal, and medium movement, participants likely did not perceive only certain tones smoothly, which might have contributed to the their judgment of tone quality. If this were true, the visual aspect of large movements might have amplified the potential roughness of certain tones, consequently influencing the rating when both the sound and movement were presented in their natural form. These minor auditory variations may have had a negligible impact on the audience's perception of tone quality had the performances only been heard in audio mode.
Nevertheless, pairing the same sound with a video of minimal hand activity did not contribute to the perception of a more rounded tone quality, given the static impression of the hand. Deppe has noted that “a flat pose of the hand sounds flat” (Caland, 1903/2010, p. 45). However, when the same sound, produced by a large movement, was matched with a video of medium movement, the movement could have improved the perception of tone quality in a way that reduced the potential roughness, such as uneven tones, and smoothed the bumps. For example, in the conditions portraying medium and large movements, I was able to “round” the sound (round timbre is a key attribute of a singing tone) by using round movements. Following Deppe's recommendation (Caland, 1903/2010), which complemented my intuition, I sought to avoid the perception of a sharp or flat tone through the shape of limb motions. Given the generally better rating of tone quality when sounds were combined with medium and large movements in contrast to minimal movements, the participants might have had an “association between the sound outcome and the round shape” (Li et al., 2021, p. 13). Indeed, cross-modal interference, also known as weak synesthesia, could play a part in the perception of timbre, as suggested by Parncutt (2013).
Still, one should also consider other potential contributing factors. As stated in the introduction, the observers’ perception of tone quality may be linked to the perception of the performer's bodily sensations (Parncutt, 2013). This argument raises the question of whether there is a relationship between consistently high ratings of tone quality associated with medium movement and my most enjoyable bodily sensations when using medium movement. For example, despite my attempts to remain faithful to my intention to produce a singing tone regardless of the type of movement used, I experienced a sensation of “moving” sound while the hand was being raised in the medium condition only. More specifically, I had a profound feeling of the hand sculpting an oval-shaped sound, which complemented the intention of the round timbre in the best way and provided the most satisfying haptic feeling. This sensation could result from various factors, including the characteristics of the medium movement type that might be more suitable for the execution of the intended tone in this specific musical context than the other two types of movements. For example, in the medium condition, only the hand movement involved a circular motion, in contrast to the large condition, in which the entire arm was used. It seemed easier to experience the “grasping” of sound from the keys and “lifting” it when the hand was raised without the interference of other arm movements (e.g., elbows). I did not experience the sensation of the hand sculpting the “oval-shaped” sound while playing with large movements. This lack of sensation might be due to the involvement of parts of my arm that were not directly engaged in sound production, such as my elbows and shoulders, which influenced my overall kinesthetic and tactile experience.
Indeed, research has demonstrated that expressive gestures go hand in hand with musical structure (Davidson, 2002; MacRitchie et al., 2013; Thompson & Luck, 2012). Therefore, medium movement may correspond ideally to a musical notion of the piece used in this study, which the participants recognized as the most suitable for achieving the musical goal. Research in cognitive neuroscience has demonstrated that the same sensorimotor system is activated in both performers and spectators, which allows people to understand others’ actions (Rizzolatti & Sinigaglia, 2010). Calvo-Merino et al. (2005) have indicated that the activation of the mirror neuron system is stronger when the audience observes movements with which they are familiar. The researchers concluded that the “mirror system does not respond simply to visual kinematics of body movement, but transforms visual inputs into the specific motor capabilities of the observer” (p. 1248). Since all participants in this study were undergraduate and postgraduate music students, and 25 of 37 studied piano performance, it is possible that most of them not only shared my proprioception but also recognized the medium movement as the most appropriate action for accomplishing the musical objective. According to the findings of Hou et al. (2020), “The frontoparietal mirror neuron system allows audiences to experience or comprehend the mind of the performer” and share the performer's playing perspective (p. 9).
However, the participants’ inclinations towards particular interpretations may have also influenced their evaluation. For instance, while every pianist has a distinct way of playing cantabile (Berman, 2000/2017), an observer who considers wrist movement essential for producing a high-quality singing tone, as suggested by Lhevinne (1924/2014), may assign lower ratings to performances with limited wrist motion, such as those with minimal movements in this study. Indeed, Blom and Poole (2004) discovered that undergraduate music students exhibited certain biases in evaluating their peers’ performances, with judgments shaped by their individual interpretations of the same composition (p. 120). According to McPherson and Thompson (1998), expert examiners can be swayed by an interpretation of the composition that aligns with their taste, resulting in a more favorable performance assessment. Other studies have discovered that listeners prefer standard musical interpretations rather than those that diverge significantly from the average (Repp, 1997). This phenomenon may also apply to the visual domain, explaining audiences’ preference for medium movement.
Mimicking Part of the Experiment
The mimicking part of the experiment demonstrated that the pianist's hand and arm movements did not influence the observers’ evaluation of singing tone quality. In other words, the participants rated the sound as having similar quality regardless of the degree of movement. This result differs from those in a study by Juchniewicz (2008), who, by employing a similar methodology, examined the influence of a performer's movements on the audience rating of dynamics, rubato, and overall expressivity. However, one of the distinctions between the two studies is that the current one specifically focused on the rating of tone quality in which the stimuli displayed the hands and torso only instead of the entire body. Seeing whole-body movements might have a different impact on the audience's perception. However, it is unclear why the outcome of this part of the experiment differs from the mismatching part, in which the movements significantly influenced the perception of tone quality in several performing situations. The audio analysis indicated that both parts of the experiments had comparable volume levels.
As a player's embodiment plays a critical role in the production of timbre, it could be beneficial to consider the differences in bodily sensations while performing the mimicking and the mismatching part of the experiment. These differences are not surprising considering the production of stimuli was different. For example, in the mismatching part of the experiment, I was producing and recording the sound concurrently, whereby “matching touch with sound information” is intertwined with haptic sensations (Saitis et al., 2018, p. 84; see also Campbell, 2014). In the mimicking part, I played along with the prerecorded audio, aligning each keyboard note with the corresponding one in the audio. While producing the intended movement amplitudes, I experienced different bodily sensations to those in the mismatching part of the experiment. The fact that I was mimicking the sound while wearing headphones and, as a result, could not hear the resulting keyboard tone, including finger-key and bottom-key noises, might have affected how I “felt” the keys.
The more profound haptic sensation in the mismatching part of the experiment likely stemmed from the close performer-instrument interaction while I was producing and monitoring the sound coming from the piano. According to O’Modhrain and Gillespie (2018), performers’ haptic sensations go hand in hand with the acoustic feedback from the instrument. Neuroscientific research has suggested that an observer could “experience and directly understand the tactile experience of others” due to the activation of the same neural mechanism in the somatic sensory cortex area of both the performer and observer (Gallese et al., 2007, p. 141). The question arises whether the player's bodily sensation in either experiment influenced the observers’ perception of tone quality.
Conclusion
This paper extended current knowledge of performance perception by investigating the influence of a pianist's movements on the audience's evaluation of tone quality in performances of a solo piano piece. The study employed methodologies of 1) mismatching combinations of sound and movements and 2) mimicking the prerecorded sound.
In the mismatching part of the experiment, the results demonstrated that, in the majority of sound and movement combinations, the tone quality was rated significantly differently depending on the type of movement presented. In the mimicking part of the experiment, the tone quality was rated with no significant difference when presented with different movement types.
At this stage, one explanation for the difference in findings might be that observers partly determine the perception of tone quality by projecting the player's bodily sensations while playing (Parncutt, 2013, p. 766). For example, in the mismatching part of the experiment, participants predominantly judged medium hand movements as having the highest quality; personally, I found medium movements to provide the most satisfying haptic sensation. By contrast, I did not experience an embodied feeling of the singing tone during the performances in the mimicking part of the experiment. This was probably due to the absence of sound production when miming the preexisting audio.
Another explanation might be that, although the interaction effect was ultimately not significant in the mismatching portion of the experiment, some interaction between movement and sound could have occurred, as the differences in sound ratings were found in specific combinations of sound and movement. For example, some audio features of each condition may have been “more capable of being combined” with different movements, in contrast to the prerecorded audio in the mimicking part, leading to differences in tone quality rating. This interpretation is similar to that of Vuoskoski et al. (2014, p. 600) regarding the influence of vision on players’ expressivity.
Overall, the results indicate that evaluating tone quality is a multifaceted process in which seeing a performer's movements might play a critical role in the observer's assessment. The present findings align with current views that the production and perception of tone quality depend on more than auditory facets (Li et al., 2021; Li & Timmers, 2020; Parncutt, 2013). The study also demonstrates that pianists might be able to improve spectators’ evaluations of their intended timbre. Although researchers have acknowledged that the visual characteristic of piano touch, such as rigid or relaxed, may change the assessment of tonal quality (Goebl et al., 2014; Ortmann, 2011; Parncutt & Troup, 2002), to the best of my knowledge, this phenomenon has not been empirically investigated until now.
Considering that communication of tone quality transcends the auditory-only domain only and can be enhanced by visual stimuli, performers should devote their attention to the visual aspect of their touch, which has been largely ignored by piano schools of previous centuries. Furthermore, it is beneficial to help performers test the perception of their musical intentions by developing video recording and movement analysis equipment that can be used at home or in a practice studio instead of relying on less affordable devices intended for laboratories, such as point-light technique or marker-based movement analysis (MacRitchie, 2015). By designing research studies from a practice-led perspective, which is the novelty of this study, performers might be encouraged to experiment with the visual aspect of their performances and to share their findings on social media, reaping the benefits of audience feedback.
As a performer, I benefited from the current investigation by learning that in situations in which I experienced a greater sense of sound embodiment, my typical movement when playing pieces requiring a singing tone, such as the excerpt chosen for this study, can most successfully communicate my intentions of singing sound. Since Chopin's nocturnes share a common general structure and expressive character featuring a lyrical melodic line, I postulate that similar results would be obtained in the majority of those pieces.
The controlled structure of the study, which featured a more uniform distribution of movements than is usually seen in natural settings (such as concerts), and the focus on a single piece, limits the ability to generalize this conclusion to all performance contexts and musical compositions. Another limitation is that most of this study's participants were piano students. These pianists might have preferred specific movements for generating a singing tone, which could have influenced their rating. The results of a further analysis that accounted for the potential effects of piano training reaffirmed the earlier findings, indicating that only the influence of hand movement was significant, while all other factors remained non-significant. However, future research should include performances of a wider variety of musical instruments to gain a more comprehensive understanding of the connections between the current findings and participants’ musical training, as well as to improve the generalizability of the results. Nevertheless, it is noteworthy that, despite their years of musical training, piano students may be swayed by non-acoustic factors when assessing sound quality. This observation aligns with the findings of other researchers who have advocated for music faculties not only to focus on preparing students for performing careers but also to equip them with critical thinking and expert evaluation skills (Mitchell & Benedict, 2017, p. 206).
This study highlights the need for further investigations to explore in more depth many of the contributions regarding the perception and assessment of tone quality. For example, studies could examine whether the audience can identify a player's most satisfying haptic feedback and bodily sensations during performances and assess how these fundamental elements of tone production affect its quality rating. Furthermore, since the current study solely focused on assessing a singing-quality tone, future studies could broaden the investigation by incorporating different timbres and providing audio-only condition. Additionally, involving non-pianists as participants could provide better generalizability of the results.
Supplemental Material
Supplemental Material
Supplemental Material
Supplemental Material
Supplemental Material
Supplemental Material
Footnotes
Acknowledgement
I wish to thank Dr. Daniel Yeadon for his guidance and insightful discussions throughout this project. I am also thankful to Prof. Oliver Toskovic for his contributions to the statistical analysis and thoughtful conversations. Additionally, I appreciate Milan Prokop's assistance with audio analysis and would like to thank Bojana Soro for our engaging discussions.
Action Editor
Emily Payne, University of Leeds, School of Music.
Peer Review
One anonymous reviewer.
Laura Bishop, University of Oslo, RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion.
Data Availability Statement
Available from the authors on request.*
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
The study was approved by the University of Sydney Human Research Ethics Committee (protocol number 2019/002).
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the University of Sydney, (grant number Postgraduate Research Support Scheme (PRSS).
1.
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
