Abstract
To what extent did early recording technology affect the creation and representation of musical performances? According to Mark Katz (from 1999 onwards), historical studio environments led to crucial shifts in 20th century violin performances due to the restrictions imposed by early recording and reproduction devices (“phonograph effects”). In particular, this may have affected sonic gestures that include expressive means such as vibrato, portamento, articulation, and timbre variation. In order to trace potential modifications, we reenacted a 1911 “Liebesleid” performance by one of the most influential violinists of the 20th century, Fritz Kreisler. We then digitally ascertained the full acoustic transfer paths (impulse responses, IRs) from the 1911 studio to 20 historical gramophone setups and applied them to the reenactment. In this way, for the first time, our study generated comparative IR findings across multiple gramophones, soundboxes, and horns built by different manufacturers between 1901 and 1933. Sonic gestures were found to induce significant level modifications of up to 20 dB due to the devices’ resonances, leading to dynamical variations that have never been part of the performance. Accordingly, Kreisler's famous “golden tone” is due, in part, to the recording technologies of his time. Therefore, early recordings should not be understood as “neutral witnesses” but rather as artifacts with substantial influence on the creation and reproduction of musical performance(s).
Keywords
Introduction
Early sound recording and reproduction technology not only has changed the way we listen to music, but also had a profound effect on musical performances of the 20th century. What was at least implied by early media philosophers such as Walter Benjamin (1936), Günther Anders (1956), and Friedrich Kittler (1986), when faced with the accelerating technological revolutions of their times, gained an essentially new dynamic with the rediscovery of early recordings as sources for musicological research owing to the writings of Robert Philip (1992, 2004). This research has been worked out substantially by Mark Katz from the turn of the century onward (Katz 1999, 2000, 2004, 2006, 2010). Two of Katz's central cases point to some crucial shifts in the practice of sonic gestures between 1900 and 1950: the strong decline of portamento (Katz 2006) as well as the emergence of what he names “the ‘new’ vibrato” (Katz 2000, p. 174; 2010, p. 95)—that is, continuous and highly intense pitch oscillation—may, according to Katz, partially be due to the new medium. Moreover, concerning the rise of the “new” vibrato, Katz suggests that aesthetic considerations might have played a rather secondary role; instead, he offers a “radical possibility: that recording was largely responsible. I propose that this shift in performance practice is, in fact, a phonograph effect” (2010, p. 94).
The term “phonograph effect” might be misleading in part, since for Katz it is “any change in musical behavior or activity that is in some way a response to the distinctive characteristics of sound recording technology” (2006, p. 225), or, in an even broader sense, “any observable manifestation of recording's influence” (2010, p. 2). As such, phonograph effects essentially derive from the shift from the stage to the studio, therefore being “ultimately responses to differences between live and recorded music” (Katz 2010, p. 4), and sometimes, as in the case of the “new” vibrato, they arise “as a practical and largely unconscious response to the limitations of a machine.” (p. 95). If Katz is right, these propositions a priori entail at least three implications. (1) With sound technology (in conjunction with early sound engineers and producers) being another player in the game, the recording studio situation was (and is) neither identical nor comparable to live performances—that is, early recordings give no reliable information about the original intentions of the performers and performance practices outside the studio. (2) If the sound modifications due to recording and reproduction chains were so far-reaching that performers had to react in their playing, we cannot even be sure we are listening to the original, unaltered studio performance. That being so, we have to fundamentally question the capacity and reliability of early recorded sound sources for performance research purposes. (3) However, at some point in recording history, musicians began adapting what they heard from the gramophone for their performance practices (otherwise, we would not be discussing a general “shift” today).
These implications shall serve us as justification for a thorough evaluation of some of Katz's central premises from a perspective of embodiment and musical acoustics. Motivated by some ongoing research on changing attitudes towards the musical score in early recorded violin performance practice (1912–1956), 1 we decided to focus on potential modifications in the representation of sonic gestures performed on a violin due to mechanical recording and reproduction devices built between 1901 and 1933. Among these, we concentrate on the expressive means vibrato, portamento, articulation (attack/sustain/decay of a sound), and (changes in) timbre.
We will examine them by proposing an extended, empirically grounded approach which makes use of some very recent research in musical acoustics and digital signal processing. Based on the methods and results of the research project Technologies of Singing in Detmold, Germany (2016–2019), 2 we were able to reproduce the impulse responses (IR) and frequency response functions (FRFs) of various mechanical recording and reproduction chains from the years 1901 to 1933. This enabled us to digitally simulate a recording and playback session all the way from the musician in the studio to the listeners in front of their gramophones, situated in the year 1911. In that way, virtually every modern record can be modified to sound as if it were recorded and played back by a specific setting from a century ago (forward FRFs) and, conversely, even the sound of a specific original studio performance from a historical recording session can be restored to some extent by inverting the acoustic impact the machines had on the recorded signal (inverse FRFs).
Following the FRF approach, we built our study on three cross-disciplinary stages. First, a historical performance was reenacted by imitating the sonic gestures within a 1911 “Liebesleid” recording by violinist Fritz Kreisler (1875–1962). The main point here was neither to provide an exact copy of Kreisler's playing nor to imitate a historical recording session, but to produce a representative reference signal that formally comes close to the original with regard to pitch content, duration, and the gestures of interest. As a secondary goal, we aimed for some insights to the original sound of the violinist with the often-quoted “golden tone” (Hartnack, 1993, p. 137), including Kreisler's choice of strings (gut vs. steel), his dynamics and timbre, and their representation by mechanical recording. In a second step, 20 historical gramophone setups, manufactured between 1901 and 1933, featuring different horns and soundboxes (from the collection of J. Notenboom, Tiel, the Netherlands), were acoustically measured while playing back the original 1911 Kreisler record in order to obtain authentic reproduction chain IRs and FRFs. For the first time, our study generated comparative FRF findings across multiple gramophones, soundboxes, and horns this way. Finally, having reconstructed the full transfer path from the musician to the listener, the effects of the recording and playback chains on the representation of sonic gestures were evaluated by employing methods from digital signal processing, in measuring the effect the historical devices have had on frequency spectra.
The study concludes with a discussion on how these modifications may in return have transformed performance practice in general, and sonic gestures on bowed string instruments during the first half of the 20th century in particular. The Appendix and an online repository provided as supplementary material contain all data and resources associated with this study in open access (Vollmer & Bolles, 2024), including the recording and reproduction devices’ IRs and FRFs, sound resources, and a short glossary of central terms. The results of an accompanying comparative listening survey, a sheet music edition of Kreisler's original 1911 playing which was the basis for the reenactment, and a short video documentation which directly compares the 1911 recording with the 2019 reenactment results can be retrieved from Vollmer (2021a, 2021b) and Vollmer & Bolles (2021).
Background
Case Example: Kreisler's 1911 “Liebesleid” Production
Fritz Kreisler has frequently been cited as a highly (or even the most) influential violinist of the 20th century who supposedly was largely responsible for some crucial aspects of contemporary violin playing. In a well-known passage, Carl Flesch (1873–1944), himself a seminal violin pedagogue, suggested that with his colleague “a new era was beginning in the history of violin playing” (Flesch, 1957, p. 37). In particular, with respect to the continuous use of vibrato typical for 20th century violin performance practice, Robert Philip considered Kreisler to be “[t]he single greatest influence towards its adoption” (1992, p. 106), although the novelty of Kreisler's playing and especially of his approach towards vibrato has been contested in some more recent research (see, e.g., Leech-Wilkinson 2009, ch. 5, paragraphs 7–10).
With regard to his “Liebesleid” (No. 2 from the Alt-Wiener Tanzweisen, originally published in 1910 by B. Schott's Söhne, Mainz, plate no. 29029), Kreisler recorded at least seven performances during his active career (in 1910, 1911, 1912, 1926 (2), 1930, and 1942). It would be a highly promising endeavor to conduct a comparative study on changing attitudes in his performance practice, particularly considering Kreisler's later use of the steel E string (from around 1918) and the changing technological preconditions entailed by electrical recording (from around 1925 onward; cf., for instance, Daniel Leech-Wilkinson (2009, ch. 5, paragraphs 11–15) by discussing Kreisler's vibrato in the 1912 and 1926 “Liebesleid” versions). However, our approach required us to focus on a single recording as a reference signal, and the 1911 production fulfills several criteria for this. First, Kreisler's 1911 recordings, the first for His Master's Voice (HMV) in the UK, were produced by the pioneering recording engineer Fred Gaisberg (as opposed to the 1910 and 1912 ones, both produced in Victor's New York studios). 3 Tully Potter (2010, p. 2) considers their “sound quality” as being “amazing, a tribute to the engineer's skills,” which suggests that the 1911 “Liebesleid” may constitute a representative sample of state-of-the-art mechanical recording technologies preceding World War I. In fact, it was precisely this recording which due to its popularity was repeatedly repressed at least until 1927, 4 even though electrical recording was established by then. Second, Katz himself takes precisely this recording as a paradigm to prove a performance practice that “fully realized” the “potential” of the so-called “new vibrato” (2010, pp. 98–99). Third, aside from continuous vibrato, Kreisler performs a generous multitude of varying portamentos in this recording as well as a number of surprising articulation decisions such as sharply accented onsets, over-dotted rhythms, and long legato passages (see Sound examples 1 and 2 in Appendix, Folder 3).
Sonic Gestures and Resonating Systems: Definitions
Rephrasing Leech-Wilkinson's (2009, ch. 8, paragraphs 15 and 16) reading of expressive gestures, sonic gestures for the purpose of this study shall be understood as sounding elements of performance practice that break musical continuities at meso and micro level, such as in creating culmination points within a phrase or in pointing out single, meaningful notes. These elements may involve expressive means such as articulation, vibrato, portamento, and changes in timbre, but their concrete compositions are theoretically limitless and strongly depend on musical genres and regional performance traditions. Furthermore, sonic gestures may be executed on single notes or to interconnect multiple notes.
Articulation is defined here as the degree of noise and dynamics at the beginning (attack), development (sustain), and ending (decay) of a single sound. This definition binds articulation to single notes, containing elements such as accents or bowing techniques, for example, flat (détaché) or bouncing (spiccato) strokes. Moreover, it excludes gestures such as vibrato, portamento, or (changes in) timbre, which are often commonly considered as modes of articulation as well but, in our opinion, call for separate treatment and measurement methods.
Portamento here is understood as an intended glide of pitch, both at the beginning or end of a note or to interconnect two notes. Deviating from various more common definitions (e.g., Brown, 1999, p. 436; Sprau, 2023, pp. 25–30), this reading in the sense of Kreisler's contemporary Carl Flesch (1929, p. 16) dismisses rather unconscious or accidental glides (“glissandi”). (For an approach on how to distinguish intended from unintended elements in recorded performance practices, see Köpp, (2016), especially pp. 17–18).
Regarding vibrato, Carl Seashore and his colleagues distinguished between three main types as early as 1932: a pitch, a volume, and a timbre vibrato, respectively (Seashore, 1932, p. 10; also see the study's glossary in Appendix, Folder 7). In modern violin practice the term vibrato is mostly understood as an intended oscillation of pitch caused by the left hand (which answers to pitch vibrato, although this affects volume and timbre as well). Although in historical singing practice it was employed similarly with the term tremolo (which from a modern standpoint would rather correspond to volume vibrato) up to at least 1900 (Mecke, 2016, p. 672) or even until the 1920s (Hähnel & Martensen, 2019, p. 30), we stick with the notion of pitch oscillation for the purpose of this study. On the violin, a pitch vibrato, which corresponds to a frequency modulation (FM), is produced by the movement of a player's left hand on the string. By principle, this also results in some amplitude modulation (AM) due to the complex interaction of the string's partial tones (fundamentals and harmonics) with the instrument's resonances (i.e., the relative levels of all partial tones change dynamically in conjunction with the periodically changing pitch), and to the varying amount of damping caused by the moving finger.
To conclude with the fourth expressive means examined in this study, the change of timbre (tone color), according to Donald Hall (2002), can be described as an alteration of everything “what characterizes a tone besides its pitch and loudness” (p. 470). To be more specific about what we are looking for, it will be defined in a more physical way as (1) the specific relation between the fundamental of a note (first partial) and its harmonics (upper partials) in spectral content, which together represent the tonal components of the signal, and (2) the relative amount of (nonharmonic) noise components of the signal and their respective spectral shape. Together with its transients (attacks and decays), timbre is not only constitutive for our recognition of the instrument played (see, e.g., Hall, 2002, pp. 107–108 and 401–403) but also for our perception of the specific character of a tone, which we typically label with terms such as “silvery,” “golden,” “warm,” or “cold.” Hence, descriptions of Fritz Kreisler as the violinist with the “golden tone” (Hartnack, 1993, p. 137) ultimately seem to be descriptions of his timbre.
Given that sonic gestures, at least up to the early 20th century, were deeply rooted in regional performance traditions, they often are the main subject of instructive editions. Articulation, portamento, and even vibrato, for instance, have been historically indicated by specific symbols or fingerings (see, e.g., Brown, 1999; Vollmer, 2023b), but their actual execution usually cannot be retraced and described without a close analysis of recordings. This becomes even more crucial with regard to changes in timbre, where detailed descriptions in written sources are the exception rather than the rule (for the former see, e.g., Flesch 1934).
Recordings, on the other hand, are both products and starting points of resonating systems, whose effect on incoming frequencies for the most part is linear and static (unless its playback speed modulates, which would render the device useless for music and sound reproduction). As the theory of linear time-invariant (LTI) systems states, the linear influences of any complex device on a signal can be described by its IR or, equivalently, by its complex FRF, which is the Fourier transform of the IR. When the IR of a system is known, an established way of computationally simulating this system is to apply the IR to an input signal by means of convolution. Alternatively, with the input signal transferred to the frequency domain by Fourier transform, convolution can be replaced by multiplication of the FRF with the input signal. Both processes are equivalent and yield the same result. Since working in frequency domain is computationally much simpler, it has become the standard implementation. Furthermore, for displaying the system's properties in a way that can be easily interpreted with respect to music, the FRF is the appropriate representation, as it shows directly which frequencies are affected and, thus, gives information about the audible results.
The Claim of a “Phonograph Effect”
A general shift in violin vibrato practice during the first two or three decades of the 20th century is as evident in early recordings as it often has been emphasized in research. Robert Philip (1992, 2004), rediscovering the enormous potential of early recordings for musicological purposes, claimed almost three decades ago that while violinists in the tradition of the 19th century “used the vibrato as an occasional ornament, applied only to notes which required expressive emphasis” (1992, p. 209), violin vibrato from the 20th century on followed a “general trend towards greater power and intensity” (p. 97). As vibrato grew substantially in variety and reached new heights as an essential means of expression, portamento simultaneously experienced a substantial decline (see Philip 1992, pp. 143, 155–178). As can be traced in recordings, the practice of gliding pitches was employed frequently during the early 20th century, but from the 1930s at latest, it has been reduced significantly (Katz, 2006, p. 211) at least by violinists born around the year 1900 (Leech-Wilkinson, 2009, ch. 5, paragraphs 40–46, especially Figures 7 and 8, even though Leech-Wilkinson points at a surprising rebound in the practice of violinists born from around the 1970s onward). Eventually, as for articulation, Philip stated that “[t]here was also a very general tendency [during the early 20th century], in patterns of long and short notes, to lengthen the long notes and hurry and lighten the short notes” (Philip 1992, p. 70). More recent studies on Philip's initial observations such as in Brown (1999, 2022), Leech-Wilkinson (2006, 2009, 2011), Milsom (2020), Gebauer (forthcoming), or the contributions to the volumes by Cook et al. (2009) and Moreda Rodríguez and Stanović (2023), to name but only a few, contributed significantly to a much more detailed picture, but in general confirmed these tendencies in violin performance observed by Philip (for some markedly different developments in the performance practices of other instruments as well as in singing; see, e.g., Billiet, 2008, Kennaway, 2014, or, most recently, Sprau, 2023).
Katz (1999, 2004, 2006, 2010) proposes “phonograph effects” to be the main reason for these crucial shifts. In the case of violin vibrato, he suggests “that its development was not—or at least not solely or at first—tied to aesthetic considerations” (2010, p. 100); moreover, Katz wants to “maintain that sound recording was the most direct cause, and perhaps the only necessary condition for the rise of the new vibrato” (Katz 2010, p. 107). Based on, among others, the early research of Carl Seashore (1932 et seqq.) and his colleagues (“intensity vibrato”), Katz explicates this thesis on three presumptions: First, it helped accommodate the distinctive and often limited receptivity of early recording equipment. Second, it could obscure imperfect intonation, which is more noticeable in a recording than in a live setting. And third, it could offer a greater sense of the performer's presence on a record, conveying to unseeing listeners what body language and facial expressions would have communicated in concert. (Katz, 2010, p. 102)
These assumptions seem to be based on the observation of Seashore's colleagues that a loudness vibrato may be induced by and even be confused with a pitch vibrato, both of which potentially rises the presence of violinist on record (Seashore, 1932).
As for portamento, Katz emphasizes that “[i]ts decline reveals the profound and underappreciated impact of sound recording on performance practice” (2006, p. 230), leading to the claim that “[t]he real force behind the change was the microphone” (p. 226). Here again, he suggests three possible “phonograph effects”: (1) The repeatability of recordings, which may have counteracted the spontaneity of portamenti and led them “sound calculated or contrived” (p. 225); (2) the introduction of the microphone around 1925, which potentially boosted formerly inconspicuous aspects of performance (such as unintended glissandi) (p. 226); finally, (3) a “secondary phonograph effect” (p. 227): since recording fostered the “new” vibrato(s) as well as “stricter rhythmic practices” (p. 227), portamento—which on string instruments is usually executed without vibrato and, according to Philip (2004, p. 111), blurs precise beats—had no place anymore in modern violin performance.
It has to be noted that researchers repeatedly pointed to a lack of evidence which would back these claims (last remarked, e.g., by Bork, 2020, p. 150). Moreover, Katz's strong focus on technology seems to obscure the fact that the shifts in violin performance practice at the turn to the 20th century were due to a much broader range of aesthetic, economic, technological, and social developments (for a comprehensive survey see, e.g., Milsom, 2020). However, other authors lately pointed at an abundance of sources that strongly indicate a substantial influence of mechanical recording devices and studio environments on various performance elements, namely in articulation, dynamics, or vibrato, leading to adjustments of the performance in order to deal with the manifold limitations of the early devices (Hähnel & Martensen, 2019; Martensen, 2019; Schaper, 2021, pp. 154–159).
Recent Research on Mechanical Sound Reproduction
Research on the influence of recording and playback devices on sound and, in particular, on the reproduction of musical performances started as early as 1890 (Kob & Weege, 2019, p. 338) and has gathered growing interest particularly within the last few years. Aside from the 2009 Cambridge Companion to Recorded Music (Cook et al., 2009), a present-day awareness for the peculiarities of historical recordings founds reflection in the most recent book Early Sound Recordings. Academic Research and Practice (Moreda Rodríguez & Stanović, 2023), building on (and substantially carrying forward) the findings of the 2004 founded Centre for the History and Analysis of Recorded Music (CHARM), among others. As essential parts of the book, the contributions of David Milsom (2023) and of Adam Stanović (2023, p. 248: “to what extent can we really trust early recordings?”) provide insightful surveys on the informative value of historical recordings. Milsom, building on Inja Stanović's seminal research on the reenactment and production of mechanical recordings at University of Huddersfield (UK) (also see Stanović & Stanović, 2021; Stanović & Billiet, 2023), conclusively suggests “that the acoustic technology is so very primitive it is unable to pick up even quite substantial differences of sound” (Milsom, 2023, p. 114, referencing the 1903 recordings of Joseph Joachim). However, even though Moreda Rodríguez and Stanović claim that recent “successful collaborative projects” in the area typically took place “mostly outside English-speaking musicology” (2023, p. 5), a number of contributions to their book seem to not have taken notice of these projects or at least to have implemented solid source criticisms when it comes to the addressed mechanical recordings and to expressive gestures in particular (see, for instance, Kaufman, 2023, p. 54, claiming without any substantiation that “expressive portamento is less affected by the technical limitations of the recordings”). Even Daniel Leech-Wilkinson's elementary Changing Sound of Music (2009), frequently quoted by the authors as supporting evidence, largely misses out further empirical studies to back the numerous claims made concerning fundamental acoustic and psychoacoustic aspects (which in fact were barely available back then; see chapter 3.1, in particular §§21–28).
Yet, some crucial research in this regard was carried out lately as part of the Technologies of Singing project in Detmold, Germany (2016–2019, hereinafter ToS); see Note 2. Kob and Weege (2019; see also Kob et al., 2018; Martensen et al., 2015; Weege et al., 2018), building on a LTI approach, claimed that in mechanical recordings of singing the transfer path from musicians in the studio to listeners in front of their playback devices had in fact “significant impact on various properties of the voice signals” (Kob & Weege, 2019, p. 335). Broadly speaking, input signal frequencies that excite strong inherent resonances of the recording and reproduction system(s) will become either boosted or attenuated when played back. This applies to both the phonograph (wax cylinder) and the gramophone (shellac disc) technologies, for their “main elements in the path of the sound […] are roughly the same.” (p. 337). The gramophone setup investigated by Kob and Weege resulted in an FRF with a strong attenuation below 150 Hz, then rising steeply with sharp boosts and cuts on single frequencies (resonances) up to a maximum at around 800 Hz, followed by a decline until a sharp high-cut at around 2.4 kHz (cf. Kob & Weege, 2019, p. 346, Figure 10). This means that singers’ voice formants and articulation properties not only became remarkably modified but also sometimes were simply not represented when exceeding the reproduction range of the device. In fact, given a nominal tuning of A4 = 440 Hz, fundamental frequencies roughly below D3 sharp as well as partials and noise above B7 were virtually cut out. When applied to bowed instruments, these ranges correspond to pitches below the viola register; that is, the lower three violoncello strings (C2, G2, and D3) as well as the complete double bass register up to the left hand's third position on the G2 string would lack any fundamental frequencies. In turn, the audibility of these pitches only relies on the psychoacoustical residual tone effect.
Ultimately, according to Kob and Weege, the transfer path may also affect elements of performance practice in general and sonic gestures in particular: “Since the variation of the fundamental frequency immediately changes all harmonics of the voice sound accordingly, [pitch] vibrato, ornaments and glides [i.e., portamenti] will affect the whole spectrum of voice signals” (2019, p. 337), and in reverse, “the modifications of the voice formants […] could be interpreted as articulatory expression that never was intended or performed by the singer. An ornament such as vibrato or glide could be enhanced or reduced […] without any intention of the singer.” (p. 348). Hence, researchers should be aware that certain elements of a recording might be due to the technology, not the performance itself (p. 349).
Two other recent studies strived to examine the impact of early recording devices on the reproduction of singers’ vibrato in particular, precisely of recordings made in the Edison wax cylinder phonograph process. Glasner and Johnson (2020) chose an approach similar to Kob and Weege (2019) by reconstructing the transfer path of an Edison Home phonograph in combination with a “New Edison Recorder,” but aiming more closely at possible modifications of vibrato parameters caused by the technology. By having 20 professional western opera singers sing into both the historical horn and flat-response omnidirectional microphones simultaneously, they found that recordings in the Edison process resulted in slight discrepancies in range, jitter, and shimmer (cf. Teixeira et al., 2013) compared with the modern recording process. However, given a just noticeable difference (JND) of 2.9 cents for a sinewave of 500 Hz, the measured mean discrepancy of only 3 cents in vibrato range seems too small to actually be perceived by most listeners (p. 11).
Hähnel and Martensen (2019), as part of the ToS research group, conducted an examination of Thomas A. Edison's studio recording policy and, more precisely, potential impact of Edison's recording process on singers’ vibrato in the early 20th century. Building on measurements of up to 10 dB in level increase triggered by a pitch vibrato when passing a simulated mechanical transfer path (p. 40), they infer (among other things) that “pitch vibrato is likely to trigger artificial and audible volume changes” (p. 43). Applied to the scope of our study, this leads to a remarkably possibility: A pitch vibrato on the violin, if reproduced by means of early recording technologies, may result in steep volume modulations complementing those due to the resonances of both the violin corpus and the recording room.
Aims and Hypotheses
As can be inferred from the studies considered previously, the most momentous effects of early recording and reproduction devices on recorded sound included major level modifications as results of the devices’ complex interactions with the signals’ pitch variations. However, the results of all existing studies known to us in this area were bound to unique gramophone setups and therefore of limited value for generalisations. As pointed out by Kob and Weege (2019), for instance, their findings “are based on measurements performed on specific recording and reproduction setups. […] individual resonances will have different central frequencies and damping if using another recording setup or when the media is played on a different gramophone model.” (pp. 346–347). This, of course, pertains to the sound representation of sonic gestures as well; and it becomes even more important if gramophones of different years of construction and diverging stages of development are considered. Regarding vibrato and portamento, we derived our working hypotheses directly from Katz's propositions:
Ultimately, these hypotheses pose some central questions for performance research in general: How much of the vibrato, portamento, articulation, and change in timbre we hear in mechanical recordings was actually due to the studio environment? What if recordings were not “neutral witnesses” of changing attitudes but rather influential protagonists themselves? In reverse: (How far) can we trust recordings when it comes to reconstructing performance practices of the late 19th and early 20th centuries?
Materials and Methods
Reference Signal: Reenacting Kreisler's 1911 “Liebesleid” Performance
Our reenactment of Kreisler's 1911 recording consisted of three steps: (1) A close analysis of his recorded performance, particularly concerning his use of sonic gestures; (2) the identification of instruments and acoustic circumstances as close as possible to the original recording situation, taking into account the resources available at the Stuttgart University of Music; finally, (3) three rehearsal sessions and the recording of the reenactment.
Approaching the first step, some general characteristics of Kreisler's performance practice, as reflected in more recent analyses (for instance, his continuous, wide vibrato and his use of portamentos and harmonics especially during repeated phrases; see, e.g., Füri, 2009; Schmidt, 2022; Meyer & Vollmer, forthcoming; Vollmer, forthcoming), have been considered and discussed with the reenactment musicians. Subsequently, an original shellac copy of Kreisler's 1911 recording was acquired (see Note 5) and a linear, digital transfer was made by means of a modern turntable specially equipped for shellacs and variable playback speeds. In accordance with Zwarg (2018, p. 130), a playback speed of 77.7 rpm has been chosen, which corresponds to a mean standard pitch of A4 = 435 Hz for the piano. Kreisler's open A string, on the other hand, is at 433 Hz.
Following that, the transfer was analyzed in spectrogram view, making use of the software Sonic Visualiser (SV) (Cannam et al., 2020). SV's spectrogram settings and a fast Fourier transform (FFT) hop size of 2,048 samples have been chosen in accordance with Vollmer (2023a, 2023b). As shown in Figure 1, Kreisler's use of vibrato and portamento in particular, but also of articulation (sound attack, sustain, and decay), can easily be traced this way, whereas changes in timbre are rather difficult to follow. This is due to the complex mixture of sounds in polyphonic repertoire, which cannot be adequately decomposed based on waveform or spectrogram visualizations alone. Since timbre is nonetheless important to determine, for example, Kreisler's choices of strings (G3, D4, A4, or E5, respectively), analysis in these cases was supplemented by close listening and a reenactment of fingering options on the violin. 5 Apart from that, Kreisler's tempo choices as well as dynamics have been measured using the SV manual time instant layer for bar onsets and the Mazurka Project's “MzPowerCurve” plug-in (Sapp, 2009) for changes in volume level. Subsequently, the analysis results were documented in the form of a full transcription of the performance in score (see Figure 2 and Vollmer, 2021a).

The beginning of Kreisler’s 1911 “Liebesleid” recording, bars 1 (with upbeat) to 13 (see Sound example 1, Appendix, Folder 3). Spectrogram view in Sonic Visualiser; x-axis is time in seconds, y-axis is frequency in hertz. The violin is easily recognizable through its largely oscillating tones. See Kreisler's manifold choice of broad portamenti, for example, as indicated in A and B. Kreisler's mean vibrato was found to be continuous, moderately fast (mean at 7 Hz), and medium wide (40–60 cents in amplitude).

Excerpt from the transcription of Kreisler’s 1911 recorded “Liebesleid” performance (Vollmer, 2021b, p. 1).
As for the second step, some information about the musical instruments and the recording studio from the original 1911 performance had to be gathered. According to Tully Potter (2017), Kreisler owned at least six violins at that time, of which his Guarneri “del Gesù” (built in around 1741 in Cremona, known today as “Hart, Kreisler”; cf. Tarisio, 2019) was most likely played during his recording sessions between 1904 and 1916.
Of particular interest is a portrait photograph by Aime Dupont for the New York Times from 1912, showing Kreisler with this respective Guarneri violin (see Figure 3), possibly by the time he made another “Liebesleid” recording for Victor. The photograph contains additional information of relevance for Kreisler's sound at that time: his choice of strings. As can be seen, the two upper strings (E5 and A4) are of blank gut, whereas the lower two strings (D4 and G3) seem to be of wound gut. Correspondingly, Stephen Redrobe (cf. Potter, 2017) and Mark Katz (2023) assume that Kreisler used gut E strings until the end of World War I, based on the fact that Armour (a U.S. maker of gut strings) published a testimonial from Kreisler in 1917.

Kreisler in 1912, holding his Guarneri, the “Hart” (Aime Dupont, 1912, New York Times), presumably played at his 1911 “Liebesleid” recording. E5 and A4 strings are of blank gut, D4 and G3 strings seem to be of wound gut.
Due to the limited resources available for this study, borrowing an original Guarneri (or even the “Hart, Kreisler”) was not feasible. Instead, we found a violin of similar design to the “Hart,” built in the early 20th century by an unknown maker from Markneukirchen, Germany (see Figure 4). Although we were unable to run comparative acoustic tests (since, to the best of the authors’ knowledge, no linear recordings of the “Hart” are currently available), we came close to the original at least in terms of violin construction this way. As for the strings, Köpp (2019) pointed out that gut string cores before World War II consisted of sheep gut—not cow gut, as commonly used in historically informed performance—which are rarely manufactured today. This is crucial, since sheep and cow gut strings differ remarkably in terms of sound qualities and handling. Luckily, we were able to obtain the last remaining set of blank sheep gut strings traditionally manufactured by Wolfgang Frank of the company EFRANO (Markneukirchen, Germany 6 ; also see Köpp & Kainzbauer, 2019). For the reenactment, we used the blank E5 and A4 EFRANO strings, whereas for the D4 and G3—since not wound in the EFRANO set—we applied silver wound cow gut strings made by another brand. Since most of Kreisler's 1911 “Liebesleid” was played on the E and A strings, this inexactness seemed tolerable to us.

Left: The original Guarneri “del Gesù” (“Hart, Kreisler”), built around 1741 in Cremona. Photograph from around 1910 (cf. Tarisio, 2019). Right: Violin built in the early 20th century by an unknown maker in Markneukirchen, Germany, played for the 2019 reenactment. Sheep gut strings by EFRANO; for the reenactment, D4 and G3 were replaced with silver wound cow gut strings.
As for his bows, Kreisler mostly preferred Tourte models built by Tubbs, Hill, or Pfretzschner (Lochner, 1981, p. 357). Furthermore, he is said “to have the bow hair very taut and anecdotal evidence suggests that he often did not loosen it between gigs” (Lochner, 1981, p. 357). According to both Langner (1980) and Schwarz (2001), while performing, Kreisler executed a uniquely elegant yet exceptionally economic playing close to the bridge, with as much pressure as necessary to control the tone. For the reenactment, we were able to provide a historical Tourte bow, built by Hill around 1900, and instructed our violinist on Kreisler's bowing practice in accordance with Potter (2017) and Schwarz (2001). Additional guidance was given by Christine Busch, a professor for historically informed violin performance practice at the Stuttgart University of Music and Performing Arts.
As can be inferred from historical accounts and photographs, recording studios in the “mechanical era” tended to be arranged as acoustically dry as possible, with the musicians as close and directed as possible to the horn in order to avoid “interference of any sort” (Anonymous, 1918; also see Martensen, 2019). Therefore, we chose an acoustically insulated recording chamber at the Stuttgart University of Music and Performing Arts with extremely low sound reflection by means of deadening ceiling, floor, and walls. Since the room was very small, we equipped it with a stand-up piano (instead of a grand piano) in the back and placed the violin in the middle of the room (see Figure 5). Following our FRF approach, instead of using historical devices, we applied modern microphones with highly linear recording patterns: an omnidirectional Earthworks M30 in front of the violin, and, to compensate for the missing grand piano, a directed spot microphone (Neumann KM184) to raise the volume of the stand-up piano, if needed. In accordance with our pitch analysis of the 1911 recording (see above), the piano was tuned to A4 = 435 Hz (equally tempered), whereas the violin was tuned to A4 = 433 Hz.

Acoustically insulated recording studio at the Stuttgart University of Music and Performing Arts (“Sprecherkabine”, room 9.14), equipped for the reenactment with a stand-up piano and a music stand (S). An omnidirectional room microphone M1 (Earthworks M30) was placed next to the violin, a directed spot microphone M2 (Neumann KM184) above the piano; both highly linear recording. Room height approximately 2.5 m.
For the reenactment, two trained musicians with experience in historically informed performance practice were engaged: Violinist Johannes Brzoska and pianist Sophia Weidemann carried out three rehearsals prior to the reenactment to familiarize themselves with the original 1911 sound file, its transcription in score (Figure 2), and the historical instruments. The recording was conducted on November 16, 2019, and consisted of two takes: one with a gut string setup as described previously (see Sound example 4, Appendix, Folder 4), the other with modern steel core strings (Sound example 8, Appendix, Folder 4).
Recording Chain: Reconstructing Kreisler's 1911 Recording FRFs
Our modern and largely linear recording of the reenactment now had to be transferred into a sound as close as possible to the historical original. The basic principle is depicted in Figure 6 and consists of two major steps. The first step is to simulate the recording chain. In the historical original, this chain was fully mechanical and consisted of the following elements: (1) an adequate horn to pick up the sound field, adapted to the music, the performing instruments and the acoustical situation; (2) a tone-arm, working as a waveguide; and (3) a soundbox with a needle. The soundbox converted the soundwave into the vibration of a diaphragm. The needle, attached to the diaphragm by means of an elaborate mechanism, wrote the signal into a rotating wax disc. The engraved disc served as the basis for later duplication processes into shellac discs. Every component of this path has a specific influence on the signal and, thus, contributes to what we find as the characteristics of the recording chain. Physically, this means that certain frequencies are boosted or attenuated due to resonances—constituting the so-called linear effects—but there is also nonlinear distortion and added noise.

Basic source–target–filter model for simulation of the 1911 Kreisler sound. The filter is represented by its FRF (see the text). “∗” denotes convolution.
We examined two different ways to identify the appropriate FRF for a simulation of the Kreisler recording chain: A synthetic method and an analytic one. A rigorous synthetic method would mean to build a physical replica of a historical recording chain and to measure its IR—either in parts, by determining an FRF for each of its modules separately and combining them mathematically, or in its entirety, by assembling the modules into a working recording machine and measuring the full actual path. This approach has been followed by the ToS research group at Erich Thienhaus Institute, Detmold. Despite their results not having yet been published, we had the chance to work with an experimental IR from the group. This IR represents the measurement of a complete physical reproduction of a historical recording chain incorporating a conical horn. Conical horns were the models used most often for instrument recording during that time period, as shown by contemporary images (see Figure 7). However, an abundance of different horns with rather diverse acoustic properties were built and used during that time. This renders it very unlikely to find a system with an IR exactly matching the sound of our Kreisler recording.

Violinist Jan Kubelík in a recording session, 1913.
Therefore, an analytical method was developed: By simplifying certain aspects, it is possible to obtain the approximate characteristics of the historical recording chain right from the recording itself. The first key to this is the reenactment recording (Stuttgart, 2019), which serves as a reference signal; the second key is to transfer the original recording (the historical Kreisler recording from 1911) into a digital system by means of modern reproduction devices. This procedure allows us to extract the recorded signal from the shellac disc without the influence and coloration of historical reproduction devices (i.e., a gramophone), resulting in a signal that only carries the characteristics of the original 1911 recording chain.
The idea behind this is as follows: Let us assume the two acoustical music signals, taking place in front of the 1911 and 2019 recording devices, respectively, are sufficiently similar regarding their long-term frequency content (the distribution of energy over the acoustic spectrum). This is provided by a congruency of, among others, the recorded piece, choice and tuning of instruments, instrumental balance, and performance decisions such as similar tempi, as given by our 1911 and 2019 recordings. If we further assume as a first approach that the influences of the respective recording rooms are also rather similar (or at least that their influence is much smaller than that of the historical recording device), then the spectral difference between the Kreisler recording (as imprinted into the historical disc) and the modern reenactment (which has been digitally recorded) is dominated by the FRF of the historical recording chain. Furthermore, it is feasible to assume the frequency responses of the modern recording devices, namely the microphones and the digital recording interface, are linear (i.e., have a flat, linear FRF). While this is not exactly true, the error is insignificant compared with the strong resonances of the historical recording FRF.
To illustrate this approach, Figure 8 explains the formation and composition of the sound of both the original Kreisler recording and the reenactment. The signal path has been analytically split into subsystems whose influences on the sound can be determined, must be estimated, or were neglected. A detailed mathematical explication of this approach can be retrieved from the study's online repository (Vollmer & Bolles, 2024; see also Appendix, Folder 7). In order to make the spectrum of the 1911 record available, a linear digital transfer was made by Christian Zwarg (Truesound Transfers, Berlin 7 ; see Sound example 1, Appendix, Folder 3). For this process, it was played back on an adapted modern turntable, equipped with a needle suitable for shellac discs and an electric mono pick-up system.

Signal flow and recording chain subsystems of the 1911 and 2019 signal paths. Comparison of model components and their assumed properties.
Another important component of the historical sound is noise. Naturally, the amount of noise present in the 1911 recording is significant. In the context of measuring the 1911 signal spectrum, the spectral energy of this noise must be taken into account, especially because of its high-frequency content which otherwise would not be present in the signal and would impair the calculation of the 1911 recording device's system FRF. As the noise mostly originates from the imperfect cutting and mechanical scanning of the groove on the disc, it has not undergone the filtering of the acoustical recording path and, thus, induces high-frequency content which otherwise would not occur in the filtered music signal. Both the horn and the soundbox have strong low-pass characteristics, which we aim to measure. These are masked by the noise. Two options arise: In order to make both recordings comparable, either the 1911 recording must be denoised, or suitable noise must be added to the 2019 recording before measuring its spectrum. Denoising, even with the most modern algorithms, is by no means an accurate process, whereas adding noise in a sufficiently accurate quantity and spectral distribution, as present at the 1911 recording, is easy and works effectively: A noise signal that meets the requirements can be recorded from the empty end groove of the respective disc. We therefore decided on the latter method.
Eventually, as soon as the 1911 recording system's FRF has been adequately determined, it becomes even possible to restore a version of Kreisler's original 1911 sound (within narrow limits, as the surface noise will be still preserved; also see Limitations): By inverting the system's FRF and applying it to Zwarg's largely linear transfer of the 1911 disc, an idea of Kreisler's nonconvolved sound in the 1911 studio may be obtained.
Playback Chain: Determining 20 Historical Gramophone FRFs, 1901–1933
After having determined a method for simulating the recording chain, the next step was the measurement and simulation of appropriate playback devices. This basically works the same way as previously: by determining the frequency response of a reproduction gramophone and by calculating the respective filter function. In order to represent a wide range of different playback systems from the mechanical era, 16 different gramophone models with varying horns and soundboxes, together forming 20 unique reproduction chain setups built between 1901–1933, 8 were acoustically measured on November 28, 2019, in Tiel, the Netherlands (for a full list and the corresponding gramophone IRs/FRFs, see Appendix, Folder 1; gramophone backgrounds according to Oakley & Proudfoot, 2011).
For this purpose, the original 1911 Kreisler recording was played back on each setup while the outputs were recorded with two highly linear condenser microphones as shown in Figure 9 (one Neumann KM184, one Earthworks M30). The KM184 was positioned directly at the horn of each gramophone (M1). Due to this close placement and the virtually frequency-independent cardioid polar pattern of the microphone, we assumed the FRFs measured here to be less influenced by room reflections and reverb. However, room modes at that position may still have affected the results. Moreover, it should be considered that this can also be regarded as a near-field position, where small placement variations might yield drastic changes of the FRF measured. By contrast, the Earthworks M30 microphone with its omnidirectional directivity was placed further apart (M2) and was meant to pick up an FR that more closely resembles a room impression. The recording process was monitored by two expert listeners (M.A. degrees in music performance), placed at different distances in front of the gramophones (L1/L2).

Gramophone recording setup. Two microphones, M1 (Neumann KM184) and M2 (Earthworks M30), and two listeners, L1 and L2, at the denoted distances. Small room approximately 3 m × 5 m × 2.6 m, lightly furnished with a table, shelves, and curtains, with a carpeted floor. One window. For gramophone setups, see Appendix, folder 1 (“GrList”).
With the recordings from all gramophones as output measurements and with the linear, digital transfer of our piece as a reference signal, it was now possible to determine the FRF and IR of each gramophone, analogous to the procedure described in the previous section (for a more detailed explication of these steps, taking into consideration potential deviations from “wow and flutter” too, see Vollmer & Bolles, 2024; see also Appendix, Folder 7). Finally, the IRs were exported as audio files (WAV format), which can be used with typical convolution effect plug-ins within any audio processing software to simulate the respective reproduction devices. For the following steps, the linearly transferred 2019 reenactment recording was convolved with Kreisler's 1911 recording chain IR and the virtual gramophone models’ IRs by means of the software Reaper. Surface noise, gained from the start and end groove of the 1911 disc while played back on the respective gramophones (thus, already entailing both the recording and respective reproduction filters), was added after convolution. This establishes the similarities between the 1911 and 2019 recordings and ensures that noise with suitable characteristics is used. Eventually, the results were loudness normalized to −23 LUFS.
Measuring the Effects on Sonic Gestures
To trace the consequences of the recording and playback chains on the representation of sonic gestures, we chose a measurement approach making use of digital signal processing, parametrized by gesture type. To do so, the respective transfer chains’ IRs had to be combined first by multiplication of the 1911 recording FRF with a gramophone FRF. The resulting, full mechanical reproduction chains then were applied as effects in a digital audio workstation to the reenactment recording as well as to some additional audio. The latter featured short samples of various isolated expressive means, such as a few seconds of a spiccato section, broad portamenti on every string of the violin, or various vibrato rates and ranges on each note on a chromatic scale covering the whole range of the Markneukirchen violin (see Appendix, Folder 6). These samples were recorded by Johannes Brzoska immediately after the reenactment, making use of the same materials (violin, strings, bow) and studio setup as before (cf. Figure 5). Table 1 provides an overview of the performance parameters measured in these samples before (“nonconvolved”) and after (“convolved”) convolution with a full reproduction chain (Kreisler‘s original 1911 recording combined with Gramophone No. 1).
Performance parameters measured in convolved and nonconvolved samples of sonic gestures (convolution chain: Kreisler 1911 and Gramophone No. 1).
aMeasured for sound decomposition within polyphonic samples; cf. Results and Discussion.
To examine the effects on articulation, we selected two bowing techniques which may have been used interchangeably for certain musical passages, depending on interpretation and speed of playing (tempo): détaché (long, separated bow strokes) and spiccato (short, bouncing bow strokes). These two techniques were used as case examples to test whether the quality of translation and, hence, the probability of correct recognition of playing techniques (in general) may be impaired by the mechanical reproduction. To do so, we analyzed the respective samples’ spectra before and after convolution. In a second step, we added device-related surface noise (recorded from the respective gramophone in question) to the audio in order to adjust to an authentic signal-to-noise ratio (SNR).
As has been previously concluded from Kob and Weege (2019), a general property of all gramophone transfer paths is the irregularity of resonances. Therefore, an inherently pitch-changing gesture such as portamento can be expected to undergo a device-induced level modulation. To trace this, 28 instances of portamento (seven on each violin string) from the additional audio samples were convolved with the IR of a selected full mechanical transfer path (again, Kreisler’s original 1911 recording, combined with Gramophone No. 1). Following that, a three-step MATLAB analysis was conducted. First, the level of each slide over time was calculated before and after convolution by a modified root mean square (RMS) measurement (time constant: 50 ms). The difference between both level trajectories was determined, resulting in a level-modulation curve showing the isolated effect of the transfer path. Second, the pitch trajectory was detected for each slide, using the “SWIPE” pitch detection algorithm by Camacho and Harris (2008) in MATLAB. Although each portamento sample was executed bidirectionally (i.e., up and down), only the downwards part was selected for further processing since it had the most continuous progression. Third, the level modulation curve of each selected part was mapped onto its corresponding pitch detection curve to obtain a level-versus-pitch assignment. The pitch curve was carefully interpolated beforehand to be fully continuous (gap free) and continuously decreasing, assuring an unambiguous assignment to the level curve.
As Hähnel and Martensen (2019) have suggested, the vibrato of singers can undergo a significant AM when passing through a mechanical audio device chain. Since in that case the strength of the effect is dependent on specific voice formants and their interaction with gramophone formants (or resonances, respectively), it shall be investigated here whether the formant and overtone structure of the violin can lead to such a modulation effect too. For this purpose, the additional violin audio samples containing chromatic scales with an increasing pitch vibrato (FM) on each note of the four strings were convolved with a full reproduction chain (Kreisler 1911, combined with Gramophone No. 1). To measure the effects on AM, the samples were processed by a MATLAB script, determining the difference in AM before and after convolution. For this purpose, a momentary RMS level (time constant: 50 ms) was calculated before and after, and the difference was returned. The amount and speed of the vibrato oscillations were also measured.
To measure the effect of the chains on string timbre and, in particular, Kreisler's original 1911 sound, the gut string version of the 2019 reenactment has been opposed to the steel string version in spectrographic comparison, making use of the software SV for spectrograms and of iZotope Insight for waterfall diagrams (i.e., 3D spectrograms). Finally, in addition to string timbre, we applied spectrograms in order to obtain some information on the piano sound as well as on interinstrumental balance. This seemed indispensable as in Kreisler's “Liebesleid,” the accompanying piano contributes to the music (and, therefore, to the recordings’ frequency contents) to a considerable degree. However, due to the scope of this study (exclusively focusing on violin gestures), we refrained from more detailed examination of the piano sound apart from its effects on instrumental balance.
Results
Reenactment and Recording Chain: Restauration of the 1911 Kreisler Sound
While rehearsing for the reenactment recording, Violinist Brzoska reported that Kreisler's portamenti and articulation were a lot easier to reproduce on the sheep gut strings than on modern strings, since the further would respond faster and be more flexible to the bow strokes. Kreisler's continuous vibrato, on the other hand, was remarkably harder to imitate because it demanded more power than on a modern violin setup. Up to and including the recording, Kreisler's vibrato therefore could not be fully realized. A demonstration video, published on YouTube, allows it to directly compare the original 1911 Kreisler performance with the outcomes of the reenactment (Vollmer & Bolles, 2021; also available by entering “Liebesleid 1911 / 2019” in YouTube's search bar; cf. Sound examples 4–7, Appendix, Folder 4).
The FRFs resulting from the spectrum measurements of the reenactment and the original 1911 Kreisler recording are given in Figures 10–12. Comparing source and target spectra in Figure 10, it can be seen that from the rich and full spectrum of the performance (source) only a fraction in the mid-range remains in the historical target. One may expect the historical recording chain to exhibit a distinct bandpass characteristic, which is exactly what we find for the source–target–filter FRF displayed in Figure 11. The main passband spans from 300 Hz to 1.5 kHz, which is only a little more than two octaves; we also observe several resonances typical for horn–soundbox combinations of mechanical recording setups (cf. Weege et al., 2018, p. 1709): in the attenuated low-frequency area between 100 and 300 Hz, within the passband at 350 Hz, 650 Hz, and 1.1 kHz, and a rather mild soundbox resonance around 2.8 kHz. Parts of these resonances may also represent the tonal differences between the respective violins, which most likely have slightly different construction-related resonances. The low-frequency resonance around 60 Hz is probably related to mechanical rumble in the historical recording. From 4 kHz on, a plateau is reached. This is most likely due to noise that masks other information in this range. Finally, the result from applying the inverted 1911 recording device's system FRF (Figure 12) to a transfer of the 1911 disc (experimentally restoring Kreisler's original sound) can be retrieved from the online repository (Sound example 3, Appendix, Folder 3) and are discussed in the following.

Long-term average spectra of source (reenactment, reference) and target (digital transfer of 1911 recording). Smoothed with 1/6-octave interpolation. Noise was added to the reenactment recording prior to spectral analysis to match the noise-induced high-frequency content of the historical recording.

Result of filter-matching calculation: estimated FRF of the 1911 Kreisler recording chain setup; raw calculation result (light gray) and smoothed with 1/6-octave interpolation (black).

Inverse FRF of the 1911 Kreisler recording chain setup, serving to approximately restore the original acoustical frequency response of the 1911 performance.
Reproduction Chain: Gramophones, 1901–1933
In the following, three gramophones, whose FRFs display substantial differences, are depicted for direct comparison (Nos. 1, 10, and 17; Figures 13 and 14). In addition, we compared all 20 gramophone systems’ FRFs from the collection systematically in order to determine whether there is any trend or similarity which might have contributed to a technology-borne “phonograph effect.” This was done by calculating the average of all 20 curves in MATLAB for each microphone position. The whole ensemble as well as the resulting averages can be seen in Figure 15.

FRF of Gramophone No. 1. Comparison of close, cardioid (KM184, left) and distant, omnidirectional (M30, right) microphone results. From 600 Hz and above, the characteristic resonances are located at identical frequencies. When the broad dip between 150 and 500 Hz is neglected, this range is also quite similar in shape, apart from a room antiresonance at 300 Hz on the right.

Comparison of FRF graphs of gramophone No. 10 (HMV cabinet type) and 17 (HMV portable type), gained from the close position measurement. For further details on the models, see Appendix, Folder 1, “GrList.”

FRFs of 20 analyzed gramophone systems for both microphone positions (upper row) and the respective average curves (bottom row). Left: KM184, close position; right: M30, room position. The resulting averages represent a combination of device and room properties.
The FRFs of three selected gramophones differ in many aspects: While No. 1 shows an overall high-pass character (attenuation of lowest frequencies) and several strong resonances, No. 10 is generally more a low-pass type with weaker resonances. The most even response is displayed by gramophone No. 17. This diversity in response properties seems to be characteristic of mechanical gramophone systems in general (see Figure 15): Although the individual FRF curves in the series (Figure 15, top row) display a wide variety of various resonances, the respective average is remarkably even. This means that most peaks of the different gramophone resonances balance each other out on average, so there seems to be no typical set of resonances universally valid for all systems. When investigating the graphs for general trends, we must note that room influences have not been eliminated from the FRFs and will therefore remain present in the average too. However, it is possible to discern room contributions from actual gramophone sound characteristics by comparing the two room position averages. It turns out that both positions’ graphs are quite similar in general. There is a distinct broad peak at 1.3 kHz and an increasingly steep low-pass decline starting from that frequency. The lower limit of the frequency range is determined by a similarly steep high-pass component with a corner frequency of around 130 Hz. Still, there are two marked differences: An analysis of the M30 plot showed additional bass resonances at 75 and 150 Hz and an additional attenuation in the mid-range centered around 330 Hz. Both effects can be explained by room resonances, for whom these are typical frequency ranges. As the closer positioned KM184 average is relatively less prone to room influences, these graphs are chosen for discussion in the following.
Effects on Sonic Gestures
Figure 16 shows the combined frequency response curves of two full mechanical recording and reproduction paths, representing the quality range between an early model and one of the most “linear” models we found in the collection. Compared with the close-to-linear reproduction of today's audio equipment, there is an enormous deviation from the ideal, linear transfer curve.

Combined FRFs of full acoustic transfer paths: estimated Kreisler 1911 recording path (both) plus reproduction path of Gramophone No. 1 (left; table gramophone, around 1912) and reproduction on Gramophone No. 17 (right; portable gramophone, around 1925).
Articulation
Examining the effects on articulation, two exemplary samples (close-to-linear recordings, no filtering) from the set of additional audio files (see Appendix, Folder 6) have been considered: (1) A few repetitions of one note (B5, approx. 1 kHz) played in détaché strokes on a violin (on the E-string), directly succeeded by (2) the same notes played in spiccato. We found that when a transient-rich signal-like spiccato undergoes filtering by the strong low-pass characteristics of the gramophone FRF, the constituting high-frequency content is considerably weakened, reducing the prominence of the transient. The high-frequency attenuation also, however, weakens characteristic noise components as found in détaché. However, this did not make it entirely impossible to discern these bowing techniques on even the “darkest” sounding gramophones, i.e., those with the least pronounced high-frequency response. As illustrated by the waterfall diagrams of Figure 17, détaché and spiccato are still distinguishable due to their structural differences in time domain: We see a lot of similarities in the spectra, but the spiccato notes with all harmonics are more separated by small but distinct gaps in time (obvious in the frequencies from about 5,000 Hz upwards). This did not change under the application of the mechanical transfer path. However, the differences almost disappeared as soon as the device-related surface noise was added to the convolved samples, as shown in Figure 18.

Waterfall diagrams (3D spectrograms) of détaché (left) and spiccato strokes (right), unaltered. Samples identical to Figure 17. Axis explanation: Time grows from right to left, frequency grows from top to bottom, height displays level.

Waterfall diagrams (3D spectrograms) of détaché (left) and spiccato (right) with added noise, convolved with full transfer path (1911 Kreisler recording chain, 1912 HMV gramophone reproduction chain [No. 1]). Axis explanation: Time grows from right to left, frequency grows from top to bottom, height displays level.
Portamento
Figure 19 shows two examples of a level-versus-pitch diagram after convolution of the represented portamentos with the full mechanical recording path (for further results, see Appendix, Folder 6). These two examples are largely representative of the effects occurring in all portamento samples: Irregular level modulations with a range of up to 20 dB have a severe effect on any part of the portamento slide during its progression, resulting in slides significantly uneven in level. However, it must be mentioned that the consistency among the examples was rather weak. The measured level difference before and after convolution at a specific pitch may vary considerably. Whereas some resonances in the mechanical transfer path have an effect on every pass and, therefore, appear in most graphs, other areas seem to be affected by additional factors, such as varying phase and level relations of the harmonics in each example.

Level modification due to simulated full mechanical transfer path during portamento progression, depending on pitch. Two examples: A-string portamento (left), G-string portamento (right); executed on the 2019 reenactment violin. Simulated path: 1911 Kreisler recording chain plus Gramophone No. 1 reproduction chain. Graphs are not to be confused with frequency responses.
Vibrato
Of the convolved vibrato samples, four examples have been selected to illustrate the extent of AM that was due to mechanical reproduction alone (Figure 20): On some notes, this AM deviated significantly from the AM already present in the original (nonconvolved) samples. On others, additional AM was virtually negligible, or the original AM was even reduced.

AMs induced by pitch vibrato and further modified by mechanical transfer path (simulation: 1911 Kreisler recording chain plus Gramophone No. 1 reproduction chain). The mean amplitude of each modulation is given in the legend. Note the different pitch of each fundamental tone. Example numbers refer to those in Appendix (Folder 6).
In Example 01 (top left), the played note oscillates at a pitch of approximately 605 Hz, with an FM amount of 6–18 Hz, and a mean AM amount of about 3 dB. After convolution (at the output of the simulated transfer path), the mean AM amount is 2.2 dB larger, resulting in a 10.6 dB dynamic range of the note. Note that the phase of the amplitude oscillation is also shifted, resulting in a large difference signal. In Example 03, where the mean pitch is around 307 Hz, the AM stays virtually equal after convolution, which results in a small difference signal. Example 05, with a sound oscillating at approximately 492 Hz, shows the opposite behavior, where the mean AM amount after convolution is smaller than before. However, the phase of the AM is fully inverted, resulting in a large difference signal. Example 08 shows a note with a mean pitch of 820 Hz. Comparable to the first example, this segment shows an enormous increase of AM after convolution, which is not fully reflected by the numbers of the mean calculation, probably due to the numerical outlier in the middle of the sample. This example is particularly interesting because the original has very little AM itself and no clear phase of AM, but it is transformed into a signal with a strong and distinctly periodic AM. As in all examples, the AM is in phase with the FM from the vibrato, proving the latter is the cause and the former is the effect. This can be verified from the given pitch detection graphs.
String Timbre
As mentioned above, the 2019 reenactment included one recording with sheep gut strings and one with steel strings to investigate the effects of recordings on the timbre of strings in general and on Kreisler's “golden sound” in particular. Opposing the first bars of both recordings in a spectrogram shows that, in the unconvolved version, the gut strings contained slightly heightened overtone and noise components compared with the steel strings—especially from about 10 kHz upwards, which can be both heard and seen (see Figure 21, left column; “a”). However, after convolution with the full mechanical transfer chain, these differences seemed to be almost completely eliminated (right column).

Comparison of the 2019 reenactment (bars 2–4) in spectrogram view, played on sheep gut (upper row) and steel strings (bottom row), both before (left column) and after (right column) convolution with 1911 reproduction chain IRs. The x-axis is time in seconds, the y-axis is frequency in hertz (primary y-scale on the right), color representation is level in decibels (secondary y-scale on the left). See Sound examples 4, 7, 8, and 9 in Appendix, Folder 4.
For a more detailed picture, Figure 22 shows samples of open A4 and D4 string strokes on both gut and steel strings in waterfall visualization again, with a fine time resolution set to reveal small irregularities in the sound. They suggest that sheep gut sound contains several spectral and general irregularities over time (see marks within Figure 22): (a) The tone onset is slightly creaky, whereas on the steel string, the start is rather smooth; (b) mid-range harmonics show more overall roughness; (c) higher harmonics show more chaotic granularity; (d) on certain time fractions the tone almost breaks entirely, which points to a greater difficulty in playing an even tone.

Sound comparison of open A4 strings made from sheep gut (left) and steel (right) in waterfall spectrogram view. Time passes from front-right to back-left; frequency grows from top to bottom, height displays level.
When convolved with the IRs of the mechanical recording and reproduction chains, the recordings’ characteristic overtone spectra above 3.5 kHz are attenuated significantly, leaving only the fundamental and first few overtones (see Figure 23). This makes it difficult to identify the type of string merely from the spectral composition. However, the temporal differences described above are still visible in the remaining frequencies (see, for instance, the gut string's creaky tone break in the middle of the left sample).

Waterfall spectrograms of sheep gut (left) and steel string (right) after convolution with full historical transfer path. Creaky tone onset and tone break of the gut string are still visible. Note that despite the visual impression, components above 3.5 kHz are not fully removed, as the level range of the spectrogram is too small to display the actual dynamic range of the sample.
Piano Sound and Interinstrumental Balance
Figure 24 shows the first three bars of the 2019 reenactment in spectrogram representation, before (left) and after (right) convolution with the reproduction chains (Kreisler 1911, combined with Gramophone No. 1): whereas the violin shows oscillating fundamentals at around 326 Hz (upbeat, E4) and 433 Hz (A4), respectively, the piano is signified by clusters of almost straight, nonoscillating “frequency bars” starting from second 3.48 at around 100 Hz (boxes “a” and “b”). As can be both heard and seen (Sound examples 4 and 7, Appendix, Folder 4), the piano sound in the nonconvolved audio (left) consists of low fundamentals with rather quiet overtones. However, after convolution, most of these frequencies appear largely suppressed, leaving only a few traces of the lower overtones that start from around 300 Hz (“b”; overtones around 1,000 Hz are even intensified, as can be seen at seconds 4.0 and 5.5). This, in turn, means that parts of the bass register in the piano accompaniment (especially the lower octaves) are barely present in gramophone reproduction (“a”) and can be heard only because of their still present overtones, addressing the psychoacoustical residual tone effect. By contrast, the treble register (“b”) stands out to a certain degree. However, compared to the non-convolved audio, the convolved piano strongly fades to the background and, at some notes, is barely audible. Moreover, considering that frequencies above approximately 1,400 Hz appear virtually obliterated after convolution, it seems impossible to make robust statements on the piano's original broadband timbre. In addition (and comparable to the observations made with regard to violin articulation), the pianist's articulation appears critically subjected to technical surface noise, as the variance of attack and decay intensities during note onset as well as the dynamic developments during sustain and release phase are substantially dampened (see, for instance, the seemingly shortened note durations at around 4.0 and 4.5 s).

Comparison of the 2019 reenactment (first three bars) in spectrogram view, before (left) and after (right) convolution with 1911 reproduction chains. Piano-related contents are marked red. X-axis displays time in seconds, y-axis displays frequency in Hz (primary y-scale on the right), color representation is level in dB (secondary y-scale on the left). See Sound examples 4 (left) and 7 (right) in Appendix, Folder 4.
Discussion
In examining Mark Katz's hypothesis of a “phonograph effect” from an acoustical point of view, we found that the convolution with the full transfer paths of 20 different mechanical reproduction chains of the years 1901–1933 caused some remarkable modifications in the reproduction of sonic gestures such as articulation, pitch vibrato, portamento, timbre, and interinstrumental balance. In the following, we discuss the results of each step (reenactment, recording chain, reproduction chains), their implications for Kreisler's original 1911 sound, and for the representation of sonic gestures in general.
Reenactment
Even though a number of variables remain unclear and, by principle of the method, escape full control (see Limitations), the reenactment of Kreisler's 1911 “Liebesleid” production gave some substantial insights to his performance practice. First, the analysis of his playing revealed a vast number of varying expressive means, which together form an abundance of sonic gestures creating musical culmination points (Vollmer, 2021b). Second (and this may serve as evidence for the limits of experimental reenactments in general), it turned out that most of these practices cannot simply be “reproduced” by modern performers, as they elude modern playing conventions: our violinist Johannes Brzoska's struggle with Kreisler's vibrato (i.e., a large, continuous oscillation, immediately from onset to offset) suggests that the latter, whether due to recording or not, did indeed reach a remarkable extent in terms of persistence and range. At the same time, modern violinists seem to be used to smaller and more selective vibrato practices again (cf., e.g., Leech-Wilkinson, 2011, p. 19), which—inverting Mark Katz's hypothesis (H1)—raises the question whether the departure from mechanical recording from around 1925 may have played a role (among numerous other possible reasons). Third, Brzoska reported that portamento was easier to execute on (sheep) gut strings compared with modern strings. Although this claim is surely subjective and should, therefore, be taken with great caution, it might point to a remarkable parallel between the decline of portamento practices and the general tendency towards steel strings from around 1917 onward (Katz, 2023; Köpp, 2019; Leech-Wilkinson, 2006; Potter, 2017).
Reproduction Chains: Restauration of the “Original” 1911 Kreisler Sound
From comparing the convolved and non-convolved 2019 reenactment recordings to the 1911 Kreisler original both visually and aurally (Figures 10–12; Sound examples 1, 4, and 5, Appendix, Folders 3 and 4), it can clearly be distinguished how constant and non-dynamic Kreisler's tone production is. This qualifies as a technique able to cut through the noise of the recording medium, but also through a full orchestra in the concert hall.
A comparison of the 1911 Kreisler original, played back on a c. 1912 horn gramophone, to the “de-convolved” recording (i.e., the restauration attempt of Kreisler's “original sound” by application of an inverse FRF; see Sound examples 2 and 3, Appendix, Folder 3 and Figure 12) support this theory. In the historical original, Kreisler's sound gains a specific “warmth” due to the mentioned bandpass characteristics (Figures 11, 15, 16): While lower midrange frequencies (starting at c. 400 Hz) are boosted, frequencies from c. 1.2 kHz on—fostering the notion of noise as well as tone “brilliance”—are strongly attenuated. This way, his sound is rendered smooth and gallant rather than sharp and direct. Convolved by the inverse filter, the gain of both lower and higher frequencies leads to the perception of a more balanced sound. Simultaneously, noise components are strongly boosted. Kreisler's violin, however, seems to slightly change in timbre, which can be explained by the hybrid nature of the inverse response function that also contains information about the reenactment violin (see Limitations). Furthermore, the listening distance seems to grow, which partially may be due to reflections from the Stuttgart recording room, but also due to psychoacoustic effects of the removed bandpass resonances.
Overall, there is much less of the “golden tone” left than in the historical original. This affects per-tone phenomena as well: Strong level peaks on single pitches, which in the historical original lead to the perception of slight tone accentuations (e.g., all D5 pitches), appear to be evened out. This leads to the impression of a dynamically more linear playing of Kreisler. The same applies to his vibrato and portamento, which both sound slightly less conspicuous and less versatile. In our opinion, Kreisler's performance sounds surprisingly less vivid this way. We therefore suggest that his 1911 “Liebesleid” actually benefitted from the modifications done by the mechanical devices, rendering Kreisler's sound and performance more versatile. This leads us to the assumption that one of the secrets behind Kreisler's “golden tone”—at least concerning his early recordings—is in the early recording technology; in other words: The famous Kreisler sound in part may be due to technical limitations of the mechanical recording devices.
Effects on Sonic Gestures
Considering the tonal range of the violin that embraces fundamental frequencies from 196 Hz (≙ G3) up to 1,568 Hz (≙ G6) plus a wealth of upper harmonics, it is safe to assume that the combined filtering of mechanical reproduction devices has a significant, irregular impact on timbre, level, and loudness of every note of a violin performance, varying strongly from device to device. This is caused by manifold resonances with irregular center frequencies and varying magnitudes as well as by a general, relatively narrow bandpass characteristic (Figure 15), which, for the violin, results in an attenuation of lower register fundamentals and of the harmonics in the mid to upper register. The two FRF graphs in Figure 16 show the quality range between an early model and one of the most “linear” models we found in the collection. Although the latter certainly reflects some technical progress toward a more linear sound reproduction, the general restrictions mentioned above still hold true for both.
Consequently, we can assume that one particular violin performance must have been perceived differently when reproduced on different devices, depending on the respective gramophone's unique resonance characteristics. This also involves the balance between the violin and accompanying instruments. While human listeners may get used to the idiosyncrasies of a particular gramophone they know, we can still assume that contemporaries of the mechanical era must have had very diverse impressions of a particular recording.
For a modern analysis of historical recordings, the problem of determining the “real” timbre of a (nonconvolved) performance by analytic, quantitative methods can only be circumvented partly, that is, for the reproduction path, by transferring the recordings with modern reproduction equipment. However, reversing the influence of the recording path still seems virtually impossible in most cases due to the vast number of unknown parameters. It should be emphasized that this imposes an essential difficulty on extracting valid data from early recordings for the purpose of performance analysis.
Articulation and Interinstrumental Balance
The variety of articulation within our samples in mechanical reproduction was affected by strong attenuations particularly in higher-frequency areas (from approximately 2 kHz upward) due to the frequency responses of the transfer paths, which entailed a damping of sound transients present in these areas. Moreover, the devices’ resonances boosted or attenuated note onsets on specific pitches, providing additional accentuations while suppressing others. This was complemented by a substantially decreased SNR due to the addition of device-related noise (cf. Figure 18). A sufficient SNR in historical recordings therefore seems essential to allow for a differentiation of various articulation or playing techniques. This also affects the characteristics of new note onsets and, therefore, various bowing techniques.
Supported by Schaper's (2021, pp. 156–157) sources and Philip's (1992, p. 70) hypothesis of a tendency towards sharper agogic accentuation and over-dotted rhythms in recordings of the early 20th century (see also Katz, 1999), this in turn suggests that musicians in historical studio situations may have played louder (resulting in a smaller dynamical range) as well as more accentuated than they would have on stage, in order to drown out device-related noise and to ensure that contrasts in articulation remained audible when reproduced on a gramophone (H3). With regard to the convolved piano sound, we found comparable evidence, with significant consequences for interinstrumental balance: Mechanical reproduction strongly dampens the lower registers of the piano while fostering those of the violin, bringing forward the melody to the disadvantage of the accompaniment. In case of artists with considerable studio experience such as Kreisler's 1911 “Liebesleid” pianist Haddon Squire, it even seems conceivable that they tried to compensate for the devices’ frequency responses by playing the bass and alto registers “louder,” or by intensifying their articulation such as in over-dotting and over-accentuating consecutive note onsets.
Portamento
Portamento, when passing through the mechanical path, can lead to significant short-term amplitude level modifications with ranges of up to a remarkable 20 dB (cf. Figure 19, Example 05) at specific frequencies traversed. Portamentos from real music examples typically have a smaller ambitus than in the examples used here, but it would be affected accordingly, depending on the covered part of the pitch scale. However, since level boosts due to portamento glides often last for a few milliseconds only, they may pass by too fast to be perceived as extreme as the measurements suggest. Instead, the time-invariant attenuation of higher frequency areas leads to a general level decrease of the recording as a whole (not only of the portamento slides). This suggests that portamentos in a linear transfer sound more conspicuous compared with those in a convolved one, especially if the low SNR within mechanical recordings is also considered. It may be further concluded that the level of static (i.e., nonportamento) notes is also modified in the same irregular way depending on pitch. This causes dynamic variations in the music only due to the interaction of the acoustical signal with the mechanical devices, in other words, variations that have never been part of the performance. The same was observed for the small, periodic pitch variations of a vibrato (below). These observations offer supporting evidence for Mark Katz's (2006) hypothesis which traces the decline of portamento during the first half of the 20th century back to the introduction of the microphone (and, therefore, to a more linear frequency range reproduction) as the new studio standard from around 1926 onward (H2).
Vibrato
Pitch vibrato is a pitch modulation by definition, but it also initially induces AMs (i.e., volume vibratos) due to the violin's own resonances. However, when convolved with the mechanical devices’ IRs, these modulations have been either boosted (resulting in a level range of up to 10.6 dB) or attenuated (range reduced to ∼2 dB). To understand this effect, it should be noted for all examples that AM is a result of spectral components moving in and out of resonances, caused by a pitch modulation originating from the signal itself. The simulated system is time invariant and therefore does not induce AM directly. In accordance with the theory of LTI systems and convolution, pitch modulation has not been modified by our simulation, apart from small numerical deviations of the detection algorithm, as it becomes apparent from the pitch detection graphs (cf. Figure 20). In fact, even a real gramophone, being neither a linear nor a fully time-invariant system (because of mechanical irregularities that cause wow and flutter), would not change the frequency of the incoming signal markedly, unless its playback speed would be modulated at a speed and amount of several Hertz. Therefore, we state that whereas the original overall level as well as the level modulation (AM) of sounds in historical recordings should be considered as indeterminable (unless the recording devices’ IRs are known), the speed and width of pitch modulation (FM), i.e., pitch vibrato, can be confidently taken as the performance as it was actually played.
Gradually contrary to Katz's presuppositions (H1), this in turn suggests that pitch vibrato does not necessarily help against the limited receptivity of the early devices from a purely acoustical point of view. However, this still does not contradict Katz's hypothesis of an increased vibrato use due to the missing visual dimension, as it is apparent in recordings. Yet, while this seems to be a subject of psychoacoustical rather than acoustical studies, this hypothesis would have to apply to all forms of recordings, not only those from the mechanical era. In other words: If a violinist's vibrato raises his presence on record in general, no matter whether he performs in front of a 1911 horn or a state-of-the-art microphone, the rise of a “new vibrato” have not been due to the technical limitations of early recording equipment, but rather to recording as such.
String Timbre
Finally, we suggest some more fundamental findings touching the debate on historical string choices in general and its connection to Kreisler's “golden tone” in particular. Based on the observations made in the comparison of convolved gut string recordings with modern steel string recordings (Figures 22 and 23), we conclude that even though violin string timbre is strongly affected by mechanical transfer paths with regard to its spectral composition, temporal aspects such as timbre changes (such as repeating a note on another string, switching the contact point between the bow and the string, or changing the bow pressure) remain discernible in the lower-frequency areas of the signal. Within narrow limits, this allows for distinguishing between different string materials as well as for an identification of expressive timbre changes when analyzing historical recordings of the mechanical era. Considering the temporal characteristics of gut string sound as shown in Figures 22 and 23, it can even be confirmed that gut strings were used in the original Kreisler 1911 recording (cf. Sound example 1, Appendix, Folder 3): the creakiness is both visible and audible, and the way certain portamento slides and bow changes progress also indicate the use of gut strings.
Overarching Conclusions
Altogether, the characteristics of mechanical recording and reproduction devices led to dynamical variations that have never been part of the performance. Recordings from the so-called “mechanical” era therefore do not only reflect historical performance practices alone, but also the manifold influences of the recording technologies in place. Furthermore, our observations suggest that these technologies substantially affect the representation of various interpretation styles: In the case of Kreisler's 1911 “Liebesleid” performance, large articulation variations were rendered in inverse convolution as sounding rather balanced, rather discreet portamentos as highly conspicuous, large dynamical ranges as well as warm timbres as rather small and bright. In other words, Kreisler's playing may have sounded far more linear on stage than it appears from the record. This becomes all the more crucial if the significance of Kreisler (and his “Liebesleid” performances) for contemporary violin performance is considered, as has been done by Flesch, Philip, Katz, and others. Without overstating a singular technological influence on shifting performance practices, it seems plausible to us that recordings from the beginning of the 20th century onward retroacted on musical performance: Musicians had to react to the changing expectations of their audiences, potentially by adopting the practices they heard on record. Understood this way, the shifts in gestures practices may be described as secondary “phonograph effects”: new developments in performance practice, which previously had been formed on stages, now increasingly have been initiated by records.
What does this tell us about the reliability of early recordings as sources for musical performances? We suggest that they each stand for singular studio situations rather than for (generalizable) performance styles, each representing unique syntheses of performance and technology. Moreover, if an exact reconstruction of a performance from a recording is pursued, the reconstruction of the whole associated recording chain would be the first step to filter out the modifications by the technology. This, in fact, seems highly unrealistic from both a synthetic and an analytic point of view, as to the present day, we still have a very limited knowledge of the original recording devices as well as of the various recording strategies in place. A possible solution, however, may be in comparative studies across multiple recordings of the same piece by the same performer in search of recurring sonic elements, addressing generalizable distinctions between intentional and technological aspects of performance this way.
Limitations
It must be noted that there remained uncertainties in every single step of the study. First and foremost, certain premises of our approach may seem somewhat circular. This particularly concerns the use of a mechanical recording as source for Kreisler's performance practice while eventually concluding that these very sources are highly problematic. However, as stated in the introduction, this restriction has been of rather secondary importance: With the reenactment mainly serving as a representative reference signal, it was only required to formally come close to the 1911 original with regard to pitch content, duration, and the gestures of interest, corresponding to the original in a relative rather than in an absolute dimension. In other words, due to their already highly convolved audio, historical records may not serve as reliable references to pin down exact performance practices, but from recording the sounds and gestures of interest with modern equipment and the subsequent convolution with historical filters, we may deduce which parts have been further modified by the reproduction technology (and, therefore, appear manipulated in first place already).
As for the reenactment, it turned out that certain aspects (i.e., vibrato) have not been fully consistent with the 1911 model. However, to do so in a sufficient manner, a modern violinist would require months, if not years, of in-depth training (e.g., Kreisler's posture and technique of bow grip), which was impossible to achieve for this study. Moreover, considering instruments and rooms, we cannot determine with absolute certainty how significant the sound differences between Kreisler's Guarneri and our violin actually were, or how acoustically dry the original recording studio has been. As for our attempt to restore Kreisler's original sound via a filter function (Figure 11), this implies that the room responses and instrument timbres have also been partly transformed into their respective counterparts. Comparably, this applies to nonlinear distortions of the devices and early reflections of the respective studios. Thus, apart from undoing the effects from horn and soundbox, we placed Kreisler half-way in Stuttgart, half-way in London, playing a hybrid Guarneri–Markneukirchen violin, so to speak. Thus, our experimental restoration of Kreislers “original sound” (Sound example 3, Appendix, Folder 3) may in fact come close to the original, but only to a certain, ultimately nondeterminable degree.
Ultimately, due to this study's LTI approach, the potential effects of mechanical reproduction systems on sound had been limited to statements on frequency-related influences. What has not been further investigated from this point of view are nonlinear modifications due to the process of mechanical recording, e.g., sound-box distortion, wax cutting, and general record surface noise. This, in turn, has consequences on statements about “musical impulses” (accents) and other transient signal characteristics: we may describe how their spectrum is influenced, but not whether they are compressed or distorted. However, since transients in music in most cases also constitute changing spectral compositions, this limitation may not be as severe as it might seem at first glance.
Implications and Future Studies
A number of claims made here, particularly considering the potential long-term effects of early sound reproduction on performance, call for further examination from the perspectives of psychoacoustics and reception history. Not being mentioned here so far, but originally constituting a central aim of this study, a small, nonrepresentative, exploratory pilot study among undergraduate music students (N = 16) has been conducted by the authors in this regard to gather some first impressions on potential effects of mechanical reproduction chains on the perception on modern-day listeners. Since the qualitative design of the study (i.e., free text answers mainly) did not allow for inferential statistics, we refrained from discussing the results here. However, a number of observations strongly indicated towards additional evidence for the assumptions made in the previous discussion. A detailed report of this study is accessible via Vollmer (2021a; in German).
To provide a starting point for further studies, we publish the IRs of our 20 gramophone setups as well as of Kreisler's 1911 recording chain in open access (Creative Commons 4.0; Vollmer & Bolles (2024) and Appendix, Folder 2—as described previously, they can easily be applied as effects in any digital audio workstation). In this way, they may contribute to a broader understanding of mechanical recordings as highly ambiguous sources for performance practice research and, in particular, of the strong interconnections between musical expressivity and early sound reproduction technologies.
Footnotes
Acknowledgements
We would like to thank the Stuttgart University of Music and the Performing Arts (Institute for Musicology, Music Pedagogy, and Aesthetics) for generous support with resources, which enabled this project in the first place. The 2019 reenactment could not have been realized without Johannes Broszka (violin performance) and Sophia Weidemann (piano performance), Philip Wetzler (project assistance), Christine Busch and Arne Morgner (counsel), Florian Wiek and Peter Kranefoed (piano supply and tuning), and Christian Büsen (room supply). Xenia Bömcke has been invaluable in participating in our gramophone recording tour to Tiel (Netherlands, 2019). Christian Zwarg (see Note 8) not only processed the linear 1911 Kreisler recording, but also provided us with invaluable feedback and sources on historical recording practices. Finally, we were grateful for the opportunity to discuss our project at the conference “Mechanical technologies and their transfers,” held in February 2022 in Berlin as part of the AHRC-funded research project “Redefining Early Recordings as Sources for Performance Practice and History” (project leaders: Karen Martensen, Inja Stanović, and Eva Moreda Rodríguez).
Action Editor
Frank Hentschel, Universität zu Köln, Musikwissenschaftliches Institut.
Peer Review
Eithan Ornoy, The Academic College Levinsky-Wingate, Music Education.
David Milsom, University of Huddersfield, Department of Music & Design Arts.
Contributorship
Johannes Broszka (violin performance), Sophia Weidemann (piano performance), Philip Wetzler (project assistance), Christine Busch (performance advice), Arne Morgner (technical advice), Florian Wiek (piano supply), Peter Kranefoed (piano tuning), Christian Büsen (room supply), Xenia Bömcke (data collector), Christian Zwarg (signal processing).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
This research did not require ethics committee or IRB approval. This research did not involve the use of personal data, fieldwork, or experiments involving human or animal participants, or work with children, vulnerable individuals, or clinical populations.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Data Availability Statement
All resources and data generated for the purpose of this study, including original recordings and IRs of all recording and reproduction chains (such as Kreisler's matched 1911 recording chain and the 20 measured gramophone setups), are openly available at Zenodo.org: “In Search of the ‘Phonograph Effect’: Online Repository”, https://doi.org/10.5281/zenodo.10493795 (Vollmer & Bolles, 2024; CC 4.0 license). See also a listing of the repository's context in the Appendix as well as the supplementary material in Vollmer and Bolles (2021) and Vollmer (2021a,
).
Notes
Correction (April 2024):
Article updated to add the reference detail of Stanović & Stanović, 2021.
Appendix
Table of online repository contents (Vollmer & Bolles, 2024).
