Abstract
Music analysis, in particular harmonic analysis, is concerned with the way pitches are organized in pieces of music, and a range of empirical applications have been developed, for example, for chord recognition or key finding. Naturally, these approaches rely on some operationalization of the concepts they aim to investigate. In this study, we take a complementary approach and discover latent tonal structures in an unsupervised manner. We use the topic model Latent Dirichlet Allocation and apply it to a large historical corpus of musical pieces from the Western classical tradition. This method conceives topics as distributions of pitch classes without assuming a priori that they correspond to either chords, keys, or other harmonic phenomena. To illustrate the generative process assumed by the model, we create an artificial corpus with arbitrary parameter settings and compare the sampled pieces to real compositions. The results we obtain by applying the topic model to the musical corpus show that the inferred topics have music-theoretically meaningful interpretations. In particular, topics cover contiguous segments on the line of fifths and mostly correspond to diatonic sets. Moreover, tracing the prominence of topics over the course of music history over
Introduction
Central questions in music theory and music information retrieval (MIR) concern the discovery of latent structures in musical pieces that abstract from the musical surface, that is, the observed notes or audio signal. Two common tasks in MIR that address these questions with respect to Western classical music are chord recognition and key finding, which, respectively, entail the retrieval of the harmonic content of a (usually short) segment of music (Müller, 2015; Korzeniowski and Widmer, 2018b; Micchi et al., 2020; McLeod and Rohrmeier, 2021) and the classification of the entire piece or larger segments into a musical key, for example, one of the 24 major and minor keys (Faraldo et al., 2016; Korzeniowski and Widmer, 2018a; Temperley, 1999; Weiß et al., 2020).
While chord recognition focuses on local and key finding on global levels, music theory suggests the existence of several intermediate levels that are frequently conceptualized as being hierarchically nested (e.g., Hauptmann, 1853; Schenker, 1935; Salzer, 1952; Lerdahl and Jackendoff, 1983; Lerdahl, 2001; Rohrmeier, 2011, 2020; Rohrmeier and Moss, 2021). A number of psychological studies provide evidence for the perceptual reality of hierarchical organization in music (Krumhansl, 2004; Tillmann and Bigand, 2004; Koelsch et al., 2013; Farbood, 2016; Herff et al., 2021) but the exact relation between theoretically postulated and perceived hierarchies is not yet fully understood.
Moreover, a clear distinction between different levels might not always be possible, which suggests that discrete categories (such as chords and keys) might not entirely capture all aspects of tonal organization. 1 Circumventing the notion of a fixed number of discrete levels, Sapp (2001) introduces key scapes that map a key estimate provided by the Krumhansl–Kessler algorithm (Krumhansl, 1990) to all possible segments of a piece. His analyses show that stable tonal regions form over the course of tonal pieces, which he interprets as points on a continuous spectrum of increasing specificity, namely ‘key of piece’, ‘strong keys’, ‘weak keys’, ‘tonicizations’, ‘cadences’ and ‘chords’ (Sapp, 2005). This approach was extended by Lieck and Rohrmeier (2020) to infer prototypical modulation plans of musical pieces, and by Viaccoz et al. (in press), who combine key scapes with the application of the discrete Fourier transform to pitch-class distributions. Instead of assuming a fixed number of levels or relying on pre-defined concepts, such as chords, local keys, and global keys, this rather continuous view of the hierarchical organization suggests using unsupervised methods to infer latent musical structures.
Several unsupervised and/or statistical methods have been successfully employed in music research for the analysis of musical structures across a large range of musical styles and genres. For instance, a number of studies have analysed Rock music with respect to harmony (De Clercq and Temperley, 2011; Temperley and de Clercq, 2013; Tan et al., 2019) and form (de Clercq, 2017). Others have studied harmony or improvised melodic solos in Jazz (Broze and Shanahan, 2013; Pfleiderer et al., 2017), harmony and form in Brazilian Choro (Moss et al., 2020b), or pitch-class distributions in atonal music (Ballance, 2020). Western classical music has been studied most extensively, and analyses range from studies of harmony based on distributions of pitch-classes or chord labels, or features extracted from audio (Albrecht and Shanahan, 2013; White, 2013; Moss et al., 2019; Rohrmeier and Cross, 2008; Hedges and Rohrmeier, 2011; Bellmann, 2012; van Balen, 2016). An increasing number of studies is recently addressing questions on the historical development of Western music on data-based accounts (Serrà et al., 2012; Burgoyne et al., 2013; Rodrigues Zivic et al., 2013; Albrecht and Huron, 2014; Gauvin, 2015; Mauch et al., 2015; Huang et al., 2017; Weiß et al., 2018; Nakamura and Kaneko, 2019; Moss, 2019; Yust, 2019; Anzuoni et al., 2021; Harasim et al., 2021).
In natural language processing as well as the digital humanities, a widely used approach for the unsupervised discovery of latent structures in textual sources is commonly subsumed under the term topic modelling (Steyvers and Griffiths, 2007; Piper, 2018), one of its most prominent variants being Latent Dirichlet Allocation (LDA; Blei et al., 2003). Taking a textual example, one would expect that a document with the topic ‘politics’ contains many names of politicians, institutions, states, or political events such as elections, wars, and so forth. It is rather unlikely that such a text would contain the names of composers, musical pieces, or music-theoretical terms such as ‘augmented-sixth chord’, ‘symphony’, ‘Passacaglia’, and the like. Topic models turn this argument around and assume that the hypothetical text is about politics precisely because it contains many words from political topics. That is, topics are defined by the frequency of co-occurrence of certain words. This is sometimes called the distributional hypothesis (Harris, 1954). Therefore, topic models define a composite model that describes documents (e.g., texts), which are represented as ‘bags of items’ (e.g., words) as mixtures of several latent topics which, in turn, are defined as distributions over the same items.
This notion of a topic can directly be translated to the musical case if one considers pieces to be the documents and pitch classes to play the role of words in these documents. The vocabulary is then the set of all pitch classes that appear in any document in a corpus, and topics correspond to distributions over these pitch classes that represent which pitch classes frequently co-occur in the pieces. Accordingly, a piece containing only one or a few topics can thus be considered to be tonally more coherent than one that has a larger number of topics, in the same manner that texts are thematically more coherent when they talk about fewer topics. 2
This translation of the LDA topic model to the case of music, however, is only structural and not semantic, in that the algorithmic structure is identical but the interpretation of basic units and topics is highly domain-dependent. Crucially, applying LDA to music does not entail that pitch classes play a similar or even analogous role as words do for language, for the simple reason that pitch classes bear very little semantic meaning on their own. While the semantics of words also are not completely independent of their context (Baayen, 2001), musical meaning is fundamentally contextualized (Huron, 2006; Koelsch, 2012; Schlenker, 2017). In the following sections, we thus do not claim that pitch classes in music possess similar functions to words in language but that the topics defined by the LDA model attain their meaning purely through the collocation of pitch classes in musical pieces, and that they need to be interpreted accordingly. Arguably, somewhat larger units in music – such as melodic fragments, motives, or short harmonically consistent segments – constitute more meaningful basic units. They are, however, not explicitly represented in the music and subject to manual or automated analysis, which in itself is a difficult task associated with some degree of ambiguity. By relying on ‘atomic’ pitch classes as basic units, the interpretation of the topics is thus less straightforward but circumvents the problem of inferential uncertainties of higher-order musical patterns. Moreover, the particular representation of pitch classes that we employ entails some notion of membership of a certain scale, which aids in the interpretations of the topics obtained (see below for more details).
LDA for symbolic representations of music
Whereas topic models are ubiquitous in the digital humanities, in particular in the context of literary studies (e.g., Blei, 2012b; Underwood, 2012; Jockers and Mimno, 2013; Goldstone and Underwood, 2014; Rhody, 2012; Piper, 2018), there are to date only a few applications of the LDA topic model to music that are not based on textual data (such as metadata or lyrics) but that take the note content of musical pieces into account.
Mauch et al. (2015) use LDA for harmonic and timbral descriptors of features derived from audio recordings of Pop music and study the historical evolution of these topics between 1960 and 2010. Lieck et al. (2020) introduced the Tonal Diffusion Model (TDM), which is conceptually related to LDA but also incorporates information about interval relations between pitch classes given by the topology of tonal space to model the distributions of pitch classes in pieces by Bach, Beethoven and Liszt. It essentially explains the occurrence of tonal pitch classes by tracing them back to a common tonal centre through paths on the Tonnetz (Cohn, 1998) that are given by interval combinations of ascending and descending perfect fifths, major thirds and minor thirds, each also associated with certain inferred weights. This allows them, for a given piece, to infer the most likely combination of interval weights that gave rise to the piece’s pitch-class distribution according to the particular assumptions of their model.
Most closely related to the present study is the work by Hu and Saul (2009a, 2009b). They interpret musical keys as topics and use LDA to infer tone profiles (in the sense of distributions over the 12 enharmonically equivalent pitch classes) from pieces. They evaluate the inferred topics against the tone profiles provided by Krumhansl and Kessler (1982) and use their topics to trace key changes over the temporal course of a musical piece. Unfortunately, they do not report the numerical values of their profiles so that a direct comparison with the other profiles is impossible. The present approach differs from theirs in several regards.
First, they use a relatively small sample of manually selected pieces. 3 Their dataset approximately spans a historical range of 260 years, whereas the number of pieces and composers, as well as the extent of the historical range of the corpus supporting the present study, is much larger (see ‘Corpus and representation of notes’).
Furthermore, in Hu and Saul’s (2009b) approach, the basic units of the model are not individual pitch classes but short segments of music that contain a certain amount of temporal information, which is not the case here. This allows them to trace changes of key (modulations) in the pieces and evaluate these assignments against the predictions by the Krumhansl–Kessler key-finding algorithm (Krumhansl, 1990). Our approach, in contrast, traces the presence and absence of latent topics on a historical time scale and interprets this as reflections of underlying changes in compositional procedures that bring about changes in tonality.
The two topics found by Hu and Saul (2009a) resemble the major and minor profiles by Krumhansl and Kessler, but their major profile in particular rather emphasizes the notes of a major triad than those of a major key with high weights for pitch classes 0, 4 and 7 (possibly an artefact of the segment level that they introduce) with the other in-scale pitch classes having very low weights, and pitch class 10 being stronger than pitch class 11 (the leading tone). This again suggests a more fine-grained conception of latent musical structure than just chords and keys.
Most importantly, their data is encoded in Musical Instrument Digital Interface (MIDI) format, which only allows for the representation of 12 distinct pitch classes that do not allow to distinguish enharmonically equivalent notes. In contrast, the dataset used here encodes the exact spelling of the notes (but without octave information), which is sometimes called the tonal pitch-class (Temperley, 2000; Moss et al., in press) representation, leading to a larger vocabulary of pitch-class types and enabling enharmonic distinctions, for example, between F
Corpus and representation of notes
The corpus used in this study is the Tonal Pitch-Class Counts Corpus (TP3C; Moss et al., 2020a). It consists of 2,012 musical pieces by 75 composers encoded in MusicXML format, spanning a range of almost 600 years and containing more than 2.7 million notes in total. It is available at https://github.com/DCMLab/TP3C/ and it was assembled from a range of different sources, such as scores from the Electronic Locator of Vertical Interval Successions (ELVIS) project,
4
and the Humdrum **kern scores of the Center for Computer Assisted Research in the Humanities (CCARH)
5
, as well as files from public repositories such as the Choral Public Domain Library (CPDL),
6
or the community page of the MuseScore notation software,
7
whereas others have been transcribed at the Digital and Cognitive Musicology Lab (DCML).
8
A full list of the pieces and sources used is given in (Moss, 2019). The MusicXML encoding allows representing notes as tonal (spelled), as opposed to enharmonically equivalent, pitch classes. Using the chromatic circle or the circle of fifths would thus discard potentially valuable information, for instance, enharmonic differences between notes (e.g., between C and B

Schematic depiction of the tonal pitch classes on the line of fifths mapped to integers in
Note, in particular, that we represent pieces as absolute pitch-class distributions, that is, we do not transpose them to a common center, for example, the tonic of a piece. The main rationale for this decision is that defining or inferring a tonic to which all pitch classes can be related is not equally feasible for all historical periods in our corpus. It is certainly appropriate for common-practice compositions (roughly from the Baroque to the early Romantic periods) to draw on the concept of a global tonic, but earlier Renaissance pieces based on modality as well as later late-Romantic or Modernist pieces in the idiom of extended tonality may employ different notions of tonal centers. 9 Moreover, the general assumption of transpositional equivalence has been called into question, in particular on historical reasoning (Quinn and White, 2017; Rom, 2011). It is important to bear this in mind, in particular when interpreting our results. Since we base our study on pitch-class distributions from untransposed pieces, the inferred topics likewise will reflect underlying absolute pitch-class distributions and thus, for instance, allow us to draw conclusions about the absolute prevalence of certain scales, keys, or modes (in the modal music sense) in the corpus. It would be, in principle, possible to adapt the present study to the case of relative pitch classes by first estimating for each piece its tonal centre and transposing all of its pitch classes accordingly. The results and their interpretation would necessarily change since one would essentially ask a different research question, and we plan to explore this avenue in our future work.
Consider the distribution of tonal pitch classes from the first movement of Charles Valentin Alkan’s Concerto for Solo Piano, op. 39, no. 8 (1857). The distribution of tonal pitch classes in this piece on the line of fifths is shown in Figure 2. The colours emphasize the ordering of the line of fifths as well as the distance from the central D by the intensity of the colors and the direction towards more flat or sharp tonal pitch classes by the blue and red hues, respectively (see Figure 1). Note that this piece contains more than 12 different tonal pitch classes and spans a range from F

Distribution of tonal pitch classes of the first movement of Alkan’s Concerto for Solo Piano, op. 39, no. 8 (1857).
The multi-modal shape of the distribution suggests modelling its generative process as a weighted mixture of a small number of simpler distributions. This modelling assumption entails understanding the overall distribution of tonal classes in a piece as a mixture of different tonal profiles. The components might, for example, correspond to the tonal pitch-class distributions of sections of a piece that are in different keys, and which may moreover contain chromaticism and enharmonicism. LDA naturally expresses these relations of weighted latent components (topics) that underlie the observed notes in a musical piece.
Overview
In the following section (‘Topic modelling with LDA’), we first illustrate the generative process for the tonal pitch-class distribution of a musical piece according to the assumptions and structure of the LDA model by creating an artificial corpus of pieces according to arbitrary parameter settings of the model (‘Generating pieces’). Subsequently, we apply this procedure to the TP3C corpus, inferring the latent topics using Gibbs sampling (‘Inferring topics’), and discussing their music-theoretical interpretations both qualitatively (‘Tonal profiles obtained by topic modelling’) and quantitatively (‘Topic similarities’). We further observe changes in the prominence of topics over the course of music history (‘Historical evolution of topics’) and conclude with a general discussion of the approach, outlining potential avenues for future research (‘Conclusion’).
Topic modelling with LDA
Generating pieces
The LDA model establishes probabilistic relations between topics and documents
10
in a corpus and specifies a generative process that is assumed to underlie the distribution of notes in pieces (Blei, 2012a). This generative model can thus also be used to create new documents, given a certain setting of the parameters of the model. It is important to note that ‘generating’ does not mean that LDA attempts to simulate the process of the composition of a piece. We will first illustrate the model by artificially generating an artificial corpus with
We start by creating a distribution of topic weights for each document in the corpus. These distributions
The sampled probabilities of the three topics in all 20 documents are shown in Figure 3. Topic 1 is shown in red, topic 2 in orange, and topic 3 in gray. For example, the topic distribution in document 1 is relatively balanced with

Topic weights for
This happens in the next step, where each of the

Tonal pitch-class distributions for
So far, we have sampled distributions over the
Next, we use the topic weights
A graphical representation of the LDA model in so-called plate notation (Bishop, 2006; Koller and Friedman, 2009) is shown in Figure 5. Circular nodes represent the observed (shaded) and latent (white) random variables of the model. The two hyperparameters

Graphical model for Latent Dirichlet Allocation (LDA) describing the relations between random variables
The artificial corpus of all

Counts of tonal pitch classes in
Comparing the overall distribution of tonal pitch classes of all documents in the artificial corpus (Figure 7a) with that of the corpus of musical pieces used in this study (Figure 7b) shows a number of important differences: the former is spread out across the whole line of fifths, with the three most common tonal pitch classes being G

Tonal pitch-class distribution in a corpus of artificially sampled pieces (top) and the corpus (bottom). (a) Overall tonal pitch-class distribution in artificially generated corpus with
Inferring topics
While generating pieces corresponds to sampling from the joint distribution given in equation (5), finding topics in the pieces corresponds to computing the conditional distribution of all latent and observed variables given a corpus, which is given by
Consider the tonal pitch-class counts in 20 randomly selected pieces from the corpus shown in Figure 8. Contrary to the 20 pieces in the artificial corpus (Figure 6), this set of pieces is much more diverse in many regards. None of the pieces spreads across the whole range of the line of fifths and contains, for example, instances of tonal pitch classes with two flats and sharps at the same time. While some pieces contain almost only ‘sharp’ (red) tonal pitch classes, for example

Tonal pitch-class counts in a sample of 20 pieces from the corpus.
The task of the Gibbs sampling procedure is now to find the most likely topics, that is, tonal pitch-class distributions
Tonal profiles obtained by topic modelling
Which topics are latent in the pieces of the corpus? The LDA model takes a fixed parameter
The inferred tonal pitch-class distributions for seven topics are shown in Figure 9. Note that the order of the topics has no particular meaning and that some weights for tonal pitch classes in the topics are too small to be displayed. The numerical values for the probabilities of the tonal pitch classes in the respective topics
Tonal pitch-class distribution for
It turns out that the shapes of the tonal pitch-class distributions for

The note distributions for the
This means that the best explanation according to the LDA model for the tonal pitch-class distributions in the corpus consists of several diatonic sets in the middle range of the line of fifths (mostly without accidentals) plus two topics representing the two extremes, flats (Topic 1) and sharps (Topic 2). We can interpret this as a corroboration of the validity of the results since Western classical pieces are commonly organized around one or a few keys that are related by fifths.
Along the same lines, Topics 1 and 2 can be interpreted as chromaticism that for instance occurs when composers write chromatic passing notes in otherwise diatonic passages. Moreover, they are responsible for pitch-class distributions stemming from less frequently employed keys relatively far from the center of the line of fifths, for example, E
The topic distributions for other values of
Topic similarities
Since topics are defined as distributions over tonal pitch classes, one can define appropriate measures to assess the similarity between them. For two discrete probability distributions
The similarities between all pairs of the

Jensen–Shannon similarities for
Because the probabilities of tonal pitch classes in these two topics are different, their similarity is relatively low in absolute terms while still being the largest among all pairs of topics. The two most distinct topic pairs are topics 1 and 2, topics 1 and 5 and topics 2 and 7, each with a similarity score of
Historical evolution of topics
We now investigate how the prevalence of certain topics in musical pieces changes historically. Recall that in the LDA model, each note

Average distribution of topics for all documents in the corpus for
Using topic modelling in the context of historical studies entails certain assumptions. As mentioned before, LDA is based on the bag-of-notes model and thus does not know the order of notes within a piece. Beyond that, it also does not have a concept for the order of pieces in the corpus, although some recent variants of the model attempt to incorporate chronological information (e.g., Blei and Lafferty, 2006; Zhu et al., 2016; Beykikhoshk et al., 2018). Under the basic LDA model, all pieces in the corpus are treated equally to infer the overall topics, regardless of the time of their composition. This loosely corresponds to a synchronous perspective that a music theorist has when analysing pieces. Since dates of composition or publication dates of the pieces in the corpus are known, we are in a position to compare the topic distributions in the pieces diachronically to consider historical changes in these distributions.
To trace the topic evolution in the corpus, we calculate first the average topic distribution of all pieces for a given year for which we have pieces in the corpus. We moreover assume that the average topic distribution does not change if we do not have data for a year within the time range. Subsequently, we calculate a moving average with a window size of 35 years over the distributions that returns smoothed values for each topic while at the same time ensuring that the distributions per year always sum to 1. As mentioned above, we expect to see changes in tonality to be reflected in the historical development of the latent topics, in particular the increase in the usage of less common keys, as well as chromaticism and enharmonicism over the course of the 18th and 19th centuries.
The topic evolution over time for

Topic evolution for
The ‘Supplemental material’ shows the tonal pitch-class distributions, topic similarities, as well as average topic distributions and topic evolution plots for other values of
Conclusion
In this study, we applied the LDA topic model to a historical corpus of Western classical musical pieces and showed that it can be used to infer music-theoretically meaningful topics. The obtained latent topics are well-interpretable from a music-theoretical perspective and correspond to contiguous segments on the line of fifths. The found topics roughly fall into two classes: topics that correspond approximately to diatonic collections of notes, and topics that represent chromaticism and less common keys. Subsequently, we observed the prominence of the topics in the corpus diachronically and found that the number of topics increases over the course of history and that the two ‘chromatic topics’ are particularly prevalent in the 19th century.
This approach opens up a number of possible extensions. Drawing on the results from this study, it seems promising to extend the basic model by including our findings, for example, to restrict topics to line-of-fifths segments or to introduce a variable that decides whether the tonal/spelled or enharmonically equivalent pitch-class representation is more appropriate. Moreover, while the pitch dimension is often in the centre of music-theoretical analysis, incorporating other features to the basic LDA model, in particular, the duration of notes, formal sections of pieces, or rhythmical and metrical properties, as well as the incorporation of information about the sequential order of notes are promising directions for future research.
Footnotes
Acknowledgements
We are grateful to Markus Neuwirth, Christoph Finkensiep, and Gabriele Cecchetti for comments on an earlier version of this manuscript.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Swiss National Science Foundation within the project ‘Distant listening: The development of harmony over three centuries (1700–2000)’ (GA No. 182811). We thank Claude Latour for supporting this research through the Latour Chair in Digital Musicology at EPFL.
Action Editor
David Meredith, Aalborg University, Department of Architecture, Design and Media Technology.
Peer Review
Emilios Cambouropoulos, Aristotle University of Thessaloniki, School of Music Studies.
Christof Weiß, Friedrich-Alexander-Universitat Erlangen-Nurnberg, International Audio Laboratories Erlangen.
