Abstract
We present a new dataset of speeches given by Danish and Dutch politicians at party congresses between 1946 and 2017. The dataset is a unique collection of materials from different party archives and digital repositories. It offers a unique opportunity to analyse the issues discussed in these speeches, the positions taken and the rhetoric used by party elites over time and between countries. We describe the data and illustrate them with one application: a sentiment analysis that describes differences between parties and over time.
Introduction
Analyses of intra-party politics feature prominently in political science classics (Duverger, 1964; Michels, 1915). Even today, intra-party procedures are a fundamental aspect of democracy: they guide the selection of candidates for public office and they shape the policy platforms parties pursue. Many of these processes, however, take place behind closed doors and, therefore, it has been difficult to study them systematically. This is why, even today, researchers often speak of the black box of intra-party politics. Opening this black box a little, we present a new dataset of political speeches given by Danish and Dutch politicians at party congresses (1946–2017). 1 This publicly available dataset is a unique combination of speeches archived in various institutions throughout the Netherlands and Denmark, and speeches obtained from party websites (current and saved websites on the internet archive). This article presents the data, describes the data collection and provides an application using sentiment analysis. This way we also contribute to the ‘text as data’ revolution: the proliferation of text data and text methods for studying politics.
Political scientists use various sources of text data: election manifestos (Volkens et al., 2017); party press releases (Sagarzazu and Klüver, 2017; Van der Velden et al., 2017); parliamentary speeches (Proksch and Slapin, 2015); media reports about party positions (De Nooy and Kleinnijenhuis, 2013); tweets (Barberá, 2015); bills (Baumgartner et al., 2006); and executive speeches (Jennings et al., 2011). Typically, the data are used to address questions about the topics politicians talk about or the ideological positions they take on these topics. Others use text data to study the rhetorical aspects of political speech, such as the use of nouns (Cichocka et al., 2016), complexity (Bischof and Senninger, 2017) and the use of emotion (Rheault et al., 2016).
The dataset presented in this article expands the text sources political scientists can use. These congress speeches provide politicians with different constraints and incentives than other texts such as legislative speeches, election manifestos or press releases. Therefore, analyses of congress speeches have additional theoretical value. First of all, the speeches at party congresses address an audience of party members or delegates directly. Members and delegates may differ from the average voter in terms of the issues they care about and the positions they take on these issues. In addition, on many occasions, members and delegates need to consent to the parties’ policies and the leadership. Therefore, a speaker’s incentive in these speeches is more to echo intra-party preferences than those of the general electorate. Second, there are procedural constraints too. Legislative speech is constrained by the parliamentary agenda. Typically, election manifestos are written by an entire committee, not a single speaker. Congress speeches, in turn, are constrained by intra-party rules. Not everyone is allowed to speak, and speaking time is likely to be highly correlated to the position of the speaker in the party hierarchy. Third, at a party congress, typically, speakers from different factions of the party are allowed to speak. This allows researchers to juxtapose the position of the party leader with that of, for example, the position of the party chairperson, or the position of a regional party leader and, thereby, provide an insight into intra-party dynamics. In sum, party congress speeches provide an interesting additional data source. This will facilitate analyses of the different incentives politicians have when facing the party or the electorate and, thereby, contribute to understanding the dynamics of representation.
Party congress speeches and intra-party politics
Collecting party congress speeches is very time consuming and, currently, only a few (Austria (Kaltenegger and Müller (2017)), France, Germany, Italy and the UK) applications exist. Ceron uses motions presented at Italian party congresses by intra-party factions to identify their different ideological positions and the ideological heterogeneity of a party and concludes (Ceron, 2012) that factions bind the party leader but less so if he/she is directly elected by the members. In addition, intra-party ideological heterogeneity reduces party unity in parliamentary voting behaviour (Ceron, 2015) and lowers the party’s ability to stick to a coalition government agreement (Ceron, 2016). Greene and Haber collected speeches at party congresses of the German Christian Democrats (CDU, CSU), and at those of the Social Democrats (SPD), the Parti Socialiste (PS) and the centre right UMP in France.
The authors used automated text analysis to identify the positions of the different speakers at the meetings and the disagreement therein (Ceron and Greene, 2019; Greene and Haber, 2016, 2017). They conclude that strong economic growth produces disagreement at opposition party congresses but not at government party congresses. In addition, they find no consistent effect of electoral losses on disagreement (Greene and Haber, 2016). Finally, Ceron and Greene (2019) find that the speeches and motions from the majority faction in the French PS predict manifesto content better than texts from the broader congress.
Data collection
We collected speeches given at party congresses. A speech is a stand-alone, scheduled address to the party congress. Unlike speeches to parties in more factionalized contexts (e.g. Ceron, 2012; Greene and Haber, 2016), the majority of these speeches are given by the party leader, the party chairperson or other prominent party members. The speeches are not contributions to a plenary debate, although in some cases they did function as introductions to a plenary debate that followed. The speeches are scheduled and, typically, are one of the highlights of the meeting.
In Denmark and the Netherlands, party congresses typically take place on a yearly basis. Before national elections or under other special circumstances there are often extraordinary congresses. In the past, party congresses were more likely to take place on a bi-annual basis. There are similarities and differences in the function of party congresses and congress speeches over time and between parties (Katz and Mair, 1994). The most important similarity in the speeches is that the party leader or leaders give speeches to party members reporting on the party’s current and future activities. These speeches typically involve sections on policies and policy-making, on party strategy and coalition possibilities (Van der Velden, 2018) and also on the performance of the party. In addition, these speeches address specific issues such as the proceedings of the day and commemorations of recently deceased (famous) party members. As a whole, the speeches portray the image of the party. In these speeches, politicians may pursue different goals: strengthening the internal cohesion of the party; signalling policy priorities to either satisfy policy activists or alert voters; or communicating strategic intentions to other parties. The media have always been present at these congresses, and parts of the speeches were often broadcast on television and radio. These days, many congresses can be followed fully online. In this sense, party congresses are somewhat more open nowadays than they have been in the past. Formal decisions were often made during party congresses, decisions such as the appointment of candidates for an election or the approval of an election manifesto. For this reason, the audience present during such a congress is the party’s selectorate. In the past, this consisted exclusively of delegates from different branches. Nowadays, some parties have broadened their selectorate to include all members (Cross and Pilet, 2015).
Dutch dataset
Table 1 provides an overview of the speeches that were collected and the archives that were consulted. Speeches from before 2000 were collected from archives in paper format. We retrieved the speeches, scanned them and used optical character recognition software (ABBYY FineReader 10) to produce machine-readable text. 2 The quality of the primary documents differed markedly, and this is reflected in the quality of the scans. Some of the speeches were highly annotated by the speaker, some were written on typewriters that were most likely low on ink and two speeches by the CPN were typed on extremely thin blue paper. As a result, the machine-readable speeches had to be read and corrected. In some cases, words were illegible in the scans and, thus, could not be processed. 3
Overview of Dutch speeches in dataset.
Note: If two archives are indicated then, typically, both had speeches from the same period.
New speeches (after 2000) were primarily collected from party websites. Typically, parties do remove speeches from old party congresses. Therefore, we used the internet data archive to access old party websites (see Table 1), allowing us to retrieve .doc or .pdf versions of the speeches. We did not process these documents any further. As is usual for this period, we also had access to the agendas of the meetings, and this allowed us to check whether we had retrieved all speeches.
For the period before 2000 we relied on archival material. The availability and quality of this material varies between parties. For the PvdA and the ARP the archives up to the 1970s are of exceptional quality (see Table 1 for an overview of the archives consulted). 4 Speeches were bundled together with the meeting agenda in books. For these parties, we are certain that we have retrieved all speeches up to the 1970s. Afterwards, we usually retrieved speeches from the party leader and party chairperson per conference, but found no agendas for the meetings. This means we are not certain whether we retrieved all speeches. The ARP gradually merged into the CDA in the 1970s, and this may explain the absence of speeches during this period. For the KVP, we found speeches by key persons in the party for the relevant time period but, typically, we found no agenda. For the VVD we have few data points until the 1980s, which suggests we are missing some speeches. At the same time, their frequency of party conferences was much lower than that of other parties. After the 1980s we have many speeches, but again no agenda to confirm whether we have everything. Still, we have speeches from the main actors. The same is true for the CDA. For D66, GL and the SP we found many speeches at different time points, again, usually, including the speeches from the party leader and the party chairperson. For D66 and GL, typically, we did not find the agendas of the meetings, so we cannot be sure that we have all of the material. For the SP, we did have the agendas and we collected the speeches that were on these agendas. For other parties, we found very little to nothing. For the three small left-wing parties PSP, PPR and CPN, we only found speeches from the latter and only six in total at that, with a large time gap between them. We also had no success with now-defunct parties such as the Boerenpartij (Farmers’ Party) (no archive) or the Centrum Democraten (Centre Democrats) (no speeches in archive). The PVV has no official party organization. However, it did organize several public meetings for people interested in becoming active in the party and we used speeches from these meetings.
We retrieved all speeches that we could find except for speeches from congresses with some very specific purpose. For example, the Labour Party organized a few thematic congresses on subjects such as agriculture or women’s rights. We found only a few examples of these. We did include speeches from party councils (partijraad), because for some parties the party councils were a secondary, or even primary, institution for decision-making among members. These councils were considerably smaller than the main party congress, and only included members elected to the council.
Danish dataset
For the Danish data, we followed approximately the same procedures as in the Netherlands (see table 2 for an overview). In addition, in this case, older speeches stem from archives and newer speeches from party websites, with the latter sometimes accessed through the internet archive. We found most speeches for the two dominant parties, the Social Democrats (A) and the centre right Venstre (V). Unfortunately, we were unable to collect data for two important Danish parties: the Conservative Party refused to cooperate and the Radikale Venstre (a centrist social-liberal party) never responded to any of the requests. We did obtain speeches from two smaller left-wing parties, Unity List (Ø) and the Socialist People’s Party (F). The latter had a period during the 1980s when there were no major speeches at its congresses. Instead, it organized debates in which all kinds of members took part. We did not collect these data, but they are available in the archive. We also retrieved the speeches of the Danish People’s Party (O) online. We had no success in finding speeches from the Progress Party, the Centre Democrats or the Christian Democrats. Their records were either not kept or not found by us.
Overview of Danish speeches in dataset.
In the Danish case, we are certain that we have retrieved a full sample of speeches, as meeting agendas could be cross-checked with the available material in the archives.
Applications of congress speech data
Most applications of congress speech data use scaling techniques to estimate the position of speeches, to identify the ideology of a speaker or to establish the ideological heterogeneity in a party (Ceron, 2015; Greene and Haber, 2016, 2017). However, there are more applications possible with text data and utilizing these approaches gives us various insights into intra-party decision-making. We will demonstrate one of these approaches. Party leaders’ speeches can strengthen the internal cohesion of a party by motivating delegates, members and partisans to maintain their support for the leader and redouble their efforts for the party. To do this, party leaders need to use emotionally engaging language. Another insight into intra-party politics possibly yielded by sentiment analysis is that speeches with strong negative emotions by members of the party could give us indications that, internally, the party is calling for a change. We demonstrate below how to apply sentiment analysis and explore between-party and over-time differences in the use of sentiment in party leader speeches (for examples of sentiment analysis applied to election manifestos or legislative speeches see Crabtree et al., 2018; Kosmidis et al., 2018 and Rheault et al., 2016).
Application: Sentiment in speeches
In our first application, we explore the use of sentiment in speech. Do politicians use increasingly sentimental speech? Is the right more or less negative than the left? To evaluate this, we calculated the polarity and level of arousal for each speech. To do this we use the NRC Emotion Lexicon (Mohammad and Turney, 2013). This dictionary was compiled by crowdsourced evaluations and distinguishes the positive and negative sentiment of English words. Subsequently, these words have been translated into different languages and are, therefore, applicable to the Danish and Dutch speeches we have. The fact that the dictionary is available in both languages is the major benefit of the dataset because sentiment dictionaries usually only cover a handful of languages. Clearly, using translations and focusing only on words rather than sense has some limitations, which are discussed at length elsewhere (e.g. Grimmer and Stewart, 2013).
For each speech, we calculated the percentage of positive and negative words in the entire text. Arousal was calculated by summing the percentages of positive words and negative words. To calculate polarity, we subtracted the percentage of positive words from the percentage of negative words, and then we weighted it with arousal. Figure 1 presents this data. Panel A shows the 95% confidence region of the mean of polarity over time, aggregated by five-year intervals. On average, polarity hovers around 0.3 (with a 95% confidence interval between 0.02 and 0.59). This means that, on average, politicians use more positive sentiment compared to negative sentiment. The trend lines are fairly stable, with some notable peaks. For example, in both Denmark and the Netherlands positive sentiment increased substantively in the 1990s, and dips again in the 2000s. Panel B shows the 95% confidence region for arousal. Further, in both countries, arousal increases over time; in particular, arousal in the period 2000–2010 is higher than in the period 1960–1980.

Arousal and polarity of speeches over time and between Dutch parties.
Panel C presents the mean polarity per party and the 95% confidence intervals. In Denmark, the two largest parties, the Liberal Party (V) and the Social Democratic Party (A) are the most positive, whereas the radical left Unity List (Ø) is the least positive. The other two parties, the Danish People’s Party (O) and the Socialist People’s Party (F) are in-between these two extremes. With regard to the Dutch parties, there are few (significant) differences between them. The CDA is more positive than the other parties, and the radical left SP is the least positive. However, these differences are much smaller than in Denmark. Panel D displays the mean arousal per party. In Denmark, surprisingly, the Danish People’s Party (O) speeches contain much less arousal than the speeches from the other parties. Similarly, in the Netherlands, the PVV speeches contain much less arousal than the other parties. Admittedly, the number of observations for the PVV is very low (n = 9). Furthermore, the two centrist parties (CDA and D66) also have speeches with, on average, less arousal than the other parties.
A set of OLS regressions confirms the general picture described in Figure 1 (see Appendix A). For each country, we predict polarity and arousal separately using left–right ideology (absolute and relative), year, seat share, a dummy for government parties, a dummy for party leader speeches and a dummy for male vs female speakers. Large parties, and parties in government, use more positive emotions than negative emotions. The latter finding is in line with other work using sentiment analysis on legislative speeches and election manifestos (Crabtree et al., 2018; Kosmidis et al., 2018; Rheault et al., 2016). Party leaders use more negativity and more arousal in their speeches compared to speeches by party chairpersons, MPs and ministers. According to our regressions, these variables describe the differences between parties better than ideology does. Left–right ideology (relative and absolute) has inconsistent effects across the models. Surprisingly, unlike Crabtree et al. (2018), we do not find that moderate parties use more positive emotions in their speeches. Finally, unlike Rheault et al. (2016) we do not find a strong increase in the use of positive sentiment over time. In sum, our results suggest similarities and differences compared to analyses of sentiment analyses of political texts from other sources. This further underlines the importance of taking congress speeches into account.
In sum, this brief example provides inconsistent evidence for claims that politicians use more emotion now than in the past and that right-wing or populist politicians use more negative sentiment. At the same time, this example provides additional avenues for research, for example, are congress speeches more emotional than other speeches and what are the effects of (positive and negative) emotional speech?
Conclusion
Our new dataset offers a unique opportunity to study intra-party elite behaviour in the Netherlands and Denmark (1945–2017). We illustrate this with an application in the form of a sentiment analysis. However, there are many more applications that could be used, for example, qualitative analyses, ideological scaling and hand coding, or automated coding of frames and topics. We provide an additional application in Appendix B. Here, we identified different topics in speeches using a technique called topic modelling, and we plotted the prevalence of these topics over time. Such an analysis could be used to predict the content of election manifestos for national elections, and also to identify who is most influential in setting the topics for the election manifesto. This enquiry could identify more closely the role of intra-party dynamics in party competition. In addition, our dataset has historical value. For example, the first party congress in the data is the founding congress of the Dutch Labour Party (1946), a defining moment in that party’s history. By making our dataset digitally accessible we contribute to the preservation of our political culture heritage.
Intra-party politics is a vital and often underestimated aspect of democracy. Intra-party processes such as candidate selection and leadership election procedures determine who ordinary citizens can elect. Typically, party congresses decide on the policy platform that the party will pursue in office. Many of these processes, however, take place behind closed doors and, therefore, it has been difficult to study them. This is why researchers often speak of the black box of intra-party politics. Our data will contribute to understanding these processes, together with several other initiatives that all aim to break open this illustrious object. First, there are several other initiatives currently underway to collect party congress speeches that have been organized the Party Congress Manifesto Research Group. Second, other teams have recently collected datasets that pertain to other aspects of intra-party politics: a coding of the institutional configurations of parties across time and countries (Scarrow et al., 2017) based on the original contribution of Katz and Mair (1994); a collection of expert evaluations of the balance of power within parties (Schumacher and Giger, 2017); a dataset on leadership elections (Cross and Pilet, 2015); and membership surveys (Van Haute and Gauja, 2015). With the combined power of these datasets, the black box of intra-party politics could possibly be cracked open.
Supplemental Material
Appendix – Supplemental material for A new dataset of Dutch and Danish party congress speeches
Supplemental material, Appendix for A new dataset of Dutch and Danish party congress speeches by Gijs Schumacher, Daniel Hansen, Mariken A.C.G. van der Velden and Sander Kunst in Research & Politics
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The data collection was partially financed by a grant from the Danish Council for Independent Research (the Sapere Aude Young Elite Researcher project).
Supplementary materials
Notes
Carnegie Corporation of New York Grant
This publication was made possible (in part) by a grant from the Carnegie Corporation of New York. The statements made and views expressed are solely the responsibility of the author.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
