Abstract
We present the PARTYPRESS Database, which compiles more than 250,000 published press releases from 68 parties in 9 European countries. The database covers the press releases of the most relevant political parties in these countries from 2010 onward. It provides a supervised machine learning classification of press releases into 21 unique issue categories according to a general codebook. The PARTYPRESS Database can be used to study parties’ issue agendas comparatively and over time. We extend a recent analysis in Gessler and Hunger (2022) to illustrate the usefulness of the database in studying dynamic party competition, communication, and behavior.
Introduction
Scholars of party competition and political representation have struggled to measure issue priorities of political parties for decades. Evaluating which policy issues political parties talk about is at the core of studying party competition and the way that political parties respond to their voters (Ansolabehere and Iyengar, 1994; Budge and Farlie, 1983; De Sio and Weber, 2014; Petrocik, 1996). Previous empirical research has long focused almost exclusively on studying parties at the time of elections based on election manifestos (e.g., Adams et al., 2006; Ezrow, 2010; Klüver and Spoon, 2016; Meyer, 2013). We therefore know relatively little about how party competition and responsiveness work outside election campaigns.
Recent literature has developed different approaches to measure parties’ issue agendas more dynamically. Scholars have used speeches (see, e.g., Quinn et al., 2010) and other parliamentary activity (Green-Pedersen & Mortensen, 2010), newspaper articles (see, e.g., Helbling and Tresch, 2011), social media data (see, e.g. Barberá et al., 2019), and press releases (see, e.g., Gessler and Hunger, 2022; Sagarzazu & Kluver, 2017) to evaluate the issue agendas of parties and political candidates over the course of the electoral cycle. 1 The data sources are in written or spoken words and require text-as-data methodology to extract parties’ issue agendas. Researchers have relied on different approaches, like hand coding (Bulut, 2017; Helbling and Tresch, 2011), dictionary approaches (Laver and Garry, 2000), unsupervised topic models (Grimmer, 2010; Quinn et al., 2010), and supervised learning models (Hemphill and Schöpke-Gonzalez, 2020) to extract parties’ issue priorities from these data sources. However, compiling and analyzing these more dynamic data sources constitutes a huge data collection and analysis effort so that existing research is limited to case studies of specific parties or countries.
A particularly valuable data source that allows for analyzing parties’ issue agendas dynamically are press releases. Press releases constitute an ideal data source for the study of political parties as political parties can independently choose what to communicate to their voters on a daily basis (Grimmer, 2010; 2013; Hopmann et al., 2012). Political parties publish several press releases per day so that we can measure their issue attention and their issue positions on a daily basis which constitutes a major advantage over election manifestos or expert surveys which are only available every couple of years. Press releases are furthermore unconstrained as parties can freely choose what to communicate to the public. Unlike speeches or questions in parliament, press releases are not bound by the parliamentary agenda and parties can independently choose what issues they want to talk about. Press releases are furthermore an ideal instrument to present themselves to their constituents. Newspapers regularly pick up press releases issued by political parties and communicate their content to citizens (Haselmayer et al., 2017; Meyer et al., 2020; Schaffner, 2006). Despite their advantages, press releases have so far only been used by a handful of scholars who focused on case studies of specific countries, given the massive investments that are required for collecting and analyzing press releases (Brandenburg, 2006; Gessler and Hunger, 2022; Haselmayer et al., 2017; Hopmann et al., 2012; Meyer et al., 2020; Sagarzazu & Klüver, 2017; Schumacher, 2015).
To address this shortcoming and to allow scholars to study party communication dynamically over time and across countries, we have compiled the new PARTYPRESS Database that covers more than 250,000 press releases published by 68 parties in 9 European countries (Austria, Denmark, Germany, Ireland, the Netherlands, Poland, Spain, Sweden, and the UK) from 2010 onward. The database provides a hand-labeled subset of press releases based on a designated codebook that builds on the Comparative Agendas Project (Bonafont et al., 2020). In this article, we describe the database and evaluate different supervised machine learning approaches to obtain evolving issue agendas of the different parties. 2 We show that supervised learning approaches allow for generating convincing estimates of policy topics of the press releases so that the PARTYPRESS Database can be used to study issue agendas comparatively and over time.
The PARTYPRESS Database can find many applications in the study of party competition, communication, and political behavior. The data can be for instance used to test whether parties react to issue priorities of the general electorate or follow long-standing issue perceptions. Many studies in this area either rely on single-country case studies (Klüver and Sagarzazu, 2016) or have to use party manifestos that are only published every few years in the context of an election campaign (De Sio and Weber, 2014). The new database with its fine-grained measurement of issue attention across countries will allow for the study of parties’ reactions to events and subtle changes in public mood. In this regard, it can also prove helpful to evaluate who leads and who follows, when it comes to setting issue priorities (Barberá et al., 2019; Gilardi et al., 2022). Beyond that, the European context provides additional institutional variation in studying parties’ evolving issue agendas. For example, how parties that govern together in a coalition differentiate each other over the legislative term is of primary interest to proportional election systems (Sagarzazu and Klüver, 2017).
The PARTYPRESS Database
Retrieving press releases
To facilitate the usage of press releases in the study of parties’ issue agendas, we have compiled the new PARTYPRESS Database that covers published press releases in 9 European countries from 2010 onward comprising 269,418 texts. The main dataset provides information on the level of individual press releases.
We retrieve press releases published by the parties, mainly from their websites. 3 We use web scraping techniques that are tailored to the different websites. 4 If a party does not maintain a publicly accessible press release archive, we contacted party offices and political archives directly and requested the full list of press releases. In some instances, we asked colleagues in the respective countries for assistance in retrieving the data. We provide a full list of the sources for the different parties in Table C1.
The number of press releases in our database varies between parties and countries, from below 200 to above 1,500 per year and party, as some parties tend to issue more press releases than others. We illustrate and discuss the coverage of our database in more detail in SM D.
Hand coding of press releases
To extract parties’ issue agendas from the documents, we start with a hand coding of a random subset of around 2,400 to 3,500 press releases in each country (Austria 3,450; Denmark 2,749; Germany 2,612; Ireland 3,475; Netherlands 3,299; Poland 2,746; Spain 3,421; Sweden 2,741, UK 2,750). We recruited native speakers with country-specific expertise to perform the coding based on a designated codebook, applying the coding scheme from the Comparative Agendas Project (CAP) (Baumgartner et al., 2019; Bonafont et al., 2020) to press releases. Note that our data can be merged directly with other CAP coded data by joining two issue categories. Additionally, we have two categories for Non-thematic press releases and press releases covering topics not specifically captured by any of the other categories (category “Other”).
The coders have received thorough coder training and were closely supervised to ensure the quality of the codings. The inter-coder reliability for the different categories and the different parties is acceptable, but also shows that it is not an easy task to decide on one area focus of a press release (see Table E12). Over all countries, the agreement between coders measured as Krippendorff’s α is 0.707. This inter-coder reliability is comparable to other applications of the CAP codebook. For example, for the coding of parliamentary questions, Sevenans and Vliegenthart (2016) find a Krippendorff’s α of 0.60. The value varies slightly between countries, with Austria having the highest and Poland the lowest. Additionally, the agreement between coders depends on the specific issues, with Defense having the highest and Other the lowest (see Table E13). The hand coding procedure, the issue categories, and the inter-coder reliability are discussed in more detail in SM E.
Supervised learning classification of press releases
Based on the hand coded data, we employ different supervised approaches to arrive at a fine-grained measure of issue attention over time for all press releases in our sample. We optimize both the accuracy for individual press releases and the accuracy for the aggregated measure. Depending on the specific research question, either measure may be given more importance. We use five individual supervised machine learning classifiers, an ensemble of these (Barberá et al., 2021), and Transformer models to predict the issue categories of unlabeled press releases. To approximate the out-of-sample deviations, we perform a five-fold cross-validation, in each fold using 80% of the data for training while reserving 20% for testing. For the evaluation, models are never tested on data that they were trained on. We discuss our approaches in more detail in SM F. The results from the classification tasks allow us to aggregate the individual classified press releases and obtain a measure of parties’ issue focus, during a specific period, by taking the share of press releases in a specific category. For the further evaluation and application, we use the Multilingual Transformer model due to its superior performance. We discuss the evaluation and model choice in more detail in SM G.
Accessing the database
The database is publicly available from the Research & Politics Dataverse, 5 and the Transformer classifiers are available for use from the platform Hugging Face. 6 Researchers can directly access data on each individual press release including the classification from the four main classifiers (Main) as well as the raw text (Texts). 7 There is also a dataset with weekly (Weekly) and monthly (Monthly) aggregates of the issue proportion, which researchers can use as a panel dataset. In SM H, we introduce the available datasets and variables in more detail.
The database can be easily combined with other sources. The ParlGov (Döring and Manow, 2012) party identifiers allow for merging the database with other datasets, such as the MARPOR project (Volkens et al., 2017) for manifesto coding and the Comparative Political Data Set (Armingeon et al., 2020).
Can radical right parties shape party competition on immigration?
The PARTYPRESS Database is a suitable resource for analyzing the evolving issue agendas and dynamic party competition. While past research on party competition has mostly examined long-term changes in positions and priorities, sudden and reactive agenda changes can also occur due to unforeseen events or in response to other parties’ agendas. Recent research by Gessler and Hunger (2022) argues that external events play a crucial role in the short-term politicization of issues.
The “refugee crisis” was such a highly politicized event that had a significant impact on the political agenda throughout Europe. During the refugee crisis, radical right parties (RRPs) attempted to gain support by increasing the salience of immigration issues in their communication efforts. Given the extraordinary event, Gessler and Hunger argue that mainstream parties had little choice but to increase their salience of immigration in response to RRPs. Gessler and Hunger test this argument based on a dynamic analysis of a corpus of press releases between 2013 and 2017 from Austria, Germany, and Switzerland. To measure parties’ immigration agendas, they employ a dictionary approach and find that “increasing levels of salience by radical right parties are associated with an immediate rise in attention to immigration by mainstream parties” (Gessler and Hunger, 2022, p. 525). The evidence based on the three countries is compelling, but raises the question if the findings are externally valid and hold in other countries as well. The PARTYPRESS Database permits us to extend the analysis to five additional countries. Whereas the original study was based on 646 observations about monthly immigration salience from 14 parties, our replication consists of 2,362 observations from 46 parties. 8 For Germany and Austria, we can directly compare our measure of immigration salience to Gessler and Hunger’s measure. The correlation between the two measures is 0.7263 based on 504 party-month observations that match between the two analyses.
We first present evolving immigration salience of parties in Germany from 2010 to 2020 in Figure 1. A central event during the refugee crisis was Chancellor Merkel’s press conference on 31 August 2015 where she first used the statement “We will make it!” (German: “Wir schaffen das!”). In the timeline, we see that all parties increase their issue attention to immigration in 2015, with the largest increase for the AfD, and small increases for the Greens and the SPD. After 2015, the mainstream parties’ focus on immigration decreases again and stabilized around the same levels as before. We describe similar patterns for the other countries with an increase in issue attention to immigration in 2015 (see SM I). Results from multilingual transformer, 9–Immigration, Germany. The vertical line marks Chancellor Merkel’s press conference during the 2015/2016 refugee crisis where she first used the statement “We will make it!”. The values are based on a Multilingual Transformer model.
Regression results for mainstream parties’ salience of immigration.
Robust standard errors in parentheses. ***p < 0.01, **p < 0.05, *p < 0.1.
The replication illustrates that the PARTYPRESS Database has a set of important advantages for applied researchers. First, it is easy to use for the comparative analysis of dynamic party competition. Second, researchers do not need to scrape and archive press releases and design their own dictionaries. Third, it also covers a broad set of issue categories.
In this regard, additional hypotheses can be formulated and directly tested. In SM J, for example, we evaluate the general argument of Gessler and Hunger (2022) for the environmental issue, where the Green party arguably takes up the role of the issue owners attempting to set the agenda, during external events like the Fukushima nuclear disaster. The results show that the Green parties do not have the same issue setting power for their environmental issue, like RRPs for immigration. Other parties only react to the German and Irish Greens increase in environment salience, but not in other countries.
Discussion
We present the newly compiled PARTYPRESS Database which compiles more than 250,000 published press releases from 68 parties in 9 European countries from 2010 onward. Press releases are particularly suitable when researchers are interested in dynamic, centralized, and unconstrained communication efforts of the party leadership. Our database contains press releases from major parties in 9 European countries from 2010 onward. We evaluate and illustrate a supervised count-and-classify approach for parties’ issue agendas.
Even though our database is unique in covering a large share of press releases from different countries, the data collection effort comes with certain limitations. In this section, we briefly discuss the potential to extend our database in different directions.
Additional countries and actuality
We collected data from 2010 onward in 9 countries. Newer press releases and additional countries might be of interest to researchers. In general, the database can be updated easily, and we encourage other researchers to contribute to this endeavor. The trained classifiers are likely to work also for new, added press releases from countries in our sample. To classify cases from new countries, it requires a sample of hand coded press releases. Using multilingual classifiers might make this even relatively cheaper: As sample size simulations in SM K shows, even with a sample of 250 new press releases, we reach around 70% accuracy using multilingual transformers.
Positions
The primary focus of our project is on issue focus of political parties. Our codebook also asks coders to classify the position of a press release within an issue area. For instance, for the issue of Immigration, we ask if the press release is pro- or anti-immigration. However, when working with the positions, we noticed a clear selection mechanism, where mostly certain positions are taken up in an issue area. This makes it challenging to classify issue-specific positions, as certain positions are not communicated. However, we think it is possible to construct dynamic positions over different issues using appropriate modelling. Researchers, who have access to our data, can use the database to construct a model for the dynamic measure of party positioning.
Coverage
We try to retrieve all press releases of political parties in the 9 countries included in our study. For some parties, we have a comparatively small number of press releases, and it appears unlikely that we archived all press releases. However, even a sample of press releases is informative about the issue emphasis of these parties. But it is difficult to say how the population and our sample differ. There are some ways the sample could be systematically biased. (a) Parties may systematically take down press releases from specific issue areas. (b) Certain periods do not cover specific issue areas. Researchers should have these potential biases in mind when using our data.
Supplemental Material
Supplemental Material - The PARTYPRESS Database: A new comparative database of parties’ press releases
Supplemental Material for The PARTYPRESS Database: A new comparative database of parties’ press releases by Cornelius Erfort, Lukas F Stoetzer and Heike Klüver in Research and Politics
Footnotes
Acknowledgements
We are grateful for valuable comments and suggestions by Eri Bertsou and excellent research assistance by Paul Bochtler, Daniel Cruz, Leonie Fuchs, Violeta Haas, Felix Heimburger, Johannes Lattmann, and Tim Wappenhans. Research for this contribution is part of the Cluster of Excellence “Contestations of the Liberal Script” (EXC 2055, Project-ID: 390715649), funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy. Cornelius Erfort is moreover grateful for generous funding provided by the DFG through the Research Training Group DYNAMICS (GRK 2458/1).
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Deutsche Forschungsgemeinschaft (EXC 2055, Project-ID: 390715649).
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
