Abstract
Every year since 1946, the General Debate has taken place at the beginning of the United Nations (UN) General Assembly session. Representatives from all UN member states deliver an address, discussing the issues that they consider most important in global politics, revealing their governments’ positions, and seeking to persuade other states of their perspectives. The annual UN General Debate statements provide invaluable information for scholars of international relations – comparable globally and over time. However, these texts are stored as poor quality images without relevant metadata, preventing researchers from applying data science methods. This paper introduces the complete UN General Debate Corpus (UNGDC). Building on a previous incomplete release of UNGDC, we have extended the corpus to cover the entire 1946–present period, included additional data on all speakers and provided advanced search and data visualization tools on a new website. The complete corpus contains over 10,000 speeches from 202 countries, including historical countries – making it the most comprehensive, unique and accessible collection of global political speeches. We discuss the complete UNGDC, provide relevant information for data users and present illustrative examples of how the corpus can be employed to address key questions in world politics.
Keywords
Introduction
Since 1946, representatives from all United Nations (UN) member states – currently 193 states – have gathered annually for the UN General Debate (UNGD). 1 The General Debate enables all member states – regardless of size, power, wealth, location, or regime type – to deliver an address to the UN General Assembly (UNGA) presenting the state’s perspective on different issues in world politics. Governments discuss the issues they consider to be most important – including peace and conflict, economic development, climate change, or the reform of the UN system. Furthermore, they explain their position on key themes and events, and seek to convince other states of the merits of these positions by framing issues in a particular light (Koschut et al., 2017: 483). Consequently, the General Debate has been referred to as a ‘barometer of international opinion’ (Smith, 2006: 153).
The UNGD, therefore, provides an invaluable data source for scholars of international relations (IR). The growing use of data science and natural language processing (NLP) tools and methods in social sciences has meant that researchers have been able to apply such approaches to textual data. While scholars have emphasized the potential for such approaches to provide crucial insights in IR, they have also noted more limited uptake in IR research compared to other areas of political science (Mitrani et al., 2022). A key obstacle has been the availability of relevant textual data, with the potential application of such methods to UNGD speeches, in particular, severely hampered by the poor quality of the documents and other challenges linked to their storage in the UN library system.
This paper introduces the complete UN General Debate Corpus (UNGDC). Building on a previous, incomplete release of the UNGDC (1970–2017) (Baturo et al., 2017), we have extended the UNGDC back to the first UNGA session in 1946, and up to 2023. We have prepared the corpus in ways that enable the application of text analysis, at scale. In addition to collecting, cleaning and formatting texts from 1946 to 1969, we have thoroughly audited the entire corpus, including relevant metadata, and adding recent sessions up to 2023. Hence, the UNGDC covers all speeches from 1946 to 2023, and will be updated annually. The extended UNGDC is publicly available at the Harvard Dataverse. It contains over 10,000 speeches from 202 countries, current and historical, making it the most comprehensive collection of global political discourse. It also includes information about the identity and institutional posts of all speakers for 1946–2023.
The UNGDC can be widely used by scholars of IR to address questions related to peace and conflict, development, human rights, climate change, issues of race and more. Indeed, the extended time period of the UNGDC means that it now includes a crucial period in international politics (1946–1969), as we discuss in more detail below. Significantly, because the UNGDC consists of textual data, it is useful to scholars from different theoretical approaches – from rational choice scholars to constructivists and critical theorists. Indeed, while the UNGDC can be employed to examine state preferences or the dynamics of global norms (Baturo et al., 2017; Chelotti et al., 2022; Kentikelenis and Voeten, 2021; Simmons and Shaffer, 2023), it can also be used to study the widely recognized discursive and performative aspects of diplomatic speech (see Risse, 2000), and the role of emotions in communicative strategies of international actors (see Bleiker and Hutchison, 2014; Holmes, 2015). Therefore, the UNGDC provides information about what representatives talk about, and how they express themselves and justify their positions. In the next section, we provide additional information about the UNGD, and discuss applications of the UNGDC in the literature. We then explain the process of expanding the UNGDC, the additional data and tools included and the practical issues researchers should consider in using the corpus. Finally, we showcase three illustrative examples of how the UNGDC can be applied in terms of visualizations, description and inference. While necessarily brief, these examples demonstrate the richness and breadth of the UNGDC.
UNGD speeches and world politics
The UNGD typically occurs every September, shortly after the start of the annual regular session of the UNGA. 2 It is the most high-profile part of the annual assembly session, receiving significant media attention (Keens-Soper, 1985: 80). The UNGD consists of UN member state representatives delivering an address discussing key issues from their government’s perspective. In addition to country delegates, high-level UN officials, such as the assembly-elected presidents or the UN Secretary-General (since 1997), and selected non-state representatives (e.g. European Union (EU) officials), may also address the UNGA. The address is typically delivered by the country’s head of state or government, its foreign minister, or the head of its delegation to the UN (see Baturo et al., 2017; Gray and Baturo, 2021; Smith, 2006). While the UN Secretary-General may indicate a theme for a session, states are free to discuss any issue they consider to be important. 3 Countries use their statements for various purposes, such as: putting on public record their position on key events and issues; offering justifications for these positions; persuading other states to undertake particular actions; praising or criticizing other countries or UN institutions; highlighting areas where consensus exists; and discussing domestic issues (Bailey, 1960; Baturo et al., 2017; Nicholas, 1959; Smith, 2006).
Several characteristics of the UNGD make it a fundamental data source for analysing global politics. First, while other data may contain more detailed information on the preferences of individual countries on specific issues (e.g. Patz and Thorvaldsdottir, 2021), the UNGD statements provide comparable textual data on the foreign policy preferences and priorities of all states over time – including smaller and less powerful nations. Second, states go beyond presenting their positions, to providing explanations and justifications for these stances, often seeking to persuade others to follow. In this regard, speeches differ substantially from UNGA roll-call votes (Simmons and Shaffer, 2023), and can shed light on why countries vote in particular ways (Peterson, 2008). Third, countries’ UNGD statements offer a view of governments’ foreign policy preferences that has not been filtered by the media or other sources (Binder and Heupel, 2021; Kentikelenis and Voeten, 2021; Simmons and Shaffer, 2023). Fourth, by gauging the extent of a country’s attention to an issue (e.g. ‘share of text’), we can infer the degree of importance assigned to that issue. Fifth, these speeches enable the study of the practice of diplomacy, including how states articulate their positions, the degree of lexical complexity or simplicity, and sentiment (Gray and Baturo, 2021). Finally, a crucial feature of the UNGD is that speeches are not institutionally connected to UN decision-making (Baturo et al., 2017; Chelotti et al., 2022). Hence, governments face fewer external constraints and pressure, and are able to more freely express their perspective on different issues, including more contentious views (Baturo et al., 2017: 2).
A potential criticism of the UNGD statements is that the lack of external constraints may mean these speeches are ‘cheap talk,’ unrepresentative of the actual perspectives of states on different issues. Another concern is that even with fewer external constraints, states may use their UNGD statements for strategic signalling rather than presenting their underlying preferences. Several studies address these concerns directly (e.g. Baturo et al., 2017; Chelotti et al., 2022; Kentikelenis and Voeten, 2021; Simmons and Shaffer, 2023; Smith, 2006). Chelotti et al. (2022) conduct interviews with representatives from EU member states’ national delegations to the UN to understand how these countries’ produce, and more broadly view, their UNGD statements. Their analysis finds that these representatives view the UNGD speeches as an important representation of states’ foreign policy preferences and priorities (Chelotti et al., 2022: 5). Kentikelenis and Voeten (2021: 746) find that state representatives follow up on preferences about debt relief expressed in UNGD statements, in annual International Monetary Fund and World Bank meetings, which they argue, demonstrates that UNGD statements ‘convey meaningful information about underlying state preferences.’ Furthermore, the substantial coverage of countries’ UNGD statements in the international media underlines that such speeches are recognized as perhaps the most important signal of a country’s foreign policy preferences for a given year (Simmons and Shaffer, 2023). This is not to suggest that states engage in no posturing and strategic signalling in their UNGD addresses. It has long been noted that in the UNGD, ‘member states present themselves exclusively in the guise in which they wish to be known’ (Nicholas, 1959: 98). Hence, these statements should be viewed as public representations of states’ foreign policy preferences and priorities. 4
The characteristics of the UNGDC have meant that it increasingly has been used by scholars to study key issues in IR. For example, Chelotti et al. (2022) examine whether EU membership has a socialization effect on member states’ preferences. Others use the corpus to study whether China influences other states’ foreign policies (Carmody et al., 2020; Turcsanyi et al., 2022). The UNGD has also been used to understand challenges to the liberal international order and global institutions (Binder and Heupel, 2021; Debre and Dijkstra, 2023; Kentikelenis and Voeten, 2021), and for studying international affinity networks and interstate conflict (Pomeroy et al., 2019). Scholars also rely on the corpus to assess the influence of domestic politics, such as party ideology, on state preferences (Finke, 2023), and for studying regime legitimization strategies in dictatorships (Baturo and Tolstrup, 2024).
The rich textual data of the UNGDC has also been used to examine specific topics in global politics. This includes the examination of states’ climate change discourse (Arias, 2023), government engagement with the health dimensions of climate change (Dasandi et al., 2021; Romanello et al., 2022; Watts et al., 2018), border discourse and anxiety (Simmons and Shaffer, 2023) and moral concerns in relation to threat and harm in international politics (Rathbun and Pomeroy, 2022). The complete UNGDC, which includes the entire post-Second World War period, enables scholars to examine longer term trends in global politics. For example, Dasandi et al. (2023) use the 1946–2023 UNGDC to examine human rights rhetoric by governments around the world going back to the first UNGA session in 1946. Future research will be able to examine the emergence and decline of different issues in world politics, the spread of global norms and formation of international laws and treaties.
The new corpus covers a particularly important period in IR, from 1946 to 1969, characterized by the creation of global governance structures, the formation of the Cold War era blocks (Alker and Russett, 1965), and ideational and normative changes underpinning the world order (Finnemore and Sikkink, 1998). This is particularly important given renewed attention to fundamental international norms and issues of race and inequality (Buzas, 2021; Paris, 2020). It is in the first two decades of the UNGD when UN members clashed over the competing concepts of sovereignty, self-determination, racial equality, colonialism and human rights (Dudziak, 2011; Keohane, 1967). This period saw significant ideational shifts resulting in the transformation of the global system. For example, some have suggested that rather than changes in political or economic structures, it was the ideational change among the colonizers and their abandonment of previous norms regarding statehood such as the ‘acquisition of (Western-style) “civilization”’ that explains the acceptance of decolonization that began in the mid-1950s (Jackson, 1993: 114). In particular, exchanges between European and Global South states in the UNGD during this period document the emerging normative framework on colonialism and self-determination, which was only later enshrined in UNGA resolutions (Keohane, 1967). The expanded UNGDC, thus, enables scholars to systematically study the origins and emergence of international norms, the spread and contestation of these norms and global acceptance and shifts in such norms over time.
Extending the UNGDC
Turning to the corpus itself, we briefly explain the process of building the UNGDC, describe the changes made to improve its usability and discuss several issues to be considered in applications. Speeches made in the General Debate are subsequently deposited at the UN Dag Hammarskjöld Library. The documents are stored as verbatim records of the plenary meetings in six official UN languages. As such, individual country statements need to be extracted from the plenary records with unrelated information removed. This pre-processing and cleaning stage is complicated by the poor quality of the files, varying formats across the documents and inconsistent speaker and country designations. Furthermore, all transcripts prior to 1992 are stored as image copies of typewritten documents. These normally required additional pre-processing using optical character recognition software, followed by manual text correction and re-entry of significant amounts of text. This was particularly acute for the texts covering the first three decades of the UNGD. All of this makes automating the process of individual country statement extraction extremely challenging. We therefore heavily relied on manual processing, particularly for the historic corpus.
Building on an earlier, incomplete version of the UNGDC, from 1970 to 2017 (see Baturo et al., 2017), we have extended the corpus back to the first UNGA session in 1946, and up to the 78th session in 2023. We will continue to update the UNGDC each year. To ensure a consistent approach to spelling, cleaning and related considerations, we have re-checked every text file in the entire 1946–2023 corpus. We have also ensured the use of consistent country code categorization across years. Importantly, the UNGDC now includes information about the identity and institutional posts of all speakers, for 1946–2023. To do this, we sourced new information from governmental webpages where speaker information was not provided in the UN system. Furthermore, on the website accompanying the complete UNGCD (https://www.ungdc.bham.ac.uk/), we offer advanced search and data visualization tools enabling scholars not trained in NLP methods to engage with the corpus. 5
There are several important considerations for scholars in using the updated UNGDC, which we discuss in turn. First, there has been a substantial increase in UN membership between 1946 and 2023, which has implications for the number of UNGD statements delivered each year (see Figure 1). Therefore, scholars should account for changing UN membership in their analyses, particularly if, for example, they focus on the number of countries that invoke a particular concept each session. A positive trend in engagement with a particular concept may be driven by increasing membership rather than topic salience. Second, the average length of speeches has decreased over time (see Figure 2). This may affect results if scholars rely on total frequency of terms in their analysis or pre-processing (see Denny and Spirling, 2018). In the late 1940s and 1950s, there were no formal limits on the length of speeches. As the UN membership has grown, delegates agreed to limit to 30 minutes in 1963, and to 15 minutes in 2003 (Peterson, 2005: 80–81). It is worth noting that this is a voluntary time limit, and many speakers exceed it. However, users of the corpus may need to make adjustments by examining term frequency and the term as a proportion of a speech (or session).

United Nations (UN) members and speeches in the UN General Debate.

Minimum/maximum UN General Debate speech size (words) with the median line.
A third issue is that in the early UNGD, there were no formal restrictions on countries speaking twice. There are, however, only a handful of cases when states delivered two official speeches in a year. In 1959, representatives agreed to limit themselves to one speech (Peterson, 2005: 80–81). For the few instances prior to 1959 that states deliver two addresses in a session, we have concatenated the speeches into a single statement, noting this in the codebook accompanying the UNGDC. Fourth, many users of text data subset their analyses by paragraphs. We draw attention to the fact that the UNGD designates numbered paragraphs in text from 1946 to 1984, omitting this practice altogether after 1984. From 1993, paragraphs are easily separated by including a space. Therefore, scholars who separate statements into paragraphs for their analyses should account for this, as it requires additional preparatory work for 1984–1992.
Fifth, while national political leaders are particularly prominent in recent UNGD sessions, this has not always been the case. Based on the speaker data now included with the UNGDC, Figure 3 shows that during the Cold War, the UNGD was primarily dominated by foreign ministers and by representatives of national UN missions (diplomats) (Baturo and Gray, 2024; Gray and Baturo, 2021). Around 24% of speakers across the 1946–2022 period are heads of state and government, with the presence of national leaders increasing over time. Other politicians, such as vice-presidents or cabinet ministers, also address the assembly during the General Debate. The intra-country and inter-country variation in national representatives delivering the UNGD address may therefore be an issue that users need to pay attention to, and potentially opens up new avenues of research regarding state preferences and national representation and delegation.

Speaker categories in the United Nations General Debate.
Exploring the UNGDC
The UNGDC can be used in a variety of ways to provide new insights into the study of international peace and conflict, and global processes more broadly. We provide three motivating examples. First, we explain how the key terms used by the USA and Russia/Soviet Union over time reveal their policy priorities, through a simple descriptive analysis. The second example focuses on the Sustainable Development Goals (SDGs) – described as a ‘shared blueprint for peace and prosperity for people and the planet, now and into the future’ (United Nations, 2015). Using topic models, we examine trends in governments’ engagement with the issues that make up the 17 SDGs. The final example uses countries’ UNGD statements to predict foreign policy outcomes, focusing on the response to Russia’s 2014 annexation of Crimea.
USA–Russia rhetoric over time
Interstate rivalry has played a central role in a large number of military conflicts (Thompson, 2001: 557). Specific issues contested by rivals are often articulated and debated in international diplomacy (Mitchell and Thies, 2011). The UNGDC, because of its rich textual information, can illustrate the dynamics of the policy priorities of rival states over time. As a simple illustration, we produce word clouds of the most frequent words used in United States (US) and Russian statements, by time periods. 6 The US–Union of Soviet Socialist Republics rivalry dominated global politics during the Cold War. Following a brief period of closer relations shortly after the collapse of the Soviet Union, relations between the USA and Russia again deteriorated – particularly in the UN (Maness and Valeriano, 2015). In the UNGA, the two countries frequently clashed over competing visions of the world order.
Figure 4 highlights the changing priorities of the two states over time, allowing researchers to identify patterns and trends from UNGD speeches. During the Cold War, both states had a strong focus on security, as seen from the prominently-placed ‘military’, ‘weapons’, ‘security’, ‘nuclear’, ‘peace’ and ‘disarmament’ terms. The US statements also covered economic issues (‘economic’, ‘growth’ and ‘markets’), the functioning of the UN (e.g. ‘charter’, ‘assembly’ and ‘council’) and values of ‘progress’, ‘opportunity’ and ‘freedom.’ In contrast, while the Soviet Union focused on security, in an effort to win new allies, its statements also emphasized ‘race’, ‘colonial’ and ‘colonialist.’ Figure 4 shows that following the brief 1990–99 interlude, Russian statements have increasingly emphasized security concerns and Russia’s perceived rivalry with the US (e.g. ‘nato’, ‘west’ and ‘washington’). In contrast, the US no longer recognized Russia as its rival, as demonstrated by the relative absence of references to Russia in the US statements.

Comparative word clouds for the United States of America and Union of Soviet Socialist Republics/Russia for selected time periods.
The UNGDC can also be used to examine whether rivalries have been resolved (Mitchell and Thies, 2011; Thompson, 2001). While we have opted to present simple visualizations here, scholars can turn to scaling, dictionary or topic analyses to understand the dyadic rivalry relationships. Beyond rivalry, Figure 4 demonstrates that the UNGDC can be used to study the emergence and decline of issues in world politics. For example, while the erstwhile emphasis on nuclear disarmament wanes over time, in the 1990s a ‘peace-keeping’-centred frame emerges. 7
Engagement with the SDGs
In 2015, UN member states unanimously adopted the 17 inter-connected SDGs, related to global development, climate change and peace, as part of the 2030 Agenda for Sustainable Development (United Nations, 2015). In addition to ‘promoting peaceful and inclusive societies for sustainable development’ (SDG 16), the growing recognition of the links between conflict, development and climate meant that the SDGs were conceived as a ‘shared blueprint for peace and prosperity for people and the planet’ (United Nations, 2015). While the SDGs were adopted in 2015, they ‘build on decades of work by countries and the UN.’ 8 For example, SDG-3, which focuses on ensuring healthy lives and promoting well-being, is closely connected to Article 25 of the 1948 Universal Declaration of Human Rights (UDHR). 9 Similarly, SDG-16’s emphasis on promoting peaceful and inclusive societies has its roots in the UDHR and the International Covenant on Civil and Political Rights. To that end, we can examine the extent to which states have engaged with the issues contained in the SDGs between 1946 and 2023 by fitting topic models, which aim to extract latent themes (topics) from documents. We use the Keyword-Assisted Topic Models (KeyATM) approach (Eshima et al., 2020). For each topic, KeyATM uses a set of key terms most representative of that topic, and assigns a high probability to those terms. Hence, we created dictionaries for each SDG based on their definition (see Table 1). 10
List of Sustainable Development Goals (SDGs) and corresponding keywords used in the Keyword-Assisted Topic Model’s approach.
Figures 5 and 6 present the results. Figure 5 displays the distribution of topics (the proportion of SDG topics within UNGD statements for each year on a vertical axis) from 1946 to 2023. 11 Each coloured line corresponds to a specific SDG. Figure 6 displays the relative proportion of each SDG topic over time using a streamgraph. Streamgraphs are particularly informative for presenting data that have multiple categories. It enables us to observe the changes and patterns in coverage over time at a glance. 12

Sustainable Development Goals’ topic distribution over time.

Streamgraph of Sustainable Development Goals’ topic distribution over time.
Both Figures 5 and 6 show that virtually all topics – with the exception of climate change (SDG-13) – have been present since 1946. The focus of the UNGD, however, has shifted over time. For example, Figure 5 shows that there was significantly higher engagement with water and sanitation (SDG-6) immediately after the Second World War, but this declined sharply from the late 1960s. In contrast, there is little engagement with climate change (SDG-13) until the 1990s, before rising sharply in the 2000s to become the topic with one of the highest distributions in countries’ UNGD statements. Figure 6 shows that topics gain and lose prominence over time. The funnel shape of the graph, with fewer topics dominating in the earlier years and a greater diversity of topics later, reflects both changes in the number of countries (and hence UNGD statements), and the changing global priorities. Both Figures 5 and 6 clearly demonstrate that the end of the Cold War was a key inflection point. Figure 5 shows how inequality (SDG-10), conservation of the oceans and marine resources (SDG-14) and responsible consumption and production (SDG-12) declined during the Cold War, while peace and inclusive societies (SDG-16), industry and infrastructure (SDG-9), global partnerships (SDG-17) and climate change (SDG-13) rose in prominence during the same period. Figure 6 underlines countries’ engagement with a broader range of topics after 1990. Moving forward, scholars can use the UNGDC to visualize and study the dynamics of global norms and issues, beyond the SDGs.
Responses to Russia’s annexation of Crimea
The UNGD statements can be used to examine specific foreign policy outcomes. As a final example, we analyse states’ responses to Russia’s annexation of Crimea in 2014, focusing on two specific responses: support for UNGA Resolution 68/262 on the ‘Territorial integrity of Ukraine’ adopted on 27 March 2014; and the imposition of sanctions on Russia in 2014. The sanctions, which targeted specific sectors, were enacted in a coordinated manner by the EU, USA, Canada and others. We examine the extent to which countries’ UNGD statements in 2013 can predict states’ responses to Russia’s actions in 2014. To that end, we estimate countries’ ideological alignment with the US and Russia using Wordscores, a text scaling method that enables the measurement of ideology, attitudes and positions of political actors (Laver et al., 2003). We select US and Russian speeches as reference texts. We then use the obtained scores to scale the speeches of all states on the dimension defined by the rivalry between the US and Russia. We choose a simplified, illustrative model specification with economic development, North Atlantic Treaty Organization membership, democracy, Russian gas dependency and ideological proximity as predictors. 13
For our estimation model, we use a random forest algorithm – an ensemble learning method that combines multiple decision trees to make predictions. Here, the vote prediction model has a 35.23% out-of-bag (OOB) error rate while the sanctions model has a - 6.74% OOB error rate. The MeanDecreaseGini is a measure used to interpret the importance of each predictor variable. 14 Simply put, a high MeanDecreaseGini value indicates a strong association with the target variable, signifying its importance as a predictor.
Figure 7 displays the results. We observe that ideological positions estimated from the UNGD speeches are closely associated with voting on UNGA Resolution 68/262, but are less important for the imposition of sanctions. Given that very few states responded to Russia’s annexation of Crimea with economic sanctions in 2014, it is perhaps unsurprising that the Wordscores measure is not strongly associated with the imposition of sanctions. Still, ideological positions derived from countries’ UNGD statements in 2013 are an important predictor of voting on the UNGA Resolution. Hence, the analysis suggests that a fruitful area of future research may rely on the UNGDC to fit predictive models to understand different outcomes in global politics.

Variable importance: Mean decrease Gini.
Conclusion
The UNGDC contains annual statements by all UN member states discussing their perspectives on the most important issues in world politics. Extending the corpus to cover the entire 1946–2023 period ensures that these annual statements can be compared across all UN member states for the entire post-Second World War period. Hence, the UNGDC arguably represents the most important source of textual data in IR. Many key text data sources remain undigitized, and subsequently, most textual data applications in IR focus on the post-Cold War period (Patz and Thorvaldsdottir, 2021; Schonfeld et al., 2019). The complete UNGDC helps to address this limitation.
As we have discussed, the UNGDC has been used to study key issues in foreign policy and IR. Extending the corpus enables researchers to study longer-term processes, such as the politics of state rivalries and protracted conflicts (Mitchell and Thies, 2011; Thompson, 2001), the spread of international norms (Finnemore and Sikkink, 1998), shifts in engagement with critical issues and the emergence of new problems and solutions in global politics (e.g. Buzas, 2021). Yet, the UNGD is not only a venue for position-taking; it has discursive, and even performative, characteristics. Representatives engage in argumentation, deliberation and persuasion (Risse, 2000) – an important aspect of diplomatic interaction, which can be studied systematically using UNGD statements. Hence, the UNGD not only provides information on what representatives talk about, but also how they express themselves in terms of the complexity of their speech, their sentiment and emotion (see Bleiker and Hutchison, 2014; Holmes, 2015). Therefore, the UNGDC is a unique resource for scholars from across different IR perspectives. It can also be used as an ‘off-the-shelf’ resource for teaching IR, and by the media and policy actors applying data visualizations, as our illustrative examples demonstrate, and as the new UNGDC website facilitates. The breadth, size and scope of the UNGDC mean it can also be used in interdisciplinary research at the intersection of natural language processing and social science – for example, in developing new approaches to evaluate semi-supervised text classification methods (Watanabe and Zhou, 2018), improving the accuracy of sentence classification (Watanabe and Baturo, 2024) and highlighting the risks of artificial intelligence algorithms immitating real speeches made at the UNGA (Bullock and Luengo-Oroz, 2019). Therefore, the complete UNGDC will enable scholars to tackle wide-ranging questions. Moving forward, we will update the UNGDC annually to ensure that it can be used to examine both past and present events in global politics.
Footnotes
Data replication
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Notes
SLAVA JANKIN, PhD in Political Science (Trinity College, University of Dublin, 2009); Professor of Data Science and Government and Director of Centre for Artificial Intelligence in Government at the University of Birmingham (2023–present). His research focuses on the development and application of data science and artificial intelligence (AI) methods to track politics and governance of health effects of climate change. Slava Jankin has also been working with national and local governments on embedding AI and data science in the delivery of public services.
ALEXANDER BATURO, PhD in Political Science (Trinity College, University of Dublin, 2007); Associate Professor of Government at Dublin City University (2008–present); various visiting academic fellowships in the UK, USA, Netherlands and Australia. He studies comparative authoritarianism, including the behaviour of authoritarian actors in international organization. Most recent book: The New Kremlinology (Oxford University Press, 2021).
NIHEER DASANDI, PhD in Political Science (University College London, 2013); Professor of Global Politics and Sustainable Development at the University of Birmingham (2017–present). His research looks at the global politics of sustainable development and human rights, focusing in particular on: the relationship between development and human rights; the health dimensions of climate change; and the measurement and determinants of foreign policy.
