Abstract
With the increasing relevance of ethnic groups as political actors, the literature has attempted to identify and study the ethnic organizations representing these groups. How do these organizations use digital communication channels to reach their domestic and international audiences? To enable research on these questions, this article introduces the Ethnic Organizations Online dataset, a new data collection focusing on the online channels that ethnic organizations use. The dataset includes four types of channels: Twitter (since July 2023, rebranded by Elon Musk as X); Facebook; Instagram; and regular websites. It relies on the Ethnic Power Relations – Organizations database, and is therefore compatible with an entire family of datasets on ethnic politics. Featuring more than 2000 online channels used by 265 groups, it allows researchers to study a wide variety of questions related to digital ethnic mobilization. The article presents three examples of how the dataset can be used. We study: (a) how a group’s political goals influence social media adoption; (b) how elections impact the organizations’ communication frequency and how this differs between democracies and autocracies; and (c) how the power status of a group affects the content of their communication. We provide replication codes facilitating the use of the dataset in applied research.
Introduction
In the last decades, the study of ethnic politics has made considerable progress to better understand a variety of phenomena, ranging from ethnic voting (Fox, 2018) and clientelism (Koter, 2013) to democratization (Giuliano, 2006) and ethnic conflict (Cederman et al., 2011). Much of this progress was made possible by the creation of datasets that allow researchers to identify the ethnic groups involved, their changing political demands and representation over time, as well as other relevant characteristics. One of these efforts is the Ethnic Power Relations (EPR) family of datasets (Vogt et al., 2015), which consists of different interlinked data collections that all rely on the same group definition. However, ‘groups’ rarely decide and act as unitary actors, which is why researchers have focused on political organizations that claim to act on behalf of ethnic groups. These organizations, such as parties, non-governmental organizations (NGOs) or social movement organizations, can influence ethnic identification (Flesken, 2018), foster mobilization (Brubaker, 2006), or even pursue their goals with violent means (Vogt et al., 2021).
To advance research on these organizations, we introduce a new data collection on their online communication channels: the Ethnic Organizations Online (EO2) dataset identifies the main online communication platform(s) that organizations rely on. This enables comparative research of organizational communication and facilitates in-depth analysis of the content of this communication. The final dataset records 2120 online communication channels of ethnic organizations, and focuses on four types: Twitter (X); Facebook; Instagram; and organizational websites. This article describes the structure of the EO2 dataset and the coding process. We illustrate the use of the data with three applications analyzing: (a) the adoption of particular channels; (b) their use; and (c) the type of message that channels disseminate.
The EO2 dataset
The EO2 dataset, accessible via https://doi.org/10.7802/2612, identifies digital communication channels used by ethnic organizations. Rather than recording these organizations anew, the EO2 dataset relies on the Ethnic Power Relations – Organizations (EPR-O) dataset (Gremler et al., 2020; Vogt et al., 2021). EPR-O lists organizations ranging from political parties to guerrilla groups that are connected to ethnic groups coded in the EPR. It includes 1154 ethnic organizations linked to 354 ethnic groups in 90 countries covering the years 2000 to 2019. For a full list of countries with organizations in EO2, see online appendix A2.
Organizations in EPR-O are linked to ethnic groups in EPR via ethnic claims (public demands in favour of a group), ethnic recruitment (membership largely based on ethnicity) and ethnic electoral support (voters elect the party because of group membership). If an organization satisfies any of these criteria and is relevant to national politics, the EPR-O dataset further collects information on organizational goals, activities and strategies. In online appendix A1 we provide an overview of the distribution of social media usage and organizational types. More details on the coding criteria and the variables in EPR-O can be found in Vogt et al. (2021).
For ethnic organizations listed in EPR-O, EO2 identifies all official online channels on Twitter, Facebook and Instagram, as well as their websites. We focus on these channels in particular because of their broad adoption across various contexts and countries. For instance, Facebook counts up to 3 billion monthly active users in 2022 1 with many world leaders and political organizations among them (Barberá et al., 2022). While Twitter counted only about 230 million monthly active users in 2022, 2 the platform holds special relevance for political content. Together with organizational websites and Instagram, our goal is to enable researchers to compare patterns of online communication across platforms and countries.
We restrict the temporal coverage of the dataset to the years 2000–2019 for two reasons: first, none of the social networking sites existed prior to 2000. For instance, Facebook was founded in 2004 and did not reach widespread usage for years. Twitter and Instagram were established much later; and second, this limitation enables comparability across the EPR-O countries, as all countries are covered within this period.
Online channels are defined as publicly accessible, authentic websites and/or social media accounts that represent an organization, which creates its content or supervises content production. This definition entails the following inclusion criteria:
The channel is official. This means that it is operated and/or supervised within the structure of the ethnic organization. For instance, this excludes accounts that are administered by individual members without permission of party officials or internal networks.
The channel is public in the sense that it can be viewed without prior approval by the organization. The reasons for excluding closed channels are twofold: first, we are interested in public channels designed to communicate with audiences extending beyond an organization’s members. Private Facebook friend pages, for instance, whose content may only be viewed with an accepted friendship request, or Facebook groups populated by the organization’s members, do not satisfy this condition; and second, including restricted channels would have practical implications for our coding. They are difficult for coders to find, which would result in a high degree of missingness.
Coding procedure
EO2 relies on human coding. Coders were assigned individual countries, and they searched for online channels of all the country’s organizations in the EPR-O dataset that did not dissolve before the year 2000. 3 Since most of the coders were also involved in the work on the EPR-O dataset, they had already obtained considerable knowledge about the organizations in their assigned countries. Using an online spreadsheet interface, they then located all online channels for each organization. In the codebook, we specify strategies to reliably detect accounts on Twitter, Facebook, Instagram as well as websites. For instance, coders were asked to first search for websites and whether these had any linked social media accounts in order to better gauge an organization’s social media ecosystem. Moreover, coders assessed authenticity via ‘About’ pages, profile pictures, distributed content, or the presence of blue checkmarks (‘verified’). During the coding, coders were instructed to collect all possible channels. Thus, multiple channels on a platform for one organization are possible. However, we always attempt to identify which online channel constitutes the primary one on a particular platform at the time of coding. This is the account that serves as the main communication channel for the organization. However, coders also retrieved dormant online channels that are no longer in active use.
Tests of intercoder reliability indicate that our process provides a replicable way of identifying online channels. We tested this by assigning two countries to different coders without their knowledge and calculated the overlap of accounts coded by both. 89% of the online channels were found by both coders. If we only consider primary accounts, this number increases to 97%.
Nevertheless, any data project has gaps, and this is no different for EO2. In particular, any data collection conducted on the Internet is inherently difficult due to the dynamic nature of online content. Similar to almost all data collection on the Internet, our coding is based on the status quo at the time of coding, but cannot consistently capture past information. For example, if we fail to find an account for a given organization, this does not mean that one has never existed. Accounts could have been deleted due to changing organizational strategies and resources, or due to a shift to different platforms or providers. This biases data collection towards organizations that have recently or consistently used the same means of online communication. Therefore, the EO2 dataset should be viewed as a snapshot of digital communication channels rather than a time-invariant, exhaustive list. To help researchers to understand and address selection, we provide account creation dates, dates of Wayback Machine snapshots to access older versions of the websites and dates of latest posts (see Online appendix A3). We also conduct two additional analyses of adoption patterns in online appendix A4.
EO2 does not cover all social media platforms. Since one of the goals of the data collection was to offer a source for comparative research across countries, we did not retrieve data on country or region-specific social networking sites. For instance, Russia’s VKontakte or China’s Weibo are important national platforms that are missing from our data. Additionally, we did not code video platforms (YouTube) or relatively new platforms (TikTok, Mastodon). In our experience, these sites are – for now – rarely employed by ethnic organizations. Finally, we explicitly code organizational accounts but exclude personal accounts of political candidates that often only appear for a shorter time span or individual campaigns.
EO2’s coding approach is to identify pointers to online channels such as account names and uniform resource locators (URLs), rather than providing the content published. This is intentional, and in line with the approach taken by other comparable datasets. Account names and URLs provided by EO2 allow researchers to obtain digital content by web scraping or using automated access via application programming interfaces (APIs). For the social media platforms that we focus on, automated access to digital content is closely regulated and, if at all possible, requires a personal account with the necessary privileges. The distribution of this content is generally prohibited.
Key variables
Each row in the main EO2 table describes an online channel attributed to an ethnic organization in the EPR-O dataset. For each channel, there are several key variables:
URL: The URL of the specific online channel. For webpages, this is the frontpage. For social media platforms, this is the URL under which the account can be reached. If publicly available, we also provide Facebook and Twitter profile identifiers for accounts.
Channel Type: Whether the channel is a webpage, or a Twitter/Facebook/Instagram account.
Primary: Whether the online channel is considered the primary outlet of the organization on that particular platform. For any type of channel on a particular platform, there can only be one primary channel. If there are several channels on a single platform by an organization, coders were asked to assess authenticity of the channel to choose the primary one.
Confidence: Coders estimate the level of confidence that they have in the authenticity of the channel (1 = low, 5 = high). While these are subjective evaluations, the codebook outlines specific criteria.
First/Last Snapshot: First and last snapshot on Wayback Machine.
Account Creation Date: Creation date of social media channel.
Newest Publication: Latest publication of social media channel.
Additionally, the dataset includes the coding date, a country identifier and the organization’s name. The latter two fields enable merging with the EPR-O dataset. Coders also entered notes regarding a particular case into a comment field.
Data overview
In total, the EO2 data comprises 2120 online channels by 265 ethnic groups in 86 countries. Table 1 shows the numbers of the different channels (first row) and the primary channels (second row) connected to all the EPR-O organizations that have not dissolved before the year 2000. Out of the 2120 total organizational online channels collected in the EO2, 764 are on Facebook, making it by far the most popular platform. Moreover, many organizations also operate organizational websites (415 primary accounts) underscoring the sometimes overlooked importance of websites. Relatively new platforms such as Twitter or Instagram are also quite popular; however, they are used by a much lower number of organizations.
Summary table of the Ethnic Organizations Online (EO2) channels.
A descriptive analysis of our data already reveals considerable variation between social media adoption and usage. Even though Facebook is the most popular platform, less than 50% of all organizations use it. Taking organizations that operate either a Facebook page or a Twitter account, this number rises to 550, almost 60%. In combination with the Twitter API, EO2 permits some interesting insights on Twitter usage by organizations. 4 For instance, our data reveal considerable variation across countries: Figure 1 visualizes the tweet frequency by organizations linked to the ethnic groups shown on the map, with group settlement areas taken from the Geo-referencing Ethnic Power Relations dataset (Wucherpfennig et al., 2011). While some organizations such as the Partido Nacionalista Peruano (PNP) in Peru publish 211 tweets per month on average, the National Freedom Party (NFP) of South Africa only does so about eight times per month. Figure 1 also serves to illustrate the benefit of being able to combine the EO2 dataset with other datasets from the EPR family: Researchers can easily combine digital trace data with other types of data sources to study online communication patterns.

Tweet frequency of groups with organizations in the Ethnic Organizations Online dataset.
The increasing importance of social media in the communication of ethnic organizations is illustrated in Figure 2. Using all tweets from accounts identified as primary online channels, it displays the average number of monthly tweets over the years. Most of the spikes in the number of tweets can be attributed to country-level events, in particular around elections. For instance, the spike in 2020 corresponds to the United States presidential election and the subsequent storming of the Capitol on 6 January 2021. In total, these data include close to 3.5 million tweets from accounts in the EO2 as of 1 October 2022.

Distribution of tweets in the Ethnic Organizations Online Twitter profiles.
The dataset thus enables researchers to study publishing patterns across a wide range of political and organizational contexts as well as different platforms. For replication purposes, we make the tweet IDs for all accounts in the dataset available (a so-called ‘dehydrated’ dataset).
Applications of the EO2 dataset
In the following, we provide three examples of empirical analyses that can be performed with our data, looking at: (a) adoption; (b) usage frequency over time; and (c) content of online communication channels.
Social media usage and separatism
Recent research has shown that social media can serve to reach out to international audiences to further organizational goals. For instance, Jones and Mattiacci (2019) find that international support towards Libyan rebel groups increases if they advocate for democratic ideals and highlight government atrocities. An organization’s online communication can also appeal to international NGOs and foreign governments that can exert pressure on international organizations in support of the group (Murdie and Peksen, 2014). For ethnic organizations that strive to establish their own nation state or follow irredentist goals, digital media can thus serve as a strong means to signal intentions and garner international support. Consequently, we expect ethnic organizations with goals of secession to have a higher likelihood of adopting digital channels, and to tweet more frequently. Moreover, a focus on advancing secession should also be reflected in language use. To address international actors (e.g. the European Union, the United Nations, international NGOs) we expect these organizations to use English, the international system’s lingua franca, more frequently compared to other ethnic organizations.
In order to analyse adoption patterns, we use the EO2 dataset to code whether an organization operates social media channels on Twitter and Facebook. Much of the social media literature exclusively focuses on a single platform, often Twitter. Yet, our data show that other platforms such as Facebook or conventional websites are often equally, if not more, important. The corresponding dummy variable takes a value of one if the organization has either a Facebook or Twitter account (or both) in the EO2 dataset, and zero otherwise. We further construct a variable that measures the percentage of an organization’s tweets that are in English. 5 Our independent variable coding secessionist or irredentist goals is taken from the EPR-O.
As control variables, we include the number of EPR groups in the respective country for the latest year available in the EPR. Economic wealth of an ethnic group is proxied with the per capita amount of night light emitted from the group’s settlement region. We also include the (log-transformed) population of an ethnic group in a particular country in the year 2010 (the latest time point available) as well as the organization’s age (the number of years from foundation to dissolution). 6 Finally, we also control for whether an organization has at any time participated in national elections as recorded in the EPR-O dataset.
For organizations linked to multiple ethnic groups, we take the average of the group variables. Instead of choosing one value or including the organization multiple times for each organization–group link, we take the mean of the group-level covariates (nightlights, population numbers). We exclude organizations in EPR-O that have dissolved before 2009, since this is the first year with a social media account on either platform in our data. We also exclude channels that are not flagged as primary in the EO2 dataset, that is, secondary channels.
Table 2 shows the results of the analysis. Model 1 is a logistic regression estimating whether an organization has opened a Twitter or Facebook account. Model 2 is a linear model that estimates the average number of monthly tweets (natural logarithm) for an organization. Model 3 is a linear model that estimates the proportion of tweets in English, based on a reduced sample that excludes countries where English is an official language. All models include region fixed effects.
Impact of separatist demands on social media adoption (Model 1, logit), tweet frequency (Model 2, ordinary least squares (OLS)) and proportion of tweets in English (Model 3, OLS).
*** p < 0.001; **p < 0.01; *p < 0.05. All models with region fixed effects.
Generally, these models provide no evidence that actors pursuing goals of secession are more likely to leverage social media. Model 1 shows no significant effect of separatist goals on adoption; if anything, the coefficient points towards a negative impact. Similarly, there is no significant effect on the number of tweets by separatist organizations (Model 2). Their use of English does not differ either (Model 3). These null results differ from previous research on separatist rebel groups (Loyle and Bestvater, 2019). One reason could be that previous work has only examined Twitter, while our adoption measure also includes Facebook. In Online appendix A5, we present a model that mirrors previous analyses as closely as possible by only including organizations that have used violence, exclude political parties and only test Twitter adoption (rather than Twitter and Facebook). These models provide evidence that violent organizations pursuing secessionist aims are more likely to use Twitter, confirming the findings of Loyle and Bestvater (2019). This shows that focusing solely on Twitter can lead to different conclusions as compared to analyses including more platforms such as Facebook.
Election cycles and online activity
The EO2 dataset can also be used to study how the political environment shapes political communication on social media. Here, we analyse how political organizations respond to elections and demonstrate that there are differences between autocracies and democracies.
One of the defining features of democracy is the presence of contested elections that serve to fill executive posts (Alvarez et al., 1996), so almost by definition, elections are more meaningful in democracies than they are in autocracies. In democracies, political organizations thus should invest resources around elections, in order to mobilize voters for political gains. At the same time, most autocracies still hold regular elections. However, in an environment where the result is known a priori, organizations should be less likely to invest strongly in mobilization: For opposition actors, electoral manipulation and repression will reduce their electoral success, while for the incumbent, there will often be no need to mobilize as election wins are certain regardless of political campaigning. Consequently, we expect that political organizations in autocracies face fewer incentives to invest in digital communication around election dates. We test this hypothesis with almost 3 million tweets from 68 countries in the EO2 dataset.
Our unit of observation here is the number of tweets published by an ethnic organization in a country in a particular week. For each tweet in our dataset, we measure the (log-transformed) number of weeks to the closest national election according to the publicly available Election Guide by the International Foundation for Electoral Systems. 7 To measure regime type, we use V-Dem’s main democracy variable v2xregime (Lindberg et al., 2014). It differentiates between closed autocracies, electoral autocracies, electoral democracies and liberal democracies. We recode the variable to a dichotomous measure of autocracies (closed and electoral) and democracies (electoral and liberal).
Table 3 presents two multilevel models with the natural logarithm of the number of posts in a specific week as the dependent variable. The control variables are the same as before; however, as our dependent variable is a country level measure, we nest our observations within organizations and countries. For both the weeks to the next election (Model 1) as well as the weeks to the closest election (Model 2), we interact our democracy measure with the time to election variable. For ease of interpretation, we present the marginal effects of Model 1 in Figure 3.
Multilevel models estimating how time to the next election (Model 1) or the closest election (Model 2) affects the weekly number of tweets.
*** p < 0.001; **p < 0.01; *p < 0.05.

Effect plots for Model 1 from Table 3, illustrating how the proximity of election affects the frequency of tweets.
These models support our expectation that elections in democracies have a positive and stronger effect on political organizations to post more frequently. Figure 3 shows that the expected number of posts per week for organizations in democracies is around 50 at the time of election, while it is only close to 10 approximately 50 weeks before. This stands in stark contrast to autocracies, where this difference is far less pronounced. The findings are similar for the model that also includes past elections (Model 2). In online appendix A6, we present models that use independent variables without log transformations, and models that use an expanded regime variable comparing all four regime types coded by V-Dem. Again, these models provide evidence in favour of our expectations.
Power status and content type
To illustrate how the EO2 dataset can be used to study the content of digital communication, we analyze how the type of messages sent by organizations depends on the political power status of the group they represent. We build on work by Gadjanova (2021), who argues that incumbent African presidential candidates will turn to targeted transfers and material incentives during campaigning, while strong opposition candidates will address ethnic grievances to attack and split the ruling coalition. For incumbent candidates, appealing to grievances is less credible as their position in office could have enabled them to address (and potentially, reduce) them.
Following this logic, ethnic parties linked to groups in power should try to divert from ethnic grievances in their online communication. Rather, we expect them to emphasize their political work, for example by promoting their activities. Along these lines, Barberá et al. (2022) show that world leaders turn to emphasizing foreign policy on Twitter and Facebook during times of domestic contention and elections. While emphasizing foreign policy can be a viable strategy, we focus on an alternative pathway: the attempt to downplay and de-emphasize policy issues per se. Consequently, we hypothesize that parties linked to groups in opposition will mention policy issues – among them grievances – more often, while incumbent ethnic parties turn to highlighting their government work by reporting on organizational activities.
To capture this empirically, we identify whether a French, English, or Spanish tweet by a political party in our sample contains a political statement or referenced activities and events. The former is defined as a substantive argument or a political statement that does not only provide neutral information about an issue. The latter captures posts that report on events or activities of a party. For each tweet, we construct two dependent variables: whether it contains an argument/statement (0/1); and whether it mentions activities/events (0/1). The prediction is based on a multilingual BERT-Transformer model trained on the EO2 data that were labelled by human coders. Human coding, machine learning models and conceptualization is based on Gremler and Haiges (2022). The machine learning model achieved an accuracy of 83% (arguments/statements) and 88% (activities/events). For instance, the following tweet by the Economic Freedom Fighters, South Africa, is classified as a statement: ‘It becomes more important during this pandemic, where we find ourselves unable to objectively conduct free and fair elections’ (twitter.com/EFFSouthAfrica/status/1349663664429789184). An example of an activity or event tweet would be the following: ‘[Twitterhandle] hosts the #NCLR16 Health Town Hall’ (twitter.com/WeAreUnidosUS/status/757665651188260864) by United States-based advocacy group UnidosUS.
In total, 1,017,513 out of 1,605,686 tweets are classified as arguments/statements in our data, and 253,289 as activities/events (the categories are not mutually exclusive). We restrict the sample to parties using their account regularly and that have tweeted more than 30 times. Our independent variable measures whether a party is linked to an ethnic group in power. For this, we rely on the EPR’s power status variable: if any ethnic group linked to a party is coded as having a power monopoly or as a senior or junior partner, we code it as ‘in power’ (1), and ‘excluded’ (0) otherwise. The control variables are the same as above.
Table 4 presents four logistic multilevel models with tweets nested within organizations as the unit of observation. The outcome in Models 1 and 2 is whether a tweet is a statement and in Models 3 and 4 it is whether the tweet describes an event. Both Models 2 and 4 are estimated without the nightlights and population control variables, since these are systematically missing for groups without clear-cut settlement patterns (urban/dispersed). Such settlement patterns and corresponding missingness of control variables could bias our results.
Multilevel models estimating whether tweets contain a political statement (Models 1 and 2) or a political activity (Models 3 and 4).
p < 0.001; **p < 0.01; *p < 0.05.
Models 1 and 2 provide support for our expectation that parties linked to incumbent ethnic groups exhibit a lower probability of turning to political arguments in their online communication. A tweet published by these parties is less likely to include an argument or statement. Holding all other variables at their average values, a tweet by an organization linked to a group in power has a probability of 45% to contain a political argument, while it is 61% for a group that is not in power – an increase of 16%. Interestingly, however, incumbent parties do not seem to counterbalance this by highlighting their organizational activities. Instead, Models 3 and 4 indicate that they are less likely to report on events, since both coefficients of interest are negative and the effect in Model 4 is significant. Substantially, however, this effect is quite small: the predicted probability of a post containing a description of an activity is 11% for an excluded group and 14% for a group in power. While we can only speculate on reasons for this finding, incumbents could be more likely to communicate through other types of content, such as promises of patronage or valence appeals (Bleck and Van De Walle, 2013).
Conclusion
The EO2 dataset enables researchers to tackle new questions in the field of ethnic politics and beyond by providing a global collection of ethnic organizations’ online channels. Compatible with the EPR data collection, it can easily be combined with a variety of existing datasets on ethnic politics. EO2 employs a simple and versatile structure, which enables it to be easily extended and updated to include more platforms, other types of organizations and further countries. With the help of our dataset, scholars can access huge amounts of online material. This also includes data that goes beyond text, such as images or videos posted on social media.
EO2 supplements existing work in the field of ethnic politics in a number of ways. First, research on ethnic organizations’ communication strategies has been hampered by a lack of comparative data. Often, existing studies rely on single-case studies. Although their findings produce important insights, external validity remains unclear. With the EO2, we provide a database of organizational communication, which enables quantitative tests in a comparative framework. Second, the ethnic politics literature often leaves social media communication aside, thus ignoring increasingly important political dynamics online. Third, beyond a specific interest in social networking sites, social media also offers data that enable comparison across countries, time and actors.
Examining online communication of ethnic organizations also matters for scholars in social media studies as well as conflict research. The former can leverage EO2 to examine how and whether political communication based on social identities relate to political attitudes in the online sphere and the real world. The dataset also includes numerous countries outside of Europe and North America – this should encourage further research on political organizations’ digital media beyond the Global North. For conflict research, the EO2 includes a number of ethnonationalist rebel groups and provides valuable information to scholars of rebel diplomacy.
Footnotes
Acknowledgements
We thank our research assistants Sima Bulut, Inés Ramírez Caballero, Frederike Kaiser, Johanna Kleemann, Mia Nahrgang, Klara Panther, Christin Rudolph and Ann-Kathrin Schnelle for their outstanding work. We gratefully acknowledge comments from Lea Haiges, Manuel Vogt and Christina Zuber, and technical assistance from Sebastian Nagel and Lars-Erik Cederman’s International Conflict research group (in particular, Luc Girardin).
Replication data
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the German Research Foundation (DFG) as part of the Excellence Strategy (EXC-2035/1–390681379).
Notes
FREDERIK GREMLER, b. 1994, PhD in Political Science and Public Administration (University of Konstanz, 2023), Researcher, University of Konstanz – Cluster of Excellence: The Politics of Inequality (since 2019).
NILS B. WEIDMANN, b. 1976, PhD in Political Science (ETH Zurich, 2009); Professor of Political Science, University of Konstanz, Germany (since 2012).
