Abstract
Analyzing issue cycles usually begins with observing selected events and then tracking the course of media coverage. This approach collapses when the events of interest are hidden, overlain, or even distorted by extensive coverage of other events. One such complicated case is news about terrorism in Africa. While previous studies have started from single media hypes, we propose modeling the general pattern of such issue cycles with distributed lag models on a large-scale data basis. In order to assess the utility of distributed lag models, two basic principles of issue cycles are derived from theory and empirically tested. Furthermore, using the Global Database of Events, Language, and Tone, we evaluate the usefulness of automated methods for news research. Although the data are quite noisy, automated content analysis combined with distributed lag models is a promising approach for studying issue cycles. The model can be used to visualize issue cycles. In the case of news about terrorism in Africa, we found a sudden increase in coverage, followed by a second local maximum after a few weeks.
Keywords
Every day, every hour, and every minute, news is produced. However, despite the continuous coverage, some events stand out as extraordinary, and some topics are of special interest. This is where the analysis of issue cycles starts. After identifying a topic such as climate change (Brossard et al., 2004; Lörcher & Neverla, 2015), care for the elderly (Wien & Elmelund-Præstekær, 2009), or war (Miltner & Waldherr, 2013), trigger events are identified and individual issue cycles are further investigated. If the uplift in attention is sudden and media-driven, these issue cycles are called media storms, hypes, or waves (Boydstun et al., 2014, p. 509; Vasterman, 2005, p. 509).
While this approach gives insights into specific issue cycles, our analysis contemplates general patterns of issue cycles when the events of interest are hidden, overlain, or even distorted by extensive coverage of other events. One of the more complicated cases is news about terrorism in Africa. Even if terrorist events are conceived of as serious exceptional occurrences, in comparison to other continents such as Europe or America, many more situations happen on many more days. In 2016, one-sided violence occurred in 19 African countries, events that were distributed over 218 days of the year (Uppsala Conflict Data Program [UCDP], 2016; see analysis below). Amnesty International and the United Nations, for example, describe the Democratic Republic of the Congo as the most dangerous place in the world (Schlindwein, 2013). Moreover, violence is perpetrated by organizations over extended periods. For example, Boko Haram, which attracted worldwide attention in 2014 with the kidnapping of 276 schoolgirls, originated in Nigeria, has existed for decades, and is still active today (CNN, 2019). Further countries with terrorist events on at least 14 days of the year include Cameroon, the Central African Republic, Ethiopia, Somalia, South Sudan, and Sudan (UCDP, 2016).
Under these conditions, day-to-day violence is barely surprising enough to trigger notable media storms, waves, or hypes. While media reactions to terror events are obvious in the Western world, the reporting about such events in Africa is much more muted, especially from a Western perspective: “Each terror victim in Western democracies receives more media attention from the New York Times than a similar death in a developing country” (Rohner & Frey, 2007, p. 140). This may be explained by news value theory, according to which journalists tend to prefer messages about elite nations (Galtung & Ruge, 1965; Schulz, 1990). Therefore, an analysis of news values needs to be complemented with study of the regional African media. This further challenges the analysis of issue cycles because the media landscape in Africa is quite complex (Brüne, 2013)—according to the Freedom in the World report, only 18% of all African countries had a free press in 2016 (Freedom House, 2016, p. 11).
Furthermore, an analysis of issue cycles under these circumstances is of particular interest because of the special relationship between the media and terrorism. Terrorist violence not only causes physical and emotional damage but also has communicative consequences. Because the media cannot ignore these events, they spread the terror messages by reporting on it; terrorism and the media are thus bound together for mutual benefit (Barnett & Reynolds, 2009, p. 2; Beck & Quandt, 2011, p. 86; Rohner & Frey, 2007, p. 130). On the level of international politics, possible consequences of media coverage include an agenda-setting effect and accelerated decision making on policy, for example, to legitimate military intervention (Livingstone, 1997, p. 2). At the same time, the achievement of policy goals can be hindered by an emotional climate created by the media: “At the heart of the Vietnam syndrome was the concern that media coverage had the potential to undermine public support for an operation and erode troop morale on the ground” (Livingstone, 1997, p. 4). The potential effects of CNN news reporting on politics have been called the CNN effect (Gilboa, 2005, p. 28). Empirical findings are contradictory: for example, whether U.S. interventions in Somalia and Rwanda can be traced to media coverage has been the subject of some controversy (Gilboa, 2005, p. 33). Nevertheless, theory development in this area has been enriched by research on the CNN effect, and blind spots have been identified (Gilboa et al., 2016, p. 663).
Extending the gaze beyond single events and single countries affords a broader picture. In this study, we explore a method for distilling the issue cycle of terrorism from a large database covering all the countries on the African continent. First, we evaluated the Global Database of Events, Language, and Tone (GDELT) from a communication research perspective. The GDELT collects worldwide news coverage, translates it from over 100 languages into English, and automatically identifies places, actors, events, and much more (GDELT, 2018b). To assess data quality, we analyzed precision and recall for terrorist events and news reports used in the issue cycles analysis. Second, the relationship between terrorist events and news coverage was analyzed with a distributed lag model. Unlike other modeling approaches, distributed lag models can reveal the shape of a process. This allowed us to explore the general patterns arising from special circumstances where reporting about violence may be overlain by other news coverage. As a testing ground for the method, we derived two standard hypotheses from news values theory.
Issue Cycles and News Value Theory
The relationship between events and news covering these events can be analyzed from very different perspectives. If the focus of interest is the causes and motives of violence and communication, an analysis of actors at the microlevel of concrete actions is appropriate (e.g., communicator and recipient research). An analysis of political or media systems requires a systematic comparison of systems or countries (e.g., media and terrorism in Europe vs. Africa). Our analysis of communication dynamics is located between microlevel behavior and macrolevel systems. We analyze terrorism and the media by aggregating individual communication and events at the levels of days and countries.
In the first part of the article, we outline our theoretical assumptions. The study is concerned with the shape and course of issue cycles and with news values as drivers of issue cycles. Thus, while news coverage may potentially influence terrorist events, the present spotlight is on news coverage as an outcome, our interest being in revealing and explaining short-term attention processes by well-established communication theory. The effects of media reporting on terrorist violence (see Brosius & Weimann, 1991, p. 68) and political outcomes, such as the CNN effect mentioned above, are more long-term, intricately interwoven with the interests of terrorist organizations, and therefore need to be investigated in dedicated studies. Moreover, media coverage can be simultaneously both cause and result if coverage depends on previous reporting (Kolb, 2005, p. 80; Schulz, 1990). Because the concept of issue cycles is based on cumulated media coverage, despite controlling for overall coverage, we only focus on terror events as drivers of attention. These clearly are limitations of the study. However, the simplifications are tolerated in order to maintain focus on the applicability of automated procedures and the transfer of the modeling approach to issue cycles.
Issue Cycles, News Waves, and Media Hypes
The occurrence of a terrorist attack is spatiotemporally bounded—as is its news coverage. Whereas a terrorist attack happens at a specific time and place, the news coverage of the event typically consists of different reports from diverse sources at varying times. The course of aggregated news coverage can be described in terms of issue cycles (Kolb, 2005, p. 168). One of the most frequently applied concepts regarding the dynamics of media attention toward certain issues is Downs’s (1972) model of an issue-attention cycle. He describes the course of a specific issue in five stages. The first is the pre-problem stage, in which an issue is not discussed in the public forum, followed by the second stage of alarmed discovery and euphoric enthusiasm (Downs, 1972, p. 39), when the occurrence of key events draws the media’s attention toward a certain issue and shapes its subsequent news coverage (Kepplinger & Habermeier, 1995). In the third stage, realizing the cost of significant progress, the public becomes aware of the complexity of an issue and the high level of effort required to solve the problem (Downs, 1972, pp. 39–40). There follows the fourth stage, a gradual decline of intense public interest, until the issue reaches the fifth and final post-problem stage, in which it “moves into a prolonged limbo—a twilight realm of lesser attention or spasmodic recurrences of interest” (Downs, 1972, p. 40). Thus, this model of issue cycles echoes the life cycle model: Both have a defined beginning, ending, and recurring characteristics but entail individual variation along the course of the cycle (Kolb, 2005, pp. 46–47). The ideal–typical course of media attention to an issue can be visualized as a bell-shaped curve (Kolb, 2005, p. 80).
The term issue cycle is frequently used to refer to patterns deriving from Downs’s model of an issue-attention cycle. In contrast, the term issue career describes “[t]he long-term development of media attention toward an issue,” which can also comprise multiple issue-attention cycles (Waldherr, 2014, p. 852). Another concept used to describe media attention to an issue is news waves (e.g., Geiß, 2011), which constitute “a sharp and continuous increase of reporting on a specific issue for a limited period of time” (Geiß, 2011, p. 272). Furthermore, the development of news coverage of one issue can be described through media hypes (e.g., Vasterman, 2005; Wien & Elmelund-Præstekær, 2009), which mainly differ from news waves through mechanisms of self-reinforcement: “During a media-hype, the sharp rise in news stories is the result of making news, instead of reporting news events, and covering media-triggered social responses, instead of reporting developments that would have taken place without media interference” (Vasterman, 2005, p. 515). Describing issue cycles as media hypes does not necessarily imply that the attention is exaggerated or distorted (Vasterman, 2005, p. 512); rather, it is the existence of key trigger events that turn news waves into media hypes (Vasterman, 2005, p. 516). In the case of terrorist events, there is usually a key event followed by news coverage. Therefore, the issue cycles triggered by terrorist events are not only news waves involving sharply increasing attention but also media hypes displacing other topics.
Since Downs’s introduction of the issue-attention cycle, the model has been elaborated and applied many times (e.g., Brossard et al., 2004; Daw et al., 2013; Loercher & Neverla, 2015; McComas & Shanahan, 1999; Shih et al., 2008; Waldherr, 2012), leading to the discovery that his ideal–typical description of an issue cycle is only one of numerous possible developments (Waldherr, 2012, p. 20). Regarding the development of an issue with intense news coverage, Wien and Elmelund-Præstekær (2009) discovered that multiple waves of attention can occur, after which the follow-up waves decrease in intensity. The authors explained that these dynamics of media attention occur when journalists use different sources as time goes by, which can result in subsequent lesser news coverage (Wien & Elmelund-Præstekær, 2009, pp. 196–197). Brossard et al. (2004) found an effect of culture on the pattern of issue cycles, and Loercher and Neverla (2015) affirmed that the course of issues is shaped by the specifics of the arena in which they are discussed. In her overview of important variables, Waldherr (2012, pp. 23–29) differentiated between events, topics, actors, and constellations among actors. While these groups together potentially influencing factors, the concrete triggers that shift an issue toward public concern always depend on the specific circumstances of the issue cycles.
Previous studies show that the patterns of issue cycles vary profoundly and depend on the object of investigation and the time span considered. The thematic field in which issue cycles have been investigated to date is broad, ranging from news coverage of environmental issues (e.g., Downs, 1972; Kolb, 2005; McComas & Shanahan, 1999) and medical and health issues (e.g., Daw et al., 2013; Shih et al., 2008) to war (Waldherr, 2012). Regarding war coverage, depending on the duration (short/long) and level of astonishment (expected/unexpected), different forms of issue cycles are theoretically possible and empirically provable (Miltner & Waldherr, 2013, p. 275). In the case of an unexpected war, news coverage typically arises suddenly, whereas longer wars could involve several peaks of news coverage (Miltner & Waldherr, 2013, p. 276). By contrast, terrorist attacks usually occur without warning and are of short duration: In such events, the relationship between terror and the media has been confirmed on a general level (e.g., Rohner & Frey, 2007, p. 139). What remains unclear is the shape of issue cycles, given the special circumstances of terrorist events. Thus, deriving from the issue cycle of short, unexpected wars, the following hypothesis can be formulated:
Usually, issue cycles are investigated in two stages: First, the issue is identified and typically aggregated at the year or month level; second, the issue cycle is divided into different stages, which are then explored in depth (Kolb, 2005, p. 168; Waldherr, 2014, p. 854). Studies along these lines usually explore the dynamics of issues descriptively in one or a few case studies and examine separate influential factors or cyclical patterns. In this study, a modeling approach has been chosen to enable exploration of the general scheme for the dynamics of media attention regarding terrorist attacks.
News Values as an Underlying Cause of the Emergence of Issue Cycles
The emergence of issue cycles is explained in news value theory (Waldherr, 2012, p. 20). According to different news value approaches, the selection, scope, and placement of the reporting of an event are determined by its characteristics. Whether an event receives, media attention is therefore affected by different news factors. A catalog of 12 news factors was first introduced by Galtung and Ruge (1965, pp. 70–71), who identified, among others, frequency, threshold, meaningfulness, unexpectedness, relevance, and reference to something negative as crucial criteria for news selection. According to Galtung and Ruge (1965, p. 71), the news value of an event also increases if multiple news factors coincide within it. This basic concept has been frequently elaborated since then, both with regard to the order and importance of different factors and epistemologically (e.g., Eilders, 1997; Schulz, 1990; Staab, 1990).
Terrorist attacks activate multiple news factors: “Terrorism is violence, or the threat of violence, calculated to create an atmosphere of fear and alarm…This violence or threat of violence is generally directed against civilian targets” (Gardela & Hoffman, 1991, p. 1). Some definitions of terrorism explicitly exclude state terrorism (e.g., Waldmann, 2011, p. 14), but this perspective is restricted to constellations where the political system is settled, which is not always the case for African countries (Frère, 2007, p. 2). Furthermore, the common notion of “one man’s terrorist is another man’s freedom fighter” highlights that what is seen as terrorism is a question of political and theoretical perspectives (for a critical discussion, see Barnett & Brooke, 2009, pp. 21–23; Ganor, 2002). Therefore, in this study, every form of organized violence against civilians is understood as terrorism. Terrorist events are characterized as disruptive events that result in unexpectedly grave damage, often with casualties. Damage and unexpectedness count as empirically confirmed news factors. Negative events such as terrorist attacks are therefore likely to be considered particularly newsworthy by the media (Galtung & Ruge, 1965, pp. 69–70; Robertz & Kahr, 2016, p. 18; Schulz, 1990, pp. 81–117). Hence, according to news value theory in connection with terrorist attacks, the following can be assumed:
Both hypotheses comply with common models in news research. Plausible hypotheses are important because they serve as a theoretical touchstone for automated methods. Although the hypotheses are adapted to the specific case of news about terrorism, they stand for two basic principles when modeling issue cycles. On the one hand, the analysis of issue cycles means describing the shape of aggregated attention over time. It can be assumed from previous research that different issues generate different shapes. On the other hand, the characteristics of anchor events need to be considered since they have consequences for the amount of attention independently of time. The following section outlines how both principles were empirically implemented.
Methods and Materials
Our analysis is based on the coverage of terrorist events in Africa in 2016. The hypotheses relate terrorist incidents and their severity to the amount of reporting. Data about terrorist events are taken from the Geocoded Events Dataset (GED) of the UCDP. Statistics for the amount of reporting are gathered from the GDELT. Thus, we rely on secondary data that were collected and prepared by automated methods. Our measurement procedure is evaluated by testing the recall and precision of the automated aggregation procedure. Finally, the relationship between the derived measures is modeled by a distributed lag model, which not only quantifies the correlation but also describes the time-dependent shape of the relationship. Each step of the analysis is detailed in the following sections.
Data Collection: Events, News, and Fatalities
The well-established UCDP GED was used as the gold standard for the operationalization of terroristic events and deaths due to terrorism. Events are manually researched, coded, and made available for research under the UCDP (Sundberg & Melander, 2013, p. 525). Organized violence directed at civilians is explicitly recorded in the UCDP GED as one-sided violence. 1 One-sided violence includes events usually considered to be terrorism (Eck & Hultman, 2007, pp. 235–236) or state violence against civilians (e.g., genocide; Eck & Hultman, 2007, p. 235). The reference to one-sided violence does not capture the intention of spreading fear, which is a defining element of terrorism. Since intentions cannot be determined by content analysis, we accept the blurring of our operationalization.
For the period under study and the selected geographical region, the database contains 502 events of one-sided violence spread over 19 countries on 219 of 366 days (see Online Appendix Table A1). According to the estimates 2 given, the events resulted in a total of 2,525 deaths. The data were prepared as a time series for further analysis by separately calculating the number of deaths for each day in each country. We excluded 36 events with 183 fatalities where the date could not be clearly determined at least within a time window of 1 week.
The analysis of the news coverage is based on the GDELT. According to their self-description, the project “monitors the world’s broadcast, print, and web news from nearly every corner of every country in over 100 languages” (GDELT, 2018a). The text sources include news agencies such as Agence France Presse, Associated Press (AP), BBC Monitoring, AfricaNews, and Xinhua (Leetaru & Schrodt, 2013, pp. 2–3). The content goes back to 1979 but is much more extensive for the most recent years.
News reports are automatically recorded at 15-min intervals and go through a multistage preparation and analysis process (Leetaru & Schrodt, 2013, pp. 15–22). One of the core technologies used is the Tabari system, which automatically identifies phrase structures and extracts events. All events are geocoded and automatically coded with categories from the Conflict and Mediation Event Observations: Event and Actor Codebook (CAMEO; see Schrodt, 2012). At the time of the data collection (January 2018), the event data set contained over 300 million records and 61 variables, and these events were mentioned almost 1 billion times in different articles. Topics, tonality, and many other variables are automatically coded for each article and recorded in a data set called the Global Knowledge Graph (GKG). For example, the number of fatalities and other counts are coded. The GKG data set contained more than 680 million entries with 27 columns, each containing additional subcolumns.
The initial challenge in dealing with this database was to cope with the data volume. We used Google BigQuery to access the database. Using Structured Query Language, a subset of all records related to African countries in 2016 was transferred to the Google Cloud storage infrastructure and downloaded using Facepager (Jünger & Keyling, 2018). The results comprised over 5,000 comma-separated values (CSV) files totaling more than 500 GB.
Additional processing was carried out with parallelized R-scripts. Only mentions of events that were extracted with a high degree of certainty were taken into account. For this purpose, GDELT provides a parameter that indicates the confidence of the algorithm in a range of 10%–100%. Following the interpretation of common reliability values, we considered only records with a confidence parameter greater than 80%. In addition, the only events considered were those mentioned at least once in the first paragraph of an article, which could thus almost certainly be considered main events. This resulted in a total of 4,783,327 news reports about events in Africa in 2016. The reports originated from 27,616 different sources, such as international news aggregators (e.g., yahoo.com), news agencies (e.g., Reuters and AP), or national newspapers (e.g., Al-Wasat from Libya; see Online Appendix Table A2). Most sources are African and report on several countries, such as Albawabh News (based in Egypt), AllAfrica (based in South Africa and other countries), or Modern Ghana (based in Ghana); indeed, 120 sources cover over 50 countries. The major languages are Arabic, French, and English. The diversity of the media landscape in Africa seems to be mirrored in the sample and is complemented by international news providers.
In a second step, the data set was tailored to contain only reporting about concrete terrorist events. First, from the 260 topic categories listed in the GKG, those related to violence (e.g., suicide attack, terror, and killing) were selected. Second, only articles in which the GDELT algorithms identified a count of dead or wounded were retained. Third, only events classified as one-sided violence were considered (assault in the CAMEO scheme). This resulted in 33,783 mentions of terrorist events in 55 African countries. All data were prepared as time series by calculating the total coverage and the amount of terror reporting for each day and country.
Data Quality: Data Analysis as a Classification Problem
The main part of the analysis depends on data that were collected and prepared by machines. In this process, a lot of potentially faulty processing steps are performed. Since the resulting database represents only a small fraction of the world, the process is especially prone to selection errors. Not every report in the world is covered by the GDELT, and an even smaller portion of the database—only two parts per million (ppm)—was used in our analysis (see Online Appendix Figure A1). The same principle applies to terrorist events: It is likely that not every conflict event is covered by the UCDP because events near capital cities, for example, may attract more attention (Hammond & Weidmann, 2014, p. 2; Kalyvas, 2004). Moreover, terrorism is often defined as acts that intentionally spread fear. On the basis of content analysis, we can only speculate about such intentions and must rely on the indicators provided by the databases.
Previous validation efforts have investigated the GDELT mainly from a political science perspective or have compared the database with other data sources (Arva et al., 2013; Hammond & Weidmann, 2014; Kwak & An, 2016; Ward et al., 2013; Yonamine, 2013). Overall, these studies point to a rather “high level of noise” (Hammond & Weidmann, 2014, p. 5). The question now arises as to whether this noise restricts meaningful findings in the field of communication science. We conceived of the entire process of data preparation as a classification problem and investigated these potential error sources:
Both databases (GDELT and UCDP GED) contain rudimentary information on the sources of the data. The UCDP GED contains short quotations for each event from the news reports from which the information was taken, for example: “Agence France Presse, 2016-01-29, At least 10 dead in NE Nigeria suicide bombing: witnesses.” This often allows at least the rough context of action to be understood. In the GDELT, many reports are based on web pages; in this case, the URLs are contained in the data set.
To assess the precision of our analysis, a random sample of 120 news reports was drawn after the inaccessible websites were automatically excluded. French and Arabic pages were translated into German with Google Translate, and we checked whether an appropriate event could be found in the article. We coded whether the country, article date, event date, and event content were consistent between the GDELT and the source. Interestingly, the decision regarding the event date was relatively difficult and showed only moderate intercoder reliability (Holsti = 0.71, κ = 0.37, Gwet’s agreement coefficient [AC]1 = 0.83; n = 38 articles). This was because a specific event date is rarely stated in an article and the plausibility has to be deduced from the context. The other variables had very good reliability values (country: Holsti = 0.89, κ = 0.75, Gwet’s AC1 = 0.87; article date: Holsti = 1.0, κ = 1.0, Gwet’s AC1 = 1.0; n = 38 articles). Of particular interest is the evaluation of the content of the event, that is, whether an event not only fits with regard to country and date but also concerns terrorist violence. The reliability of this variable was very good (Holsti = 0.83, κ = 0.73, Gwet’s AC1 = 0.83; n = 38 articles).
To investigate recall, 120 events were randomly selected from the UCDP GED. All articles about terrorist events contained in the GDELT matching the date, country, and topic were reviewed to see whether the event could be found in them. Websites that were no longer available were automatically excluded. In addition, the only articles checked were those in which events were identified by the algorithms provided by the GDELT as having a certainty of 100% (confidence parameter, see above). This procedure yielded 484 articles that were subjected to a manual content analysis. For this purpose, the pages were translated into German with Google Translate and coded according to whether the event could be reliably, possibly, or not reliably identified. In addition, we recorded when pages could not be checked, for example, because the translation was incomprehensible. The identification of events is patently difficult: There may be various versions of a story that differ significantly in terms of numbers of victims, actors, or even the date, despite referring to the same event. It can be assumed that the conflict parties, in particular, tell different stories of an event. The coding based on translated articles further aggravates the situation. In addition, article descriptions in the UCDP GED often contain only a few keywords. An examination of the codebook on the basis of 44 articles nevertheless resulted in high intercoder reliability (Holsti = 0.95, κ = 0.78, Gwet’s AC1 = 0.94).
Data Analysis: The Distributed Lag Model
The hypotheses refer to two types of events, news events and death events; therefore, a time-series analysis is required. Accordingly, the data encompass information on fatalities and news over a period of 366 days for 58 countries. The choice of modeling strategy depends on what kind of data-generating processes (Pickup, 2015, pp. 5–8) are assumed. 3 When modeling issue cycles, the shape of changes in the outcome variable is of particular interest. The shape of a relationship between time series can be predicted by distributed lag models (Almon, 1965; Gasparrini, 2011; Gasparrini et al., 2010).
Distributed lag models have apparently been rarely used in communication science, although this form of modeling is reasonable for the analysis of issue cycles. The method was developed in the context of economic studies and is now increasingly used in medical studies (Almon, 1965; Gasparrini, 2011, p. 2; Simons et al., 2016, p. 2). In this study, unlike in medical studies, death is the cause and not the consequence. As in every time series analysis, the basic idea is to take into account not only the influence of daily data but also previous time points (i.e., delayed data). Such a time series can be viewed from two different perspectives: We can say that a specific exposure event produces effects on multiple future outcomes, or alternatively that a specific outcome is explained in terms of contributions by multiple exposure events in the past. The concept of lag can then be used to describe the relationship either forward (from a fixed exposure to future outcomes) or backward in time (from a fixed outcome to past exposures). (Gasparrini, 2011, p. 2)
In principle, separate variables with the number of deaths could be introduced into the regression model for each preceding day (a shifted time series) in order to take past time into account. However, this violates a fundamental assumption of regression analysis, namely, that the observations should be independent of one another. As a result, the parameter estimates can no longer be reliably interpreted. Distributed lag models solve the problem by not introducing the delayed data themselves into the model but rather the parameters of a function that models the effect of previous days (Almon, 1965, p. 179). This step models the form of a function that directly reflects the shape of issue cycles. By predicting the result of the function for selected conditions, the function can be visualized. As a result, the development of issue cycles can be read directly off the visualization.
A disadvantage of this method is that, in the course of the analysis, different assumptions about the modeling of the lag function are possible; thus, model selection becomes necessary (Box-Steffensmeier et al., 2014, p. 87). Therefore, the analysis is partly explorative. To date, we have been short of experience in modeling issue cycles with distributed lag models that would make it possible to fall back on rules of thumb for this explorative step. We decided on a polynomial function with three degrees. This allowed the capture of a large number of shapes, including constant thematization, linear increase or decrease, or up to two local minima or maxima.
One of the critical points of distributed lag models is choosing the lag length. Although studies of communication processes have a long tradition, especially in agenda-setting research, no systematic investigation of lag lengths and aggregation decisions has yet been conducted (Kohler, 2019, p. 17; Maurer, 2010, p. 29). Agenda-setting studies investigate the influence of media coverage on the salience and importance of issues on the public agenda. In general, agenda-setting effects last about 2 weeks (Gehrau, 2014, p. 265), but older studies recommend much longer optimal lag lengths from 5 to 7 weeks (Salwen, 1988, p. 106). The question of the lag between the media agenda and the public agenda was intensively explored in the 1980s and 1990s. Nevertheless, the findings and recommendations for choosing lag lengths are highly variable, ranging from a few weeks to several months (Kohler, 2019, p. 84).
The optimal time lags depend on the type of media; for example, television has a shorter optimal time lag than newspapers (Wanta & Hu, 1994, p. 235). In the case of online media, agenda-setting effects are analyzed on time frames of 7 days (Guggenheim et al., 2015, p. 214). For a combination of different media, Wanta and Hu (1994) recommended a range between 1 and 4 weeks (p. 234). Many of these studies investigate general issues such as environmental issues or policy making. Studies that describe the course of news coverage on these general issues even aggregate to months (e.g., on climate change; Daw et al., 2013, p. 69) or years (e.g., on prescription drug financing; McComas & Shanahan, 1999; Nisbet & Huge, 2006). More sensitive topics, for example, in Germany the field of right wing–motivated violence, produce shorter lags of a few days (Krause & Fretwurst, 2007, p. 194).
Another strand of research related to news processes does not correlate different agendas but rather identifies key events by suddenly increased coverage of specific topics in the stream of news. A week-based analysis of cumulative article counts makes sense in view of the weekly news production cycle (Boydstun et al., 2014, p. 519), but shorter sampling intervals of, for example, 3 days are also found (Geiß, 2011, p. 272). When considering the time correlation between nonmedia events and news coverage, the research situation is even sparser. For long-lasting events such as wars, the length of coverage depends on the length of the event (Miltner & Waldherr, 2013, p. 282). Coverage of more short-term, right-wing extremist violence in Germany lasts a mere few days (Gehrau, 2014, p. 254). More general studies conclude that issue cycles start with a trigger event and last approximately 3 (Wien & Elmelund-Præstekær, 2009, p. 183) or 4 weeks (Vasterman, 2005, p. 524).
Because robust guidelines from theory or the state of the art do not exist, we determined the lag length empirically. We followed common practice in model selection, choosing the model with the lowest value of the Akaike information criterion. Therefore, we estimated the model for maximum lag lengths between 7 (1 week) and 63 days (9 weeks) and evaluated the robustness of the model (Online Appendix Figures A3 and A4). The range reflects common time frames in communication research as discussed above. This resulted in an optimal lag length of 44 days.
Distributed lag models can implement a nonlinear influence between dependent and independent variables. For example, the response to a single death may be stronger than the response to another death that follows 100 deaths. On the other hand, 100 deaths at a single event definitely constitute an extreme occurrence and could lead to even more extreme consequences. We decided on a polynomial with two degrees for the relationship between fatalities and news coverage. This function covers constant or linear relationships, as we assumed in our second hypothesis, and u-shapes (i.e., a ceiling effect). In addition, we did not perform ordinary least squares (OLS) regression because we were working with count data. Following previous literature, we estimated a generalized linear model with Poisson-distributed error terms and a log link-function (Fox, 2008, p. 387), by which the rate of change is captured, abstracting from the specific circumstances of singular cases.
The model tests both hypotheses in one step and thus simultaneously takes into account the course of time and event characteristics. As the response variable, we included in the model the total extent of the reporting about terrorism on one day in a country. The number of fatalities due to terrorism was regarded as the most important independent variable. This variable represented both the event and the seriousness of the damage. In line with news values theory and to keep the model parsimonious, violence was taken as an input only. Since we assumed that the extent of reporting also depends on country-specific factors, the total amount of reporting in a country was included in the model as a control variable. Reasons for the varying degrees of overall reporting may include international thematization effects and news factors such as concentration on elite nations and the respective media landscape of a country. Despite the overall reporting, we did not control for autoregression of reporting about violence for two reasons. First, when investigating issue cycles, the aggregated amount of news following trigger events is of interest, no matter whether self-reinforcing processes are at play. Second, the parsimony of the model offers a better means of demonstrating how distributed lag models can be exploited in issue cycles research, which is the goal of the study.
Results
Quality of the Automated Analysis Pipeline
Our analysis of issue cycles was based on a pipeline of automated processing steps. The news reports were automatically scraped and underwent automated content analysis provided by the GDELT. The whole pipeline of news processing was carried out without intellectual coding, as not a single text was read by humans. To assess the validity of our results, we opened the black box and checked a subsample of the final records.
In fact, the data quality is quite low (see Online Appendix Figure A2). Aside from the fact that many articles are no longer available or cannot be verified, owing to the incomprehensibility of the texts, only 57% of the articles examined clearly identify events of one-sided violence at the given location and date (precision = 57%; 95% CI [0.47, 0.67]). In the main, it is the combination of the different data for an event that is flawed. Country (85%), event date (77%), article date (95%), and the topic of violence (97%) on their own are each plausible most of the time and reach the level of acceptable reliability scores.
The situation is similar with regard to events. Nearly one third (31%) of the events in the UCDP GED cannot be found in the GDELT by our procedure. For a further 18%, the data are too vague to make an identification. Only about half of the events recorded in the UCDP GED can be clearly found in the GDELT (recall = 51%; 95% CI [0.41, 0.60]).
Issue Cycles in Terror News
Besides testing precision and recall, the hypotheses are a touchstone for automated methods, which entail a multitude of potentially faulty processing steps. Surprisingly, the data quality seems to be good enough for selection errors and inaccuracies to average out on an aggregated level. The analysis is based on data for 19 countries in Africa in which at least one death due to one-sided violence occurred in 2016. Summing up the days for all countries, fatalities happened on 353 days, which equate to 5% of all days. The average death count was 6.6 (SD = 11.5, min = 1, mdn = 2, max = 91); thus, on average, on each day of the year, 0.34 fatalities occurred in each country (SD = 3.0, min = 0, mdn = 0, max = 91). The daily news coverage in every country, on average, comprised 441.8 reports (SD = 814.5, min = 0, mdn = 158, max = 14,651). Out of these, on average, 4.1 news reports (SD = 19.2, min = 0, mdn = 0, max = 511) related to terrorist events. Overall, news about terrorist events in the analyzed countries accounted for approximately 9‰. Given the skewed distributions of fatalities and news coverage, the typical day is better described by the median value, which was zero for death due to terrorism and news about these fatalities. Therefore, on a typical day in an African country, no terrorism and no news on terrorism occurred, while news about 158 other events was reported.
If the time series on fatalities and news coverage are compared, the development of death rates significantly correlates with terrorist reporting (r = .58, p < .001; n = 337; running 30-day averages) and is much more in parallel with terrorist reporting than with overall reporting (see Figure 1). A comparison of the total numbers at the country level also indicates a significant correlation (see Figure 2; r = .60, p < .001; n = 58). The situation in the northern and southern countries of Africa was more peaceful than in the middle of the continent, in terms of both fatalities and reporting.

Time series of news reports and fatalities. Note. Running 30-day averages of total reporting (top; n = 4,783,327), reporting about terrorism (middle; n = 33,783), and fatalities resulting from terrorism (bottom; n = 2,342). Basis: 366 days in 2016 for 58 countries in Africa. Data sources: Global Database of Events, Language, and Tone; Geocoded Events Dataset of the Uppsala Conflict Data Program.

Number of deaths and reports related to terrorist violence. Note. Number of deaths (left; n = 2,342) and news reports (right; n = 33,577) related to terrorist violence in 2016 in Africa. Basis: 51 countries (excluding small islands). Data sources: Global Database of Events, Language, and Tone; Geocoded Events Dataset of the Uppsala Conflict Data Program.
The distributed lag model confirms the relationship between the two constructs and, in particular, the issue-cycle hypotheses. The explained variance 4 of the entire model is moderate with a proportion of explained deviation D2 of 0.20. All coefficients and the deviation decrease are highly significant compared to the null model (see Table 1). The raw coefficients are hardly interpretable, owing to the lag function and the link function of the generalized linear model. However, the model can be easily visualized by predicting the outcome for fixed conditions—in this case, either the number of deaths or the number of days. Thus, the model allows both hypotheses to be considered and visualized at the same time.
Distributed Lag Model Parameters.
Note. Parameters of the distributed lag model: lag = 44 days; dependent variable = terror news. Terms cb_v1_l1 to cb_v2_l4 are the log-transformed coefficients of the lag function (basis l1–l4 for intercept + three degrees) crossed with the fatalities function (basis v1 to v2 for two degrees). The parameter totalnews was centered at the mean; Poisson-distributed errors and log link-function; basis: N = 6,954 country days; only countries with at least one fatality (366 days × 19 countries). Cross-basis variables were calculated in R 3.6 using the cross-basis function of the package dlnm. See Gasparrini et al. (2010, p. 2227) for an explanation of the concept. AIC = Akaike information criterion; df = degrees of freedom.
Focusing on the day on which the events occur, the reporting about terrorism increases by 7% given an increase of one fatality, which corresponds to an incidence rate ratio of 1.07 (see Figure 3, left). The increase gets larger with additional fatalities and reaches a maximum ratio of 4.69 for 45 deaths, that is, news coverage explodes to nearly 500% for such extreme situations. Our assumption that the number of news reports increases with the number of deaths (H2) is confirmed for this interval.
However, when even more fatalities occur, the incidence rate ratio decreases again, which indicates a ceiling effect. The interpretation of the model becomes problematic in the upper region of fatalities because it suggests a decrease in reporting for extreme events with over 90 fatalities. Since the limit of the data is reached here, the model should be interpreted with caution in this area. The average number of deaths amounted to 6.6 and thus was much lower. The highest number of deaths on a single day in a single country was 91 on April 8, 2016, in Sudan. Further studies are necessary to understand how to best model the relationships in these extreme regions. For now, we assume that the death count and the amount of reporting are related; however, predictability is limited in exceptional cases.
Fixing the number of fatalities at 10, which is in the vicinity of typical events, improves the shape of the issue cycle. Accordingly, the extent of terror news increases by a factor of 1.8 and falls back to normal over the next 12 days (see Figure 3, right); this result is in line with our H1. In addition, the model shows a second local maximum of reporting about terrorism after 38 days (incidence rate ratio is 1.3 for 10 fatalities).

Distributed lag model. Note. incident rate ratio = change in reporting (i.e., number of reports) about terrorism. Left: fixed lag of 0 days, right: fixed number of 10 fatalities. Green = zero change. Gray = 95% confidence interval. Basis: 6,954 = 366 days in 2016 × 19 African countries with at least one victim killed by terrorism. Data sources: Global Database of Events, Language, and Tone; Geocoded Events Dataset of the Uppsala Conflict Data Program.
While the general shape of the model is in line with our hypotheses, the estimates depend on the selected maximum lag. The longer the period under consideration, the more the first local minimum and the second local maximum shift forward. At the same time, the whole curve flattens out (Online Appendix Figure A4). Considering maximum lag lengths between 1 and 9 weeks, the incidence rate ratio for a fixed number of 10 fatalities ranges between 1.7 and 2.6 at the beginning and returns to normal within 1–3 weeks.
Discussion
The GDELT offers an impressive scope and sophisticated automated procedures. At the same time, the scope is a challenge. The preparation of the data goes hand in hand with several selection steps. Starting from events in the social world, probably only a fraction is covered by the GDELT, although the proportion can hardly be quantified. We further narrowed down the data set and used about 94 ppm (1 ppm = 0.0001%) of all records related to 2016 to arrive at substantial propositions about issue cycles. 5
The procedure entails a lot of potential sources of errors and thus evokes uncertainty about valid conclusions. In fact, data quality is poor regarding precision and recall. About half of the data is wrongly classified. While the automated methods yield good decisions for single variables such as date, country, or content of an event, their combination includes the combination of their respective error rates. Even for a simple conception of events with only three criteria (date, country, and content), the probability of errors easily reaches problematic levels. For our analysis, the correct localization of events in particular was prone to errors, while the content was adequately detected. This indicates a fundamental difficulty of automated procedures. While pattern recognition can be well trained for specific characteristics in given domains, a holistic classification is much more difficult to achieve.
Another limitation causing uncertainty derives from the nature of secondary data. The operationalization of concepts is limited to indicators contained in the database. For example, with regard to terrorist events, the aspect of spreading fear cannot be captured. When combining different databases, the indicators need to be comparable as well. In the field of terrorist events in Africa, using the UCDP criterion for one-sided violence in combination with the CAMEO category of assault makes sense, but alternative approaches, for example, differentiating between state and nonstate organizations, still have to be tested. All in all, although the data are already available, automated methods are still time-consuming. It is necessary to deal extensively with the details of the databases to know how to reduce and prepare the data for the particular application. The handling of automated procedures consists essentially in coping with uncertainty resulting from the data-generating processes.
Nonetheless, the general prevailing state of research on news value theory and, more specifically, the news factors of damage and unexpectedness were reconfirmed through our analysis. With regard to the number of deaths and the extent of violence, the first hypothesis about a sudden rise in news coverage can be clearly confirmed. This corresponds to the findings of previous studies on the relationship between terrorism and the media (Brosius & Weimann, 1991). The second hypothesis about an increase of news coverage in proportion to the number of fatalities is confirmed in principle as well but has to be differentiated for different conditions. Concerning the number of fatalities, a ceiling effect becomes apparent for extreme events with more than about 40 fatalities. Such events presumably cross the boundary between one or multiple individual fatalities and mass killings when it comes to the amount of news coverage. Whether and how the content and style of news changes is also a question requiring further study.
With regard to more common situations in which there are about 10 fatalities, the issue cycle starts with an increase in news coverage about terrorism of 84%. The coverage returns to normal in the following 2 weeks. About 5 weeks later, a follow-up phase is indicated by the model. This second rise contrasts with previous studies. Multiple peaks are assumed for longer wars (Miltner & Waldherr, 2013, p. 275) but not for unexpected events such as terrorist attacks. While media hypes in general can comprise multiple waves, these multiple waves were observed over a shorter period in former studies (Wien & Elmelund-Præstekær, 2009, p. 196). Reasons for the follow-up phase are easy to imagine. After disruptive events, it takes some time for political authorities to manage the consequences. For example, we found reports in our evaluation sample about arresting people (i.e., follow-up events). Nevertheless, the time span under consideration matters and, therefore, statistical factors may be at play as well. The variability of the estimates indicates that the model provided is just a starting point for further analyses of shapes and of specific countries.
Thus, the use of the databases and a pipeline consisting of automated methods for data collection and analysis prove appropriate in news research. This is surprising given the high error rates (precision and recall). Despite the large number of faulty selection steps and the resulting noise of the data, using the GDELT can lead to meaningful and even puzzling findings. We assumed that terrorist events would be immediately followed by an increase in news coverage. We further assumed that the severity of the event—in terms of fatalities—would be related to news coverage. The two hypotheses implement basic principles of issue cycles by describing attention over time and the characteristics of anchor events. Using a distributed lag model, these principles have been proven in such a way as to lead to a visualization of the issue cycle. Instead of relying on single issue cycles, this study has shown how to illuminate the general pattern and how to investigate the basic principles of issue cycles.
Conclusion
We proposed to model the course of issue dynamics on an abstract level while at the same time taking into account the characteristics of concrete events. Consequently, the model represents an averaged view, which can be used for comparisons such as between countries. The database used permits scaling up to enable comparison of the continents of the world and longer time spans, aiming at even more general models. Thus, the distributed lag model in combination with large databases overcomes the limitation of other studies, which often rest on the interpretation of singular events.
However, while lag models are common in medical or economic studies, the details are not yet well understood for issue cycles. Hence, the analysis strategy needs to be verified for longer periods of time, other regions, and other topics. For example, the consequences of deciding on specific function shapes or the role of significance tests in large-n studies have to be further investigated. Therefore, we suggest further studies to explore different modeling strategies and data using the replication data and scripts. Such studies should also examine qualitatively the way in which such aggregated cycles are encountered in individual countries and which concrete circumstances influence their manifestation.
Supplemental Material
Supplemental Material, Online_Supp._Material - Distilling Issue Cycles From Large Databases: A Time-Series Analysis of Terrorism and Media in Africa
Supplemental Material, Online_Supp._Material for Distilling Issue Cycles From Large Databases: A Time-Series Analysis of Terrorism and Media in Africa by Jakob Jünger and Chantal Gärtner in Social Science Computer Review
Footnotes
Data Availability
Replication data and scripts are available from the Open Science Framework repository: https://osf.io/qvrew. Nonaggregated data are publicly available at the project pages of GDELT and UCDP (see https://www.gdeltproject.org/data.html and
).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Software Information
Analysis was conducted in R 3.6 (R Core Team, 2018) using the package dlnm (Gasparrini, 2011). Figures were produced using the package ggplot2 (Wickham, 2009). Data were downloaded using Facepager (Jünger & Keyling, 2018).
Supplemental Material
The supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
