Abstract
This paper focuses on forecasting Military Action-type events by both state and non-state actors. Here we demonstrate that the dynamics of these types of events can be adequately described by a Hidden Markov Model (HMM) where the hidden states correspond to different operational regimes of an actor, and observations correspond to event frequency—and the HMM effectively predicts events with different lead times. We also demonstrate that one can enrich statistical time series-based methods that work only on historical data by exploiting predictive signals in real-time external data streams. We demonstrate the superior predictive power of the proposed models with evaluation of recent data capturing activities over two groups, ISIS and the Syrian Arab Military, two countries, Syria and Iraq, and two cities, Aleppo and Mosul. We also present an approach to converting predictions of the proposed models to real-world warnings.
1. Introduction
There has been significant recent interest in modeling and predicting violent events, such as Military Action by state actors, or terrorist attacks by non-state actors, 1 collectively referred to as MANSA events. There are certain characteristics of MANSA event dynamics that make them particularly challenging to model. For instance, it is well established that the dynamics of terrorist attacks have distinctly non-Poissonian characteristics. 2 In particular, the inter-event duration distribution (which is exponential for the Poisson process) has been shown to be heavy tailed and bursty for a number of different event types. Thus, we need different mechanisms for adequately reproducing and predicting activity patterns with highly non-Poissonian statistics.
In this paper we study the problem of forecasting violent events using historical and external open source data (e.g., news articles, blogs, and tweets), given that the historical data may not be up-to-date. Specifically, we focus on predicting violent events in the Middle East and North Africa (MENA) region over a year from 1 August 2016 to 30 September 2017. For evaluation we use manually curated violent events with rich features—such as actor, target, time, and location—as well as news articles data collected over the MENA region by Arabia Inform. 3
It has been previously demonstrated that the bursty dynamics of terrorist activity can be well-captured by an appropriately designed d-state Hidden Markov Model (HMM), where a hidden state characterizes a specific operational mode of an organization.
2
The simplest setting of
Another important challenge for developing high-fidelity models for MANSA events is the availability of reliable, up-to-date historical data for generating real-time predictions. Indeed, recent studies, such as Raghavan et al.
2
and Porter et al.,
4
try to predict the number of terrorist attacks at time

Overview of event forecasting without recent historical data. The proposed model takes historical data and indicators from external sources as inputs, and the model makes forecasts without recent historical data. GSR: gold standard report.
Here we address this shortcoming of existing models by proposing to use additional (surrogate) data sources to compensate for the lack of most recent event data. In particular, we focus on a scenario where in addition to historical event counts, we also have a time-stamped set of documents that contains potentially relevant information about events. Our results indicate that the signals extracted from streaming news sources can indeed lead to more accurate forecasts.
The rest of the paper is organized as follows: Section 2 discusses relevant research on event forecasting and Section 3 presents models that we exploit for forecasting MANSA events. Finally, we present an evaluation of our models in Section 4 and discuss our findings in Section 5.
2. Related work
There has been a significant interest in modeling the activities of terrorist groups.1,5,6 Enders and Sandler7,8 proposed a threshold autoregressive (TAR) model to study both short- and long-run spurts in terrorist activities. Dugan et al.9,10 suggested group-based trajectory analysis techniques (Cox proportional hazards model or zero-inflated Poisson model) to identify regional terrorism trends with similar developmental paths. More recently, Porter et al. 4 suggested the two-component self-exciting hurdle model (SEHM) and Raghavan et al. 2 proposed a d-state HMM for describing the activity profile of terrorist groups.
Developing a precise model for the dynamic behavior of time series is a challenging problem and an essential one for the success of forecasting methods. Researchers have extensively studied and used time series analysis in many domains, such as finance, 11 epidemiology,12,13 geophysics, 14 and sociology. 15 A popular strategy for analyzing time series data is using classical autoregressive models, such as AR, ARMA, ARIMA, and ARIMAX.14,16,17 Autoregressive models are widely used in intrusion detection, detecting denial-of-service (DoS) attacks, and network monitoring. 18 These models assume that the underlying data-generating process is linear, that is, the value at a time point is a linear combination of the past values. However, real-world time series exhibit volatility and nonlinearity. A way to deal with the problem of volatility is to employ ARCH and GARCH, which are extensions of classical autoregressive models. 19
The generation of temporal features from text corpora for event forecasting is a diverse practice in the prediction of civil unrest,20–22 crime,23–25 political violence,1,26,27 and epidemics. 28 Using datasets of social media or news articles, domain-relevant information is typically extracted using expert-generated keywords as a starting point. Techniques that generate features from social media text using some form of supervised learning—keyword counting, manual document filtering, document classification, etc.—include work in spatio-temporal forecasting of civil unrest by Zhao et al.29,30 using keywords to filter relevant information from social media posts. In the same domain, Compton et al. 31 use keywords and geographical terms to filter Twitter posts, performing manual annotation on a small set of tweets in order to produce detailed forecasts of the demographic, spatial, and temporal information of civil unrest events.
Emphasizing the role news articles can play as precursors to particular events, Ning et al. 32 propose a nested, multi-task learning approach to discover news articles that have a high impact on future event outcomes—whether or not a protest event occurs in a certain city. In this model, documents are represented as bag-of-words or a similarly unsupervised method of representation.
Forecasting military events has gained attention in recent years, as datasets have become more available. Zammit-Mangion et al. 33 apply a point process model to conflict events from the Afghan War Diary. Yonamine 34 models military events in Afghanistan using the Autoregressive Fractionally Integrated Moving Average (ARFIMA) to predict time series of district-level event counts. For a comprehensive review of datasets and models for the prediction of political violence, we refer the reader to Schrodt et al. 1
3. Models
The intuition behind time series model is that when events are correlated in time, then given a sequence of events, one can learn patterns of past events that are useful for predicting future events. Time series prediction techniques use historical data about events (with optional surrogate data) to learn a model of the process that produced these events. The model can, in turn, be used to predict new events. In this section, we describe how we apply two types of models—the HMM and autoregressive models—to address the challenge of modeling events executed by military and non-state actors.
3.1. Hidden Markov Models
We first present the HMM-based approach for modeling terrorist activities. In our context, the key idea of the HMM is that the current number of events (e.g., terrorist activities) depends on the past history of events through K dominant hidden states, which represent different operational phases of the terrorist activities. For example, the hidden states of a two-state HMM correspond to “low-activity” and “high-activity” processes, as shown in Figure 2. The process transitions probabilistically between low-activity and high-activity states. While in a particular state, the process outputs some events according to a state-dependent probability distribution.

(a) Two-state Hidden Markov Model (HMM) for predicting terrorist activities. (b) Rolled-out HMM with hidden states and observations.
Let
3.1.1. Estimating HMM Parameters
The unknown parameters of the proposed HMM are
3.1.2. Predicting with the HMM
To predict the number of new events, we adopt a sliding window approach. We teach our model with data determined by a user-defined time window (e.g., four months), estimate the expected number of events for a gap period (e.g., one month), and forecast for the next one month. The expected number of events at time t given
where
3.2. Autoregressive models
We propose RARE—regularized autoregression with exogenous variables—for predicting terrorist activities. RARE is based on the ARX model—the autoregressive model with external variables 36 —and Lasso. 37 The key idea is to use penalized regression (e.g., Lasso) for selecting autoregressive terms as well as covariates. The model is robust to the absence of historical data and requires limited history for prediction.
Let
Here
For comparison, we also apply the widely used ARIMA model for forecasting events. ARIMA stands for autoregressive integrated moving average (MA). The key idea is that the number of current events (
Here is a constant,
We use maximum likelihood estimation for learning the parameters; more specifically, parameters are optimized with the LBFGS method.
38
These models assume that
We also compare our proposed methods against a base rate model, which predicts the number of future events as the average number of past events over a time window W. Formally:
3.3. Evaluation of time series models
We use three error measures for quantitative evaluation of our time series models: (a) mean absolute error (MAE); (b) root mean squared error (RMSE); and (c) mean absolute scaled error (MASE).
39
These measures are defined as follows in terms of forecasting error,
MAE:
RMSE:
MASE:
4. Experiments
We now present a case study for the proposed models using data on military and non-state actor events in the MENA region. Our goals are to answer the following questions.
Can the HMM capture latent structures in activities executed by various actors?
How do the proposed models perform with MANSA events at actor, country, and city levels?
Which external signals are good indicators for forecasting MANSA events?
How can we generate warnings given predicted event counts? How does the model perform in terms of quantitative evaluation of generated warnings?
4.1. Datasets
The ground truth information about MANSA events, called the gold standard report (GSR) is exclusively provided by the Center for Analytics at New Haven. The GSR is a manually created list of MANSA events by domain experts. Each event in the dataset has 22 different attributes: actor, actor status, approximate location, causalities, country, earliest reported date, encoding comment, event date, event id, event subtype, event type, first reported link, gold standard source link, latitude, longitude, news source, other links, revision date, state, target, target name, and target status. While much care had been taken to address the attribution and duplication problem in the manual event documentation step, we also remove any duplicates in preprocessing steps using the these attributes.
For evaluation we use ground truth time series of daily event counts based on manually extracted, structured reports on events, at actor, city, and country level (see Table 1). We use two actors—ISIS and the Syrian Arab Army, two countries—Syria and Iraq, and two cities—Aleppo and Mosul. In addition, we use surrogate data, which is generated from Arabic news articles originating from MENA countries.
Aggregates for countries and top-eight cities for MANSA events in the time period from August 2016 to October 2017.
In order to generate potentially predictive signals, we apply a temporal topic-based feature extraction approach to Arabia Inform news articles, 3 a corpus of news documents originating from MENA countries (see Figure 3), over a time span co-occurring with our GSR event time series. We consider the subset of the corpus that has at least one of our countries of interest (Iraq, Syria, Saudi Arabia, Lebanon, Yemen, Jordan) “tagged” as part of the meta-data provided from each document’s URL. As the corpus consists mostly of articles published in Egypt, and thus the majority of articles have “Egypt” as a tagged location, we exclude articles about places in Egypt. This largely Arabic corpus has approximately 20,000 documents per day, including a variety of topics spanning entertainment, politics, reporting articles, and general purpose news items.

Monthly aggregates of Arabia Inform news articles in our corpus, after filtering for documents that have one or more of the following countries “tagged” in the article meta-data: Lebanon, Jordan, Yemen, Saudi Arabia, Iraq, Syria.
4.1.1. Topic-based temporal feature generation
To learn latent shifts in the news corpus that possess information about our events of interest, we chose two topic modeling techniques: firstly, we train Latent Dirichlet Allocation (LDA) models with 100, 150, and 200 topics on the whole corpus and aggregate (see below for details) the posterior distributions of each topic over a given day’s documents. Secondly, we pre-train a LDA model on a set of 10,000 Arabic news articles—reporting MANSA events—which were used to generate the ground truth event dataset, and repeat the temporal feature generation on the entire Arabia Inform corpus, 40 inferring topic posterior distributions over the same corpus. We use the Mallet LDA package, 41 performing light stemming and stop-word removal as preprocessing. We found the Mallet LDA package to produce more coherent and consistent topics compared to the Corex 42 and Gensim LDA packages. 43 As the news articles in our dataset are predominantly in Arabic (90%), we perform light stemming, as Arabic is a highly inflecting language 44 and the development of a proper Arabic lemmatizer is still an active area of research. For our experiments, we did not achieve significant results with the first method (not pre-trained), and thus present only our findings with the pre-trained topic model.
In order to generate daily features given a trained topic model and a set of time-stamped documents, we denote
4.2. Structures in actors’ activities using a two-state Hidden Markov Model
We hypothesize that the number of activities performed by an actor might have hidden structures (e.g., high-activity and low-activity periods), which may not be well-captured using a simple counting process, such as the Poisson process. We employ the HMM for capturing hidden structures in activities by various actors in our dataset, such as ISIS, the Syrian Arab Military, the Iraqi Military, and the Russian Military. Figure 4 illustrates the result of the HMM for ISIS with various settings. Specifically, the datasets contain daily counts of terrorist events by ISIS in Iraq and Syria. We used the initial 80% of the data for training, using the Baum–Welch algorithm to estimate the model parameters. Here we report results using the Gaussian observation model, so that the total number of parameters is four: two transition probabilities, from L state to H and vice versa, and four parameters for the observation model (two for each hidden state).

(a) The two-state Hidden Markov Model (HMM) taught using ISIS data. (b) The predictive performance of the method, measured in mean squared error, for different observation models and different lead times.
Figure 4 depicts the model learned via the Baum–Welch method. We observe that both hidden states have significant inertia, for example, the actor is more likely to stay in the same hidden state than transition to a new one. Also, what is perhaps more important, is tht the rate of events (as characterized by the mean of the Gaussian model) differs significantly between the states: The average number of attacks per day is
Next, we focus on the task of reconstructing the hidden trajectory of the actor. Toward that goal, we run the Viterbi algorithm, which returns a single (maximum a posteriori) hidden state sequence that best explains the observed counts. Figure 5 shows the event count together with the reconstructed hidden dynamics. Remarkably, even this simple two-state model is able to capture the spurts in the activity.

Reconstructed hidden state sequence (dotted red lines) together with the observed count sequence (blue) for three different time windows. The trajectory given by the dotted red line switches between the high-activity (upper line) and low-activity (lower line) states. (Color online only.)
4.3. Predictive performance of HMM, ARIMA, and RARE models
The GSR represents the occurrence of an event on a given day at a specific location by a specific actor. As GSR is typically lagged (e.g., by a month), and thus it poses a challenge for the prediction algorithm. In our evaluation settings, we assume a gap of a month between the last day of the training period and the first day of the testing period. We keep the test period to be a month, as the GSR is updated each month. For the RARE model, we use topic-based temporal features as external signals (see Section 4.1). Before applying temporal features, we first align them using correlation analysis with the GSR: we determine the lag where the maximum correlation occurred between a temporal feature and the GSR, and use the lag for alignment. We tested the RARE model with 50 and 100 external features and
Figure 6(a) and Table 2 illustrate the models’ predictions and performance measures for ISIS activities over the month of January 2017, respectively. Here the models are trained with the data from 1 August 2016 to 30 November 2016, and we assume there is no GSR for the month of December 2016. Although the HMM performed better than the RARE model for this period in terms of performance measures, the RARE model captures the trends better than other models. Figure 6(b) and Table 3 present the models’ predictions and performance measures for Syrian Arab Military activities over the month of March 2017, respectively. Here we assume the absence of a GSR in February 2017, and the models are trained with the data from 1 August 2016 to 31 January 2017. The RARE model clearly outperforms the other models in terms of capturing the trends and performance measures.

Forecasting ISIS and Syrian Arab Military activities using the Hidden Markov Model (HMM), autoregressive integrated moving average (ARIMA) model, regularized autoregression with exogenous variables (RARE) model, and a base rate model: (a) models forecast ISIS activities over the period (January 2017) wherein the models are trained with the data from 1 August 2016 to 30 November 2016, and the month of December 2016 is considered as the gap period; (b) forecasting of Syrian Arab Military activities over the period (March 2017) wherein the models are trained with the data from 1 August 2016 to 31 January 2017, and the month of February 2017 is considered as the gap period. For both settings, the HMM with two hidden states and Gaussian emission probability is used, and the ARIMA and RARE models are identified using a grid search over parameters.
Forecasting of ISIS activities using the Hidden Markov Model (HMM) and autoregressive models (autoregressive integrated moving average (ARIMA) and regularized autoregression with exogenous variables (RARE)). Methods are compared in terms of different performance metrics: mean absolute error (MAE), root mean squared error (RMSE), and mean absolute scaled error (MASE).
Forecasting of Syrian Arab Military activities using the Hidden Markov Model (HMM) and autoregressive models (autoregressive integrated moving average (ARIMA) and regularized autoregression with exogenous variables (RARE)). Methods are compared in terms of different performance metrics: mean absolute error (MAE), root mean squared error (RMSE), and mean absolute scaled error (MASE).
We also evaluate our models for country-level event activities. We use six months of the GSR as training data starting from 1 August 2016, and use the model for predicting over a month, where there is a gap of a month between training and forecasting spans. We then shift the training period by a month and repeat the forecasting up to the month of September 2017. Tables 4 and 5 show the comparison between methods with different average metrics over seven months. We observed that the HMM and RARE model perform better compared to the other models. Figures 7(a) and (b) illustrate the models’ predictions over activities in Syria and Iraq for the months of August 2017 and May 2017, respectively.
Forecasting of MANSA events in Syria using the Hidden Markov Model (HMM) and autoregressive models (autoregressive integrated moving average (ARIMA) and regularized autoregression with exogenous variables (RARE)). Methods are compared in terms of different performance metrics with average over seven months: mean absolute error (MAE), root mean squared error (RMSE), and mean absolute scaled error (MASE).
Forecasting of MANSA events in Iraq using the Hidden Markov Model (HMM) and autoregressive models (autoregressive integrated moving average (ARIMA) and regularized autoregression with exogenous variables (RARE)). Methods are compared in terms of different performance metrics with average over seven months: mean absolute error (MAE), root mean squared error (RMSE), and mean absolute scaled error (MASE).

Forecasting events in Syria and Iraq using the Hidden Markov Model (HMM), autoregressive integrated moving average (ARIMA) model, regularized autoregression with exogenous variables (RARE) model, and a base rate model. (a), (b) Models forecast activities in Syria and Iraq over the period August 2017 and May 2017, respectively. For both settings, the HMM with two hidden states and Gaussian emission probability is used, and the ARIMA and RARE models are identified using a grid search over parameters.
Finally, we also evaluate our models against city-level events with two cities—Mosul and Aleppo. Similar to country-level event data, we use six months of the GSR as training data starting from 1 August 2016, and use the model for predicting over a month, where there is a gap of a month between training and forecasting spans. We then shift the training period by a month and repeat the forecasting up to the month of September 2017. Tables 6 and 7 show the comparison between methods over different metrics across seven months. We observed that the HMM and RARE model perform better compared to other models for Aleppo, but the ARIMA model outperformed others for Mosul. The reason could be the sparsity in the city-level events. Figures 8(a) and (b) illustrate the models’ predictions over activities in Aleppo and Mosul for the months of August 2017 and May 2017, respectively.
Forecasting of MANSA events in Aleppo using the Hidden Markov Model (HMM) and autoregressive models (autoregressive integrated moving average (ARIMA) and regularized autoregression with exogenous variables (RARE)). Methods are compared in terms of different performance metrics with average over seven months: mean absolute error (MAE), root mean squared error (RMSE), and mean absolute scaled error (MASE).
Forecasting of MANSA events in Mosul using the Hidden Markov Model (HMM) and autoregressive models (autoregressive integrated moving average (ARIMA) and regularized autoregression with exogenous variables (RARE)). Methods are compared in terms of different performance metrics with average over seven months: mean absolute error (MAE), root mean squared error (RMSE), and mean absolute scaled error (MASE).

Forecasting evenst in Aleppo and Mosul using the Hidden Markov Model (HMM), autoregressive integrated moving average (ARIMA) model, regularized autoregression with exogenous variables (RARE) model, and a base rate model. (a), (b) Models forecast activities in Syria and Iraq over the period May 2017. For both settings, the HMM with two hidden states and Gaussian emission probability is used, and the ARIMA and RARE models are identified using a grid search over parameters.
4.4. Important predictors in forecasting MANSA events
The RARE model identifies a subset of autoregressive variables and external variables that are predictive of the target, which is the number of events occurring each day. We analyze the topics selected by the algorithm. As an example, Tables 8 and 9 show some of the features identified by the RARE model with ISIS and Syrian Arab Military activities, respectively. We observe that many of identified topics are meaningful and relevant to the events associated with ISIS and the Syrian Arab Military.
Representative features selected by the regularized autoregression with exogenous variables model with a training set for ISIS activities from 1 August 2016 to 30 November 2016.
Representative features selected by the regularized autoregression with exogenous variables model with a training set for Syrian Arab Military activities from 1 August 2016 to 31 January 2017.
4.5. Warning generation
The proposed models essentially forecast event counts, but an intelligence analyst may need more details about the events for better understanding and dissemination. We propose a two-phase algorithm for generating real-world warnings. We transfer these event counts for each model to meaningful warnings with sampling each event detail field from its corresponding empirical distribution of the fields. To see the efficacy of this approach, we generate warnings at the country level (Syria and Iraq) for two different types of events (military action and non-state actor events) over the months from March to September 2017. For each event count, we use six trials for generating six different sets of warnings. We match the generated warnings against GSR events using the Hungarian matching 45 algorithm as well as other numerical and string matching algorithms. If a warning occurs within seven days of the corresponding true event, then a warning is included for further analysis in terms of various metrics. Figures 9–11 illustrate the evaluation of warnings generated by the base rate model, the HMM, and the RARE model in terms of precision, recall, and quality score. Each box in the plots represents 50% of the data, and each vertical red line denotes the median. We can see that the RARE model performs better than the others in terms of precision, and performs slightly better than the base rate model in terms of warning quality.

Evaluation of warnings generated using the base rate model for two types of events in Syria and Iraq from 1 March 2017 to 30 September 2017. (Color online only.)

Evaluation of warnings generated using the Hidden Markov Model for two types of events in Syria and Iraq from 1 March 2017 to 30 September 2017. (Color online only.)

Evaluation of warnings generated using the regularized autoregression with exogenous variables model for two types of events in Syria and Iraq from 1 March 2017 to 30 September 2017. (Color online only.)
5. Discussion
We explore state-based (HMM) and autoregressive (ARIMA and RARE) models for generating event forecasts with external indicators. We observe that both the HMM and RARE model perform quite well with a reasonable amount of data (actor and country-level events), while performance deteriorates when events are sparse. When event density is low and event type is rare, it poses a challenge to our proposed models for predicting events in such settings. Some of the countries (e.g., Saudi Arabia and Yemen) and most of the cities in our dataset have low event density, for which the HMM and the autoregressive models seem inadequate. In addition, there are some event types that are rare, such as some epidemic disease that do not occur so often compared to flu epidemics. For these rare events, the HMM and the autoregressive models may not work well. To address these problems we need predictive models that would take the elaborate event context in external sources into account.
In this study we model each actor independent of others, although actors interact with each other in a real-world scenario. It would be interesting to pursue modeling actors with more than two operational states as well as interactions between multiple actors.
This study explores an external source (Arabia Inform news articles) for event forecasting. Our methods can be extended to deal with signals from additional sources—such as Twitter and blogs—which we plan to explore in the future. It also possible to develop models that consider each of the data sources separately and that select subsets of external signals from each group for prediction. In addition to event count prediction models, we plan to explore models that not only forecast events but also identify the precursors to events in external sources.
Footnotes
Acknowledgements
The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.
Funding
This research is based upon work supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA).
